├── .gitignore ├── README.md ├── algorithms ├── __init__.py ├── ais_gsm.py ├── chains.py ├── crp.py ├── dumb_samplers.py ├── ibp.py ├── ibp_split_merge.py ├── low_rank.py ├── low_rank_poisson.py ├── slice_sampling.py ├── sparse_coding.py └── variational.py ├── config_example.py ├── example.py ├── example_data ├── animals-data.txt ├── animals-features.txt └── animals-names.txt ├── experiments.py ├── grammar.py ├── initialization.py ├── models.py ├── observations.py ├── parallel.py ├── parsing.py ├── predictive_distributions.py ├── presentation.py ├── recursive.py ├── scoring.py ├── single_process.py ├── synthetic_experiments.py └── utils ├── __init__.py ├── distributions.py ├── gaussians.py ├── misc.py ├── profiler.py ├── psd_matrices.py └── storage.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *~ 3 | config.py 4 | parsetab.py 5 | parser.out 6 | debugging 7 | sandbox 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | This software package implements the algorithms described in the paper 2 | 3 | > Roger B. Grosse, Ruslan Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum, 4 | > "Exploiting compositionality to explore a large space of model structures," UAI 2012. 5 | 6 | In particular, it takes an input matrix, runs the structure search, and outputs a report 7 | summarizing the choices made at each step. There is also a script which runs the synthetic 8 | data experiments from the paper. 9 | 10 | ## Caveats 11 | 12 | This is a research prototype, and I've made some simplifying assumptions which may or may 13 | not match your situation. In particular, 14 | 15 | - Matrices are assumed to be real-valued, and it handles binary matrices by treating the 16 | values as real and adding a small amount of noise to prevent degenerate solutions. (As 17 | a sanity check, I've also experimented with samplers which handle binary inputs directly, 18 | in order to check that the results were consistent with the real-valued version. However, 19 | I didn't get the algorithms working robustly enough to include in the experiments 20 | or the software package.) 21 | - It handles missing observations by explicitly sampling the missing values. 22 | This seems to work well for matrices with small numbers of missing entries, but might 23 | have poor mixing on sparse input matrices. 24 | - I haven't run the software on matrices larger than 1000 x 1000. There's no conceptual reason the 25 | algorithms can't scale beyond this, but there may be implementational reasons. 26 | 27 | I am working on a newer version of the software package which shouldn't have these 28 | limitations. 29 | 30 | 31 | ## Requirements 32 | 33 | This code base depends on a number of Python packages, most of which are pretty standard. 34 | Most of the packages are available through [Enthought Canopy](https://www.enthought.com/products/canopy/), 35 | which all academic users (including professors and postdocs) can use for free under their 36 | [academic license](https://www.enthought.com/products/canopy/academic/). We use the following 37 | Python packages which are included in Canopy: 38 | 39 | - [NumPy](http://www.numpy.org/) (I used 1.6.1) 40 | - [Matplotlib](http://matplotlib.org/index.html) (I used 1.2.0) 41 | - [SciPy](http://www.scipy.org/) (I used 0.12.0) 42 | - [scikit-learn](http://scikit-learn.org/stable/) (I used 0.13.1) 43 | 44 | Note: I've been told that [Anaconda Python](https://store.continuum.io/cshop/anaconda/) is an 45 | alternative distribution which includes these same packages, has a comparable academic license, 46 | and is easier to get running. I've never tried it myself, though. 47 | 48 | There are two additional requirements, which are both `easy_install`able: 49 | 50 | - [termcolor](https://pypi.python.org/pypi/termcolor) 51 | - [progressbar](https://code.google.com/p/python-progressbar/) 52 | 53 | More recent versions than the ones listed above should work fine, though unfortunately 54 | the interfaces to some SciPy routines have a tendency to change without warning... 55 | 56 | Also, if you want to distribute jobs across multiple cores or machines (highly recommended), you 57 | will need to do one of the following: 58 | 59 | - install [GNU Parallel](www.gnu.org/software/parallel) (see Configuration section for more details) 60 | - write a scheduler which better matches your own computing resources ([see below](#ownsched)) 61 | 62 | 63 | ## Configuration 64 | 65 | In order to run the structure search, you need to specify some local configuration parameters 66 | in `config.py`. First, in the main project directory, copy the template: 67 | 68 | cp config_example.py config.py 69 | 70 | In `config.py`, you need to specify the following paths: 71 | 72 | - `CODE_PATH`, the directory where you keep the code for this project 73 | - `CACHE_PATH`, a directory for storing intermediate results (which can take up a fair amount of disk 74 | space and are OK to delete when the experiment is done) 75 | - `RESULTS_PATH`, the directory for storing the machine-readable results of the structure search 76 | - `REPORT_PATH`, the directory for saving human-readable reports 77 | 78 | You also need to specify `SCHEDULER` to determine how the experiment jobs are to be run. The 79 | choices are `'single_process'`, which runs everything in a single process (not practical except 80 | for the smallest matrices), and `'parallel'`, which uses GNU Parallel to distribute the jobs 81 | across different machines, or different processes on the same machine. If you use GNU Parallel, 82 | you also need to specify: 83 | 84 | - `JOBS_PATH`, a directory for saving the status of jobs, if you are using GNU Parallel 85 | - `DEFAULT_NUM_JOBS`, the number of jobs to run on each machine 86 | 87 | Note that using our GNU Parallel wrapper requires the ability to `ssh` into the machines without 88 | entering a password. We realize this might not correspond to your situation, so [see below](#ownsched) 89 | for how you can write your own job scheduler module geared towards the clusters at your own institution. 90 | 91 | 92 | ## Running the example 93 | 94 | We provide an example of how to run the structure search in `example.py`. This runs the 95 | structure search on the mammals dataset of Kemp et al. (2006), "Learning systems of concepts 96 | with an infinite relational model." This is a 50 x 85 matrix where the rows represent 97 | different species of mammal, the columns represent attributes, and each entry is a binary 98 | value representing subjects' judgments of whether the animal has that attribute. Our structure 99 | search did not result in a clear structure for this dataset, but it serves as an example which 100 | can be run quickly (2 CPU minutes for me). 101 | 102 | After following the configuration directions above, run the following from the command line: 103 | 104 | python example.py 105 | python experiments.py everything example 106 | 107 | This will run the structure search, and then output the results to the shell (and also save 108 | them to the `example` subdirectory of `config.REPORT_PATH`). The results include the following: 109 | 110 | - the best-performing structure at each level of the search, with their improvement in 111 | predictive log-likelihood for rows and columns, as well as z-scores for the improvement 112 | - the total CPU time, also broken down by model 113 | - the predictive log-liklihood scores for all structures at all levels of the search, sorted 114 | from best to worst 115 | 116 | Note that the search parameters used in this example are probably 117 | insufficient for inference; if you are interested in accurate results for this dataset, 118 | change `QuickParams` to `SmallParams` in `example.py`. 119 | 120 | 121 | 122 | ## Running the structure search 123 | 124 | Suppose you have a real-valued matrix `X` you're interested in learning the structure of, 125 | in the form of a NumPy array. The first step is to create a `DataMatrix` instance: 126 | 127 | from observations import DataMatrix 128 | data_matrix = DataMatrix.from_real_values(X) 129 | 130 | This constructor also takes some optional arguments: 131 | 132 | - `mask`, which is a binary array determining which entries of `X` are observed. (By default, 133 | all entries are assumed to be observed.) 134 | - `row_label` and `col_label`, which are Python lists giving the label of each row or column. 135 | These are used for printing the learned clusters and binary components. 136 | 137 | The code doesn't do any preprocessing of the data, so it's recommended that you standardize 138 | it to have zero mean and unit variance. 139 | 140 | Next, you want to initialize an experiment for this matrix. You do this by passing in the 141 | `DataMatrix` instance, along with a parameters object. `experiments.SmallParams` gives a 142 | reasonable set of defaults for small matrices (e.g. 200 x 200), and `experiments.LargeParams` 143 | gives a reasonable set of defaults for larger matrices (e.g. 1000 x 1000). This creates a 144 | subdirectories of `config.RESULTS_PATH` and `config.REPORT_PATH` where all the computations 145 | and results will be stored. For example, 146 | 147 | from experiments import init_experiment, LargeParams 148 | init_experiment('experiment_name', data_matrix, LargeParams()) 149 | 150 | You can also override the default parameters by passing keyword arguments to the parameters 151 | constructor. See `experiments.DefaultParams` for more details. Finally, from the command line, 152 | run the whole structure search using the following: 153 | 154 | python experiments.py everything experiment_name 155 | 156 | You can also specify some optional keyword arguments: 157 | 158 | - `--machines`, the list of machines to distribute the jobs to if you are using GNU Parallel. 159 | This should be a comma-separated list with no spaces. By default, it runs jobs only on the same machine. 160 | - `--njobs`, the number of jobs to run on each machine if you are using GNU Parallel. (This 161 | overrides the default value in `config.DEFAULT_NUM_JOBS`.) 162 | - `--email`, your e-mail address, if you want it to e-mail you the report when it finishes. 163 | 164 | For example, 165 | 166 | python experiments.py everything experiment_name --machines machine1,machine2,machine3 --njobs 2 --email me@example.com 167 | 168 | If all goes well, a report will be saved to `experiment_name/results.txt` under `config.REPORT_PATH`. 169 | 170 | 171 | ## Using your own scheduler 172 | 173 | As mentioned above, the experiment script assumes you have GNU Parallel installed, and that you're 174 | able to SSH into machines without entering a password. This might not match your situation; for instance, 175 | your institution might use a queueing system to distribute jobs. I've tried to make it simple to adapt 176 | the experiment scripts to your own cluster setup. In particular, you need to do the following: 177 | 178 | 1. Write a Python function which takes a list of jobs and distributes them on your cluster. In particular, 179 | it should take two arguments: 180 | * `script`, the name of the Python file to execute 181 | * `jobs`, a list of jobs, where each one is a list of strings, each one corresponding to one 182 | command line argument. 183 | 184 | See `single_process.run` for an example. Note that some of the arguments may contain the single quote 185 | character, so you will have to escape them. 186 | 2. Add another case to `experiments.run_jobs` which calls your scheduler, and change `config.SCHEDULER` 187 | to the appropriate value. 188 | 3. If your scheduler should take any additional command line arguments, you can specify them in 189 | `experiments.add_scheduler_args`. 190 | 191 | The above directions assume that all of the machines have access to a common filesystem (e.g. AFS, NFS). 192 | If this isn't the case (for instance, if you are running on Amazon EC2), you'll also need to modify 193 | the functions in `storage.py` to read and write from whatever storage system is shared between the 194 | machines. 195 | 196 | 197 | ## Organization of the code 198 | 199 | The main code directly contains the following files which handle the logic of the experiments, 200 | and are described above: 201 | 202 | - `experiments.py`, as mentioned above, which manages the structure search for a single input matrix 203 | - `synthetic_experiments.py`, which runs the synthetic data experiments from the paper, i.e. by 204 | generating a lot of synthetic matrices and running the structure search on each 205 | - `presentation.py`, which formats the results into tables 206 | - `parallel.py` and `single_process.py`, utilities for running jobs 207 | 208 | The following files define the main data structures used in the structure search: 209 | 210 | - `grammar.py`, which defines the conext-free grammar 211 | - `parsing.py`, which parses string representations of the models into expression trees 212 | - `observations.py`, which defines the `DataMatrix` and `Observations` classes used to represent 213 | the input matrices 214 | - `recursive.py`, which defines the `Node` classes which store the actual decompositions 215 | - `models.py`, which defines model classes which parallel the structure of the `Node` classes, but 216 | define properties of the model itself (such as whether variance parameters for a matrix are 217 | associated with rows or columns) 218 | 219 | The following handle the posterior inference over decompositions: 220 | 221 | - `initialization.py`, which does the most interesting algorithmic work, namely initializing 222 | the more complex structures using algorithms particular to each production rule. 223 | - `algorithms/dumb_samplers.py`, which contains simple MCMC operators which are run after the 224 | recursive initialization procedure 225 | - the `algorithms` subdirectory contains inference algorithms corresponding to particular production 226 | rules: in particular, `chains.py`, `crp.py`, `ibp.py`, `low_rank_poisson.py`, and `sparse_coding.py`. 227 | 228 | Finally, the following files handle the predictive likelihood scoring: 229 | 230 | - `scoring.py`, the main procedures for predictive likelihood scoring 231 | - `predictive_distributions.py`, which converts the predictive distributions into a sum of terms 232 | as in Section 5 of the paper 233 | - `algorithms/variational.py`, which implements the variational lower bound of Section 5 234 | - `algorithms/ais_gsm.py`, which performs the additional AIS step needed for evaluating the GSM models. 235 | -------------------------------------------------------------------------------- /algorithms/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rgrosse/compositional_structure_search/b93f9f8d3a714213002c09403c1766b57835025e/algorithms/__init__.py -------------------------------------------------------------------------------- /algorithms/ais_gsm.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | import pylab 4 | 5 | import predictive_distributions 6 | from utils import distributions 7 | import variational 8 | 9 | SIGMOID_SCHEDULE = True 10 | 11 | def p_s_given_z(S, Z, t, sigma_sq_approx): 12 | return (1. - t) * distributions.gauss_loglik(S, 0., sigma_sq_approx[nax, :]) + \ 13 | t * distributions.gauss_loglik(S, 0., np.exp(Z)) 14 | 15 | def log_odds_to_prob(log_odds): 16 | prob = np.exp(log_odds - np.logaddexp.reduce(log_odds, axis=1)[:, nax]) 17 | prob /= prob.sum(1)[:, nax] # redundant in principle, but numerical error makes np.random.multinomial unhappy 18 | return prob 19 | 20 | def get_schedule(num_steps, first_odds): 21 | tau = np.linspace(-first_odds, first_odds, num_steps) 22 | temp = 1. / (1. + np.exp(-tau)) 23 | return (temp - temp[0]) / (temp[-1] - temp[0]) 24 | 25 | 26 | 27 | class MultinomialSampler: 28 | def __init__(self, pi, A, Sigma): 29 | self.pi = pi 30 | self.A = A 31 | self.Sigma = Sigma 32 | self.nlat, nvis = A.shape 33 | if Sigma.ndim == 2: 34 | self.Lambda = np.linalg.inv(self.Sigma)[nax, :, :] 35 | else: 36 | self.Lambda = np.array([np.linalg.inv(self.Sigma[i, :, :]) 37 | for i in range(self.Sigma.shape[0])]) 38 | 39 | def random_initialization(self, variational_reps): 40 | for vr in variational_reps: 41 | assert isinstance(vr, variational.MultinomialRepresentation) 42 | return np.array([rep.sample() for rep in variational_reps]) 43 | 44 | def step(self, targets, t, U): 45 | N = targets.shape[0] 46 | diff = self.A[nax, :, :] - targets[:, nax, :] 47 | obs_term = -0.5 * np.sum(np.sum(diff[:, :, :, nax] * diff[:, :, nax, :] * 48 | self.Lambda[:, nax, :, :], axis=3), axis=2) 49 | prob = log_odds_to_prob(obs_term + np.log(self.pi)[nax, :]) 50 | return np.array([np.random.multinomial(1, prob[i, :]) 51 | for i in range(N)]) 52 | 53 | def p_star(self, t, U): 54 | # constant with respect to time, so it doesn't affect AIS output 55 | return 0 56 | 57 | def contribution(self, U): 58 | return np.dot(U, self.A) 59 | 60 | class InnerMultinomialSampler: 61 | def __init__(self, pi, A, sigma_sq_approx): 62 | self.pi = pi 63 | self.A = A 64 | self.sigma_sq_approx = sigma_sq_approx 65 | 66 | def random_initialization(self, N): 67 | return np.random.multinomial(1, self.pi, size=N) 68 | 69 | def step(self, Z0, S, t, U): 70 | N, nspars, nclusters = S.shape[0], S.shape[1], self.pi.size 71 | ev = np.zeros((N, nclusters)) 72 | for k in range(nclusters): 73 | Z = Z0 + self.A[k, :][nax, :] 74 | ev[:, k] = p_s_given_z(S, Z, t, self.sigma_sq_approx).sum(1) 75 | prob = log_odds_to_prob(ev + np.log(self.pi)[nax, :]) 76 | #return np.random.multinomial(1, prob) 77 | return np.array([np.random.multinomial(1, prob[i, :]) 78 | for i in range(N)]) 79 | 80 | def contribution(self, Z): 81 | return np.dot(Z, self.A) 82 | 83 | class BernoulliSampler: 84 | def __init__(self, pi, A, Sigma): 85 | self.pi = pi 86 | self.A = A 87 | self.Sigma = Sigma 88 | self.nlat, nvis = A.shape 89 | if Sigma.ndim == 2: 90 | self.Lambda = np.linalg.inv(self.Sigma)[nax, :, :] 91 | else: 92 | self.Lambda = np.array([np.linalg.inv(self.Sigma[i, :, :]) 93 | for i in range(self.Sigma.shape[0])]) 94 | 95 | def random_initialization(self, variational_reps): 96 | for vr in variational_reps: 97 | assert isinstance(vr, variational.BernoulliRepresentation) 98 | return np.array([rep.sample() for rep in variational_reps]) 99 | 100 | def step(self, targets, t, U): 101 | U = U.copy() 102 | N, K = U.shape 103 | for i in range(N): 104 | x = np.dot(U[i, :], self.A) 105 | curr_targets = targets[i, :] 106 | if self.Lambda.ndim == 2: 107 | Lambda = self.Lambda 108 | else: 109 | Lambda = self.Lambda[i, :, :] 110 | 111 | for k in range(K): 112 | if U[i, k]: 113 | x -= self.A[k, :] 114 | off_score = -0.5 * np.dot(x - curr_targets, np.dot(Lambda, x - curr_targets)) 115 | x_on = x + self.A[k, :] 116 | on_score = -0.5 * np.dot(x_on - curr_targets, np.dot(Lambda, x_on - curr_targets)) 117 | 118 | log_odds = np.log(self.pi[k]) + on_score - off_score 119 | prob = 1. / (1. + np.exp(-log_odds)) 120 | U[i, k] = np.random.binomial(1, prob) 121 | x += U[i, k] * self.A[k, :] 122 | 123 | return U 124 | 125 | def p_star(self, t, u): 126 | # constant with respect to time, so it doesn't affect AIS output 127 | return 0 128 | 129 | def contribution(self, Z): 130 | return np.dot(Z, self.A) 131 | 132 | class InnerBernoulliSampler: 133 | def __init__(self, pi, A, sigma_sq_approx): 134 | self.pi = pi 135 | self.A = A 136 | self.sigma_sq_approx = sigma_sq_approx 137 | 138 | def step(self, Z0, S, t, U): 139 | U = U.copy() 140 | ndata, nspars, nfea = S.shape[0], S.shape[1], U.shape[1] 141 | for i in range(ndata): 142 | z = Z0[i, :] + np.dot(U[i, :], self.A) 143 | 144 | for k in range(nfea): 145 | if U[i, k]: 146 | z -= self.A[k, :] 147 | off_score = p_s_given_z(S[i, :], z, t, self.sigma_sq_approx).sum() 148 | z_on = z + self.A[k, :] 149 | on_score = p_s_given_z(S[i, :], z_on, t, self.sigma_sq_approx).sum() 150 | 151 | log_odds = np.log(self.pi[k]) + on_score - off_score 152 | prob = 1. / (1. + np.exp(-log_odds)) 153 | U[i, k] = np.random.binomial(1, prob) 154 | z += U[i, k] * self.A[k, :] 155 | 156 | return U 157 | 158 | def contribution(self, U): 159 | return np.dot(U, self.A) 160 | 161 | def random_initialization(self, N): 162 | return np.random.binomial(1, self.pi[nax, :], size=(N, self.pi.size)) 163 | 164 | def mh_multivariate_gaussian(U, f, mu, Sigma, epsilon): 165 | N, K = U.shape 166 | perturbation = np.array([np.random.multivariate_normal(np.zeros(K), Sigma) 167 | for i in range(N)]) 168 | proposal = mu[nax, :] + \ 169 | np.sqrt(1. - epsilon ** 2) * (U - mu[nax, :]) + \ 170 | epsilon * perturbation 171 | L0 = f(U) 172 | L1 = f(proposal) 173 | accept = np.random.binomial(1, np.where(L1 > L0, 1., np.exp(L1 - L0))) 174 | if np.isscalar(accept): # np.random.binomial converts length 1 arrays to scalars 175 | accept = np.array([accept]) 176 | return np.where(accept[:, nax], proposal, U) 177 | 178 | 179 | class InnerGaussianSampler: 180 | def __init__(self, mu, Sigma, sigma_sq_approx): 181 | self.mu = mu 182 | self.Sigma = Sigma 183 | self.sigma_sq_approx = sigma_sq_approx 184 | 185 | def step(self, Z0, S, t, U): 186 | EPSILON = 0.5 187 | N, K = S.shape 188 | U = U.copy() 189 | 190 | def f(U): 191 | return p_s_given_z(S, Z0 + U, t, self.sigma_sq_approx).sum(1) 192 | 193 | return mh_multivariate_gaussian(U, f, self.mu, self.Sigma, EPSILON) 194 | 195 | 196 | def contribution(self, U): 197 | return U 198 | 199 | def random_initialization(self, N): 200 | return np.array([np.random.multivariate_normal(self.mu, self.Sigma) 201 | for i in range(N)]) 202 | 203 | 204 | class GSMRepresentation: 205 | def __init__(self, S, U_all): 206 | self.S = S 207 | self.U_all = U_all 208 | 209 | def copy(self): 210 | return GSMRepresentation(self.S.copy(), self.U_all[:]) 211 | 212 | class GSMSampler: 213 | def __init__(self, scale_samplers, sigma_sq_approx, evidence_Sigma, A): 214 | self.scale_samplers = scale_samplers 215 | self.sigma_sq_approx = sigma_sq_approx 216 | self.evidence_Sigma = evidence_Sigma 217 | if evidence_Sigma.ndim == 2: 218 | self.evidence_Lambda = np.linalg.inv(evidence_Sigma) 219 | else: 220 | self.evidence_Lambda = np.array([np.linalg.inv(evidence_Sigma[i, :, :]) 221 | for i in range(evidence_Sigma.shape[0])]) 222 | self.A = A 223 | 224 | def random_initialization(self, S): 225 | N = S.shape[0] 226 | U_all = [sampler.random_initialization(N) for sampler in self.scale_samplers] 227 | return GSMRepresentation(S.copy(), U_all) 228 | 229 | def step(self, targets, t, rep): 230 | N, D = targets.shape 231 | K = rep.S.shape[1] 232 | rep = rep.copy() 233 | 234 | 235 | # sample S 236 | if self.evidence_Lambda.ndim == 2: 237 | Lambda_ev = np.dot(self.A, np.dot(self.evidence_Lambda, self.A.T)) 238 | else: 239 | Lambda_ev = np.array([np.dot(self.A, np.dot(self.evidence_Lambda[i, :, :], self.A.T)) 240 | for i in range(N)]) 241 | h_ev = np.dot(self.A, np.dot(self.evidence_Lambda, targets.T)).T 242 | 243 | Z = np.zeros((N, K)) 244 | for comp, samp in zip(rep.U_all, self.scale_samplers): 245 | Z += samp.contribution(comp) 246 | #sigma_sq_pri = np.exp((1. - t) * np.log(self.sigma_sq_approx)[nax, :] + 247 | # t * Z) 248 | lam_pri = (1. - t) / self.sigma_sq_approx[nax, :] + \ 249 | t * np.exp(-Z) 250 | 251 | 252 | 253 | rep.S = np.zeros((N, K)) 254 | for i in range(N): 255 | Lambda_pri = np.diag(lam_pri[i, :]) 256 | if Lambda_ev.ndim == 2: 257 | Lambda = Lambda_pri + Lambda_ev 258 | else: 259 | Lambda = Lambda_pri + Lambda_ev[i, :, :] 260 | Sigma = np.linalg.inv(Lambda) 261 | mu = np.dot(Sigma, h_ev[i, :]) 262 | rep.S[i, :] = np.random.multivariate_normal(mu, Sigma) 263 | 264 | # sample components of Z 265 | rep = rep.copy() 266 | for c in range(len(self.scale_samplers)): 267 | Z0 = np.zeros(Z.shape) 268 | for d in range(len(self.scale_samplers)): 269 | if d != c: 270 | Z0 += self.scale_samplers[d].contribution(rep.U_all[d]) 271 | 272 | rep.U_all[c] = self.scale_samplers[c].step(Z0, rep.S, t, rep.U_all[c]) 273 | 274 | # temporary 275 | #assert not stop 276 | 277 | return rep 278 | 279 | def p_star(self, t, rep): 280 | N, K = rep.S.shape 281 | Z = np.zeros((N, K)) 282 | for comp, samp in zip(rep.U_all, self.scale_samplers): 283 | Z += samp.contribution(comp) 284 | return p_s_given_z(rep.S, Z, t, self.sigma_sq_approx).sum(1) 285 | 286 | def contribution(self, rep): 287 | return np.dot(rep.S, self.A) 288 | 289 | # temporary 290 | stop = False 291 | 292 | class AISModel: 293 | def __init__(self, samplers, X, Sigma, init_partition_function): 294 | self.samplers = samplers 295 | self.X = X 296 | self.Sigma = Sigma 297 | self.init_partition_function = init_partition_function 298 | 299 | def step(self, reps, t): 300 | reps = reps[:] 301 | for i in range(len(self.samplers)): 302 | targets = self.X.copy() 303 | for j in range(len(self.samplers)): 304 | if j != i: 305 | targets -= self.samplers[j].contribution(reps[j]) 306 | reps[i] = self.samplers[i].step(targets, t, reps[i]) 307 | return reps 308 | 309 | def init_sample(self, variational_reps): 310 | N, D = self.X.shape 311 | is_gsm = np.array([isinstance(s, GSMSampler) for s in self.samplers]) 312 | gsm_idxs = np.where(is_gsm)[0] 313 | non_gsm_idxs = np.where(-is_gsm)[0] 314 | 315 | if len(gsm_idxs) == 0: 316 | raise RuntimeError('No GSM components; problem with module reloading?') 317 | if len(gsm_idxs) > 1: 318 | raise RuntimeError('Cannot handle multiple GSM components yet') 319 | gsm_sampler = self.samplers[gsm_idxs[0]] 320 | 321 | reps = [None for i in range(len(self.samplers))] 322 | discrete_part = np.zeros((N, D)) 323 | for vr_idx, sampler_idx in enumerate(non_gsm_idxs): 324 | curr_reps = [vr[vr_idx] for vr in variational_reps] 325 | reps[sampler_idx] = self.samplers[sampler_idx].random_initialization(curr_reps) 326 | discrete_part += self.samplers[sampler_idx].contribution(reps[sampler_idx]) 327 | 328 | # S = coefficients 329 | # G = GSM part = SA 330 | # E = Gaussian part 331 | # C = continuous part = S + E 332 | A = gsm_sampler.A 333 | C = self.X - discrete_part 334 | Sigma_S = np.diag(gsm_sampler.sigma_sq_approx) 335 | Sigma_E = self.Sigma 336 | Sigma_C = np.dot(A.T, np.dot(Sigma_S, A)) + Sigma_E 337 | Sigma_C_inv = np.linalg.inv(Sigma_C) 338 | temp = np.dot(Sigma_S, A) 339 | mu_S_given_C = np.dot(temp, np.dot(Sigma_C_inv, C.T)).T 340 | Sigma_S_given_C = np.dot(temp, np.dot(Sigma_C_inv, temp.T)) 341 | S = np.array([np.random.multivariate_normal(mu_S_given_C[i, :], Sigma_S_given_C) 342 | for i in range(N)]) 343 | 344 | reps[gsm_idxs[0]] = gsm_sampler.random_initialization(S) 345 | 346 | return reps 347 | 348 | def p_star(self, reps, t): 349 | total = 0. 350 | for sampler, rep in zip(self.samplers, reps): 351 | total += sampler.p_star(t, rep) 352 | return total 353 | 354 | def ais(ais_model, t_schedule, variational_representations): 355 | init_partition_function = ais_model.init_partition_function 356 | total = init_partition_function.copy() 357 | reps = ais_model.init_sample(variational_representations) 358 | 359 | all_deltas = [] 360 | 361 | count = 0 362 | for t0, t1 in zip(t_schedule[:-1], t_schedule[1:]): 363 | if count == 1: 364 | for i in range(100): 365 | reps = ais_model.step(reps, t0) 366 | else: 367 | reps = ais_model.step(reps, t0) 368 | delta = ais_model.p_star(reps, t1) - ais_model.p_star(reps, t0) 369 | 370 | # temporary 371 | if count > 0: 372 | total += delta 373 | 374 | all_deltas.append(delta) 375 | 376 | # temporary 377 | global stop 378 | if delta > 1.: 379 | stop = True 380 | 381 | count += 1 382 | 383 | return total 384 | 385 | # __init__(self, scale_samplers, sigma_sq_approx, evidence_Sigma): 386 | def compute_likelihood(X, components, Sigma, variational_representations, init_partition_function, 387 | t_schedule=None, num_steps=1000): 388 | 389 | samplers = [] 390 | for comp in components: 391 | if isinstance(comp, predictive_distributions.MultinomialPredictiveDistribution): 392 | sampler = MultinomialSampler(comp.pi, comp.centers, Sigma) 393 | 394 | elif isinstance(comp, predictive_distributions.BernoulliPredictiveDistribution): 395 | sampler = BernoulliSampler(comp.pi, comp.A, Sigma) 396 | 397 | elif isinstance(comp, predictive_distributions.GSMPredictiveDistribution): 398 | inner_samplers = [] 399 | for sc in comp.scale_components: 400 | if isinstance(sc, predictive_distributions.MultinomialPredictiveDistribution): 401 | inner_sampler = InnerMultinomialSampler(sc.pi, sc.centers, comp.sigma_sq_approx) 402 | 403 | elif isinstance(sc, predictive_distributions.BernoulliPredictiveDistribution): 404 | inner_sampler = InnerBernoulliSampler(sc.pi, sc.A, comp.sigma_sq_approx) 405 | 406 | else: 407 | raise RuntimeError("Can't convert to inner sampler: %s" % sc.__class__) 408 | 409 | inner_samplers.append(inner_sampler) 410 | 411 | igs = InnerGaussianSampler(comp.scale_mu, comp.scale_Sigma, comp.sigma_sq_approx) 412 | inner_samplers.append(igs) 413 | 414 | sampler = GSMSampler(inner_samplers, comp.sigma_sq_approx, Sigma, 415 | comp.A) 416 | 417 | samplers.append(sampler) 418 | 419 | ais_model = AISModel(samplers, X, Sigma, init_partition_function) 420 | 421 | if t_schedule is None: 422 | if SIGMOID_SCHEDULE: 423 | t_schedule = get_schedule(num_steps, 10.) 424 | else: 425 | t_schedule = np.linspace(0., 1., num_steps) 426 | 427 | return ais(ais_model, t_schedule, variational_representations) 428 | 429 | 430 | 431 | -------------------------------------------------------------------------------- /algorithms/chains.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | import scipy.linalg 4 | import time 5 | 6 | from utils import misc 7 | 8 | 9 | def integration_matrix(m): 10 | return (np.arange(m)[:,nax] >= np.arange(m)[nax,:]).astype(float) 11 | 12 | def sample_single_chain(t, lambda_D, lambda_N): 13 | m = t.size 14 | diagonal = lambda_N.copy() 15 | diagonal[1:] += lambda_D 16 | diagonal[:-1] += lambda_D 17 | off_diag = -lambda_D 18 | 19 | a = np.zeros((2, m)) 20 | a[0, 1:] = off_diag 21 | a[1,:] = diagonal 22 | 23 | x = scipy.linalg.solveh_banded(a, t * lambda_N) 24 | c = scipy.linalg.cholesky_banded(a) 25 | 26 | x = x.ravel() 27 | 28 | # generate noise 29 | lower = np.zeros(c.shape) 30 | lower[0,:] = c[1,:] 31 | lower[1,:-1] = c[0,1:] 32 | u = np.random.normal(size=m) 33 | n = scipy.linalg.solve_banded((1, 0), lower, u) 34 | 35 | assert np.max(np.abs(x + n)) < 1000. 36 | 37 | return x + n 38 | 39 | def single_chain_marginal(t, lambda_D, lambda_N): 40 | m = t.size 41 | diagonal = lambda_N.copy() 42 | diagonal[1:] += lambda_D 43 | diagonal[:-1] += lambda_D 44 | off_diag = -lambda_D 45 | 46 | a = np.zeros((2, m)) 47 | a[0, 1:] = off_diag 48 | a[1,:] = diagonal 49 | 50 | x = scipy.linalg.solveh_banded(a, t * lambda_N) 51 | x = x.ravel() 52 | 53 | Lambda = np.diag(diagonal) + np.diag(off_diag, -1) + np.diag(off_diag, 1) 54 | Sigma = np.linalg.inv(Lambda) 55 | return x, np.diag(Sigma) 56 | 57 | 58 | def chain_gibbs(X, obs, D, row_ids=None, row_variance=False): 59 | m, n = X.shape 60 | if row_ids is not None: 61 | row_ids = np.array(row_ids) 62 | time_steps = (row_ids[1:] - row_ids[:-1]).astype(float) 63 | else: 64 | time_steps = np.ones(m-1) 65 | 66 | S = D.cumsum(0) 67 | N = X - S 68 | 69 | 70 | if row_variance: 71 | # UNDO: is this correct? 72 | sigma_sq_D_rows, sigma_sq_D_cols = misc.sample_noise(D[1:,:] / np.sqrt(time_steps[:,nax])) 73 | sigma_sq_N_rows, sigma_sq_N_cols = misc.sample_noise(N, obs=obs) 74 | else: 75 | sigma_sq_D_rows, sigma_sq_N_rows = np.ones(m-1), np.ones(m) 76 | sigma_sq_D_cols = misc.sample_col_noise(D[1:,:] / np.sqrt(time_steps[:,nax])) 77 | sigma_sq_N_cols = misc.sample_col_noise(N) 78 | 79 | for j in range(n): 80 | sigma_sq_D = sigma_sq_D_rows * sigma_sq_D_cols[j] 81 | sigma_sq_N = sigma_sq_N_rows * sigma_sq_N_cols[j] 82 | 83 | # UNDO 84 | sigma_sq_D = sigma_sq_D.clip(1e-4, 100.) 85 | sigma_sq_N = sigma_sq_N.clip(1e-4, 100.) 86 | 87 | S[:,j] = sample_single_chain(X[:,j], 1. / (time_steps * sigma_sq_D), obs[:,j] / sigma_sq_N) 88 | N[:,j] = X[:,j] - S[:,j] 89 | 90 | D = np.zeros(X.shape) 91 | D[0,:] = S[0,:] 92 | D[1:,:] = S[1:,:] - S[:-1,:] 93 | return D 94 | 95 | NUM_ITER = 200 96 | 97 | def sample_variance(values): 98 | a = 1. + 0.5 * values.size 99 | b = 1. + 0.5 * np.sum(values ** 2) 100 | prec = np.random.gamma(a, 1. / b) 101 | prec = np.clip(prec, 1e-4, 1e4) # avoid numerical issues 102 | return 1. / prec 103 | 104 | 105 | def fit_model(data_matrix, num_iter=NUM_ITER): 106 | N_orig, N, D = data_matrix.m_orig, data_matrix.m, data_matrix.n 107 | X = data_matrix.sample_latent_values(np.zeros((N, D)), 1.) 108 | sigma_sq_D = sigma_sq_N = 1. 109 | fixed_variance = data_matrix.fixed_variance() 110 | 111 | row_ids = data_matrix.row_ids 112 | X_full = np.zeros((N_orig, D)) 113 | X_full[row_ids, :] = X 114 | 115 | states = np.zeros((N_orig, D)) 116 | resid = np.zeros((N, D)) 117 | diff = np.zeros((N_orig-1, D)) 118 | 119 | pbar = misc.pbar(num_iter) 120 | 121 | t0 = time.time() 122 | for it in range(num_iter): 123 | lam_N = np.zeros(N_orig) 124 | lam_N[row_ids] = 1. / sigma_sq_N 125 | for j in range(D): 126 | states[:, j] = sample_single_chain(X_full[:, j], 1. / sigma_sq_D, lam_N) 127 | resid = X - states[row_ids, :] 128 | diff = states[1:, :] - states[:-1, :] 129 | 130 | sigma_sq_D = sample_variance(diff) 131 | if not fixed_variance: 132 | sigma_sq_N = sample_variance(resid) 133 | 134 | X = data_matrix.sample_latent_values(states[row_ids, :], sigma_sq_N) 135 | X_full[row_ids, :] = X 136 | 137 | if time.time() - t0 > 3600.: # 1 hour 138 | break 139 | 140 | pbar.update(it) 141 | pbar.finish() 142 | 143 | return states, sigma_sq_D, sigma_sq_N 144 | 145 | 146 | def sample_chain(X, obs, row_ids=None): 147 | m, n = X.shape 148 | 149 | # initalize deltas 150 | X_noise = X + np.random.normal(0., 0.1, size=X.shape) 151 | D = np.zeros(X_noise.shape) 152 | D[0,:] = X_noise[0,:] 153 | D[1:,:] = X_noise[1:,:] - X_noise[:-1,:] 154 | 155 | niter = 50 # UNDO 156 | for it in range(niter): 157 | D = chain_gibbs(X, obs, D, row_ids) 158 | 159 | return D 160 | -------------------------------------------------------------------------------- /algorithms/crp.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | import sklearn.cluster 4 | import scipy.special 5 | import time 6 | 7 | import low_rank 8 | from utils import distributions, gaussians, psd_matrices, misc 9 | from_iso = gaussians.Potential.from_moments_iso 10 | 11 | 12 | np.seterr(divide='ignore', invalid='ignore') 13 | 14 | MAX_COMPONENTS = 100 15 | 16 | class CRPModel: 17 | def __init__(self, alpha, ndim, within_var_prior, between_var_prior, isotropic_w, isotropic_b): 18 | self.alpha = alpha 19 | self.ndim = ndim 20 | self.within_var_prior = within_var_prior 21 | self.between_var_prior = between_var_prior 22 | self.isotropic_w = isotropic_w 23 | self.isotropic_b = isotropic_b 24 | 25 | 26 | class CollapsedCRPState: 27 | def __init__(self, X, assignments, sigma_sq_w, sigma_sq_b): 28 | self.X = X.copy() 29 | self.assignments = assignments 30 | self.sigma_sq_w = sigma_sq_w 31 | self.sigma_sq_b = sigma_sq_b 32 | 33 | class FullCRPState: 34 | def __init__(self, X, assignments, centers, sigma_sq_w, sigma_sq_b): 35 | self.X = X 36 | self.assignments = assignments 37 | self.centers = centers 38 | self.sigma_sq_w = sigma_sq_w 39 | self.sigma_sq_b = sigma_sq_b 40 | 41 | def copy(self): 42 | if np.isscalar(self.sigma_sq_w): 43 | sigma_sq_w = self.sigma_sq_w 44 | else: 45 | sigma_sq_w = self.sigma_sq_w.copy() 46 | if np.isscalar(self.sigma_sq_b): 47 | sigma_sq_b = self.sigma_sq_b 48 | else: 49 | sigma_sq_b = self.sigma_sq_b.copy() 50 | return FullCRPState(self.X.copy(), self.assignments.copy(), self.centers.copy(), sigma_sq_w, sigma_sq_b) 51 | 52 | 53 | 54 | 55 | 56 | class CollapsedCRPCache: 57 | def __init__(self, model, X, mask, assignments, counts, obs_counts, sum_X, sum_X_sq): 58 | self.model = model 59 | self.X = X.copy() 60 | self.mask = mask.copy() 61 | self.assignments = assignments 62 | self.ncomp = assignments.max() + 1 63 | self.counts = counts 64 | self.obs_counts = obs_counts 65 | self.sum_X = sum_X 66 | self.sum_X_sq = sum_X_sq 67 | 68 | def copy(self): 69 | return CollapsedCRPCache(self.model, self.X, self.mask, self.assignments.copy(), self.counts.copy(), 70 | self.obs_counts.copy(), self.sum_X.copy(), self.sum_X_sq.copy()) 71 | 72 | def add(self, i, k, x): 73 | assert self.assignments[i] == -1 74 | if k == self.ncomp: 75 | self.counts = np.concatenate([self.counts, [0]]) 76 | self.obs_counts = np.vstack([self.obs_counts, np.zeros(self.model.ndim, dtype=int)]) 77 | self.sum_X = np.vstack([self.sum_X, np.zeros((1, self.model.ndim))]) 78 | self.sum_X_sq = np.vstack([self.sum_X_sq, np.zeros((1, self.model.ndim))]) 79 | self.ncomp += 1 80 | self.counts[k] += 1 81 | self.obs_counts[k, :] += self.mask[i, :] 82 | self.sum_X[k, :] += self.mask[i, :] * x 83 | self.sum_X_sq[k, :] += self.mask[i, :] * x ** 2 84 | self.assignments[i] = k 85 | self.X[i, :] = x 86 | 87 | def remove(self, i): 88 | assert self.assignments[i] != -1 89 | k = self.assignments[i] 90 | self.counts[k] -= 1 91 | self.obs_counts[k, :] -= self.mask[i, :] 92 | self.sum_X[k, :] -= self.mask[i, :] * self.X[i, :] 93 | self.sum_X_sq[k, :] -= self.mask[i, :] * self.X[i, :] ** 2 94 | self.assignments[i] = -1 95 | 96 | def replace(self, i, k): 97 | self.remove(i) 98 | self.add(i, k, self.X[i, :]) 99 | 100 | def squeeze(self, state): 101 | # renumber the clusters to eliminate empty ones 102 | for i in range(self.ncomp)[::-1]: 103 | if self.counts[i] == 0: 104 | assert np.all(state.assignments == self.assignments) 105 | self.assignments = np.where(self.assignments > i, self.assignments - 1, self.assignments) 106 | state.assignments = self.assignments.copy() 107 | self.ncomp -= 1 108 | self.counts = np.concatenate([self.counts[:i], self.counts[i+1:]]) 109 | self.obs_counts = np.vstack([self.obs_counts[:i, :], self.obs_counts[i+1:, :]]) 110 | self.sum_X = np.vstack([self.sum_X[:i, :], self.sum_X[i+1:, :]]) 111 | self.sum_X_sq = np.vstack([self.sum_X_sq[:i, :], self.sum_X_sq[i+1:, :]]) 112 | 113 | 114 | def check(self, data, state): 115 | new_cache = CollapsedCRPCache.from_state(self.model, data, state) 116 | self.check_close(new_cache) 117 | assert np.all(self.counts > 0) 118 | 119 | def check_close(self, other): 120 | assert np.all(self.counts == other.counts) 121 | assert np.all(self.obs_counts == other.obs_counts) 122 | assert np.allclose(self.sum_X, other.sum_X) 123 | assert np.allclose(self.sum_X_sq, other.sum_X_sq) 124 | assert np.all(self.assignments == other.assignments) 125 | 126 | @staticmethod 127 | def from_state(model, data, state): 128 | assignments = state.assignments.copy() 129 | ncomp = assignments.max() + 1 130 | counts = misc.get_counts(state.assignments, ncomp) 131 | obs_counts = np.zeros((ncomp, model.ndim), dtype=int) 132 | sum_X = np.zeros((ncomp, model.ndim)) 133 | sum_X_sq = np.zeros((ncomp, model.ndim)) 134 | for k in range(ncomp): 135 | obs_counts[k, :] = data.mask[assignments==k, :].sum(0) 136 | sum_X[k, :] = (data.mask * state.X)[assignments==k, :].sum(0) 137 | sum_X_sq[k, :] = (data.mask * state.X**2)[assignments==k, :].sum(0) 138 | return CollapsedCRPCache(model, state.X, data.mask, assignments, counts, obs_counts, sum_X, sum_X_sq) 139 | 140 | 141 | def crp_loglik(assignments, alpha): 142 | counts = np.bincount(assignments) 143 | N = counts.sum() 144 | K = counts.size 145 | return scipy.special.gammaln(alpha) + \ 146 | -scipy.special.gammaln(alpha + N) + \ 147 | K * np.log(alpha) + \ 148 | scipy.special.gammaln(counts).sum() 149 | 150 | 151 | def p_tilde_collapsed(model, data, state): 152 | cache = CollapsedCRPCache.from_state(model, data, state) 153 | ncomp = cache.counts.size 154 | total = 0. 155 | 156 | # data evidence, marginalizing out the centers 157 | ce = center_evidence(model, state, cache) 158 | for k in range(ncomp): 159 | if model.isotropic_b: 160 | prior_term = from_iso(np.zeros(model.ndim), state.sigma_sq_b) 161 | else: 162 | prior_term = gaussians.Potential.from_moments_diag(np.zeros(model.ndim), state.sigma_sq_b) 163 | evidence = ce[k] 164 | total += (prior_term + evidence).integral() 165 | 166 | # hyperparameters 167 | total += np.sum(model.within_var_prior.loglik(state.sigma_sq_w)) 168 | total += np.sum(model.between_var_prior.loglik(state.sigma_sq_b)) 169 | 170 | # partition 171 | total += crp_loglik(state.assignments, model.alpha) 172 | 173 | return total 174 | 175 | def p_tilde(model, data, state): 176 | total = 0. 177 | 178 | # data likelihood 179 | evidence = p_X_given_centers(model, data, state) 180 | total += evidence.score(state.centers[state.assignments, :]).sum() 181 | 182 | # centers likelihood 183 | if model.isotropic_b: 184 | centers_dist = from_iso(np.zeros(model.ndim), state.sigma_sq_b) 185 | else: 186 | centers_dist = gaussians.Potential.from_moments_diag(np.zeros(model.ndim), state.sigma_sq_b) 187 | total += centers_dist[nax].score(state.centers).sum() 188 | 189 | # hyperparameters 190 | total += np.sum(model.within_var_prior.loglik(state.sigma_sq_w)) 191 | total += np.sum(model.between_var_prior.loglik(state.sigma_sq_b)) 192 | 193 | # partition 194 | total += crp_loglik(state.assignments, model.alpha) 195 | 196 | return total 197 | 198 | def p_X_given_centers(model, data, state): 199 | lam = data.mask / state.sigma_sq_w 200 | h = lam * state.X 201 | temp = -0.5 * np.log(2*np.pi) + \ 202 | -0.5 * np.log(state.sigma_sq_w) + \ 203 | -0.5 * lam * state.X**2 204 | Z = (data.mask * temp).sum(1) 205 | return gaussians.Potential(h, psd_matrices.DiagonalMatrix(lam), Z) 206 | 207 | 208 | 209 | def center_evidence(model, state, cache): 210 | lam = cache.obs_counts / state.sigma_sq_w 211 | mu = np.where(cache.obs_counts > 0, cache.sum_X / cache.obs_counts, 0.) 212 | h = mu * lam 213 | if model.isotropic_w: 214 | Z = -0.5 * cache.obs_counts.sum(1) * np.log(2*np.pi) + \ 215 | -0.5 * cache.obs_counts.sum(1) * np.log(state.sigma_sq_w) + \ 216 | -0.5 * cache.sum_X_sq.sum(1) / state.sigma_sq_w 217 | else: 218 | Z = -0.5 * cache.obs_counts.sum(1) * np.log(2 * np.pi) + \ 219 | -0.5 * (cache.obs_counts * np.log(state.sigma_sq_w)).sum(1) + \ 220 | -0.5 * (cache.sum_X_sq / state.sigma_sq_w).sum(1) 221 | return gaussians.Potential(h, psd_matrices.DiagonalMatrix(lam), Z) 222 | 223 | def center_beliefs(model, state, cache): 224 | if model.isotropic_b: 225 | prior_term = from_iso(np.zeros(model.ndim), state.sigma_sq_b) 226 | else: 227 | prior_term = gaussians.Potential.from_moments_diag(np.zeros(model.ndim), state.sigma_sq_b) 228 | return (center_evidence(model, state, cache) + prior_term).renorm() 229 | 230 | def new_center_beliefs(model, state): 231 | return from_iso(np.zeros(model.ndim), state.sigma_sq_b) 232 | 233 | def center_predictive(model, state, cache, k): 234 | N, D = state.X.shape 235 | if k == cache.ncomp: 236 | return np.zeros(D), np.ones(D) * (state.sigma_sq_b + state.sigma_sq_w) 237 | else: 238 | ssq_w, ssq_b = state.sigma_sq_w, state.sigma_sq_b 239 | lam = cache.obs_counts[k, :] / ssq_w + 1. / ssq_b 240 | predictive_mu = (cache.sum_X[k, :] / ssq_w) / lam 241 | predictive_sigma_sq = 1. / lam + state.sigma_sq_w 242 | return predictive_mu, predictive_sigma_sq 243 | 244 | def cond_assignments_collapsed(model, data, state, cache, i): 245 | cache.remove(i) 246 | 247 | prior_term = np.concatenate([np.log(cache.counts), [np.log(model.alpha)]]) 248 | 249 | if MAX_COMPONENTS is not None and cache.ncomp >= MAX_COMPONENTS: 250 | prior_term[-1] = -np.infty 251 | 252 | data_term = np.zeros(cache.ncomp + 1) 253 | for k in range(cache.ncomp + 1): 254 | predictive_mu, predictive_ssq = center_predictive(model, state, cache, k) 255 | data_term[k] = data[i, :].loglik(predictive_mu, predictive_ssq) 256 | 257 | cache.add(i, state.assignments[i], state.X[i, :]) 258 | 259 | return distributions.MultinomialDistribution.from_odds(prior_term + data_term) 260 | 261 | 262 | def gibbs_step_assignments_collapsed(model, data, state, cache, i): 263 | dist = cond_assignments_collapsed(model, data, state, cache, i) 264 | new_assignment = dist.sample().argmax() 265 | state.assignments[i] = new_assignment 266 | cache.remove(i) 267 | predictive_mu, predictive_ssq = center_predictive(model, state, cache, new_assignment) 268 | state.X[i, :] = data[i, :].sample_latent_values(predictive_mu, predictive_ssq) 269 | cache.add(i, new_assignment, state.X[i, :]) 270 | cache.squeeze(state) 271 | 272 | 273 | def cond_centers(model, data, state, cache): 274 | if model.isotropic_b: 275 | prior_term = from_iso(np.zeros(model.ndim), state.sigma_sq_b) 276 | else: 277 | prior_term = gaussians.Potential.from_moments_diag(np.zeros(model.ndim), state.sigma_sq_b) 278 | center_beliefs = center_evidence(model, state, cache) + prior_term 279 | return center_beliefs.renorm() 280 | 281 | def gibbs_step_centers(model, data, state, cache): 282 | cond = cond_centers(model, data, state, cache) 283 | new_centers = cond.to_distribution().sample() 284 | state.centers = new_centers 285 | 286 | def cond_sigma_sq_b(model, data, state): 287 | counts = np.bincount(state.assignments) 288 | nz = np.where(counts > 0)[0] 289 | centers = state.centers[nz, :] 290 | 291 | if model.isotropic_b: 292 | a = model.between_var_prior.a + 0.5 * nz.size * model.ndim 293 | b = model.between_var_prior.b + 0.5 * np.sum(centers**2) 294 | else: 295 | a = model.between_var_prior.a + 0.5 * nz.size * np.ones(model.ndim) 296 | b = model.between_var_prior.b + 0.5 * np.sum(centers**2, axis=0) 297 | return distributions.InverseGammaDistribution(a, b) 298 | 299 | def gibbs_step_sigma_sq_b(model, data, state): 300 | cond = cond_sigma_sq_b(model, data, state) 301 | state.sigma_sq_b = cond.sample() 302 | 303 | def cond_sigma_sq_w(model, data, state): 304 | diff = state.X - state.centers[state.assignments, :] 305 | if model.isotropic_w: 306 | a = model.within_var_prior.a + 0.5 * np.sum(data.mask) 307 | b = model.within_var_prior.b + 0.5 * np.sum(data.mask * diff**2) 308 | else: 309 | a = model.within_var_prior.a + 0.5 * np.sum(data.mask, axis=0) 310 | b = model.within_var_prior.b + 0.5 * np.sum(data.mask * diff**2, axis=0) 311 | return distributions.InverseGammaDistribution(a, b) 312 | 313 | def gibbs_step_sigma_sq_w(model, data, state): 314 | cond = cond_sigma_sq_w(model, data, state) 315 | state.sigma_sq_w = cond.sample() 316 | 317 | def gibbs_sweep_collapsed(model, data, state, fixed_variance=False): 318 | cache = CollapsedCRPCache.from_state(model, data, state) 319 | num = state.X.shape[0] 320 | for i in range(num): 321 | gibbs_step_assignments_collapsed(model, data, state, cache, i) 322 | 323 | cache = CollapsedCRPCache.from_state(model, data, state) 324 | gibbs_step_centers(model, data, state, cache) 325 | #assert False 326 | gibbs_step_sigma_sq_b(model, data, state) 327 | if not fixed_variance: 328 | gibbs_step_sigma_sq_w(model, data, state) 329 | 330 | 331 | 332 | 333 | NUM_ITER = 200 334 | 335 | def init_X(data_matrix): 336 | X_init = data_matrix.sample_latent_values(np.zeros((data_matrix.m, data_matrix.n)), 1.) 337 | svd_K = min(20, data_matrix.m // 4, data_matrix.n // 4) 338 | svd_K = max(svd_K, 2) # 0 and 1 cause it to crash 339 | _, _, _, _, _, X_init = low_rank.fit_model(data_matrix, svd_K, 10) 340 | return X_init 341 | 342 | 343 | def fit_model(data_matrix, isotropic_w=True, isotropic_b=True, num_iter=NUM_ITER): 344 | X_init = init_X(data_matrix) 345 | 346 | model = CRPModel(1., X_init.shape[1], distributions.InverseGammaDistribution(0.01, 0.01), 347 | distributions.InverseGammaDistribution(0.01, 0.01), isotropic_w, isotropic_b) 348 | 349 | N, D = X_init.shape 350 | 351 | k_init = min(N//4, 40) 352 | km = sklearn.cluster.KMeans(n_clusters=k_init) 353 | km.fit(X_init) 354 | init_assignments = km.labels_ 355 | 356 | 357 | 358 | sigma_sq_f = sigma_sq_n = X_init.var() / 2. 359 | if not model.isotropic_b: 360 | sigma_sq_f = X_init.var(0) / 2. 361 | state = CollapsedCRPState(X_init, init_assignments, sigma_sq_n, sigma_sq_f) 362 | state.centers = km.cluster_centers_ 363 | 364 | fixed_variance = data_matrix.fixed_variance() 365 | 366 | data = data_matrix.observations 367 | 368 | if fixed_variance: 369 | if isotropic_w: 370 | state.sigma_sq_w = 1. 371 | else: 372 | state.sigma_sq_w = np.ones(D) 373 | 374 | pbar = misc.pbar(num_iter) 375 | 376 | t0 = time.time() 377 | for it in range(num_iter): 378 | pred = state.centers[state.assignments, :] 379 | state.X = data_matrix.sample_latent_values(pred, state.sigma_sq_w) 380 | gibbs_sweep_collapsed(model, data, state, fixed_variance) 381 | 382 | if time.time() - t0 > 3600.: # 1 hour 383 | break 384 | 385 | pbar.update(it) 386 | pbar.finish() 387 | 388 | # sample the centers 389 | cache = CollapsedCRPCache.from_state(model, data, state) 390 | gibbs_step_centers(model, data, state, cache) 391 | 392 | return state 393 | 394 | 395 | -------------------------------------------------------------------------------- /algorithms/dumb_samplers.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | import scipy.optimize 4 | 5 | import grammar 6 | import models 7 | import slice_sampling 8 | import sparse_coding 9 | from utils import distributions, misc 10 | 11 | 12 | def sample_variance(node): 13 | if node.isleaf() and node.distribution() == 'g': 14 | node.sample_variance() 15 | elif node.issum(): 16 | for child in node.children: 17 | sample_variance(child) 18 | elif node.isproduct(): 19 | for child in node.children: 20 | sample_variance(child) 21 | 22 | 23 | 24 | 25 | 26 | class GenericGibbsSampler: 27 | def __init__(self, node): 28 | self.node = node 29 | 30 | def step(self, niter=1, maximize=False): 31 | if not maximize: 32 | self.node.gibbs_update2() 33 | 34 | def __str__(self): 35 | return 'GenericGibbsSampler(%d)' % self.node.model.id 36 | 37 | def preserves_root_value(self): 38 | return True 39 | 40 | 41 | class GaussianSampler: 42 | def __init__(self, product_node, noise_node, side, maximize): 43 | self.product_node = product_node 44 | self.noise_node = noise_node 45 | self.side = side 46 | self.maximize = maximize 47 | 48 | def step(self): 49 | left, right = self.product_node.children 50 | m, n = self.noise_node.m, self.noise_node.n 51 | 52 | old = np.dot(left.value(), right.value()) + self.noise_node.value() 53 | 54 | if self.side == 'left' and ((left.isleaf() and left.distribution() == 'g') 55 | or left.isgsm()): 56 | A = np.eye(m) 57 | B = right.value() 58 | X_node = left 59 | elif self.side == 'left' and left.issum(): 60 | A = np.eye(m) 61 | B = right.value() 62 | X_node = left.children[-1] 63 | elif self.side == 'right' and ((right.isleaf() and right.distribution() == 'g') 64 | or right.isgsm()): 65 | A = left.value() 66 | B = np.eye(n) 67 | X_node = right 68 | elif self.side == 'right' and right.issum(): 69 | A = left.value() 70 | B = np.eye(n) 71 | X_node = right.children[-1] 72 | else: 73 | raise RuntimeError("shouldn't get here") 74 | 75 | X = X_node.value() 76 | N_node = self.noise_node 77 | N = N_node.value() 78 | C = np.dot(np.dot(A, X), B) + N 79 | obs = np.ones((m, n), dtype=bool) 80 | 81 | if X_node.has_rank1_variance() and N_node.has_rank1_variance(): 82 | ssq_row_N, ssq_col_N = N_node.row_col_variance() 83 | ssq_row_X, ssq_col_X = X_node.row_col_variance() 84 | d_1 = 1. / np.sqrt(ssq_row_N) 85 | d_2 = 1. / np.sqrt(ssq_col_N) 86 | d_3 = 1. / np.sqrt(ssq_row_X) 87 | d_4 = 1. / np.sqrt(ssq_col_X) 88 | 89 | if self.maximize: 90 | X_new = misc.map_gaussian_matrix_em(A, B, C, d_1, d_2, d_3, d_4, obs, X) 91 | else: 92 | X_new = misc.sample_gaussian_matrix_em(A, B, C, d_1, d_2, d_3, d_4, obs, X) 93 | 94 | else: 95 | if self.maximize: 96 | fn = misc.map_gaussian_matrix2 97 | else: 98 | fn = misc.sample_gaussian_matrix2 99 | 100 | if self.side == 'left': 101 | X_new = fn(B.T, C.T, 1. / X_node.variance().T, obs.T / N_node.variance().T).T 102 | else: 103 | X_new = fn(A, C, 1. / X_node.variance(), obs / N_node.variance()) 104 | 105 | 106 | X_node.set_value(X_new) 107 | N_new = C - np.dot(np.dot(A, X_new), B) 108 | N_node.set_value(N_new) 109 | 110 | new = np.dot(left.value(), right.value()) + self.noise_node.value() 111 | assert np.allclose(old, new) 112 | 113 | def __str__(self): 114 | return 'GaussianSampler(prod=%d, noise=%d, side=%s, maximize=%s)' % \ 115 | (self.product_node.model.id, self.noise_node.model.id, self.side, self.maximize) 116 | 117 | def preserves_root_value(self): 118 | return True 119 | 120 | 121 | class LatentValueSampler: 122 | def __init__(self, data_matrix, node): 123 | self.data_matrix = data_matrix 124 | self.node = node 125 | 126 | def step(self): 127 | pred = self.node.value() - self.node.children[-1].value() 128 | new_X = self.data_matrix.sample_latent_values(pred, self.node.children[-1].variance()) 129 | self.node.children[-1].set_value(new_X - pred) 130 | 131 | def __str__(self): 132 | return 'LatentValueSampler(%d)' % self.node.model.id 133 | 134 | def preserves_root_value(self): 135 | return False 136 | 137 | 138 | 139 | class LatentValueMaximizer: 140 | def __init__(self, data_matrix, node): 141 | self.data_matrix = data_matrix 142 | self.node = node 143 | 144 | def step(self): 145 | pred = self.node.value() - self.node.children[-1].value() 146 | new_X = np.where(self.data_matrix.observations.mask, self.node.value(), pred) 147 | self.node.children[-1].set_value(new_X - pred) 148 | 149 | def __str__(self): 150 | return 'LatentValueMaximizer(%d)' % self.node.model.id 151 | 152 | def preserves_root_value(self): 153 | return False 154 | 155 | class VarianceSampler: 156 | def __init__(self, node): 157 | self.node = node 158 | 159 | def step(self): 160 | self.node.sample_variance() 161 | 162 | def __str__(self): 163 | return 'VarianceSampler(%d)' % self.node.model.id 164 | 165 | def preserves_root_value(self): 166 | return True 167 | 168 | class GSMScaleSampler: 169 | def __init__(self, gsm_node, maximize=False): 170 | self.gsm_node = gsm_node 171 | self.maximize = maximize 172 | 173 | def step(self): 174 | # S ~ N(0, exp(Z / 2)) 175 | # 176 | # Z = signal_node + noise_node + bias 177 | # = signal_node + gaussian_term 178 | # = scale_node + bias 179 | # 180 | # resample gaussian_term conditioned on signal_node 181 | 182 | scale_node = self.gsm_node.scale_node 183 | S = self.gsm_node.value() 184 | N, K = S.shape 185 | 186 | # resample Z 187 | Z = scale_node.value() + self.gsm_node.bias 188 | if scale_node.isleaf(): 189 | mu = self.gsm_node.bias * np.ones((N, K)) 190 | sigma_sq = scale_node.variance() 191 | else: 192 | assert scale_node.issum() 193 | mu = self.gsm_node.bias + scale_node.value() - scale_node.children[-1].value() 194 | sigma_sq = scale_node.children[-1].variance() 195 | 196 | for i in range(N): 197 | for k in range(K): 198 | log_f = sparse_coding.LogFUncollapsed(S[i, k]) 199 | if self.maximize: 200 | temp = lambda z: -log_f(z) - distributions.gauss_loglik(z, mu[i, k], sigma_sq[i, k]) 201 | Z[i, k] = scipy.optimize.fmin(temp, Z[i, k], disp=False) 202 | else: 203 | Z[i, k] = slice_sampling.slice_sample_gauss(log_f, mu[i, k], sigma_sq[i, k], Z[i, k]) 204 | 205 | # resample bias 206 | if scale_node.isleaf(): 207 | gaussian_term = Z 208 | else: 209 | signal = scale_node.value() - scale_node.children[-1].value() 210 | gaussian_term = Z - signal 211 | 212 | if not self.maximize: 213 | if self.gsm_node.bias_type == 'scalar': 214 | mu = gaussian_term.mean() 215 | lam = (1. / sigma_sq).sum() 216 | self.gsm_node.bias = np.random.normal(mu, 1. / lam) 217 | elif self.gsm_node.bias_type == 'row': 218 | mu = gaussian_term.mean(1) 219 | lam = (1. / sigma_sq).sum(1) 220 | self.gsm_node.bias = np.random.normal(mu, 1. / lam)[:, nax] 221 | elif self.gsm_node.bias_type == 'col': 222 | mu = gaussian_term.mean(0) 223 | lam = (1. / sigma_sq).sum(0) 224 | self.gsm_node.bias = np.random.normal(mu, 1. / lam)[nax, :] 225 | 226 | # set noise node 227 | noise_term = gaussian_term - self.gsm_node.bias 228 | if scale_node.isleaf(): 229 | scale_node.set_value(noise_term) 230 | else: 231 | scale_node.children[-1].set_value(noise_term) 232 | 233 | def __str__(self): 234 | return 'GSMScaleSampler(%d, maximize=%s)' % (self.gsm_node.model.id, self.maximize) 235 | 236 | def preserves_root_value(self): 237 | return True 238 | 239 | 240 | def get_samplers(data_matrix, node, maximize): 241 | samplers = [] 242 | if data_matrix is not None and not maximize: 243 | samplers.append(LatentValueSampler(data_matrix, node)) 244 | if data_matrix is not None and maximize: 245 | samplers.append(LatentValueMaximizer(data_matrix, node)) 246 | 247 | if node.isleaf() and not node.model.fixed and not maximize: 248 | samplers.append(GenericGibbsSampler(node)) 249 | 250 | if node.isleaf() and node.distribution() == 'g' and not node.model.fixed_variance and not maximize: 251 | samplers.append(VarianceSampler(node)) 252 | 253 | if node.issum(): 254 | children = node.children[:-1] 255 | noise_node = node.children[-1] 256 | for child in children: 257 | left, right = child.children 258 | if ((left.isleaf() and left.distribution() == 'g') or left.issum() or left.isgsm()) and \ 259 | not left.model.fixed and not noise_node.model.fixed: 260 | samplers.append(GaussianSampler(child, noise_node, 'left', maximize)) 261 | if ((right.isleaf() and right.distribution() == 'g') or right.issum() or left.isgsm()) and \ 262 | not right.model.fixed and not noise_node.model.fixed: 263 | samplers.append(GaussianSampler(child, noise_node, 'right', maximize)) 264 | 265 | if node.isgsm(): 266 | samplers.append(GSMScaleSampler(node, maximize=maximize)) 267 | 268 | for child in node.children: 269 | samplers += get_samplers(None, child, maximize) 270 | 271 | return samplers 272 | 273 | def list_samplers(model, maximize=False): 274 | node = model.dummy() 275 | models.align(node, model) 276 | samplers = get_samplers('dummy', node, maximize) 277 | node.model.display() 278 | print 279 | for s in samplers: 280 | print s 281 | 282 | 283 | def sweep(data_matrix, root, num_iter=100, maximize=False): 284 | samplers = get_samplers(data_matrix, root, maximize) 285 | 286 | if num_iter > 1: 287 | print 'Dumb Gibbs sampling on %s...' % grammar.pretty_print(root.structure()) 288 | pbar = misc.pbar(num_iter) 289 | else: 290 | pbar = None 291 | 292 | for it in range(num_iter): 293 | for sampler in samplers: 294 | if sampler.preserves_root_value(): 295 | old = root.value() 296 | sampler.step() 297 | if sampler.preserves_root_value(): 298 | assert np.allclose(old, root.value()) 299 | 300 | if pbar is not None: 301 | pbar.update(it) 302 | if pbar is not None: 303 | pbar.finish() 304 | 305 | -------------------------------------------------------------------------------- /algorithms/ibp_split_merge.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | import scipy.special 4 | 5 | import ibp 6 | import observations 7 | from utils import distributions, gaussians, psd_matrices 8 | 9 | def poisson(k, lam): 10 | return -lam * k * np.log(lam) - scipy.special.gammaln(k+1) 11 | 12 | def evidence(model, data, state): 13 | K, D = state.Z.shape[1], state.X.shape[1] 14 | 15 | Lambda = np.dot(state.Z.T, state.Z) / state.sigma_sq_n + np.eye(K) / state.sigma_sq_f 16 | h = np.dot(state.Z.T, state.X) / state.sigma_sq_n 17 | 18 | # we can ignore the constant factors because they don't depend on Z 19 | pot = gaussians.Potential(h.T, psd_matrices.FullMatrix(Lambda[nax, :, :]), 0.) 20 | return pot.integral().sum() 21 | 22 | def sample_features(model, data, state): 23 | K, D = state.Z.shape[1], state.X.shape[1] 24 | 25 | Lambda = np.dot(state.Z.T, state.Z) / state.sigma_sq_n + np.eye(K) / state.sigma_sq_f 26 | h = np.dot(state.Z.T, state.X) / state.sigma_sq_n 27 | 28 | # we can ignore the constant factors because they don't depend on Z 29 | pot = gaussians.Potential(h.T, psd_matrices.FullMatrix(Lambda[nax, :, :]), 0.) 30 | return pot.to_distribution().sample().T 31 | 32 | 33 | def next_assignment_proposal(model, data, state, cache, Sigma_info, i, k): 34 | assert not cache.rows_included[i] 35 | x = state.X[i, :] 36 | 37 | evidence = np.zeros(2) 38 | for assignment in [0, 1]: 39 | mu = Sigma_info.mu_for(k, assignment) 40 | ssq = Sigma_info.sigma_sq_for(k, assignment) + state.sigma_sq_n 41 | evidence[assignment] = ibp.gauss_loglik_vec_C2(x, mu, ssq) 42 | data_odds = evidence[1] - evidence[0] 43 | 44 | if cache.counts[k] > 0: 45 | prior_odds = np.log(cache.counts[k]) - np.log(cache.num_included - cache.counts[k] + 1) 46 | else: 47 | #prior_odds = poisson(1, 0.5 * model.alpha / (i+1)) - poisson(0, 0.5 * model.alpha / (i+1)) 48 | prior_odds = np.log(model.alpha) - np.log(cache.num_included + 1) 49 | 50 | return distributions.BernoulliDistribution.from_odds(data_odds + prior_odds) 51 | 52 | 53 | def propose_assignments(model, data, state, update=False): 54 | """Generate the proposal for K columns using sequential Monte Carlo. Assumes the remaining 55 | features have been sampled, the remaining assignments are fixed, and the other features' contributions 56 | are subtracted from the data matrix. Generally K = 2.""" 57 | N, K = state.Z.shape 58 | state = state.copy() 59 | cache = ibp.IBPCache.from_state(model, data, state, np.zeros(N, dtype=bool)) 60 | 61 | proposal_prob = 0. 62 | 63 | for i in range(N): 64 | Sigma_info = cache.fpost.Sigma_info(np.zeros(K, dtype=int)) 65 | for k in range(K): 66 | cond = next_assignment_proposal(model, data, state, cache, Sigma_info, i, k) 67 | if update: 68 | state.Z[i, k] = cond.sample() 69 | proposal_prob += cond.loglik(state.Z[i, k]) 70 | Sigma_info.update(k, state.Z[i, k]) 71 | cache.add(i, state.Z[i, :], state.X[i, :]) 72 | 73 | return state, proposal_prob 74 | 75 | CHOICES = [(0, 0), (0, 1), (1, 0), (1, 1)] 76 | 77 | def propose_assignments2(model, data, state, update=False): 78 | N, K = state.Z.shape 79 | state = state.copy() 80 | cache = ibp.IBPCache.from_state(model, data, state, np.zeros(N, dtype=bool)) 81 | 82 | proposal_prob = 0. 83 | 84 | for i in range(N): 85 | obs = data.mask[i, :] 86 | x = state.X[i, :] 87 | 88 | evidence = np.zeros(4) 89 | prior_odds = np.zeros(4) 90 | for c, (z1, z2) in enumerate(CHOICES): 91 | z = np.array([z1, z2]) 92 | mu = cache.fpost.predictive_mu(z) 93 | ssq = cache.fpost.predictive_ssq(z) + state.sigma_sq_n 94 | evidence[c] = ibp.gauss_loglik_vec_C2(x[obs], mu[obs], ssq) 95 | 96 | for k in [0, 1]: 97 | if cache.counts[k] > 0: 98 | prior_odds[c] += np.log(cache.counts[k]) - np.log(cache.num_included - cache.counts[k] + 1) 99 | else: 100 | prior_odds[c] += np.log(model.alpha) - np.log(cache.num_included + 1) 101 | 102 | odds = evidence + prior_odds 103 | dist = distributions.MultinomialDistribution.from_odds(odds) 104 | if update: 105 | state.Z[i, :] = CHOICES[dist.sample().argmax()] 106 | proposal_prob += dist.loglik(CHOICES.index(tuple(state.Z[i, :]))) 107 | cache.add(i, state.Z[i, :], state.X[i, :]) 108 | 109 | assert np.isfinite(proposal_prob) 110 | 111 | return state, proposal_prob 112 | 113 | 114 | def ibp_loglik(Z, alpha): 115 | N = Z.shape[0] 116 | idxs = np.where(Z.any(0))[0] 117 | K = idxs.size 118 | 119 | total = -alpha * (1. / np.arange(1, N+1)).sum() 120 | total += alpha * K 121 | 122 | if K > 0: 123 | m = Z[:, idxs].sum(0) 124 | total += scipy.special.gammaln(N - m + 1).sum() 125 | total += scipy.special.gammaln(m).sum() 126 | total -= K * scipy.special.gammaln(N + 1) 127 | 128 | assert np.isfinite(total) 129 | 130 | return total 131 | 132 | 133 | def choose_columns(K): 134 | if np.random.binomial(1, 0.5): 135 | k1 = 'new' 136 | else: 137 | k1 = np.random.randint(0, K) 138 | 139 | if np.random.binomial(1, 0.5): 140 | k2 = 'new' 141 | else: 142 | k2 = np.random.randint(0, K) 143 | if k2 == k1: 144 | k2 = 'new' 145 | 146 | return k1, k2 147 | 148 | def column_probability(K, k1, k2): 149 | total = 0. 150 | if k1 == 'new': 151 | total += np.log(0.5) 152 | else: 153 | total += np.log(0.5) - np.log(K) 154 | 155 | if k2 == 'new': 156 | total += np.log(0.5 + 0.5 / K) 157 | else: 158 | assert k1 != k2 159 | total += np.log(0.5) - np.log(K) 160 | 161 | return total 162 | 163 | 164 | def backward_move_info(K_orig, k1, k2, new_reduced_state): 165 | any_ones = new_reduced_state.Z.any(0) 166 | 167 | K_back = K_orig 168 | if k1 == 'new': 169 | K_back += 1 170 | if k2 == 'new': 171 | K_back += 1 172 | if not any_ones[0]: 173 | K_back -= 1 174 | if not any_ones[1]: 175 | K_back -= 1 176 | 177 | if any_ones[0]: 178 | k1_back = 0 179 | else: 180 | k1_back = 'new' 181 | 182 | if any_ones[1]: 183 | k2_back = 1 184 | else: 185 | k2_back = 'new' 186 | 187 | return K_back, k1_back, k2_back 188 | 189 | 190 | def split_merge_step(model, data, state): 191 | N, K, D = state.X.shape[0], state.Z.shape[1], state.X.shape[1] 192 | 193 | if K <= 2: 194 | return # this case is awkward to deal with, and if it occurs, the model probably isn't too good anyway 195 | 196 | # choose random columns 197 | k1, k2 = choose_columns(K) 198 | 199 | # generate reduced problem 200 | prod = np.zeros(state.X.shape) 201 | for k in range(K): 202 | if k not in (k1, k2): 203 | prod += np.outer(state.Z[:, k], state.A[k, :]) 204 | reduced_data = observations.RealObservations(state.X - prod, np.ones(state.X.shape, dtype=bool)) 205 | reduced_X = state.X - prod 206 | reduced_state = ibp.CollapsedIBPState(reduced_X, np.zeros((N, 2), dtype=int), state.sigma_sq_f, state.sigma_sq_n) 207 | if k1 != 'new': 208 | reduced_state.Z[:, 0] = state.Z[:, k1] 209 | if k2 != 'new': 210 | reduced_state.Z[:, 1] = state.Z[:, k2] 211 | 212 | # propose assignments 213 | new_reduced_state, forward_prob = propose_assignments2(model, reduced_data, reduced_state, True) 214 | forward_prob += column_probability(K, k1, k2) 215 | 216 | # score the states 217 | old_score = ibp_loglik(reduced_state.Z, model.alpha) + evidence(model, reduced_data, reduced_state) 218 | new_score = ibp_loglik(new_reduced_state.Z, model.alpha) + evidence(model, reduced_data, new_reduced_state) 219 | 220 | # backward proposal probability 221 | K_back, k1_back, k2_back = backward_move_info(K, k1, k2, new_reduced_state) 222 | backward_prob = column_probability(K_back, k1_back, k2_back) 223 | _, proposal_prob = propose_assignments2(model, reduced_data, reduced_state, False) 224 | backward_prob += proposal_prob 225 | 226 | mh_score = new_score - old_score + backward_prob - forward_prob 227 | if mh_score > 0.: 228 | acceptance_prob = 1. 229 | else: 230 | acceptance_prob = np.exp(mh_score) 231 | 232 | accept = np.random.binomial(1, acceptance_prob) 233 | 234 | if accept: 235 | A = sample_features(model, reduced_data, new_reduced_state) 236 | 237 | if k1 == 'new': 238 | if np.any(new_reduced_state.Z[:, 0] > 0): 239 | state.Z = np.hstack([state.Z, new_reduced_state.Z[:, 0][:, nax]]) 240 | state.A = np.vstack([state.A, A[0, :][nax, :]]) 241 | else: 242 | state.Z[:, k1] = new_reduced_state.Z[:, 0] 243 | state.A[k1, :] = A[0, :] 244 | 245 | if k2 == 'new': 246 | if np.any(new_reduced_state.Z[:, 1] > 0): 247 | state.Z = np.hstack([state.Z, new_reduced_state.Z[:, 1][:, nax]]) 248 | state.A = np.vstack([state.A, A[1, :][nax, :]]) 249 | else: 250 | state.Z[:, k2] = new_reduced_state.Z[:, 1] 251 | state.A[k2, :] = A[1, :] 252 | 253 | else: 254 | pass 255 | 256 | 257 | 258 | -------------------------------------------------------------------------------- /algorithms/low_rank.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | import scipy.linalg 4 | import time 5 | 6 | from utils import misc 7 | 8 | def sample_variance(values, axis): 9 | a = 0.01 + 0.5 * np.ones(values.shape).sum(axis) 10 | b = 0.01 + 0.5 * (values ** 2).sum(axis) 11 | prec = np.random.gamma(a, 1. / b) 12 | return 1. / prec 13 | 14 | NUM_ITER = 200 15 | 16 | 17 | def fit_model(data_matrix, K, num_iter=NUM_ITER, rotation_trick=True): 18 | N, D = data_matrix.m, data_matrix.n 19 | X = data_matrix.sample_latent_values(np.zeros((N, D)), 1.) 20 | 21 | if rotation_trick: 22 | U_, s_, V_ = scipy.linalg.svd(X, full_matrices=False) 23 | U = U_[:, :K] * np.sqrt(s_[:K][nax, :]) 24 | V = V_[:K, :] * np.sqrt(s_[:K][:, nax]) 25 | else: 26 | U = np.random.normal(size=(N, K)) 27 | V = np.random.normal(size=(K, D)) 28 | 29 | ssq_U = np.mean(U**2, axis=0) 30 | ssq_V = np.mean(V**2, axis=1) 31 | 32 | pred = np.dot(U, V) 33 | if data_matrix.observations.fixed_variance(): 34 | ssq_N = 1. 35 | else: 36 | ssq_N = np.mean((X - pred) ** 2) 37 | 38 | t0 = time.time() 39 | for it in range(num_iter): 40 | if np.any(-data_matrix.observations.mask): 41 | obs = data_matrix.observations.mask 42 | U_var = np.outer(np.ones(N), ssq_U) 43 | V_var = np.outer(ssq_V, np.ones(D)) 44 | U = misc.sample_gaussian_matrix2(V.T, X.T, 1. / U_var.T, obs.T / ssq_N).T 45 | V = misc.sample_gaussian_matrix2(U, X, 1. / V_var, obs / ssq_N) 46 | else: 47 | U = misc.sample_gaussian_matrix(np.eye(N), V, X, np.ones(N) / ssq_N, np.ones(D), np.ones(N), 1. / ssq_U) 48 | V = misc.sample_gaussian_matrix(U, np.eye(D), X, np.ones(N) / ssq_N, np.ones(D), 1. / ssq_V, np.ones(D)) 49 | 50 | 51 | # rotation trick (to speed up learning the variances) 52 | if rotation_trick and it < num_iter // 4: 53 | UtU = np.dot(U.T, U) 54 | _, Q = scipy.linalg.eigh(UtU) 55 | Q = Q[:, ::-1] 56 | U = np.dot(U, Q) 57 | V = np.dot(Q.T, V) 58 | 59 | 60 | ssq_U = sample_variance(U, 0) 61 | ssq_V = sample_variance(V, 1) 62 | ssq_U = np.sqrt(ssq_U * ssq_V) 63 | ssq_V = ssq_U.copy() 64 | 65 | pred = np.dot(U, V) 66 | if not data_matrix.observations.fixed_variance(): 67 | ssq_N = sample_variance(X - pred, None) 68 | 69 | X = data_matrix.sample_latent_values(pred, ssq_N) 70 | 71 | if time.time() - t0 > 3600.: # 1 hour 72 | break 73 | 74 | return U, V, ssq_U, ssq_V, ssq_N, X 75 | 76 | 77 | -------------------------------------------------------------------------------- /algorithms/low_rank_poisson.py: -------------------------------------------------------------------------------- 1 | import itertools 2 | import numpy as np 3 | nax = np.newaxis 4 | import random 5 | import scipy.integrate 6 | import scipy.linalg 7 | import scipy.special 8 | import time 9 | 10 | from utils import distributions, gaussians, misc, psd_matrices 11 | 12 | A = 0.1 13 | B = 0.1 14 | 15 | VERBOSE = False 16 | SEED_0 = False 17 | K_INIT = 2 18 | 19 | class State: 20 | def __init__(self, U, V, ssq_U, ssq_N): 21 | self.U = U 22 | self.V = V 23 | self.ssq_U = ssq_U 24 | self.ssq_N = ssq_N 25 | 26 | def copy(self): 27 | return State(self.U.copy(), self.V.copy(), self.ssq_U.copy(), self.ssq_N) 28 | 29 | def sample_variance(values, axis, mask=None): 30 | if mask is None: 31 | mask = np.ones(values.shape, dtype=bool) 32 | a = 0.01 + 0.5 * mask.sum(axis) 33 | b = 0.01 + 0.5 * (mask * values ** 2).sum(axis) 34 | prec = np.random.gamma(a, 1. / b) 35 | return 1. / prec 36 | 37 | def p_u(u): 38 | N = u.size 39 | return -(A + 0.5 * N) * np.log(B + 0.5 * np.sum(u ** 2)) 40 | 41 | def givens_move(U, V, a, b): 42 | N = U.shape[0] 43 | theta = np.linspace(-np.pi / 4., np.pi / 4.) 44 | uaa = np.dot(U[:, a], U[:, a]) 45 | uab = np.dot(U[:, a], U[:, b]) 46 | ubb = np.dot(U[:, b], U[:, b]) 47 | 48 | sin, cos = np.sin(theta), np.cos(theta) 49 | uaa_prime_ssq = uaa * cos ** 2 + 2 * uab * cos * sin + ubb * sin ** 2 50 | ubb_prime_ssq = uaa * sin ** 2 - 2 * uab * cos * sin + ubb * cos ** 2 51 | odds = -(A + 0.5 * N) * (np.log(B + 0.5 * uaa_prime_ssq) + np.log(B + 0.5 * ubb_prime_ssq)) 52 | p = np.exp(odds - np.logaddexp.reduce(odds)) 53 | p /= np.sum(p) 54 | idx = np.random.multinomial(1, p).argmax() 55 | 56 | theta = theta[idx] 57 | sin, cos = np.sin(theta), np.cos(theta) 58 | U[:, a], U[:, b] = cos * U[:, a] + sin * U[:, b], -sin * U[:, a] + cos * U[:, b] 59 | V[a, :], V[b, :] = cos * V[a, :] + sin * V[b, :], -sin * V[a, :] + cos * V[b, :] 60 | 61 | def givens_moves(state): 62 | U, V = state.U, state.V 63 | N, K, D = U.shape[0], U.shape[1], V.shape[1] 64 | pairs = list(itertools.combinations(range(K), 2)) 65 | if not SEED_0: 66 | random.shuffle(pairs) 67 | for a, b in pairs: 68 | givens_move(U, V, a, b) 69 | state.ssq_U = sample_variance(U, 0) 70 | 71 | def scaling_move(U, V, a): 72 | alpha_pts = np.logspace(-2., 2., 100) 73 | odds = np.zeros(len(alpha_pts)) 74 | for i, alpha in enumerate(alpha_pts): 75 | odds[i] = p_u(alpha * U[:, a]) + distributions.gauss_loglik(V[a, :] / alpha, 0., 1.).sum() 76 | p = np.exp(odds - np.logaddexp.reduce(odds)) 77 | p /= np.sum(p) 78 | idx = np.random.multinomial(1, p).argmax() 79 | alpha = alpha_pts[idx] 80 | 81 | U[:, a] *= alpha 82 | V[a, :] /= alpha 83 | 84 | def scaling_moves(state): 85 | U, V = state.U, state.V 86 | N, K, D = U.shape[0], U.shape[1], V.shape[1] 87 | for a in range(K): 88 | scaling_move(U, V, a) 89 | state.ssq_U = sample_variance(U, 0) 90 | 91 | 92 | def cond_U(X, obs, V, ssq_U, ssq_N): 93 | N, K, D = X.shape[0], V.shape[0], X.shape[1] 94 | if np.all(obs): 95 | Lambda = np.diag(1. / ssq_U) + np.dot(V, V.T) / ssq_N 96 | Lambda = Lambda[nax, :, :] 97 | else: 98 | Lambda = np.zeros((N, K, K)) 99 | for i in range(N): 100 | idxs = np.where(obs[i, :])[0] 101 | V_curr = V[:, idxs] 102 | Lambda[i, :, :] = np.diag(1. / ssq_U) + np.dot(V_curr, V_curr.T) / ssq_N 103 | h = np.dot(X * obs, V.T) / ssq_N 104 | return gaussians.Potential(h, psd_matrices.FullMatrix(Lambda), 0.) 105 | 106 | def cond_Vt(X, obs, U, ssq_N): 107 | K = U.shape[1] 108 | return cond_U(X.T, obs.T, U.T, np.ones(K), ssq_N) 109 | 110 | def sample_U_V(state, X, obs): 111 | state.U = cond_U(X, obs, state.V, state.ssq_U, state.ssq_N).to_distribution().sample() 112 | state.V = cond_Vt(X, obs, state.U, state.ssq_N).to_distribution().sample().T 113 | 114 | 115 | class InstabilityError(Exception): 116 | pass 117 | 118 | class ProposalInfo: 119 | def __init__(self, resid, obs, ssq_N): 120 | N, D = resid.shape 121 | self.resid = resid.copy() 122 | self.obs = obs.copy() 123 | self.ssq_N = ssq_N 124 | self.u = np.zeros(N) 125 | self.assigned = np.zeros(N, dtype=bool) 126 | self.lam = np.ones(D) # N(0, 1) prior 127 | self.h = np.zeros(D) 128 | self.v = None 129 | self.ssq_u = None 130 | self.num_assigned = 0 131 | self.sum_u_sq = 0. 132 | 133 | def update_u(self, i, ui): 134 | assert not self.assigned[i] 135 | self.u[i] = ui 136 | idxs = np.where(self.obs[i, :])[0] 137 | self.lam[idxs] += ui ** 2 / self.ssq_N 138 | self.h[idxs] += ui * self.resid[i, idxs] / self.ssq_N 139 | self.assigned[i] = True 140 | self.num_assigned += 1 141 | self.sum_u_sq += ui ** 2 142 | 143 | def cond_v(self): 144 | return distributions.GaussianDistribution(self.h / self.lam, 1. / self.lam) 145 | 146 | def cond_ssq_u(self): 147 | a = A + 0.5 * self.num_assigned 148 | b = B + 0.5 * self.sum_u_sq 149 | return distributions.InverseGammaDistribution(a, b) 150 | 151 | def cond_u(self, i): 152 | idxs = np.where(self.obs[i, :])[0] 153 | #lam = np.dot(self.v[idxs], self.v[idxs]) / self.ssq_N + 1. / self.ssq_u 154 | v = self.v[idxs] 155 | lam = (v**2).sum() / self.ssq_N + 1. / self.ssq_u 156 | h = (self.resid[i, idxs] * v).sum() / self.ssq_N 157 | if np.abs(h / lam) < 1e-10: 158 | raise InstabilityError() 159 | return distributions.GaussianDistribution(h / lam, 1. / lam) 160 | 161 | def fit_v_and_var(self): 162 | self.v = self.cond_v().maximize() 163 | #self.v /= np.sqrt(np.mean(self.v ** 2)) 164 | self.ssq_u = self.sum_u_sq / (self.num_assigned + 1) 165 | 166 | class Proposal: 167 | def __init__(self, u, v, ssq_u): 168 | self.u = u 169 | self.v = v 170 | self.ssq_u = ssq_u 171 | 172 | 173 | def make_proposal(resid, obs, ssq_N, order=None): 174 | pi = ProposalInfo(resid, obs, ssq_N) 175 | N, D = resid.shape 176 | if order is None: 177 | order = range(N) 178 | 179 | for i in order: 180 | if i == order[0]: 181 | dist = distributions.GaussianDistribution(0., 1.) 182 | else: 183 | dist = pi.cond_u(i) 184 | pi.update_u(i, dist.sample()) 185 | pi.fit_v_and_var() 186 | 187 | v = pi.cond_v().sample() 188 | ssq_u = pi.cond_ssq_u().sample() 189 | 190 | return Proposal(pi.u.copy(), v, ssq_u) 191 | 192 | def proposal_probability(resid, obs, ssq_N, proposal, order=None): 193 | pi = ProposalInfo(resid, obs, ssq_N) 194 | N, D = resid.shape 195 | if order is None: 196 | order = range(N) 197 | 198 | total = 0. 199 | for i in order: 200 | if i == order[0]: 201 | dist = distributions.GaussianDistribution(0., 1.) 202 | else: 203 | dist = pi.cond_u(i) 204 | 205 | total += dist.loglik(proposal.u[i]) 206 | pi.update_u(i, proposal.u[i]) 207 | pi.fit_v_and_var() 208 | 209 | total += pi.cond_v().loglik(proposal.v).sum() 210 | total += pi.cond_ssq_u().loglik(proposal.ssq_u) 211 | 212 | return total 213 | 214 | 215 | def log_poisson(K, lam): 216 | return -lam + K * np.log(lam) - scipy.special.gammaln(K+1) 217 | 218 | def p_star(state, X, obs): 219 | K = state.U.shape[1] 220 | total = log_poisson(K, 1.) 221 | 222 | var_prior = distributions.InverseGammaDistribution(A, B) 223 | total += var_prior.loglik(state.ssq_U).sum() 224 | 225 | assert np.isfinite(total) 226 | 227 | U_dist = distributions.GaussianDistribution(0., state.ssq_U[nax, :]) 228 | total += U_dist.loglik(state.U).sum() 229 | 230 | assert np.isfinite(total) 231 | 232 | V_dist = distributions.GaussianDistribution(0., 1.) 233 | total += V_dist.loglik(state.V).sum() 234 | 235 | assert np.isfinite(total) 236 | 237 | pred = np.dot(state.U, state.V) 238 | X_dist = distributions.GaussianDistribution(pred, state.ssq_N) 239 | total += X_dist.loglik(X)[obs].sum() 240 | 241 | assert np.isfinite(total) 242 | 243 | return total 244 | 245 | def add_delete_move(state, X, obs): 246 | N, K, D = state.U.shape[0], state.U.shape[1], state.V.shape[1] 247 | order = np.random.permutation(N) 248 | if np.random.binomial(1, 0.5): # add move 249 | pred = np.dot(state.U, state.V) 250 | resid = X - pred 251 | try: 252 | proposal = make_proposal(resid, obs, state.ssq_N, order) 253 | except InstabilityError: 254 | return state 255 | except OverflowError: 256 | return state 257 | forward_prob = -np.log(2) + proposal_probability(resid, obs, state.ssq_N, proposal, order) 258 | backward_prob = -np.log(2) - np.log(K + 1) 259 | 260 | new_U = np.hstack([state.U, proposal.u[:, nax]]) 261 | new_V = np.vstack([state.V, proposal.v[nax, :]]) 262 | new_ssq_U = np.concatenate([state.ssq_U, [proposal.ssq_u]]) 263 | new_state = State(new_U, new_V, new_ssq_U, state.ssq_N) 264 | p_star_new = p_star(new_state, X, obs) 265 | p_star_old = p_star(state, X, obs) 266 | 267 | ratio = p_star_new - p_star_old - forward_prob + backward_prob 268 | assert np.isfinite(ratio) 269 | if np.random.binomial(1, min(np.exp(ratio), 1)): 270 | if VERBOSE: 271 | print 'Add move accepted (ratio=%1.2f)' % ratio 272 | return new_state 273 | else: 274 | if VERBOSE: 275 | print 'Add move rejected (ratio=%1.2f)' % ratio 276 | return state 277 | 278 | else: # delete move 279 | if K <= 2: # zero or one dimensions causes NumPy awkwardness 280 | return state 281 | 282 | k = np.random.randint(0, K) 283 | pred = np.dot(state.U, state.V) - np.outer(state.U[:, k], state.V[k, :]) 284 | resid = X - pred 285 | reverse_proposal = Proposal(state.U[:, k], state.V[k, :], state.ssq_U[k]) 286 | forward_prob = -np.log(2) - np.log(K) 287 | try: 288 | backward_prob = -np.log(2) + proposal_probability(resid, obs, state.ssq_N, reverse_proposal, order) 289 | except InstabilityError: 290 | return state 291 | except OverflowError: 292 | return state 293 | 294 | new_U = np.hstack([state.U[:, :k], state.U[:, k+1:]]) 295 | new_V = np.vstack([state.V[:k, :], state.V[k+1:, :]]) 296 | new_ssq_U = np.concatenate([state.ssq_U[:k], state.ssq_U[k+1:]]) 297 | new_state = State(new_U, new_V, new_ssq_U, state.ssq_N) 298 | 299 | p_star_new = p_star(new_state, X, obs) 300 | p_star_old = p_star(state, X, obs) 301 | 302 | ratio = p_star_new - p_star_old - forward_prob + backward_prob 303 | assert np.isfinite(ratio) 304 | if np.random.binomial(1, min(np.exp(ratio), 1)): 305 | if VERBOSE: 306 | print 'Delete move accepted (ratio=%1.2f)' % ratio 307 | return new_state 308 | else: 309 | if VERBOSE: 310 | print 'Delete move rejected (ratio=%1.2f)' % ratio 311 | return state 312 | 313 | 314 | 315 | NUM_ITER = 200 316 | 317 | def init_state(data_matrix, K): 318 | N, D = data_matrix.m, data_matrix.n 319 | X = data_matrix.sample_latent_values(np.zeros((N, D)), 1.) 320 | U = np.random.normal(0., 1. / np.sqrt(K), size=(N, K)) 321 | V = np.random.normal(0., 1., size=(K, D)) 322 | ssq_U = np.mean(U**2, axis=0) 323 | 324 | pred = np.dot(U, V) 325 | if data_matrix.observations.fixed_variance(): 326 | ssq_N = 1. 327 | else: 328 | ssq_N = np.mean((X - pred) ** 2) 329 | return X, State(U, V, ssq_U, ssq_N) 330 | 331 | def fit_model(data_matrix, K=K_INIT, num_iter=NUM_ITER, name=None): 332 | if SEED_0: 333 | np.random.seed(0) 334 | N, D = data_matrix.m, data_matrix.n 335 | X, state = init_state(data_matrix, K) 336 | 337 | pbar = misc.pbar(num_iter) 338 | 339 | t0 = time.time() 340 | for it in range(num_iter): 341 | sample_U_V(state, X, data_matrix.observations.mask) 342 | 343 | old = np.dot(state.U, state.V) 344 | givens_moves(state) 345 | assert np.allclose(np.dot(state.U, state.V), old) 346 | scaling_moves(state) 347 | assert np.allclose(np.dot(state.U, state.V), old) 348 | 349 | state.ssq_U = sample_variance(state.U, 0) 350 | pred = np.dot(state.U, state.V) 351 | if not data_matrix.observations.fixed_variance(): 352 | state.ssq_N = sample_variance(X - pred, None, mask=data_matrix.observations.mask) 353 | 354 | X = data_matrix.sample_latent_values(pred, state.ssq_N) 355 | 356 | for i in range(10): 357 | state = add_delete_move(state, X, data_matrix.observations.mask) 358 | 359 | if VERBOSE: 360 | print 'K =', state.U.shape[1] 361 | print 'ssq_N =', state.ssq_N 362 | print 'X.var() =', X.var() 363 | 364 | #misc.print_dot(it+1, num_iter) 365 | pbar.update(it) 366 | 367 | if time.time() - t0 > 3600.: # 1 hour 368 | break 369 | 370 | pbar.finish() 371 | 372 | return state, X 373 | 374 | 375 | 376 | -------------------------------------------------------------------------------- /algorithms/slice_sampling.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | 4 | from utils import misc 5 | 6 | MAX_ITER = 1000 7 | 8 | def slice_sample(log_f, x0, L, U): 9 | assert L < x0 < U 10 | log_y = log_f(x0) + np.log(np.random.uniform(0, 1)) 11 | 12 | count = 0 13 | while True: 14 | x1 = np.random.uniform(L, U) 15 | if log_f(x1) >= log_y: 16 | return x1 17 | 18 | if x1 < x0: 19 | L = x1 20 | else: 21 | U = x1 22 | 23 | count += 1 24 | if count >= MAX_ITER: 25 | raise RuntimeError('Exceeded maximum iterations for slice sampling') 26 | 27 | 28 | class GaussObj: 29 | def __init__(self, log_f, mu, sigma_sq): 30 | self.log_f = log_f 31 | self.mu = mu 32 | self.sigma_sq = sigma_sq 33 | 34 | def __call__(self, x): 35 | return self.log_f(x) - 0.5 * (x - self.mu)**2 / self.sigma_sq 36 | 37 | def slice_sample_gauss(log_f, mu, sigma_sq, x0): 38 | sigma = np.sqrt(sigma_sq) 39 | temp = (x0 - mu) / sigma 40 | if not -4. <= temp <= 4.: 41 | # If x takes an extreme value, scipy.special.erf may fail, so fall back to ordinary slice sampling. 42 | # This isn't a valid sample, since it assumes a contiguous interval, which may not be the case. 43 | # Hopefully this case doesn't arise too often. 44 | return slice_sample(GaussObj(log_f, mu, sigma_sq), x0, x0 - 4. * sigma, x0 + 4. * sigma) 45 | 46 | L, U = 1e-10, 1. - 1e-10 47 | p0 = misc.inv_probit((x0 - mu) / sigma) 48 | log_y = log_f(x0) + np.log(np.random.uniform(0, 1)) 49 | 50 | count = 0 51 | while True: 52 | p1 = np.random.uniform(L, U) 53 | x1 = mu + misc.probit(p1) * sigma 54 | if log_f(x1) >= log_y: 55 | #if np.random.binomial(1, 0.001): 56 | # print 'Took %d iterations' % count 57 | return x1 58 | 59 | if p1 < p0: 60 | L = p1 61 | else: 62 | U = p1 63 | 64 | count += 1 65 | if count >= MAX_ITER: 66 | raise RuntimeError('Exceeded maximum iterations for slice sampling') 67 | 68 | 69 | 70 | -------------------------------------------------------------------------------- /algorithms/sparse_coding.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | 4 | import slice_sampling 5 | from utils import distributions 6 | 7 | debugger = None 8 | 9 | class SparseCodingState: 10 | def __init__(self, S, A, Z, sigma_sq_N, mu_Z, sigma_sq_Z, sigma_sq_A): 11 | self.S = S 12 | self.A = A 13 | self.Z = Z 14 | self.sigma_sq_N = sigma_sq_N 15 | self.mu_Z = mu_Z 16 | self.sigma_sq_Z = sigma_sq_Z 17 | self.sigma_sq_A = sigma_sq_A 18 | 19 | def copy(self): 20 | if np.isscalar(self.mu_Z): 21 | mu_Z = self.mu_Z 22 | else: 23 | mu_Z = self.mu_Z.copy() 24 | return SparseCodingState(self.S.copy(), self.A.copy(), self.Z.copy(), self.sigma_sq_N, mu_Z, 25 | self.sigma_sq_Z, self.sigma_sq_A) 26 | 27 | 28 | class LogFCollapsed: 29 | def __init__(self, lam, h): 30 | self.lam = lam 31 | self.h = h 32 | 33 | def __call__(self, z): 34 | sigma_sq = np.exp(z) + 1. / self.lam 35 | mu = self.h / self.lam 36 | 37 | return -0.5 * np.log(sigma_sq) + \ 38 | -0.5 * mu ** 2 / sigma_sq 39 | 40 | class LogFUncollapsed: 41 | def __init__(self, s): 42 | self.s = s 43 | 44 | def __call__(self, z): 45 | return -0.5 * z + \ 46 | -0.5 * self.s ** 2 / np.exp(z) 47 | 48 | 49 | def cond_mu_Z(state, by_column=False): 50 | if by_column: 51 | mu = state.Z.mean(0) 52 | sigma_sq = state.sigma_sq_Z / state.Z.shape[0] * np.ones(state.Z.shape[1]) 53 | else: 54 | mu = state.Z.mean() 55 | sigma_sq = state.sigma_sq_Z / state.Z.size 56 | return distributions.GaussianDistribution(mu, sigma_sq) 57 | 58 | def cond_sigma_sq_Z(state): 59 | a = 1. + 0.5 * state.Z.size 60 | b = 1. + 0.5 * np.sum((state.Z - state.mu_Z) ** 2) 61 | return distributions.InverseGammaDistribution(a, b) 62 | 63 | 64 | def sample_Z(state): 65 | N, K= state.S.shape[0], state.Z.shape[1] 66 | for i in range(N): 67 | for k in range(K): 68 | log_f = LogFUncollapsed(state.S[i, k]) 69 | if np.isscalar(state.mu_Z): 70 | mu_Z = state.mu_Z 71 | else: 72 | mu_Z = state.mu_Z[k] 73 | state.Z[i, k] = slice_sampling.slice_sample_gauss(log_f, mu_Z, state.sigma_sq_Z, state.Z[i, k]) 74 | 75 | if hasattr(debugger, 'after_sample_Z'): 76 | debugger.after_sample_Z(vars()) 77 | 78 | -------------------------------------------------------------------------------- /algorithms/variational.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | import random 4 | Random = random.Random() 5 | import scipy.linalg, scipy.stats 6 | 7 | from utils import misc 8 | 9 | 10 | 11 | def perturb_simplex(q, eps=1e-5): 12 | eps = 1e-5 13 | k = q.size 14 | q = q.copy() 15 | for tr in range(10): 16 | large_inds = np.where(q > eps)[0] 17 | i = Random.choice(large_inds) 18 | j = np.random.randint(0, k) 19 | if i == j or q[j] > 1-eps: 20 | continue 21 | q[i] -= eps 22 | q[j] += eps 23 | return q 24 | 25 | def perturb_psd(S, eps=1e-5): 26 | d, V = scipy.linalg.eigh(S) 27 | d *= np.exp(np.random.normal(0., eps, size=d.shape)) 28 | return np.dot(np.dot(V, np.diag(d)), V.T) 29 | 30 | def perturb_pos(x, eps=1e-5): 31 | return x * np.exp(np.random.normal(0., eps, size=x.shape)) 32 | 33 | 34 | 35 | ALPHA = 1. 36 | class MultinomialEstimator: 37 | def __init__(self, pi, A): 38 | self.pi = pi 39 | self.nclass = pi.size 40 | self.A = A 41 | 42 | def expected_log_prob(self, rep): 43 | return np.dot(rep.q, np.log(self.pi)) 44 | 45 | def fit_representation(self, t, Sigma_N, init=None): 46 | data_term = np.zeros(self.nclass) 47 | Lambda_N = np.linalg.inv(Sigma_N) 48 | for i in range(self.nclass): 49 | diff = t - self.A[i,:] 50 | #data_term[i] = -0.5 * np.sum(diff**2 / sigma_sq_N) 51 | data_term[i] = -0.5 * np.dot(np.dot(diff, Lambda_N), diff) 52 | log_q = np.log(self.pi) + data_term 53 | log_q -= np.logaddexp.reduce(log_q) 54 | q = np.exp(log_q) 55 | return MultinomialRepresentation(q) 56 | 57 | def init_representation(self): 58 | return MultinomialRepresentation(self.pi) 59 | 60 | @staticmethod 61 | def random(k, n): 62 | pi = np.random.uniform(0., 1., size=k) 63 | pi /= pi.sum() 64 | A = np.random.normal(size=(k, n)) 65 | return MultinomialEstimator(pi, A) 66 | 67 | @staticmethod 68 | def random_u(k): 69 | u = np.random.uniform(0., 1., size=k) 70 | return u / u.sum() 71 | 72 | class MultinomialRepresentation: 73 | def __init__(self, q): 74 | self.q = q 75 | assert np.allclose(np.sum(self.q), 1.) 76 | 77 | def expected_value(self): 78 | return self.q 79 | 80 | def covariance(self): 81 | return np.diag(self.q) - np.outer(self.q, self.q) 82 | 83 | def entropy(self): 84 | return scipy.stats.distributions.entropy(self.q) 85 | 86 | def sample(self): 87 | return np.random.multinomial(1, self.q) 88 | 89 | def perturb(self, eps): 90 | return MultinomialRepresentation(perturb_simplex(self.q, eps)) 91 | 92 | 93 | 94 | class BernoulliEstimator: 95 | def __init__(self, pi, A): 96 | self.pi = pi 97 | self.A = A 98 | self.nclass = self.pi.size 99 | 100 | def expected_log_prob(self, rep): 101 | return np.dot(rep.q, np.log(self.pi)) + np.dot(1-rep.q, np.log(1-self.pi)) 102 | 103 | def fit_representation(self, t, Sigma_N, init=None): 104 | Lambda_N = np.linalg.inv(Sigma_N) 105 | J = -np.log(self.pi) + np.log(1. - self.pi) - np.dot(self.A, np.dot(Lambda_N, t)) 106 | Lambda = np.dot(np.dot(self.A, Lambda_N), self.A.T) 107 | return BernoulliRepresentation(misc.mean_field(J, Lambda, init.q)) 108 | 109 | def init_representation(self): 110 | return BernoulliRepresentation(self.pi) 111 | 112 | @staticmethod 113 | def random(k, n): 114 | pi = np.random.uniform(0., 1., size=k) 115 | A = np.random.normal(size=(k, n)) 116 | return BernoulliEstimator(pi, A) 117 | 118 | @staticmethod 119 | def random_u(k): 120 | return np.random.uniform(0., 1., size=k) 121 | 122 | class BernoulliRepresentation: 123 | def __init__(self, q): 124 | self.q = q 125 | 126 | def expected_value(self): 127 | return self.q 128 | 129 | def covariance(self): 130 | return np.diag(self.q * (1. - self.q)) 131 | 132 | def entropy(self): 133 | #return misc.bernoulli_entropy(self.q) * np.log(2) 134 | return np.sum([scipy.stats.distributions.entropy([p, 1.-p]) for p in self.q]) 135 | 136 | def sample(self): 137 | return np.random.binomial(1, self.q) 138 | 139 | def perturb(self, eps): 140 | q = np.clip(np.random.normal(self.q, eps), 0., 1.) 141 | return BernoulliRepresentation(q) 142 | 143 | 144 | 145 | class VariationalProblem: 146 | def __init__(self, estimators, x, Sigma_N): 147 | self.estimators = estimators 148 | self.x = x 149 | self.nterms = len(estimators) 150 | self.nfea = self.x.size 151 | self.Sigma_N = Sigma_N 152 | assert Sigma_N.shape == (x.size, x.size) 153 | 154 | def objective_function(self, reps, collapse_z=False): 155 | assert len(reps) == self.nterms 156 | 157 | fobj = 0. 158 | m = np.zeros(self.nfea) 159 | S = np.zeros((self.nfea, self.nfea)) 160 | for estimator, rep in zip(self.estimators, reps): 161 | # E[log P(u|U)] 162 | fobj += estimator.expected_log_prob(rep) 163 | 164 | # H(q) 165 | fobj += rep.entropy() 166 | 167 | # sufficient statistics 168 | m += np.dot(estimator.A.T, rep.expected_value()) 169 | S += misc.mult([estimator.A.T, rep.covariance(), estimator.A]) 170 | 171 | Lambda_N = np.linalg.inv(self.Sigma_N) 172 | 173 | fobj += -0.5 * self.nfea * np.log(2*np.pi) - 0.5 * misc.logdet(self.Sigma_N) 174 | diff = self.x - m 175 | fobj += -0.5 * np.dot(np.dot(diff, Lambda_N), diff) 176 | fobj += -0.5 * np.sum(S * Lambda_N) 177 | 178 | return fobj 179 | 180 | def update_one(self, reps, i): 181 | reps = reps[:] # make copy 182 | m = np.zeros(self.nfea) 183 | for j, estimator in enumerate(self.estimators): 184 | if i == j: 185 | continue 186 | m += np.dot(estimator.A.T, reps[j].expected_value()) 187 | 188 | t = self.x - m 189 | reps[i] = self.estimators[i].fit_representation(t, self.Sigma_N, reps[i]) 190 | return reps 191 | 192 | def update_all(self, reps): 193 | for i in range(self.nterms): 194 | reps = self.update_one(reps, i) 195 | return reps 196 | 197 | def solve(self): 198 | if len(self.estimators) <= 1: 199 | NUM_ITER = 1 200 | else: 201 | NUM_ITER = 10 202 | reps = [estimator.init_representation() for estimator in self.estimators] 203 | for it in range(NUM_ITER): 204 | reps = self.update_all(reps) 205 | return reps 206 | 207 | -------------------------------------------------------------------------------- /config_example.py: -------------------------------------------------------------------------------- 1 | # experiment directories 2 | RESULTS_PATH = '/path/to/results' 3 | CACHE_PATH = '/path/to/cached' 4 | REPORT_PATH = '/path/to/reports' 5 | 6 | # 'single_process' to run in a single process, 'parallel' to use GNU Parallel 7 | SCHEDULER = 'single_process' 8 | 9 | # additional options for GNU Parallel 10 | DEFAULT_NUM_JOBS = 2 11 | JOBS_PATH = '/path/to/job_info' 12 | -------------------------------------------------------------------------------- /example.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from experiments import init_experiment, QuickParams 4 | from observations import DataMatrix 5 | 6 | ### 7 | ### First follow the configuration directions in README.md. Then run the following: 8 | ### 9 | ### python example.py 10 | ### python experiments.py everything example 11 | ### 12 | 13 | def read_array(fname): 14 | return np.array([map(float, line.split()) for line in open(fname)]) 15 | 16 | def read_list(fname): 17 | return map(str.strip, open(fname).readlines()) 18 | 19 | def init(): 20 | X = read_array('example_data/animals-data.txt') 21 | row_labels = read_list('example_data/animals-names.txt') 22 | col_labels = read_list('example_data/animals-features.txt') 23 | 24 | # normalize to zero mean, unit variance 25 | X -= X.mean() 26 | X /= X.std() 27 | 28 | # since the data were binary, add a small amount of noise to prevent degeneracy 29 | X = np.random.normal(X, np.sqrt(0.1)) 30 | 31 | data_matrix = DataMatrix.from_real_values(X, row_labels=row_labels, col_labels=col_labels) 32 | init_experiment('example', data_matrix, QuickParams(search_depth=2)) 33 | 34 | if __name__ == '__main__': 35 | init() 36 | 37 | 38 | 39 | -------------------------------------------------------------------------------- /example_data/animals-features.txt: -------------------------------------------------------------------------------- 1 | black 2 | white 3 | blue 4 | brown 5 | gray 6 | orange 7 | red 8 | yellow 9 | patches 10 | spots 11 | stripes 12 | furry 13 | hairless 14 | toughskin 15 | big 16 | small 17 | bulbous 18 | lean 19 | flippers 20 | hands 21 | hooves 22 | pads 23 | paws 24 | longleg 25 | longneck 26 | tail 27 | chewteeth 28 | meatteeth 29 | buckteeth 30 | strainteeth 31 | horns 32 | claws 33 | tusks 34 | smelly 35 | flys 36 | hops 37 | swims 38 | tunnels 39 | walks 40 | fast 41 | slow 42 | strong 43 | weak 44 | muscle 45 | bipedal 46 | quadrapedal 47 | active 48 | inactive 49 | nocturnal 50 | hibernate 51 | agility 52 | fish 53 | meat 54 | plankton 55 | vegetation 56 | insects 57 | forager 58 | grazer 59 | hunter 60 | scavenger 61 | skimmer 62 | stalker 63 | newworld 64 | oldworld 65 | arctic 66 | coastal 67 | desert 68 | bush 69 | plains 70 | forest 71 | fields 72 | jungle 73 | mountains 74 | ocean 75 | ground 76 | water 77 | tree 78 | cave 79 | fierce 80 | timid 81 | smart 82 | group 83 | solitary 84 | nestspot 85 | domestic 86 | -------------------------------------------------------------------------------- /example_data/animals-names.txt: -------------------------------------------------------------------------------- 1 | antelope 2 | grizzly bear 3 | killer whale 4 | beaver 5 | dalmatian 6 | persian cat 7 | horse 8 | german shepherd 9 | blue whale 10 | siamese cat 11 | skunk 12 | mole 13 | tiger 14 | hippopotamus 15 | leopard 16 | moose 17 | spider monkey 18 | humpback whale 19 | elephant 20 | gorilla 21 | ox 22 | fox 23 | sheep 24 | seal 25 | chimpanzee 26 | hamster 27 | squirrel 28 | rhinoceros 29 | rabbit 30 | bat 31 | giraffe 32 | wolf 33 | chihuahua 34 | rat 35 | weasel 36 | otter 37 | buffalo 38 | zebra 39 | giant panda 40 | deer 41 | bobcat 42 | pig 43 | lion 44 | mouse 45 | polar bear 46 | collie 47 | walrus 48 | raccoon 49 | cow 50 | dolphin 51 | -------------------------------------------------------------------------------- /grammar.py: -------------------------------------------------------------------------------- 1 | 2 | import parsing 3 | 4 | START = 'g' 5 | 6 | PRODUCTION_RULES = {'low-rank': [('g', ('+', ('*', 'g', 'g'), 'g'))], 7 | 8 | 'clustering': [('g', ('+', ('*', 'm', 'g'), 'g')), 9 | ('g', ('+', ('*', 'g', 'M'), 'g'))], 10 | 11 | 'binary': [('g', ('+', ('*', 'b', 'g'), 'g')), 12 | ('g', ('+', ('*', 'g', 'B'), 'g'))], 13 | 14 | 'chain': [('g', ('+', ('*', 'c', 'g'), 'g')), 15 | ('g', ('+', ('*', 'g', 'C'), 'g'))], 16 | 17 | 'sparsity': [('g', ('s', 'g'))], 18 | 19 | 'expand-disc': [('m', ('+', ('*', 'm', 'g'), 'g')), 20 | ('M', ('+', ('*', 'g', 'M'), 'g')), 21 | ('b', ('+', ('*', 'b', 'g'), 'g')), 22 | ('B', ('+', ('*', 'g', 'B'), 'g'))], 23 | 24 | 'm-to-b': [('m', 'b')], 25 | } 26 | 27 | 28 | def is_valid(structure): 29 | if type(structure) == str and structure != 'g': 30 | return False 31 | if type(structure) == tuple and structure[0] == 's': 32 | return False 33 | return True 34 | 35 | def list_successors_helper(structure, rule_names, is_noise, expand_noise=True): 36 | rules = reduce(list.__add__, [PRODUCTION_RULES[rn] for rn in rule_names]) 37 | 38 | if is_noise and not expand_noise: 39 | return [] 40 | 41 | if type(structure) == str: 42 | return [rhs for lhs, rhs in rules if lhs == structure] 43 | 44 | successors = [] 45 | for pos in range(len(structure)): 46 | is_noise = (structure[0] == '+' and pos == len(structure) - 1) 47 | for child_succ in list_successors_helper(structure[pos], rule_names, is_noise, expand_noise): 48 | if is_noise and type(child_succ) == tuple and child_succ[0] == 's': 49 | continue 50 | successors.append(structure[:pos] + (child_succ,) + structure[pos+1:]) 51 | return successors 52 | 53 | def list_successors(structure, rules, expand_noise=True): 54 | successors = list_successors_helper(structure, rules, False, expand_noise) 55 | return filter(is_valid, successors) 56 | 57 | def collapse_sums(structure): 58 | if type(structure) == str: 59 | return structure 60 | elif structure[0] == '+': 61 | new_structure = ('+',) 62 | for s_ in structure[1:]: 63 | s = collapse_sums(s_) 64 | if type(s) == tuple and s[0] == '+': 65 | new_structure = new_structure + s[1:] 66 | else: 67 | new_structure = new_structure + (s,) 68 | return new_structure 69 | else: 70 | return tuple([collapse_sums(s) for s in structure]) 71 | 72 | def list_collapsed_successors(structure, rule_names, expand_noise=True): 73 | return [collapse_sums(s) for s in list_successors_helper(structure, rule_names, False, expand_noise) 74 | if is_valid(collapse_sums(s))] 75 | 76 | def pretty_print(structure, spaces=True, quotes=True): 77 | if spaces: 78 | PLUS = ' + ' 79 | else: 80 | PLUS = '+' 81 | 82 | if type(structure) == str: 83 | if structure.isupper() and quotes: 84 | return structure.lower() + "'" 85 | else: 86 | return structure 87 | elif structure[0] == '+': 88 | parts = [pretty_print(s, spaces, quotes) for s in structure[1:]] 89 | return PLUS.join(parts) 90 | elif structure[0] == 's': 91 | return 's(%s)' % pretty_print(structure[1], spaces, quotes) 92 | else: 93 | assert structure[0] == '*' 94 | parts = [] 95 | for s in structure[1:]: 96 | if type(s) == str or s[0] == '*' or s[0] == 's': 97 | parts.append(pretty_print(s, spaces, quotes)) 98 | else: 99 | parts.append('(' + pretty_print(s, spaces, quotes) + ')') 100 | return ''.join(parts) 101 | 102 | def list_derivations(depth, do_print=False): 103 | derivations = [['g']] 104 | for i in range(depth): 105 | new_derivations = [] 106 | for d in derivations: 107 | new_derivations += [d + [s] for s in list_successors(d[-1])] 108 | derivations = new_derivations 109 | 110 | for d in derivations: 111 | if do_print: 112 | print [pretty_print(s) for s in d] 113 | 114 | return derivations 115 | 116 | def list_structures(depth): 117 | full = set() 118 | for i in range(1, depth+1): 119 | derivations = list_derivations(depth, False) 120 | full.update(set([d[-1] for d in derivations])) 121 | return full 122 | 123 | 124 | 125 | 126 | def parse(string): 127 | structure = parsing.parse(string) 128 | return collapse_sums(structure) 129 | 130 | 131 | -------------------------------------------------------------------------------- /initialization.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | 4 | from algorithms import low_rank_poisson, crp, ibp, sparse_coding, chains 5 | import grammar 6 | import observations 7 | import recursive 8 | from utils import misc 9 | 10 | debugger = None 11 | 12 | 13 | def init_low_rank(data_matrix, num_iter=200): 14 | m, n = data_matrix.m, data_matrix.n 15 | state, X = low_rank_poisson.fit_model(data_matrix, 2, num_iter=num_iter) 16 | U, V, ssq_U, ssq_N = state.U, state.V, state.ssq_U, state.ssq_N 17 | 18 | U /= ssq_U[nax, :] ** 0.25 19 | V *= ssq_U[:, nax] ** 0.25 20 | 21 | left = recursive.GaussianNode(U, 'col', np.sqrt(ssq_U)) 22 | 23 | right = recursive.GaussianNode(V, 'row', np.sqrt(ssq_U)) 24 | 25 | pred = np.dot(U, V) 26 | X = data_matrix.sample_latent_values(pred, ssq_N) 27 | noise = recursive.GaussianNode(X - pred, 'scalar', ssq_N) 28 | 29 | return recursive.SumNode([recursive.ProductNode([left, right]), noise]) 30 | 31 | def init_row_clustering(data_matrix, isotropic, num_iter=200): 32 | m, n = data_matrix.m, data_matrix.n 33 | state = crp.fit_model(data_matrix, isotropic_w=isotropic, isotropic_b=isotropic, num_iter=num_iter) 34 | 35 | U = np.zeros((m, state.assignments.max() + 1), dtype=int) 36 | U[np.arange(m), state.assignments] = 1 37 | left = recursive.MultinomialNode(U) 38 | 39 | if isotropic: 40 | right = recursive.GaussianNode(state.centers, 'scalar', state.sigma_sq_b) 41 | else: 42 | right = recursive.GaussianNode(state.centers, 'col', state.sigma_sq_b) 43 | 44 | pred = state.centers[state.assignments, :] 45 | X = data_matrix.sample_latent_values(pred, state.sigma_sq_w * np.ones((m, n))) 46 | if isotropic: 47 | noise = recursive.GaussianNode(X - pred, 'scalar', state.sigma_sq_w) 48 | else: 49 | noise = recursive.GaussianNode(X - pred, 'col', state.sigma_sq_w) 50 | 51 | return recursive.SumNode([recursive.ProductNode([left, right]), noise]) 52 | 53 | def init_col_clustering(data_matrix, isotropic, num_iter=200): 54 | return init_row_clustering(data_matrix.transpose(), isotropic, num_iter=num_iter).transpose() 55 | 56 | def init_row_binary(data_matrix, num_iter=200): 57 | state = ibp.fit_model(data_matrix, num_iter=num_iter) 58 | 59 | left = recursive.BernoulliNode(state.Z) 60 | 61 | right = recursive.GaussianNode(state.A, 'scalar', state.sigma_sq_f) 62 | 63 | pred = np.dot(state.Z, state.A) 64 | X = data_matrix.sample_latent_values(pred, state.sigma_sq_n) 65 | noise = recursive.GaussianNode(X - pred, 'scalar', state.sigma_sq_n) 66 | 67 | return recursive.SumNode([recursive.ProductNode([left, right]), noise]) 68 | 69 | def init_col_binary(data_matrix, num_iter=200): 70 | return init_row_binary(data_matrix.transpose(), num_iter=num_iter).transpose() 71 | 72 | def init_row_chain(data_matrix, num_iter=200): 73 | states, sigma_sq_D, sigma_sq_N = chains.fit_model(data_matrix, num_iter=num_iter) 74 | 75 | integ = chains.integration_matrix(data_matrix.m_orig)[data_matrix.row_ids, :] 76 | left = recursive.IntegrationNode(integ) 77 | 78 | temp = np.vstack([states[0, :][nax, :], 79 | states[1:, :] - states[:-1, :]]) 80 | right = recursive.GaussianNode(temp, 'scalar', sigma_sq_D) 81 | 82 | pred = states[data_matrix.row_ids, :] 83 | X = data_matrix.sample_latent_values(pred, sigma_sq_N) 84 | noise = recursive.GaussianNode(X - pred, 'scalar', sigma_sq_N) 85 | 86 | return recursive.SumNode([recursive.ProductNode([left, right]), noise]) 87 | 88 | def init_col_chain(data_matrix, num_iter=200): 89 | return init_row_chain(data_matrix.transpose(), num_iter=num_iter).transpose() 90 | 91 | def init_sparsity(data_matrix, mu_Z_mode, num_iter=200): 92 | if mu_Z_mode == 'row': 93 | return init_sparsity(data_matrix.transpose(), 'col', num_iter).transpose() 94 | elif mu_Z_mode == 'col': 95 | by_column = True 96 | elif mu_Z_mode == 'scalar': 97 | by_column = False 98 | 99 | # currently, data_matrix should always be real-valued with no missing values, so this just 100 | # passes on data_matrix.observations.values; we may want to replace it with interval observations 101 | # obtained from slice sampling 102 | S = data_matrix.sample_latent_values(np.zeros((data_matrix.m, data_matrix.n)), 103 | np.ones((data_matrix.m, data_matrix.n))) 104 | 105 | Z = np.random.normal(-1., 1., size=S.shape) 106 | 107 | # sparse_coding.py wants a full sparse coding problem, so pass in None for the things 108 | # that aren't relevant here 109 | state = sparse_coding.SparseCodingState(S, None, Z, None, -1., 1., None) 110 | 111 | pbar = misc.pbar(num_iter) 112 | for i in range(num_iter): 113 | sparse_coding.sample_Z(state) 114 | state.mu_Z = sparse_coding.cond_mu_Z(state, by_column).sample() 115 | state.sigma_sq_Z = sparse_coding.cond_sigma_sq_Z(state).sample() 116 | 117 | if hasattr(debugger, 'after_init_sparsity_iter'): 118 | debugger.after_init_sparsity_iter(locals()) 119 | 120 | pbar.update(i) 121 | pbar.finish() 122 | 123 | scale_node = recursive.GaussianNode(state.Z, 'scalar', state.sigma_sq_Z) 124 | return recursive.GSMNode(state.S, scale_node, mu_Z_mode, state.mu_Z) 125 | 126 | 127 | 128 | def initialize(data_matrix, root, old_structure, new_structure, num_iter=200): 129 | root = root.copy() 130 | if old_structure == new_structure: 131 | return root 132 | node, old_dist, rule = recursive.find_changed_node(root, old_structure, new_structure) 133 | 134 | old = root.value() 135 | 136 | # if we're replacing the root, pass on the observation model; otherwise, treat 137 | # the node we're factorizing as exact real-valued observations 138 | if node is root: 139 | inner_data_matrix = data_matrix 140 | else: 141 | row_ids = recursive.row_ids_for(data_matrix, node) 142 | col_ids = recursive.col_ids_for(data_matrix, node) 143 | m_orig, n_orig = recursive.orig_shape_for(data_matrix, node) 144 | frv = observations.DataMatrix.from_real_values 145 | inner_data_matrix = frv(node.value(), row_ids=row_ids, col_ids=col_ids, 146 | m_orig=m_orig, n_orig=n_orig) 147 | 148 | print 'Initializing %s from %s...' % (grammar.pretty_print(new_structure), grammar.pretty_print(old_structure)) 149 | 150 | if rule == grammar.parse("gg+g"): 151 | new_node = init_low_rank(inner_data_matrix, num_iter=num_iter) 152 | elif rule == grammar.parse("mg+g"): 153 | isotropic = (node is root) 154 | new_node = init_row_clustering(inner_data_matrix, isotropic, num_iter=num_iter) 155 | elif rule == grammar.parse("gM+g"): 156 | isotropic = (node is root) 157 | new_node = init_col_clustering(inner_data_matrix, isotropic, num_iter=num_iter) 158 | elif rule == grammar.parse("bg+g"): 159 | new_node = init_row_binary(inner_data_matrix, num_iter=num_iter) 160 | elif rule == grammar.parse("gB+g"): 161 | new_node = init_col_binary(inner_data_matrix, num_iter=num_iter) 162 | elif rule == grammar.parse("cg+g"): 163 | new_node = init_row_chain(inner_data_matrix, num_iter=num_iter) 164 | elif rule == grammar.parse("gC+g"): 165 | new_node = init_col_chain(inner_data_matrix, num_iter=num_iter) 166 | elif rule == grammar.parse("s(g)"): 167 | new_node = init_sparsity(inner_data_matrix, node.variance_type, num_iter=num_iter) 168 | else: 169 | raise RuntimeError('Unknown production rule: %s ==> %s' % (grammar.pretty_print(old_dist), 170 | grammar.pretty_print(rule))) 171 | 172 | root = recursive.splice(root, node, new_node) 173 | 174 | if isinstance(data_matrix.observations, observations.RealObservations): 175 | assert np.allclose(root.value()[data_matrix.observations.mask], old[data_matrix.observations.mask]) 176 | 177 | return root 178 | 179 | 180 | 181 | 182 | -------------------------------------------------------------------------------- /models.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | 4 | import recursive 5 | 6 | class Leaf: 7 | def __init__(self, left_side, right_side, fixed): 8 | self.left_side = left_side 9 | self.right_side = right_side 10 | self.fixed = fixed 11 | self.children = [] 12 | 13 | def structure(self): 14 | return self.distribution() 15 | 16 | def transpose(self): 17 | return self.transpose_class()(self.right_side, self.left_side) 18 | 19 | def display(self, indent=0): 20 | s = self.__class__.__name__ 21 | if self.fixed: 22 | s += ', fixed' 23 | s = ' ' * indent + s 24 | if hasattr(self, 'id'): 25 | s = '(%2d) ' % self.id + s 26 | print s 27 | 28 | def dummy(self): 29 | return self.node_class().dummy() 30 | 31 | class Gaussian(Leaf): 32 | def __init__(self, variance_type, fixed_variance, left_side, right_side, fixed): 33 | Leaf.__init__(self, left_side, right_side, fixed) 34 | if variance_type not in ['row', 'col', 'scalar']: 35 | raise RuntimeError('Unknown variance type: %s' % variance_type) 36 | self.variance_type = variance_type 37 | self.fixed_variance = fixed_variance 38 | 39 | def distribution(self): 40 | return 'g' 41 | 42 | def transpose_class(self): 43 | return Gaussian 44 | 45 | def node_class(self): 46 | return recursive.GaussianNode 47 | 48 | def transpose(self): 49 | if self.variance_type == 'row': 50 | variance_type = 'col' 51 | elif self.variance_type == 'col': 52 | variance_type = 'row' 53 | elif self.variance_type == 'scalar': 54 | variance_type = 'scalar' 55 | return Gaussian(variance_type, self.fixed_variance, self.right_side, self.left_side) 56 | 57 | def display(self, indent=0): 58 | s = 'Gaussian, %s' % self.variance_type 59 | if self.fixed_variance: 60 | s += ', fixed_variance' 61 | if self.fixed: 62 | s += ', fixed' 63 | s = ' ' * indent + s 64 | if hasattr(self, 'id'): 65 | s = '(%2d) ' % self.id + s 66 | print s 67 | 68 | def dummy(self): 69 | return recursive.GaussianNode.dummy(self.variance_type) 70 | 71 | class Multinomial(Leaf): 72 | def distribution(self): 73 | return 'm' 74 | 75 | def transpose_class(self): 76 | return MultinomialT 77 | 78 | def node_class(self): 79 | return recursive.MultinomialNode 80 | 81 | class MultinomialT(Leaf): 82 | def distribution(self): 83 | return 'M' 84 | 85 | def transpose_class(self): 86 | return Multinomial 87 | 88 | def node_class(self): 89 | return recursive.MultinomialTNode 90 | 91 | class Bernoulli(Leaf): 92 | def distribution(self): 93 | return 'b' 94 | 95 | def transpose_class(self): 96 | return BernoulliT 97 | 98 | def node_class(self): 99 | return recursive.BernoulliNode 100 | 101 | class BernoulliT(Leaf): 102 | def distribution(self): 103 | return 'B' 104 | 105 | def transpose_class(self): 106 | return Bernoulli 107 | 108 | def node_class(self): 109 | return recursive.BernoulliTNode 110 | 111 | class Integration(Leaf): 112 | def distribution(self): 113 | return 'c' 114 | 115 | def transpose_class(self): 116 | return IntegrationT 117 | 118 | def node_class(self): 119 | return recursive.IntegrationNode 120 | 121 | 122 | 123 | class IntegrationT(Leaf): 124 | def distribution(self): 125 | return 'C' 126 | 127 | def transpose_class(self): 128 | return Integration 129 | 130 | def node_class(self): 131 | return recursive.IntegrationTNode 132 | 133 | class GSM: 134 | def __init__(self, left_side, right_side, fixed, scale_node, bias_type): 135 | self.left_side = left_side 136 | self.right_side = right_side 137 | self.fixed = fixed 138 | self.scale_node = scale_node 139 | if bias_type not in ['row', 'col', 'scalar']: 140 | raise RuntimeError('Unknown bias type: %s' % bias_type) 141 | self.bias_type = bias_type 142 | self.children = [self.scale_node] 143 | 144 | def structure(self): 145 | return ('s', self.scale_node.structure()) 146 | 147 | def transpose(self): 148 | return GSM(self.scale_node.transpose()) 149 | 150 | def display(self, indent=0): 151 | s = ' ' * indent + 'GSM' 152 | if hasattr(self, 'id'): 153 | s = '(%2d) ' % self.id + s 154 | print s 155 | 156 | self.scale_node.display(indent + 4) 157 | 158 | def dummy(self): 159 | value = np.zeros((5, 5)) 160 | if self.bias_type in ['row', 'col']: 161 | bias = np.zeros(5) 162 | else: 163 | bias = 0. 164 | return recursive.GSMNode(value, self.scale_node.dummy(), self.bias_type, bias) 165 | 166 | 167 | class Sum: 168 | def __init__(self, children, left_side, right_side, fixed): 169 | self.children = children 170 | self.left_side = left_side 171 | self.right_side = right_side 172 | self.fixed = fixed 173 | 174 | def structure(self): 175 | return ('+',) + tuple([c.structure() for c in self.children]) 176 | 177 | def transpose(self): 178 | return Sum([c.transpose() for c in self.children], self.right_side, self.left_side) 179 | 180 | def display(self, indent=0): 181 | s = ' ' * indent + 'Sum' 182 | if hasattr(self, 'id'): 183 | s = '(%2d) ' % self.id + s 184 | print s 185 | 186 | for c in self.children: 187 | c.display(indent + 4) 188 | 189 | def dummy(self): 190 | return recursive.SumNode([c.dummy() for c in self.children]) 191 | 192 | 193 | class Product: 194 | def __init__(self, left, right, left_side, right_side, fixed): 195 | self.left = left 196 | self.right = right 197 | self.children = [left, right] 198 | self.left_side = left_side 199 | self.right_side = right_side 200 | self.fixed = fixed 201 | 202 | def structure(self): 203 | return ('*',) + tuple([self.left.structure(), self.right.structure()]) 204 | 205 | def transpose(self): 206 | return Product(self.right.transpose(), self.left.transpose(), self.obs.T.copy()) 207 | 208 | def display(self, indent=0): 209 | s = ' ' * indent + 'Product' 210 | if hasattr(self, 'id'): 211 | s = '(%2d) ' % self.id + s 212 | print s 213 | 214 | for c in [self.left, self.right]: 215 | c.display(indent + 4) 216 | 217 | def dummy(self): 218 | return recursive.ProductNode([self.left.dummy(), self.right.dummy()]) 219 | 220 | 221 | def continuous_left(structure): 222 | if type(structure) == str: 223 | return structure in ['g', 's', 'k'] 224 | elif type(structure) == tuple and structure[0] == '+': 225 | return any([continuous_left(c) for c in structure[1:]]) 226 | elif type(structure) == tuple and structure[0] == '*': 227 | assert len(structure) == 3 228 | return continuous_left(structure[1]) 229 | elif type(structure) == tuple and structure[0] == 's': 230 | return True 231 | else: 232 | raise RuntimeError('Invalid structure: %s' % structure) 233 | 234 | def continuous_right(structure): 235 | if type(structure) == str: 236 | return structure == 'g' 237 | elif type(structure) == tuple and structure[0] == '+': 238 | return any([continuous_right(c) for c in structure[1:]]) 239 | elif type(structure) == tuple and structure[0] == '*': 240 | assert len(structure) == 3 241 | return continuous_right(structure[2]) 242 | elif type(structure) == tuple and structure[0] == 's': 243 | return True 244 | else: 245 | raise RuntimeError('Invalid structure: %s' % str(structure)) 246 | 247 | 248 | 249 | 250 | dist2class = {'g': Gaussian, 251 | 'm': Multinomial, 252 | 'M': MultinomialT, 253 | 'b': Bernoulli, 254 | 'B': BernoulliT, 255 | 'c': Integration, 256 | 'C': IntegrationT, 257 | } 258 | 259 | def get_model_helper(structure, left_side, right_side, fixed, fixed_variance, variance_type): 260 | if type(structure) == str: 261 | if structure == 'g': 262 | return Gaussian(variance_type, fixed_variance, left_side, right_side, fixed) 263 | else: 264 | return dist2class[structure](left_side, right_side, fixed) 265 | 266 | elif type(structure) == tuple and structure[0] == '+': 267 | child_models = [get_model_helper(s, left_side, right_side, False, fixed_variance, variance_type) 268 | for s in structure[1:]] 269 | return Sum(child_models, left_side, right_side, fixed) 270 | 271 | elif type(structure) == tuple and structure[0] == '*': 272 | assert len(structure) == 3 273 | 274 | iv = continuous_right(structure[1]) and continuous_left(structure[2]) 275 | if iv: 276 | left_variance_type = 'col' 277 | right_variance_type = 'row' 278 | else: 279 | left_variance_type = right_variance_type = 'scalar' 280 | 281 | left_fixed = (structure[2] == 'C') 282 | right_fixed = (structure[1] == 'c') 283 | 284 | left = get_model_helper(structure[1], left_side, False, left_fixed, left_fixed, left_variance_type) 285 | right = get_model_helper(structure[2], False, right_side, right_fixed, right_fixed, right_variance_type) 286 | return Product(left, right, left_side, right_side, fixed) 287 | 288 | elif type(structure) == tuple and structure[0] == 's': 289 | assert len(structure) == 2 290 | 291 | scale_node = get_model_helper(structure[1], left_side, right_side, False, False, 'scalar') 292 | return GSM(left_side, right_side, fixed, scale_node, variance_type) 293 | 294 | else: 295 | raise RuntimeError('Invalid structure: %s' % structure) 296 | 297 | 298 | def assign_ids(model_node, next_id=1): 299 | model_node.id = next_id 300 | next_id += 1 301 | for child in model_node.children: 302 | next_id = assign_ids(child, next_id) 303 | return next_id 304 | 305 | 306 | def get_model(structure, fixed_noise_variance=False): 307 | model = get_model_helper(structure, True, True, False, fixed_noise_variance, 'scalar') 308 | assign_ids(model) 309 | return model 310 | 311 | def align(node, model_node): 312 | assert node.model is None 313 | node.model = model_node 314 | for nchild, mchild in zip(node.children, model_node.children): 315 | align(nchild, mchild) 316 | 317 | -------------------------------------------------------------------------------- /observations.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | 4 | from utils import distributions, misc 5 | 6 | class DataMatrix: 7 | def __init__(self, observations, row_ids=None, col_ids=None, row_labels=None, col_labels=None, 8 | m_orig=None, n_orig=None): 9 | self.m, self.n = observations.shape 10 | self.observations = observations 11 | 12 | if row_ids is None: # indices from the original matrix (used for chain models) 13 | row_ids = np.arange(self.m) 14 | self.row_ids = np.array(row_ids) 15 | if col_ids is None: 16 | col_ids = np.arange(self.n) 17 | self.col_ids = np.array(col_ids) 18 | 19 | if row_labels is None: # e.g. entity or attribute names 20 | row_labels = range(self.m) 21 | self.row_labels = list(row_labels) # make sure it's not an array 22 | if col_labels is None: 23 | col_labels = range(self.n) 24 | self.col_labels = list(col_labels) 25 | 26 | if m_orig is None: # size of the original matrix (used for chain models) 27 | m_orig = self.m 28 | self.m_orig = m_orig 29 | if n_orig is None: 30 | n_orig = self.n 31 | self.n_orig = n_orig 32 | 33 | def transpose(self): 34 | return DataMatrix(self.observations.transpose(), self.col_ids, self.row_ids, self.col_labels, self.row_labels, 35 | self.n_orig, self.m_orig) 36 | 37 | def copy(self): 38 | return DataMatrix(self.observations.copy(), self.row_ids.copy(), self.col_ids.copy(), list(self.row_labels), 39 | list(self.col_labels), self.n_orig, self.m_orig) 40 | 41 | def __getitem__(self, slc): 42 | rslc, cslc = misc.extract_slices(slc) 43 | return DataMatrix(self.observations[slc], self.row_ids[rslc], self.col_ids[cslc], 44 | misc.slice_list(self.row_labels, rslc), misc.slice_list(self.col_labels, cslc), 45 | self.m_orig, self.n_orig) 46 | 47 | def sample_latent_values(self, predictions, noise): 48 | return self.observations.sample_latent_values(predictions, noise) 49 | 50 | def loglik(self, predictions, noise): 51 | return self.observations.loglik(predictions, noise) 52 | 53 | def fixed_variance(self): 54 | return self.observations.fixed_variance() 55 | 56 | @staticmethod 57 | def from_decomp(decomp): 58 | obs = RealObservations(decomp.root.value(), decomp.obs) 59 | return DataMatrix(obs, decomp.row_ids, decomp.col_ids, decomp.row_labels, decomp.col_labels) 60 | 61 | @staticmethod 62 | def from_real_values(values, mask=None, **kwargs): 63 | if mask is None: 64 | mask = np.ones(values.shape, dtype=bool) 65 | observations = RealObservations(values, mask) 66 | return DataMatrix(observations, **kwargs) 67 | 68 | class RealObservations: 69 | def __init__(self, values, mask): 70 | self.values = values 71 | self.mask = mask 72 | self.shape = values.shape 73 | assert isinstance(self.values, np.ndarray) and self.values.dtype == float 74 | assert isinstance(self.mask, np.ndarray) and self.mask.dtype == bool 75 | 76 | def sample_latent_values(self, predictions, noise): 77 | missing_values = np.random.normal(predictions, np.sqrt(noise)) 78 | return np.where(self.mask, self.values, missing_values) 79 | 80 | def copy(self): 81 | return RealObservations(self.values.copy(), self.mask.copy()) 82 | 83 | def transpose(self): 84 | return RealObservations(self.values.T, self.mask.T) 85 | 86 | def loglik(self, predictions, noise): 87 | if not np.isscalar(noise): 88 | noise = noise[self.mask] 89 | return distributions.gauss_loglik(self.values[self.mask], predictions[self.mask], noise).sum() 90 | 91 | def loglik_each(self, predictions, noise): 92 | return np.where(self.mask, 93 | distributions.gauss_loglik(self.values, predictions, noise), 94 | 0.) 95 | 96 | def fixed_variance(self): 97 | return False 98 | 99 | def variance_estimate(self): 100 | return (self.values[self.mask] ** 2).mean() 101 | 102 | def __getitem__(self, slc): 103 | return RealObservations(self.values[slc], self.mask[slc]) 104 | 105 | -------------------------------------------------------------------------------- /parallel.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import os 3 | import re 4 | import smtplib 5 | import socket 6 | import subprocess 7 | import sys 8 | 9 | import config 10 | 11 | def _status_path(key): 12 | return os.path.join(config.JOBS_PATH, key) 13 | 14 | def _status_file(key, host=None): 15 | if host is not None: 16 | return os.path.join(_status_path(key), 'status-%s.txt' % host) 17 | else: 18 | return os.path.join(_status_path(key), 'status.txt') 19 | 20 | def _run_job(script, key, args): 21 | if key != 'None': 22 | outstr = open(_status_file(key, socket.gethostname()), 'a') 23 | print >> outstr, 'running:', args 24 | outstr.close() 25 | 26 | ret = subprocess.call('python %s %s' % (script, args), shell=True) 27 | 28 | if key != 'None': 29 | outstr = open(_status_file(key, socket.gethostname()), 'a') 30 | if ret == 0: 31 | print >> outstr, 'finished:', args 32 | else: 33 | print >> outstr, 'failed:', args 34 | outstr.close() 35 | 36 | def _executable_exists(command): 37 | # taken from stackoverflow.com/questions/377017/test-if-executable-exists-in-python 38 | def is_exe(fpath): 39 | return os.path.isfile(fpath) and os.access(fpath, os.X_OK) 40 | 41 | for path in os.environ['PATH'].split(os.pathsep): 42 | path = path.strip('"') 43 | exe_file = os.path.join(path, command) 44 | if is_exe(exe_file): 45 | return True 46 | 47 | return False 48 | 49 | def _remove_status_files(key): 50 | fnames = os.listdir(_status_path(key)) 51 | for fname in fnames: 52 | if re.match(r'status-.*.txt', fname): 53 | full_path = os.path.join(_status_path(key), fname) 54 | os.remove(full_path) 55 | 56 | def escape(job): 57 | return ' '.join(["'" + arg.replace("'", r"\'") + "'" 58 | for arg in job]) 59 | 60 | def run_command(command, jobs, machines=None, chdir=None): 61 | args = ['parallel', '--gnu'] 62 | if machines is not None: 63 | for m in machines: 64 | args += ['--sshlogin', m] 65 | 66 | if chdir is not None: 67 | command = 'cd %s; %s' % (chdir, command) 68 | args += [command] 69 | 70 | p = subprocess.Popen(args, shell=False, stdin=subprocess.PIPE) 71 | p.communicate('\n'.join(map(escape, jobs))) 72 | 73 | def run(script, jobs, machines=None, key=None, email=False, rm_status=True): 74 | if not _executable_exists('parallel'): 75 | raise RuntimeError('GNU Parallel executable not found.') 76 | if not hasattr(config, 'JOBS_PATH'): 77 | raise RuntimeError('Need to specify JOBS_PATH in config.py') 78 | if not os.path.exists(config.JOBS_PATH): 79 | raise RuntimeError('Path chosen for config.JOBS_PATH does not exist: %s' % config.JOBS_PATH) 80 | 81 | if key is not None: 82 | if not os.path.exists(_status_path(key)): 83 | os.mkdir(_status_path(key)) 84 | 85 | outstr = open(_status_file(key), 'w') 86 | for job in jobs: 87 | print >> outstr, 'queued:', job 88 | outstr.close() 89 | 90 | if rm_status: 91 | _remove_status_files(key) 92 | 93 | command = 'python parallel.py %s %s' % (key, script) 94 | run_command(command, jobs, machines=machines, chdir=os.getcwd()) 95 | 96 | if email: 97 | if key is not None: 98 | subject = '%s jobs finished' % key 99 | p = subprocess.Popen(['check_status', key], stdout=subprocess.PIPE) 100 | body, _ = p.communicate() 101 | else: 102 | subject = 'jobs finished' 103 | body = '' 104 | 105 | msg = '\r\n'.join(['From: %s' % config.EMAIL, 106 | 'To: %s' % config.EMAIL, 107 | 'Subject: %s' % subject, 108 | '', 109 | body]) 110 | 111 | s = smtplib.SMTP('localhost') 112 | s.sendmail(config.EMAIL, [config.EMAIL], msg) 113 | s.quit() 114 | 115 | def isint(p): 116 | try: 117 | int(p) 118 | return True 119 | except: 120 | return False 121 | 122 | def parse_machines(s, njobs): 123 | if s is None: 124 | return s 125 | parts = s.split(',') 126 | return ['%d/%s' % (njobs, p) for p in parts] 127 | 128 | def list_jobs(key, status_val): 129 | status_files = [os.path.join(_status_path(key), 'status.txt')] 130 | status_files += glob.glob('%s/status-*.txt' % _status_path(key)) 131 | 132 | status = {} 133 | for fname in status_files: 134 | for line_ in open(fname).readlines(): 135 | line = line_.strip() 136 | sv, args = line.split(':') 137 | args = args.strip() 138 | status[args] = sv 139 | 140 | return [k for k, v in status.items() if v == status_val] 141 | 142 | 143 | if __name__ == '__main__': 144 | assert len(sys.argv) == 4 145 | key = sys.argv[1] 146 | script = sys.argv[2] 147 | args = sys.argv[3] 148 | _run_job(script, key, args) 149 | -------------------------------------------------------------------------------- /parsing.py: -------------------------------------------------------------------------------- 1 | 2 | tokens = ('LETTER', 'PLUS', 'LPAREN', 'RPAREN', 'GSM') 3 | 4 | t_LETTER = r'[gmbcMBC]' 5 | t_PLUS = r'\+' 6 | t_LPAREN = r'\(' 7 | t_RPAREN = r'\)' 8 | t_GSM = r's' 9 | 10 | t_ignore = ' ' 11 | 12 | def t_error(t): 13 | raise RuntimeError("Illegal character: '%s'" % t.value[0]) 14 | 15 | import ply.lex as lex 16 | lex.lex() 17 | 18 | def p_expression_plus(t): 19 | """expression : expression PLUS term""" 20 | t[0] = ('+', t[1], t[3]) 21 | 22 | def p_expression_term(t): 23 | """expression : term""" 24 | t[0] = t[1] 25 | 26 | def p_term_times(t): 27 | """term : factor factor""" 28 | t[0] = ('*', t[1], t[2]) 29 | 30 | def p_term_factor(t): 31 | """term : factor""" 32 | t[0] = t[1] 33 | 34 | def p_factor_gsm(t): 35 | """factor : GSM LPAREN expression RPAREN""" 36 | t[0] = ('s', t[3]) 37 | 38 | def p_factor_group(t): 39 | """factor : LPAREN expression RPAREN""" 40 | t[0] = t[2] 41 | 42 | def p_factor_letter(t): 43 | """factor : LETTER""" 44 | t[0] = t[1] 45 | 46 | def p_error(t): 47 | raise RuntimeError("Syntax error at '%s'" % t[1]) 48 | 49 | import ply.yacc as yacc 50 | yacc.yacc() 51 | 52 | def parse(s): 53 | return yacc.parse(s) 54 | 55 | 56 | -------------------------------------------------------------------------------- /predictive_distributions.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | 4 | from utils import misc 5 | 6 | class PredictiveDistribution: 7 | def __slice__(self, slc): 8 | return self.__getitem__(slc) 9 | 10 | 11 | class GaussianPredictiveDistribution(PredictiveDistribution): 12 | def __init__(self, mu, Sigma): 13 | self.mu = mu.copy() 14 | self.Sigma = Sigma.copy() 15 | 16 | def __getitem__(self, slc): 17 | return GaussianPredictiveDistribution(self.mu, self.Sigma) 18 | 19 | def generate_data(self, N): 20 | return np.array([np.random.multivariate_normal(self.mu, self.Sigma) 21 | for i in range(N)]) 22 | 23 | class MultinomialPredictiveDistribution(PredictiveDistribution): 24 | def __init__(self, pi, centers): 25 | self.pi = pi.copy() 26 | self.centers = centers.copy() 27 | 28 | def __getitem__(self, slc): 29 | return MultinomialPredictiveDistribution(self.pi, self.centers[:, slc]) 30 | 31 | @staticmethod 32 | def random(K, N): 33 | pi = np.random.uniform(0., 1., size=K) 34 | pi /= pi.sum() 35 | centers = np.random.normal(size=(K, N)) 36 | return MultinomialPredictiveDistribution(pi, centers) 37 | 38 | def generate_data(self, N): 39 | Z = np.random.multinomial(1, self.pi, size=N) 40 | return np.dot(Z, self.centers) 41 | 42 | class BernoulliPredictiveDistribution(PredictiveDistribution): 43 | def __init__(self, pi, A): 44 | self.pi = pi.copy() 45 | self.A = A.copy() 46 | 47 | def __getitem__(self, slc): 48 | return BernoulliPredictiveDistribution(self.pi, self.A[:, slc]) 49 | 50 | @staticmethod 51 | def random(K, N): 52 | pi = np.random.uniform(0., 1., size=K) 53 | A = np.random.normal(size=(K, N)) 54 | return BernoulliPredictiveDistribution(pi, A) 55 | 56 | def generate_data(self, N): 57 | Z = np.random.binomial(1, self.pi[nax, :], size=(N, self.pi.size)) 58 | return np.dot(Z, self.A) 59 | 60 | 61 | class PredictiveInfo: 62 | def __init__(self, components, mu, Sigma): 63 | self.components = components 64 | self.mu = mu 65 | self.Sigma = Sigma 66 | 67 | def predictive_for_row(self, i, idxs): 68 | components = [c[idxs] for c in self.components] 69 | if self.mu.ndim == 2: 70 | return components, self.mu[i, idxs], self.Sigma[i, :, :][idxs[:, nax], idxs[nax, :]] 71 | else: 72 | assert self.mu.ndim == 1 73 | return components, self.mu[idxs], self.Sigma[idxs[:, nax], idxs[nax, :]] 74 | 75 | def predictive_for_rows(self, rows): 76 | if self.mu.ndim == 1: 77 | N, D = rows.size, self.mu.size 78 | return self.components, np.tile(self.mu[nax, :], (N, 1)), np.tile(self.Sigma[nax, :, :], (N, 1, 1)) 79 | else: 80 | return self.components, self.mu[rows], self.Sigma[rows, :, :] 81 | 82 | def generate_data(self, N): 83 | D = self.Sigma.shape[0] 84 | X = np.zeros((N, D)) 85 | for c in self.components: 86 | X += c.generate_data(N) 87 | X += np.array([np.random.multivariate_normal(self.mu, self.Sigma) 88 | for i in range(N)]) 89 | return X 90 | 91 | class GSMPredictiveDistribution(PredictiveDistribution): 92 | def __init__(self, scale_components, scale_mu, scale_Sigma, sigma_sq_approx, A): 93 | self.scale_components = scale_components 94 | self.scale_mu = scale_mu 95 | self.scale_Sigma = scale_Sigma 96 | self.sigma_sq_approx = sigma_sq_approx 97 | self.A = A.copy() 98 | 99 | def __getitem__(self, slc): 100 | return GSMPredictiveDistribution(self.scale_components, self.scale_mu, self.scale_Sigma, 101 | self.sigma_sq_approx, self.A[:, slc]) 102 | 103 | def generate_data(self, N): 104 | K, D = self.A.shape 105 | Z = np.zeros((N, K)) 106 | for sc in self.scale_components: 107 | Z += sc.generate_data(N) 108 | Z += np.array([np.random.multivariate_normal(self.scale_mu, self.scale_Sigma) 109 | for i in range(N)]) 110 | S = np.random.normal(0., np.exp(0.5 * Z)) 111 | return np.dot(S, self.A) 112 | 113 | 114 | 115 | 116 | 117 | ######################## computing the predictive distributions ################ 118 | 119 | class FixedTerm: 120 | def __init__(self, values): 121 | self.values = values 122 | 123 | class GaussianTerm: 124 | def __init__(self, values, mu, Sigma): 125 | self.values = values 126 | self.mu = mu 127 | self.Sigma = Sigma 128 | 129 | class ChainTerm: 130 | def __init__(self, values, mu_delta, Sigma_delta): 131 | self.values = values 132 | self.mu_delta = mu_delta 133 | self.Sigma_delta = Sigma_delta 134 | 135 | def extract_terms(node): 136 | if node.isleaf(): 137 | assert node.distribution() in ['g', 'm', 'b'] 138 | if node.distribution() == 'g': 139 | mu = np.zeros(node.n) 140 | sigma_sq_row, sigma_sq_col = node.row_col_variance() 141 | Sigma = np.diag(sigma_sq_row.mean() * sigma_sq_col) 142 | return [GaussianTerm(node.value(), mu, Sigma)] 143 | else: 144 | return [FixedTerm(node.value())] 145 | 146 | elif node.issum(): 147 | child_terms = [extract_terms(child) for child in node.children] 148 | return reduce(list.__add__, child_terms) 149 | 150 | elif node.isgsm(): 151 | return [FixedTerm(node.value())] 152 | 153 | elif node.isproduct(): 154 | left, right = node.children 155 | 156 | if left.isleaf() and left.distribution() == 'c': 157 | child_terms = extract_terms(right) 158 | terms = [] 159 | for ct in child_terms: 160 | if isinstance(ct, FixedTerm): 161 | # fixed terms inside chains remain fixed 162 | terms.append(FixedTerm(ct.values.cumsum(0))) 163 | elif isinstance(ct, GaussianTerm): 164 | # Gaussians become chains 165 | terms.append(ChainTerm(ct.values.cumsum(0), ct.mu, ct.Sigma)) 166 | elif isinstance(ct, ChainTerm): 167 | # freeze nested chains since these are annoying 168 | terms.append(FixedTerm(ct.values.cumsum(0))) 169 | else: 170 | raise RuntimeError('Unknown term') 171 | return terms 172 | 173 | else: 174 | child_terms = extract_terms(left) 175 | V = right.value() 176 | terms = [] 177 | for ct in child_terms: 178 | # same distribution, but multiplied by V on the right 179 | if isinstance(ct, FixedTerm): 180 | terms.append(FixedTerm(np.dot(ct.values, V))) 181 | elif isinstance(ct, GaussianTerm): 182 | mu = np.dot(ct.mu, V) 183 | Sigma = np.dot(V.T, np.dot(ct.Sigma, V)) 184 | terms.append(GaussianTerm(np.dot(ct.values, V), mu, Sigma)) 185 | elif isinstance(ct, ChainTerm): 186 | mu = np.dot(ct.mu_delta, V) 187 | Sigma = np.dot(V.T, np.dot(ct.Sigma_delta, V)) 188 | terms.append(ChainTerm(np.dot(ct.values, V), mu, Sigma)) 189 | else: 190 | raise RuntimeError('Unknown term') 191 | return terms 192 | 193 | def collect_terms(terms): 194 | fixed_values = 0. 195 | gaussian_values = 0. 196 | gaussian_mu = 0. 197 | gaussian_Sigma = 0. 198 | chain_values = 0. 199 | chain_mu = 0. 200 | chain_Sigma = 0. 201 | has_fixed = has_gaussian = has_chain = False 202 | 203 | for term in terms: 204 | if isinstance(term, FixedTerm): 205 | fixed_values += term.values 206 | has_fixed = True 207 | elif isinstance(term, GaussianTerm): 208 | gaussian_values += term.values 209 | gaussian_mu += term.mu 210 | gaussian_Sigma += term.Sigma 211 | has_gaussian = True 212 | elif isinstance(term, ChainTerm): 213 | chain_values += term.values 214 | chain_mu += term.mu_delta 215 | chain_Sigma += term.Sigma_delta 216 | has_chain = True 217 | else: 218 | raise RuntimeError('Unknown term') 219 | 220 | if has_fixed: 221 | fixed_term = FixedTerm(fixed_values) 222 | else: 223 | fixed_term = None 224 | 225 | if has_gaussian: 226 | gaussian_term = GaussianTerm(gaussian_values, gaussian_mu, gaussian_Sigma) 227 | else: 228 | gaussian_term = None 229 | 230 | if has_chain: 231 | chain_term = ChainTerm(chain_values, chain_mu, chain_Sigma) 232 | else: 233 | chain_term = None 234 | 235 | return fixed_term, gaussian_term, chain_term 236 | 237 | 238 | def compute_gaussian_part(training_data_matrix, root, N): 239 | fixed_term, gaussian_term, chain_term = collect_terms(extract_terms(root)) 240 | assert gaussian_term is not None 241 | D = gaussian_term.values.shape[1] 242 | 243 | if chain_term is None: 244 | return gaussian_term.mu, gaussian_term.Sigma 245 | 246 | X = training_data_matrix.sample_latent_values(root.predictions(), root.children[-1].sigma_sq) 247 | 248 | mu_0 = np.zeros(D) 249 | Sigma_v = chain_term.Sigma_delta 250 | 251 | y = np.zeros((D, N)) 252 | for i, row in enumerate(training_data_matrix.row_ids): 253 | if fixed_term is not None: 254 | y[:, row] = X[i, :] - gaussian_term.mu - fixed_term.values[i, :] 255 | else: 256 | y[:, row] = X[i, :] - gaussian_term.mu 257 | 258 | mask = np.zeros(N, dtype=bool) 259 | mask[training_data_matrix.row_ids] = True 260 | 261 | mu_chains, Sigma_chains = misc.kalman_filter_codiag2( 262 | mu_0, Sigma_v, np.linalg.inv(gaussian_term.Sigma), y, mask) 263 | mu_total = mu_chains.T + gaussian_term.mu[nax, :] 264 | Sigma_total = np.zeros((N, D, D)) 265 | for i in range(N): 266 | Sigma_total[i, :, :] = Sigma_chains[:, :, i] + gaussian_term.Sigma 267 | 268 | return mu_total, Sigma_total 269 | 270 | 271 | 272 | def extract_non_gaussian_part(node): 273 | if node.isleaf(): 274 | assert node.distribution() in ['g', 'm', 'b'] 275 | if node.distribution() == 'g': 276 | return [] 277 | elif node.distribution() == 'm': 278 | pi = (1. + node.value().sum(0)) / (node.n + node.m) 279 | return [MultinomialPredictiveDistribution(pi, np.eye(node.n))] 280 | elif node.distribution() == 'b': 281 | pi = (1. + node.value().sum(0)) / (2. + node.m) 282 | return [BernoulliPredictiveDistribution(pi, np.eye(node.n))] 283 | 284 | elif node.issum(): 285 | child_components = [extract_non_gaussian_part(child) for child in node.children] 286 | return reduce(list.__add__, child_components) 287 | 288 | elif node.isproduct(): 289 | left, right = node.children 290 | 291 | if left.isleaf() and left.distribution() == 'c': 292 | return [] 293 | 294 | else: 295 | child_components = extract_non_gaussian_part(left) 296 | components = [] 297 | for cp in child_components: 298 | if isinstance(cp, MultinomialPredictiveDistribution): 299 | components.append(MultinomialPredictiveDistribution(cp.pi, np.dot(cp.centers, right.value()))) 300 | elif isinstance(cp, BernoulliPredictiveDistribution): 301 | components.append(BernoulliPredictiveDistribution(cp.pi, np.dot(cp.A, right.value()))) 302 | elif isinstance(cp, GSMPredictiveDistribution): 303 | components.append(GSMPredictiveDistribution(cp.scale_components, cp.scale_mu, cp.scale_Sigma, 304 | cp.sigma_sq_approx, np.dot(cp.A, right.value()))) 305 | return components 306 | 307 | elif node.isgsm(): 308 | scale_node = node.scale_node 309 | scale_components = extract_non_gaussian_part(scale_node) 310 | fixed_term, gaussian_term, chain_term = collect_terms(extract_terms(scale_node)) 311 | assert chain_term is None 312 | scale_mu, scale_Sigma = gaussian_term.mu, gaussian_term.Sigma 313 | if node.bias_type == 'col': 314 | scale_mu += node.bias.ravel() 315 | elif node.bias_type == 'scalar': 316 | scale_mu += node.bias 317 | else: 318 | raise RuntimeError('Invalid bias type: %s' % node.bias_type) 319 | sigma_sq_approx = (node.value() ** 2).mean(0) 320 | return [GSMPredictiveDistribution(scale_components, scale_mu, scale_Sigma, 321 | sigma_sq_approx, np.eye(node.n))] 322 | 323 | 324 | 325 | 326 | def compute_predictive_info(train_data_matrix, root, N): 327 | components = extract_non_gaussian_part(root) 328 | mu, Sigma = compute_gaussian_part(train_data_matrix, root, N) 329 | return PredictiveInfo(components, mu, Sigma) 330 | 331 | 332 | def remove_gsm(predictive_info): 333 | new_components = [] 334 | new_mu, new_Sigma = predictive_info.mu.copy(), predictive_info.Sigma.copy() 335 | for c in predictive_info.components: 336 | if isinstance(c, GSMPredictiveDistribution): 337 | #new_Sigma += np.diag(c.sigma_sq_approx) 338 | new_Sigma += np.dot(c.A.T, np.dot(np.diag(c.sigma_sq_approx), c.A)) 339 | else: 340 | new_components.append(c) 341 | return PredictiveInfo(new_components, new_mu, new_Sigma) 342 | 343 | def has_gsm(predictive_info): 344 | for c in predictive_info.components: 345 | if isinstance(c, GSMPredictiveDistribution): 346 | return True 347 | return False 348 | 349 | -------------------------------------------------------------------------------- /presentation.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import numpy as np 3 | import sys 4 | 5 | import grammar 6 | 7 | 8 | 9 | 10 | def format_table(table, sep=' '): 11 | num_cols = len(table[0]) 12 | if any([len(row) != num_cols for row in table]): 13 | raise RuntimeError('Number of columns must match.') 14 | 15 | widths = [max([len(row[i]) for row in table]) 16 | for i in range(num_cols)] 17 | format_string = sep.join(['%' + str(w) + 's' for w in widths]) 18 | return [format_string % tuple(row) for row in table] 19 | 20 | def format_table_latex(table): 21 | return [l + ' \\\\' for l in format_table(table, ' & ')] 22 | 23 | class Failure: 24 | def __init__(self, structure, level, all_failed, name=None): 25 | self.structure = structure 26 | self.level = level 27 | self.all_failed = all_failed 28 | self.name = name 29 | 30 | def print_failed_structures(failures, outfile=sys.stdout): 31 | if failures: 32 | print >> outfile, 'The inference algorithms failed for the following structures:' 33 | print >> outfile 34 | print >> outfile, '%30s%8s %s' % \ 35 | ('structure', 'level', 'notes') 36 | print >> outfile 37 | for f in failures: 38 | line = '%30s%8d ' % (grammar.pretty_print(f.structure), f.level) 39 | if f.name: 40 | line += '(for %s) ' % f.name 41 | if not f.all_failed: 42 | line += '(only some jobs failed) ' 43 | print >> outfile, line 44 | print >> outfile 45 | print >> outfile 46 | 47 | 48 | class ModelScore: 49 | def __init__(self, structure, row_score, col_score, total, row_improvement, col_improvement, 50 | z_score_row, z_score_col): 51 | self.structure = structure 52 | self.row_score = row_score 53 | self.col_score = col_score 54 | self.total = total 55 | self.row_improvement = row_improvement 56 | self.col_improvement = col_improvement 57 | self.z_score_row = z_score_row 58 | self.z_score_col = z_score_col 59 | 60 | def print_scores(level, model_scores, outfile=sys.stdout): 61 | print >> outfile, 'The following are the top-scoring structures for level %d:' % level 62 | print >> outfile 63 | print >> outfile, '%30s%10s%10s%13s%13s%13s%10s%10s' % \ 64 | ('structure', 'row', 'col', 'total', 'row impvt.', 'col impvt.', 'z (row)', 'z (col)') 65 | print >> outfile 66 | for ms in model_scores: 67 | print >> outfile, '%30s%10.2f%10.2f%13.2f%13.2f%13.2f%10.2f%10.2f' % \ 68 | (grammar.pretty_print(ms.structure), ms.row_score, ms.col_score, ms.total, 69 | ms.row_improvement, ms.col_improvement, ms.z_score_row, ms.z_score_col) 70 | print >> outfile 71 | print >> outfile 72 | 73 | 74 | def print_model_sequence(model_scores, outfile=sys.stdout): 75 | print >> outfile, "Here are the best-performing structures in each level of the search:" 76 | print >> outfile 77 | print >> outfile, '%10s%25s%13s%13s%10s%10s' % \ 78 | ('level', 'structure', 'row impvt.', 'col impvt.', 'z (row)', 'z (col)') 79 | print >> outfile 80 | for i, ms in enumerate(model_scores): 81 | print >> outfile, '%10d%25s%13.2f%13.2f%10.2f%10.2f' % \ 82 | (i+1, grammar.pretty_print(ms.structure), ms.row_improvement, ms.col_improvement, 83 | ms.z_score_row, ms.z_score_col) 84 | print >> outfile 85 | print >> outfile 86 | 87 | 88 | class RunningTime: 89 | def __init__(self, level, structure, num_samples, total_time): 90 | self.level = level 91 | self.structure = structure 92 | self.num_samples = num_samples 93 | self.total_time = total_time 94 | 95 | def format_time(t): 96 | if t < 60.: 97 | return '%1.1f seconds' % t 98 | elif t < 3600.: 99 | return '%1.1f minutes' % (t / 60.) 100 | else: 101 | return '%1.1f hours' % (t / 3600.) 102 | 103 | def print_running_times(running_times, outfile=sys.stdout): 104 | total = sum([rt.total_time for rt in running_times]) 105 | print >> outfile, 'Total CPU time was %s. Here is the breakdown:' % format_time(total) 106 | print >> outfile 107 | print >> outfile, '%30s%8s %s' % \ 108 | ('structure', 'level', 'time') 109 | print >> outfile 110 | running_times = sorted(running_times, key=lambda rt: rt.total_time, reverse=True) 111 | for rt in running_times: 112 | time_str = '%d x %s' % (rt.num_samples, format_time(rt.total_time / rt.num_samples)) 113 | print >> outfile, '%30s%8d %s' % (grammar.pretty_print(rt.structure), rt.level, time_str) 114 | print >> outfile 115 | print >> outfile 116 | 117 | 118 | class FinalResult: 119 | def __init__(self, expt_name, structure): 120 | self.expt_name = expt_name 121 | self.structure = structure 122 | 123 | def print_learned_structures(results, outfile=sys.stdout): 124 | def sortkey(result): 125 | return result.expt_name.split('_')[-1] 126 | results = sorted(results, key=sortkey) 127 | 128 | print >> outfile, 'The learned structures:' 129 | print >> outfile 130 | print >> outfile, '%25s%25s' % ('experiment', 'structure') 131 | print >> outfile 132 | for r in results: 133 | print >> outfile, '%25s%25s' % (r.expt_name, grammar.pretty_print(r.structure)) 134 | print >> outfile 135 | print >> outfile 136 | 137 | 138 | 139 | class LatentVariables: 140 | def __init__(self, label, z): 141 | self.label = label 142 | self.z = z 143 | 144 | def print_components(model, structure, row_or_col, items, outfile=sys.stdout): 145 | cluster_members = collections.defaultdict(list) 146 | if model == 'clustering': 147 | for item in items: 148 | z = item.z if np.isscalar(item.z) else item.z.argmax() 149 | cluster_members[z].append(item.label) 150 | 151 | component_type, component_type_pl = 'Cluster', 'clusters' 152 | elif model == 'binary': 153 | for item in items: 154 | for i, zi in enumerate(item.z): 155 | if zi: 156 | cluster_members[i].append(item.label) 157 | component_type, component_type_pl = 'Component', 'components' 158 | 159 | cluster_ids = sorted(cluster_members.keys(), key=lambda k: len(cluster_members[k]), reverse=True) 160 | 161 | row_col_str = {'row': 'row', 'col': 'column'}[row_or_col] 162 | print >> outfile, 'For structure %s, the following %s %s were found:' % \ 163 | (grammar.pretty_print(structure), row_col_str, component_type_pl) 164 | print >> outfile 165 | 166 | for i, cid in enumerate(cluster_ids): 167 | print >> outfile, ' %s %d:' % (component_type, i+1) 168 | print >> outfile 169 | for label in cluster_members[cid]: 170 | print >> outfile, ' %s' % label 171 | print >> outfile 172 | print >> outfile 173 | 174 | 175 | 176 | 177 | -------------------------------------------------------------------------------- /scoring.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | 4 | from algorithms import ais_gsm, variational 5 | import observations 6 | import predictive_distributions 7 | from utils import misc 8 | 9 | 10 | CACHE = False 11 | cached_pi = None 12 | 13 | def score_row_predictive_variational(train_data_matrix, root, test_data_matrix, num_steps_ais=2000): 14 | N = test_data_matrix.m_orig 15 | predictive_info_orig = predictive_distributions.compute_predictive_info(train_data_matrix, root, N) 16 | predictive_info = predictive_distributions.remove_gsm(predictive_info_orig) 17 | 18 | result = np.zeros(test_data_matrix.m) 19 | pbar = misc.pbar(test_data_matrix.m) 20 | for i, row in enumerate(test_data_matrix.row_ids): 21 | idxs = np.where(test_data_matrix.observations.mask[i, :])[0] 22 | 23 | components, mu, Sigma = predictive_info.predictive_for_row(row, idxs) 24 | 25 | estimators = [] 26 | for comp in components: 27 | if isinstance(comp, predictive_distributions.MultinomialPredictiveDistribution): 28 | estimators.append(variational.MultinomialEstimator(comp.pi, comp.centers)) 29 | elif isinstance(comp, predictive_distributions.BernoulliPredictiveDistribution): 30 | estimators.append(variational.BernoulliEstimator(comp.pi, comp.A)) 31 | else: 32 | raise RuntimeError('Unknown predictive distribution') 33 | 34 | assert isinstance(test_data_matrix.observations, observations.RealObservations) 35 | 36 | problem = variational.VariationalProblem(estimators, test_data_matrix.observations.values[i, idxs] - mu, 37 | Sigma) 38 | reps = problem.solve() 39 | result[i] = problem.objective_function(reps) 40 | 41 | if predictive_distributions.has_gsm(predictive_info_orig): 42 | components, mu, Sigma = predictive_info_orig.predictive_for_row(row, idxs) 43 | assert np.allclose(mu, 0.) # can't do chains yet 44 | X = test_data_matrix.observations.values[i, idxs] 45 | X = X[nax, :] 46 | result[i] = ais_gsm.compute_likelihood(X, components, Sigma, [reps], np.array([result[i]]), 47 | num_steps=num_steps_ais)[0] 48 | 49 | pbar.update(i) 50 | pbar.finish() 51 | 52 | 53 | return result 54 | 55 | def score_col_predictive_variational(train_data_matrix, root, test_data_matrix, num_steps_ais=2000): 56 | return score_row_predictive_variational(train_data_matrix.transpose(), root.transpose(), 57 | test_data_matrix.transpose(), num_steps_ais=num_steps_ais) 58 | 59 | 60 | 61 | def no_structure_row_loglik(train_data, row_test_data): 62 | sigma_sq = train_data.observations.variance_estimate() 63 | return np.array([row_test_data.observations[i, :].loglik(np.zeros(row_test_data.n), sigma_sq) 64 | for i in range(row_test_data.m)]) 65 | 66 | 67 | def no_structure_col_loglik(train_data, col_test_data): 68 | return no_structure_row_loglik(train_data.transpose(), col_test_data.transpose()) 69 | 70 | def evaluate_model(train_data, root, row_test_data, col_test_data, label='', avg_col_mean=True, 71 | init_row_loglik=None, init_col_loglik=None, num_steps_ais=2000, max_dim=None): 72 | 73 | print 'Scoring row predictive likelihood...' 74 | row_loglik_all = score_row_predictive_variational( 75 | train_data[:, :max_dim], root[:, :max_dim], row_test_data[:, :max_dim], num_steps_ais=num_steps_ais) 76 | if avg_col_mean: 77 | if init_row_loglik is None: 78 | init_row_loglik = no_structure_row_loglik(train_data[:, :max_dim], row_test_data[:, :max_dim]) 79 | row_loglik_all = np.logaddexp(row_loglik_all + np.log(0.99), 80 | init_row_loglik + np.log(0.01)) 81 | 82 | print 'Scoring column predictive likelihood...' 83 | col_loglik_all = score_col_predictive_variational( 84 | train_data[:max_dim, :], root[:max_dim, :], col_test_data[:max_dim, :], num_steps_ais=num_steps_ais) 85 | if avg_col_mean: 86 | if init_col_loglik is None: 87 | init_col_loglik = no_structure_col_loglik(train_data[:max_dim, :], col_test_data[:max_dim, :]) 88 | col_loglik_all = np.logaddexp(col_loglik_all + np.log(0.99), 89 | init_col_loglik + np.log(0.01)) 90 | 91 | return row_loglik_all, col_loglik_all 92 | 93 | 94 | 95 | -------------------------------------------------------------------------------- /single_process.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | 3 | 4 | def run(script, jobs): 5 | for job in jobs: 6 | subprocess.call(['python', script] + list(job)) 7 | 8 | 9 | -------------------------------------------------------------------------------- /synthetic_experiments.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import collections 3 | import numpy as np 4 | nax = np.newaxis 5 | import os 6 | import StringIO 7 | import sys 8 | 9 | import config 10 | import experiments 11 | import observations 12 | import presentation 13 | from utils import misc, storage 14 | 15 | 16 | NUM_ROWS = 200 17 | NUM_COLS = 200 18 | NUM_COMPONENTS = 10 19 | 20 | DEFAULT_SEARCH_DEPTH = 3 21 | DEFAULT_PREFIX = 'synthetic' 22 | 23 | def generate_ar(nrows, ncols, a): 24 | X = np.zeros((nrows, ncols)) 25 | X[0,:] = np.random.normal(size=ncols) 26 | for i in range(1, nrows): 27 | X[i,:] = a * X[i-1,:] + np.random.normal(0., np.sqrt(1-a**2), size=ncols) 28 | return X 29 | 30 | def generate_data(data_str, nrows, ncols, ncomp, return_components=False): 31 | IBP_ALPHA = 2. 32 | pi_crp = np.ones(ncomp) / ncomp 33 | pi_ibp = np.ones(ncomp) * IBP_ALPHA / ncomp 34 | 35 | if data_str[-1] == 'T': 36 | data_str = data_str[:-1] 37 | transpose = True 38 | nrows, ncols = ncols, nrows 39 | else: 40 | transpose = False 41 | 42 | if data_str == 'pmf': 43 | U = np.random.normal(0., 1., size=(nrows, ncomp)) 44 | V = np.random.normal(0., 1., size=(ncomp, ncols)) 45 | data = np.dot(U, V) 46 | components = (U, V) 47 | 48 | elif data_str == 'mog': 49 | U = np.random.multinomial(1, pi_crp, size=nrows) 50 | V = np.random.normal(0., 1., size=(ncomp, ncols)) 51 | data = np.dot(U, V) 52 | components = (U, V) 53 | 54 | elif data_str == 'ibp': 55 | U = np.random.binomial(1, pi_ibp[nax,:], size=(nrows, ncomp)) 56 | V = np.random.normal(0., 1., size=(ncomp, ncols)) 57 | data = np.dot(U, V) 58 | components = (U, V) 59 | 60 | elif data_str == 'sparse': 61 | Z = np.random.normal(0., 1., size=(nrows, ncomp)) 62 | U = np.random.normal(0., np.exp(Z)) 63 | V = np.random.normal(0., 1., size=(ncomp, ncols)) 64 | data = np.dot(U, V) 65 | components = (U, V) 66 | 67 | 68 | elif data_str == 'gsm': 69 | U_inner = np.random.normal(0., 1., size=(nrows, 1)) 70 | V_inner = np.random.normal(0., 1., size=(1, ncomp)) 71 | Z = np.random.normal(U_inner * V_inner, 1.) 72 | #Z = 2. * Z / np.sqrt(np.mean(Z**2)) 73 | 74 | U = np.random.normal(0., np.exp(Z)) 75 | V = np.random.normal(0., 1., size=(ncomp, ncols)) 76 | data = np.dot(U, V) 77 | components = (U, V) 78 | 79 | elif data_str == 'irm': 80 | U = np.random.multinomial(1, pi_crp, size=nrows) 81 | R = np.random.normal(0., 1., size=(ncomp, ncomp)) 82 | V = np.random.multinomial(1, pi_crp, size=ncols).T 83 | data = np.dot(np.dot(U, R), V) 84 | components = (U, R, V) 85 | 86 | elif data_str == 'bmf': 87 | U = np.random.binomial(1, pi_ibp[nax,:], size=(nrows, ncomp)) 88 | R = np.random.normal(0., 1., size=(ncomp, ncomp)) 89 | V = np.random.binomial(1, pi_ibp[nax,:], size=(ncols, ncomp)).T 90 | data = np.dot(np.dot(U, R), V) 91 | components = (U, R, V) 92 | 93 | elif data_str == 'mgb': 94 | U = np.random.multinomial(1, pi_crp, size=nrows) 95 | R = np.random.normal(0., 1., size=(ncomp, ncomp)) 96 | V = np.random.binomial(1, pi_ibp[nax,:], size=(ncols, ncomp)).T 97 | data = np.dot(np.dot(U, R), V) 98 | components = (U, R, V) 99 | 100 | elif data_str == 'chain': 101 | data = generate_ar(nrows, ncols, 0.9) 102 | components = (data) 103 | 104 | elif data_str == 'kf': 105 | U = generate_ar(nrows, ncomp, 0.9) 106 | V = np.random.normal(size=(ncomp, ncols)) 107 | data = np.dot(U, V) 108 | components = (U, V) 109 | 110 | elif data_str == 'bctf': 111 | temp1, (U1, V1) = generate_data('mog', nrows, ncols, ncomp, True) 112 | F1 = np.random.normal(temp1, 1.) 113 | temp2, (U2, V2) = generate_data('mog', nrows, ncols, ncomp, True) 114 | F2 = np.random.normal(temp2, 1.) 115 | data = np.dot(F1, F2.T) 116 | components = (U1, V1, F1, U2, V2, F2) 117 | 118 | 119 | data /= np.std(data) 120 | 121 | if transpose: 122 | data = data.T 123 | 124 | if return_components: 125 | return data, components 126 | else: 127 | return data 128 | 129 | 130 | NOISE_STR_VALUES = ['0.1', '1.0', '3.0', '10.0'] 131 | ALL_MODELS = ['pmf', 'mog', 'ibp', 'chain', 'irm', 'bmf', 'kf', 'bctf', 'sparse', 'gsm'] 132 | 133 | 134 | def experiment_name(prefix, noise_str, model): 135 | return '%s_%s_%s' % (prefix, noise_str, model) 136 | 137 | def all_experiment_names(prefix): 138 | return [experiment_name(prefix, noise_str, model) 139 | for noise_str in NOISE_STR_VALUES 140 | for model in ALL_MODELS 141 | ] 142 | 143 | def load_params(prefix): 144 | expt_name = all_experiment_names(prefix)[0] 145 | return storage.load(experiments.params_file(expt_name)) 146 | 147 | def initial_samples_jobs(prefix, level): 148 | return reduce(list.__add__, [experiments.initial_samples_jobs(name, level) 149 | for name in all_experiment_names(prefix)]) 150 | 151 | def initial_samples_key(prefix, level): 152 | return '%s_init_%d' % (prefix, level) 153 | 154 | def evaluation_jobs(prefix, level): 155 | return reduce(list.__add__, [experiments.evaluation_jobs(name, level) 156 | for name in all_experiment_names(prefix)]) 157 | 158 | def evaluation_key(prefix, level): 159 | return '%s_eval_%d' % (prefix, level) 160 | 161 | def final_model_jobs(prefix): 162 | return reduce(list.__add__, [experiments.final_model_jobs(name) 163 | for name in all_experiment_names(prefix)]) 164 | 165 | def final_model_key(prefix): 166 | return '%s_final' % prefix 167 | 168 | def report_dir(prefix): 169 | return os.path.join(config.REPORT_PATH, prefix) 170 | 171 | def report_file(prefix): 172 | return os.path.join(report_dir(prefix), 'results.txt') 173 | 174 | 175 | def init_experiment(prefix, debug, search_depth=3): 176 | experiments.check_required_directories() 177 | 178 | for noise_str in NOISE_STR_VALUES: 179 | for model in ALL_MODELS: 180 | name = experiment_name(prefix, noise_str, model) 181 | if debug: 182 | params = experiments.QuickParams(search_depth=search_depth) 183 | else: 184 | params = experiments.SmallParams(search_depth=search_depth) 185 | data, components = generate_data(model, NUM_ROWS, NUM_COLS, NUM_COMPONENTS, True) 186 | clean_data_matrix = observations.DataMatrix.from_real_values(data) 187 | noise_var = float(noise_str) 188 | noisy_data = np.random.normal(data, np.sqrt(noise_var)) 189 | data_matrix = observations.DataMatrix.from_real_values(noisy_data) 190 | experiments.init_experiment(name, data_matrix, params, components, 191 | clean_data_matrix=clean_data_matrix) 192 | 193 | 194 | def init_level(prefix, level): 195 | for name in all_experiment_names(prefix): 196 | experiments.init_level(name, level) 197 | 198 | def collect_scores_for_level(prefix, level): 199 | for name in all_experiment_names(prefix): 200 | experiments.collect_scores_for_level(name, level) 201 | 202 | def run_everything(prefix, args): 203 | params = load_params(prefix) 204 | init_level(prefix, 1) 205 | experiments.run_jobs(evaluation_jobs(prefix, 1), args, evaluation_key(prefix, 1)) 206 | collect_scores_for_level(prefix, 1) 207 | for level in range(2, params.search_depth + 1): 208 | init_level(prefix, level) 209 | experiments.run_jobs(initial_samples_jobs(prefix, level), args, initial_samples_key(prefix, level)) 210 | experiments.run_jobs(evaluation_jobs(prefix, level), args, evaluation_key(prefix, level)) 211 | collect_scores_for_level(prefix, level) 212 | experiments.run_jobs(final_model_jobs(prefix), args, final_model_key(prefix)) 213 | 214 | 215 | def print_failures(prefix, outfile=sys.stdout): 216 | params = load_params(prefix) 217 | failures = [] 218 | for level in range(1, params.search_depth + 1): 219 | ok_counts = collections.defaultdict(int) 220 | fail_counts = collections.defaultdict(int) 221 | for expt_name in all_experiment_names(prefix): 222 | for _, structure in storage.load(experiments.structures_file(expt_name, level)): 223 | for split_id in range(params.num_splits): 224 | for sample_id in range(params.num_samples): 225 | ok = False 226 | fname = experiments.scores_file(expt_name, level, structure, split_id, sample_id) 227 | if storage.exists(fname): 228 | row_loglik, col_loglik = storage.load(fname) 229 | if np.all(np.isfinite(row_loglik)) and np.all(np.isfinite(col_loglik)): 230 | ok = True 231 | 232 | if ok: 233 | ok_counts[structure] += 1 234 | else: 235 | fail_counts[structure] += 1 236 | 237 | for structure in fail_counts: 238 | if ok_counts[structure] > 0: 239 | failures.append(presentation.Failure(structure, level, False)) 240 | else: 241 | failures.append(presentation.Failure(structure, level, True)) 242 | 243 | presentation.print_failed_structures(failures, outfile) 244 | 245 | def print_learned_structures(prefix, outfile=sys.stdout): 246 | results = [] 247 | for expt_name in all_experiment_names(prefix): 248 | structure, _ = experiments.final_structure(expt_name) 249 | results.append(presentation.FinalResult(expt_name, structure)) 250 | presentation.print_learned_structures(results, outfile) 251 | 252 | def summarize_results(prefix, outfile=sys.stdout): 253 | print_learned_structures(prefix, outfile) 254 | print_failures(prefix, outfile) 255 | 256 | def save_report(name, email=None): 257 | # write to stdout 258 | summarize_results(name) 259 | 260 | # write to report file 261 | if not os.path.exists(report_dir(name)): 262 | os.mkdir(report_dir(name)) 263 | summarize_results(name, open(report_file(name), 'w')) 264 | 265 | if email is not None and email.find('@') != -1: 266 | header = 'experiment %s finished' % name 267 | buff = StringIO.StringIO() 268 | print >> buff, 'These results are best viewed in a monospace font.' 269 | print >> buff 270 | summarize_results(name, buff) 271 | body = buff.getvalue() 272 | buff.close() 273 | misc.send_email(header, body, email) 274 | 275 | 276 | 277 | if __name__ == '__main__': 278 | command = sys.argv[1] 279 | parser = argparse.ArgumentParser() 280 | parser.add_argument('command') 281 | 282 | if command == 'generate': 283 | parser.add_argument('--debug', action='store_true', default=False) 284 | parser.add_argument('--search_depth', type=int, default=DEFAULT_SEARCH_DEPTH) 285 | parser.add_argument('--prefix', type=str, default=DEFAULT_PREFIX) 286 | args = parser.parse_args() 287 | init_experiment(args.prefix, args.debug, args.search_depth) 288 | 289 | elif command == 'init': 290 | parser.add_argument('level', type=int) 291 | parser.add_argument('--prefix', type=str, default=DEFAULT_PREFIX) 292 | experiments.add_scheduler_args(parser) 293 | args = parser.parse_args() 294 | init_level(args.prefix, args.level) 295 | if args.level > 1: 296 | experiments.run_jobs(initial_samples_jobs(args.prefix, args.level), args, 297 | initial_samples_key(args.prefix, args.level)) 298 | 299 | elif command == 'eval': 300 | parser.add_argument('level', type=int) 301 | parser.add_argument('--prefix', type=str, default=DEFAULT_PREFIX) 302 | experiments.add_scheduler_args(parser) 303 | args = parser.parse_args() 304 | experiments.run_jobs(evaluation_jobs(args.prefix, args.level), args, 305 | evaluation_key(args.prefix, args.level)) 306 | collect_scores_for_level(args.prefix, args.level) 307 | 308 | elif command == 'final': 309 | parser.add_argument('level', type=int) 310 | parser.add_argument('--prefix', type=str, default=DEFAULT_PREFIX) 311 | experiments.add_scheduler_args(parser) 312 | args = parser.parse_args() 313 | experiments.run_jobs(final_model_jobs(args.prefix, args.level), args, 314 | final_model_key(args.prefix)) 315 | 316 | elif command == 'everything': 317 | parser.add_argument('--prefix', type=str, default=DEFAULT_PREFIX) 318 | experiments.add_scheduler_args(parser) 319 | args = parser.parse_args() 320 | run_everything(args.prefix, args) 321 | 322 | else: 323 | raise RuntimeError('Unknown command: %s' % command) 324 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | import distributions 2 | import gaussians 3 | import misc 4 | import profiler 5 | import psd_matrices 6 | import storage 7 | -------------------------------------------------------------------------------- /utils/distributions.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | import scipy.special 4 | 5 | # temporary 6 | ALPHA_CRP = 5 7 | 8 | 9 | gammaln = scipy.special.gammaln 10 | 11 | def uni_gauss_information_to_expectation(lam, J): 12 | sigma_sq = 1. / lam 13 | mu = -sigma_sq * J 14 | return sigma_sq, mu 15 | 16 | def uni_gauss_expectation_to_information(sigma_sq, mu): 17 | lam = 1. / sigma_sq 18 | J = -lam * mu 19 | return lam, J 20 | 21 | def gauss_loglik(x, mu, sigma_sq): 22 | return -0.5 * np.log(2*np.pi) - 0.5 * np.log(sigma_sq) \ 23 | - 0.5 * (x - mu)**2 / sigma_sq 24 | 25 | def sample_dirichlet(alpha): 26 | temp = np.random.gamma(alpha) 27 | return temp / np.sum(temp) 28 | 29 | def dirichlet_loglik(alpha, U): 30 | norm = gammaln(alpha.sum(-1)) - gammaln(alpha).sum(-1) 31 | return norm + (U * np.log(alpha-1.)).sum(-1) 32 | 33 | def dirichlet_multinomial_loglik(alpha, U): 34 | c = U.sum(0) 35 | assert alpha.ndim == 1 and alpha.shape == c.shape 36 | return gammaln(alpha + c).sum(-1) - gammaln(alpha).sum(-1) + \ 37 | gammaln(alpha.sum()) - gammaln(alpha.sum() + c.sum()) 38 | 39 | 40 | def check_dirichlet_multinomial_loglik(): 41 | U = np.array([[1, 0], 42 | [1, 0], 43 | [0, 1], 44 | [1, 0]]) 45 | alpha = np.array([1., 1.]) 46 | assert np.allclose(dirichlet_multinomial_loglik(alpha, U), np.log(1./2 * 2./3 * 1./4 * 3./5)) 47 | 48 | def beta_bernoulli_loglik(alpha0, alpha1, U): 49 | M = U.shape[0] 50 | c = U.sum(0) 51 | assert alpha0.ndim == 1 and alpha0.shape == alpha1.shape == c.shape 52 | temp = gammaln(alpha0 + M - c) - gammaln(alpha0) + \ 53 | gammaln(alpha1 + c) - gammaln(alpha1) + \ 54 | gammaln(alpha0 + alpha1 ) - gammaln(alpha0 + alpha1 + M) 55 | return temp.sum() 56 | 57 | def check_beta_bernoulli_loglik(): 58 | U = np.array([[1, 0], 59 | [1, 1], 60 | [0, 1], 61 | [0, 1]]) 62 | alpha0 = np.array([2., 2.]) 63 | alpha1 = np.array([1., 1.]) 64 | result = beta_bernoulli_loglik(alpha0, alpha1, U) 65 | assert np.allclose(result, np.log(1./3) + np.log(2./4) + np.log(2./5) + np.log(3./6) + 66 | np.log(2./3) + np.log(1./4) + np.log(2./5) + np.log(3./6)) 67 | 68 | 69 | 70 | class GammaDistribution: 71 | def __init__(self, a, b): 72 | if np.shape(a) != np.shape(b): 73 | raise RuntimeError('a and b should be the same shape') 74 | self.a = a 75 | self.b = b 76 | 77 | def expectation(self): 78 | return self.a / self.b 79 | 80 | def variance(self): 81 | return self.a / self.b**2 82 | 83 | def expectation_log(self): 84 | return scipy.special.basic.digamma(self.a) - np.log(self.b) 85 | 86 | def entropy(self): 87 | return scipy.special.gammaln(self.a) - (self.a - 1.) * scipy.special.basic.digamma(self.a) - np.log(self.b) + self.a 88 | 89 | def sample(self): 90 | return np.random.gamma(self.a, 1./self.b) 91 | 92 | def loglik(self, tau): 93 | return self.a * np.log(self.b) - scipy.special.gammaln(self.a) + (self.a - 1.) * np.log(tau) - self.b * tau 94 | 95 | def perturb(self, eps=1e-5): 96 | a = self.a * np.exp(np.random.normal(0., eps, size=self.a.shape)) 97 | b = self.b * np.exp(np.random.normal(0., eps, size=self.b.shape)) 98 | return GammaDistribution(a, b) 99 | 100 | def copy(self): 101 | try: 102 | return GammaDistribution(self.a.copy(), self.b.copy()) 103 | except: # not arrays 104 | return GammaDistribution(self.a, self.b) 105 | 106 | class InverseGammaDistribution: 107 | def __init__(self, a, b): 108 | self.a = a 109 | self.b = b 110 | 111 | def sample(self): 112 | return 1. / np.random.gamma(self.a, 1. / self.b) 113 | 114 | def loglik(self, tau): 115 | return GammaDistribution(self.a, self.b).loglik(1. / tau) - 2 * np.log(tau) 116 | 117 | 118 | class MultinomialDistribution: 119 | def __init__(self, log_p): 120 | # take log_p rather than p as an argument because of underflow 121 | self.log_p = log_p 122 | self.p = np.exp(log_p) 123 | self.p /= self.p.sum(-1)[..., nax] # should already be normalized, but sometimes numerical error causes problems 124 | 125 | def expectation(self): 126 | return self.p 127 | 128 | def sample(self): 129 | #return np.random.multinomial(1, self.p) 130 | shape = self.p.shape[:-1] 131 | pr = int(np.prod(shape)) 132 | p = self.p.reshape((pr, self.p.shape[-1])) 133 | temp = np.array([np.random.multinomial(1, p[i, :]) 134 | for i in range(pr)]) 135 | return temp.reshape(shape + (self.p.shape[-1],)) 136 | 137 | def loglik(self, a): 138 | a = np.array(a) 139 | if not np.issubdtype(a.dtype, int): 140 | raise RuntimeError('a must be an integer array') 141 | if np.shape(a) != np.shape(self.p)[:a.ndim]: 142 | raise RuntimeError('sizes do not match') 143 | 144 | if a.ndim == self.p.ndim: 145 | if not (np.all((a == 0) + (a == 1)) and a.sum(-1) == 1): 146 | raise RuntimeError('a must be 1-of-n') 147 | return np.sum(a * self.log_p) 148 | elif a.ndim == self.p.ndim - 1: 149 | shp = np.shape(self.log_p)[:-1] 150 | size = np.prod(shp).astype(int) 151 | log_p_ = self.log_p.reshape((size, np.shape(self.log_p)[-1])) 152 | a_ = a.ravel() 153 | result = log_p_[np.arange(size), a_] 154 | return result.reshape(shp) 155 | else: 156 | raise RuntimeError('sizes do not match') 157 | 158 | def __slice__(self, slc): 159 | return MultinomialDistribution(self.log_p[slc]) 160 | 161 | @staticmethod 162 | def from_odds(odds): 163 | return MultinomialDistribution(odds - np.logaddexp.reduce(odds, axis=-1)[..., nax]) 164 | 165 | class BernoulliDistribution: 166 | def __init__(self, odds): 167 | self.odds = odds 168 | 169 | def _p(self): 170 | return 1. / (1 + np.exp(-self.odds)) 171 | 172 | def expectation(self): 173 | return self._p() 174 | 175 | def variance(self): 176 | p = self._p() 177 | return p * (1. - p) 178 | 179 | def sample(self): 180 | return np.random.binomial(1, self._p()) 181 | 182 | def loglik(self, a): 183 | if not np.issubdtype(a.dtype, int): 184 | raise RuntimeError('a must be an integer array') 185 | if not np.all((a==0) + (a==1)): 186 | raise RuntimeError('a must be a binary array') 187 | 188 | log_p = -np.logaddexp(0., -self.odds) 189 | log_1_minus_p = -np.logaddexp(0., self.odds) 190 | return a * log_p + (1-a) * log_1_minus_p 191 | 192 | @staticmethod 193 | def from_odds(odds): 194 | return BernoulliDistribution(odds) 195 | 196 | 197 | 198 | class GaussianDistribution: 199 | def __init__(self, mu, sigma_sq): 200 | self.mu = mu 201 | self.sigma_sq = sigma_sq 202 | 203 | def loglik(self, x): 204 | return -0.5 * np.log(2*np.pi) + \ 205 | -0.5 * np.log(self.sigma_sq) + \ 206 | -0.5 * (x - self.mu) ** 2 / self.sigma_sq 207 | 208 | def sample(self): 209 | return np.random.normal(self.mu, self.sigma_sq) 210 | 211 | def maximize(self): 212 | return self.mu 213 | -------------------------------------------------------------------------------- /utils/gaussians.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | nax = np.newaxis 3 | 4 | import psd_matrices 5 | 6 | #from profiler import profiled 7 | import profiler 8 | profiled = profiler.profiled('gaussians') 9 | from misc import _err_string, process_slice, my_sum, match_shapes, dot, full_shape, broadcast, set_err_info, transp 10 | 11 | 12 | class Potential(): 13 | def __init__(self, J, Lambda, Z): 14 | J, Lambda, Z = match_shapes([('J', J, 1), ('Lambda', Lambda, 0), ('Z', Z, 0)]) 15 | self._J = J 16 | self._Lambda = Lambda 17 | self._Z = Z 18 | self.shape = full_shape([J.shape[:-1], Lambda.shape, Z.shape]) 19 | self.ndim = J.ndim - 1 20 | self.dim = J.shape[-1] 21 | self.shape_str = '%s J=%s Z=%s %s' % (Lambda.__class__, J.shape, Z.shape, Lambda.shape_str) 22 | self.mutable = False 23 | 24 | def set_mutable(self, m): 25 | # copy everything, just in case 26 | self._J = self._J.copy() 27 | self._Lambda = self._Lambda.copy() 28 | self._Z = self._Z.copy() 29 | 30 | self.mutable = m 31 | self._Lambda.set_mutable(m) 32 | 33 | 34 | @profiled 35 | def full(self): 36 | return Potential(self._J, self._Lambda.full(), self._Z) 37 | 38 | @profiled 39 | def copy(self): 40 | return Potential(self._J.copy(), self._Lambda.copy(), self._Z.copy()) 41 | 42 | @profiled 43 | def score(self, x): 44 | return -0.5 * self._Lambda.qform(x) + (self._J * x).sum(-1) + self._Z 45 | 46 | @profiled 47 | def flip(self): 48 | return Potential(-self._J, self._Lambda, self._Z) 49 | 50 | @profiled 51 | def translate(self, dmu): 52 | new_J = self._J + self._Lambda.dot(dmu) 53 | linv = self._Lambda.pinv() 54 | new_Z = self._Z + 0.5 * linv.qform(self._J) - 0.5 * linv.qform(new_J) 55 | return Potential(new_J, self._Lambda, new_Z) 56 | 57 | def __getitem__(self, slc): 58 | return self.__slice__(slc) 59 | 60 | @profiled 61 | def __slice__(self, slc): 62 | J_slc = process_slice(slc, self._J.shape, 1) 63 | Lambda_slc = process_slice(slc, self._Lambda.shape, 0) 64 | Z_slc = process_slice(slc, self._Z.shape, 0) 65 | return Potential(self._J[J_slc], self._Lambda[Lambda_slc], self._Z[Z_slc]) 66 | 67 | def __setitem__(self, slc, other): 68 | return self.__setslice__(slc, other) 69 | 70 | @profiled 71 | def __setslice__(self, slc, other): 72 | if not self.mutable: 73 | raise RuntimeError('Attempt to modify immutable potential') 74 | J_slc = process_slice(slc, self._J.shape, 1) 75 | Lambda_slc = process_slice(slc, self._Lambda.shape, 0) 76 | Z_slc = process_slice(slc, self._Z.shape, 0) 77 | self._J[J_slc] = other._J 78 | self._Lambda[Lambda_slc] = other._Lambda 79 | self._Z[Z_slc] = other._Z 80 | 81 | @profiled 82 | def __add__(self, other): 83 | return Potential(self._J + other._J, self._Lambda + other._Lambda, self._Z + other._Z) 84 | 85 | @profiled 86 | def __sub__(self, other): 87 | return Potential(self._J - other._J, self._Lambda - other._Lambda, self._Z - other._Z) 88 | 89 | @profiled 90 | def __mul__(self, other): 91 | other = np.asarray(other) 92 | return Potential(self._J * other[..., nax], self._Lambda * other, self._Z * other) 93 | 94 | @profiled 95 | def __rmul__(self, other): 96 | return self * other 97 | 98 | @profiled 99 | def sum(self, axis): 100 | assert type(axis) == int and 0 <= axis < self.ndim 101 | return Potential(my_sum(self._J, axis, self.shape[axis]), 102 | my_sum(self._Lambda, axis, self.shape[axis]), 103 | my_sum(self._Z, axis, self.shape[axis])) 104 | 105 | @profiled 106 | def conv(self, other): 107 | J1, J2, Lambda1, Lambda2, Z1, Z2 = self._J, other._J, self._Lambda, other._Lambda, self._Z, other._Z 108 | LL = Lambda1 + Lambda2 109 | P = LL.pinv() 110 | Lambda_c = Lambda1.conv(Lambda2) 111 | J_c = Lambda1.dot(P.dot(J2)) + Lambda2.dot(P.dot(J1)) 112 | Z_c = 0.5 * P.qform(J1 - J2) + 0.5 * self.dim * np.log(2*np.pi) - 0.5 * LL.logdet() + Z1 + Z2 113 | return Potential(J_c, Lambda_c, Z_c) 114 | 115 | @profiled 116 | def transform(self, A): 117 | J = dot(transp(A), self._J) 118 | Lambda = self._Lambda.alat(transp(A)) 119 | return Potential(J, Lambda, self._Z) 120 | 121 | @profiled 122 | def rescale(self, a): 123 | a = np.array(a) 124 | J = a[..., nax] * self._J 125 | Lambda = self._Lambda.rescale(a) 126 | return Potential(J, Lambda, self._Z) 127 | 128 | 129 | 130 | @profiled 131 | def integral(self): 132 | J, Lambda, Z = self._J, self._Lambda, self._Z 133 | linv = Lambda.pinv() 134 | return 0.5 * self.dim * np.log(2*np.pi) - 0.5 * Lambda.logdet() + 0.5 * linv.qform(J) + Z 135 | 136 | @profiled 137 | def renorm(self): 138 | return Potential(self._J, self._Lambda, self._Z - self.integral()) 139 | 140 | @profiled 141 | def add_dummy_dimension(self): 142 | J = np.zeros(self._J.shape[:-1] + (self.dim + 1,)) 143 | J[..., 1:] = self._J 144 | Lambda = self._Lambda.add_dummy_dimension() 145 | return Potential(J, Lambda, self._Z) 146 | 147 | @profiled 148 | def to_eig(self): 149 | return Potential(self._J, self._Lambda.to_eig(), self._Z) 150 | 151 | @staticmethod 152 | @profiled 153 | def from_moments(mu, Sigma): 154 | return Distribution(mu, Sigma).to_potential() 155 | 156 | @staticmethod 157 | @profiled 158 | def from_moments_full(mu, Sigma): 159 | return Distribution(mu, psd_matrices.FullMatrix(Sigma)).to_potential() 160 | 161 | @staticmethod 162 | @profiled 163 | def from_moments_diag(mu, sigma_sq): 164 | return Distribution(mu, psd_matrices.DiagonalMatrix(sigma_sq)).to_potential() 165 | 166 | @staticmethod 167 | @profiled 168 | def from_moments_iso(mu, sigma_sq): 169 | sigma_sq = np.asarray(sigma_sq) 170 | return Distribution(mu, psd_matrices.EyeMatrix(sigma_sq, mu.shape[-1])).to_potential() 171 | 172 | @staticmethod 173 | @profiled 174 | def from_moments_eig(mu, d, Q, s_perp): 175 | return Distribution(mu, psd_matrices.FixedEigMatrix(d, Q, s_perp)).to_potential() 176 | 177 | @profiled 178 | def allclose(self, other): 179 | J_err = _err_string(self._J, other._J) 180 | Lambda_err = _err_string(self._Lambda.full()._S, other._Lambda.full()._S) 181 | Z_err = _err_string(self._Z, other._Z) 182 | set_err_info('gaussians', [('J', J_err), ('Lambda', Lambda_err), ('Z', Z_err)]) 183 | 184 | return np.allclose(self._J, other._J) and \ 185 | self._Lambda.allclose(other._Lambda) and \ 186 | np.allclose(self._Z, other._Z) 187 | 188 | @profiled 189 | def to_distribution(self): 190 | Sigma = self._Lambda.inv() 191 | mu = Sigma.dot(self._J) 192 | Z = self._Z + 0.5 * self.dim * np.log(2*np.pi) + 0.5 * Sigma.logdet() + 0.5 * Sigma.qform(self._J) 193 | return Distribution(mu, Sigma, Z) 194 | 195 | @staticmethod 196 | def random(J_shape, Z_shape, Lambda, dim): 197 | J = np.random.normal(size=J_shape + (dim,)) 198 | Z = np.random.normal(size=Z_shape) 199 | return Potential(J, Lambda, Z) 200 | 201 | @profiled 202 | def conditionals(self, X): 203 | return Conditionals.from_potential(self, X) 204 | 205 | @profiled 206 | def mu(self): 207 | return self._Lambda.pinv().dot(self._J) 208 | 209 | 210 | class Distribution: 211 | def __init__(self, mu, Sigma, Z=0.): 212 | mu, Sigma, Z = match_shapes([('mu', mu, 1), ('Sigma', Sigma, 0), ('Z', Z, 0)]) 213 | self._mu = mu 214 | self._Sigma = Sigma 215 | self._Z = Z 216 | self.dim = mu.shape[-1] 217 | self.ndim = mu.ndim - 1 218 | self.shape = full_shape([Sigma.shape, mu.shape[:-1], Z.shape]) 219 | self.shape_str = '%s mu=%s Z=%s %s' % (Sigma.__class__, mu.shape, Z.shape, Sigma.shape_str) 220 | 221 | def allclose(self, other): 222 | return np.allclose(self._mu, other._mu) and \ 223 | np.allclose(self._Z, other._Z) and \ 224 | self._Sigma.allclose(other._Sigma) 225 | 226 | @profiled 227 | def full(self): 228 | return Distribution(self._mu, self._Sigma.full(), self._Z) 229 | 230 | @profiled 231 | def __add__(self, other): 232 | return Distribution(self._mu + other._mu, self._Sigma + other._Sigma, self._Z + other._Z) 233 | 234 | @profiled 235 | def translate(self, dmu): 236 | return Distribution(self._mu + dmu, self._Sigma, self._Z) 237 | 238 | @profiled 239 | def to_potential(self): 240 | Lambda = self._Sigma.inv() 241 | J = Lambda.dot(self._mu) 242 | Z = -0.5 * self.dim * np.log(2*np.pi) - 0.5 * self._Sigma.logdet() - 0.5 * self._Sigma.qform(J) + self._Z 243 | return Potential(J, Lambda, Z) 244 | 245 | @profiled 246 | def sample(self): 247 | return self._mu + self._Sigma.sqrt_dot(np.random.normal(size=self.shape + (self.dim,))) 248 | 249 | @profiled 250 | def transform(self, A): 251 | return Distribution(dot(A, self._mu), self._Sigma.alat(A), self._Z) 252 | 253 | @profiled 254 | def __slice__(self, slc): 255 | mu_slc = process_slice(slc, self._mu.shape, 1) 256 | Sigma_slc = process_slice(slc, self._Sigma.shape, 0) 257 | Z_slc = process_slice(slc, self._Z.shape, 0) 258 | return Distribution(self._mu[mu_slc], self._Sigma[Sigma_slc], self._Z[Z_slc]) 259 | 260 | @profiled 261 | def loglik(self, x): 262 | return self.to_potential().score(x) 263 | 264 | @staticmethod 265 | def from_moments_full(mu, Sigma, Z=0.): 266 | return Distribution(mu, psd_matrices.FullMatrix(Sigma), Z) 267 | 268 | @staticmethod 269 | def from_moments_diag(mu, sigma_sq, Z=0.): 270 | return Distribution(mu, psd_matrices.FullMatrix(np.diag(sigma_sq)), Z) 271 | 272 | @staticmethod 273 | def from_moments_iso(mu, sigma_sq, Z=0.): 274 | dim = mu.shape[-1] 275 | return Distribution(mu, psd_matrices.EyeMatrix(sigma_sq, dim), Z) 276 | 277 | def mu(self): 278 | return self._mu 279 | 280 | def Sigma(self): 281 | return self._Sigma.full()._S 282 | 283 | def Z(self): 284 | return self._Z 285 | 286 | 287 | class Conditionals: 288 | def __init__(self, Lambda, J_diff, Z_diff, X): 289 | Lambda, J_diff, X = match_shapes([('Lambda', Lambda, 0), ('J_diff', J_diff, 1), ('X', X, 1)]) 290 | self._Lambda = Lambda 291 | self._J_diff = J_diff.copy() 292 | self._Z_diff = Z_diff.copy() 293 | self._X = X.copy() 294 | self.dim = self._J_diff.shape[-1] 295 | self.ndim = self._J_diff.ndim - 1 296 | self.shape = full_shape([Lambda.shape, J_diff.shape[:-1], X.shape[:-1]]) 297 | self.shape_str = '%s J_diff=%s X=%s %s' % (Lambda.__class__, J_diff.shape, X.shape, Lambda.shape_str) 298 | 299 | ## can't have EigMatrix of zero dimensions, since NumPy doesn't like zero-dimensional object arrays 300 | #if self.shape == () and isinstance(Lambda, psd_matrices.EigMatrix): 301 | # self._Lambda = self._Lambda.full() 302 | 303 | def allclose(self, other): 304 | return self._Lambda.allclose(other._Lambda) and \ 305 | np.allclose(self._J_diff, other._J_diff) and \ 306 | np.allclose(self._Z_diff, other._Z_diff) and \ 307 | np.allclose(self._X, other._X) 308 | 309 | @profiled 310 | def __slice__(self, slc): 311 | Lambda_slc = process_slice(slc, self._Lambda.shape, 0) 312 | J_slc = process_slice(slc, self._J_diff.shape, 1) 313 | Z_slc = process_slice(slc, self._Z_diff.shape, 0) 314 | X_slc = process_slice(slc, self._X.shape, 1) 315 | return Conditionals(self._Lambda[Lambda_slc], self._J_diff[J_slc], self._Z_diff[Z_slc], self._X[X_slc]) 316 | 317 | @profiled 318 | def conditional_for(self, i): 319 | Lambda = psd_matrices.EyeMatrix(self._Lambda.elt(i, i), 1) 320 | return Potential(self._J_diff[..., i:i+1].copy(), Lambda, self._Z_diff).translate(self._X[..., i:i+1]) 321 | 322 | @profiled 323 | def assign(self, j, x_new): 324 | diff = x_new - self._X[..., j] 325 | self._X[..., j] = x_new 326 | self._Z_diff += self._J_diff[..., j] * diff + \ 327 | -0.5 * self._Lambda.elt(j, j) * diff ** 2 328 | self._J_diff -= diff[..., nax] * self._Lambda.col(j) 329 | 330 | 331 | @profiled 332 | def assign_one(self, idx, j, x_new): 333 | if type(idx) == int: 334 | idx = (idx,) 335 | diff = x_new - self._X[idx + (j,)] 336 | self._X[idx + (j,)] = x_new 337 | Lambda_idx = broadcast(idx, self._Lambda.shape) 338 | self._Z_diff[idx] += self._J_diff[idx + (j,)] * diff + \ 339 | -0.5 * self._Lambda[Lambda_idx].elt(j, j) * diff ** 2 340 | self._J_diff[idx + (slice(None),)] -= diff * self._Lambda[Lambda_idx].col(j) 341 | 342 | @staticmethod 343 | @profiled 344 | def from_potential(pot, X): 345 | J_diff = pot._J - pot._Lambda.dot(X) 346 | Z_diff = pot.score(X) 347 | return Conditionals(pot._Lambda, J_diff, Z_diff, X) 348 | 349 | 350 | -------------------------------------------------------------------------------- /utils/misc.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import itertools 3 | import numpy as np 4 | nax = np.newaxis 5 | import progressbar 6 | import scipy.linalg, scipy.integrate 7 | import smtplib 8 | import sys 9 | import termcolor 10 | 11 | 12 | def is_diag(A): 13 | return A.shape[0] == A.shape[1] and np.all(A == np.diag(np.diag(A))) 14 | 15 | def my_svd(A): 16 | m, n = A.shape 17 | if is_diag(A): 18 | return np.eye(m), np.diag(A), np.eye(m) 19 | else: 20 | return scipy.linalg.svd(A, full_matrices=False) 21 | 22 | def map_gaussian_matrix(A, B, C, d_1, d_2, d_3, d_4): 23 | """sample X, where P(X) \propto e^{-J(X)} and 24 | J(X) = 1/2 \|D_1(AXB - C)D_2\|^2 + 1/2 \|D_3 X D_4\|^2.""" 25 | A_tilde = d_1[:, nax] * A / d_3[nax, :] 26 | B_tilde = (1. / d_4[:, nax]) * B * d_2[nax, :] 27 | C_tilde = d_1[:, nax] * C * d_2[nax, :] 28 | 29 | U_A, lambda_A, Vt_A = my_svd(A_tilde) 30 | V_A = Vt_A.T 31 | 32 | U_B, lambda_B, Vt_B = my_svd(B_tilde) 33 | V_B = Vt_B.T 34 | 35 | Lambda = lambda_A[:, nax] * lambda_B[nax, :] 36 | Y = Lambda * np.dot(np.dot(U_A.T, C_tilde), V_B) / (1. + Lambda**2) 37 | X_tilde = np.dot(np.dot(V_A, Y), U_B.T) 38 | X = (1. / d_3[:, nax]) * X_tilde * (1. / d_4[nax, :]) 39 | 40 | return X 41 | 42 | def map_gaussian_matrix_em(A, B, C, d_1, d_2, d_3, d_4, obs, X): 43 | C_ = np.where(obs, C, np.dot(np.dot(A, X), B)) 44 | return map_gaussian_matrix(A, B, C_, d_1, d_2, d_3, d_4) 45 | 46 | 47 | def sample_gaussian_matrix(A, B, C, d_1, d_2, d_3, d_4): 48 | """sample X, where P(X) \propto e^{-J(X)} and 49 | J(X) = 1/2 \|D_1(AXB - C)D_2\|^2 + 1/2 \|D_3 X D_4\|^2.""" 50 | A_tilde = d_1[:, nax] * A / d_3[nax, :] 51 | B_tilde = (1. / d_4[:, nax]) * B * d_2[nax, :] 52 | C_tilde = d_1[:, nax] * C * d_2[nax, :] 53 | 54 | U_A, lambda_A, Vt_A = my_svd(A_tilde) 55 | V_A = Vt_A.T 56 | 57 | U_B, lambda_B, Vt_B = my_svd(B_tilde) 58 | V_B = Vt_B.T 59 | 60 | Lambda = lambda_A[:, nax] * lambda_B[nax, :] 61 | Y_mean = Lambda * np.dot(np.dot(U_A.T, C_tilde), V_B) / (1. + Lambda**2) 62 | Y_var = 1. / (1. + Lambda**2) 63 | Y = np.random.normal(Y_mean, np.sqrt(Y_var)) 64 | X_tilde = np.dot(np.dot(V_A, Y), U_B.T) 65 | X = (1. / d_3[:, nax]) * X_tilde * (1. / d_4[nax, :]) 66 | 67 | return X 68 | 69 | def sample_gaussian_matrix_em(A, B, C, d_1, d_2, d_3, d_4, obs, X): 70 | C_ = np.where(obs, C, np.dot(np.dot(A, X), B)) 71 | return sample_gaussian_matrix(A, B, C_, d_1, d_2, d_3, d_4) 72 | 73 | 74 | 75 | def sample_gaussian_matrix2(A, B, W_X, W_N): 76 | nrows, ncols = A.shape[1], B.shape[1] 77 | X = np.zeros((nrows, ncols)) 78 | for j in range(ncols): 79 | Lambda = np.dot(np.dot(A.T, np.diag(W_N[:,j])), A) + np.diag(W_X[:,j]) 80 | Sigma = np.linalg.inv(Lambda) 81 | mu = mult([Sigma, A.T, W_N[:,j] * B[:,j]]) 82 | X[:,j] = np.random.multivariate_normal(mu, Sigma) 83 | return X 84 | 85 | def map_gaussian_matrix2(A, B, W_X, W_N): 86 | nrows, ncols = A.shape[1], B.shape[1] 87 | X = np.zeros((nrows, ncols)) 88 | for j in range(ncols): 89 | Lambda = np.dot(np.dot(A.T, np.diag(W_N[:,j])), A) + np.diag(W_X[:,j]) 90 | Sigma = np.linalg.inv(Lambda) 91 | mu = mult([Sigma, A.T, W_N[:,j] * B[:,j]]) 92 | X[:,j] = mu 93 | return X 94 | 95 | 96 | 97 | def mult(matrices): 98 | """Matrix multiplication""" 99 | prod = matrices[0] 100 | for mat in matrices[1:]: 101 | prod = np.dot(prod, mat) 102 | return prod 103 | 104 | 105 | 106 | def mean_field(J, Lambda, z_init=None): 107 | n = J.size 108 | assert J.shape == (n,) and Lambda.shape == (n, n) 109 | if z_init is not None: 110 | z = z_init.copy() 111 | else: 112 | z = np.zeros(n) 113 | 114 | # move quadratic potentials for one variable to unary terms 115 | J = J + 0.5 * Lambda[range(n), range(n)] 116 | Lambda[range(n), range(n)] = 0. 117 | 118 | for tr in range(100): 119 | for j in range(n): 120 | Lambda_term = np.dot(Lambda, z) 121 | odds = -J - Lambda_term 122 | odds = odds.clip(-100., 100.) # to avoid the overflow warnings 123 | z_new = 1. / (1. + np.exp(-odds)) 124 | z[j] = 0.8*z[j] + 0.2*z_new[j] 125 | 126 | return z 127 | 128 | NEWLINE_EVERY = 50 129 | dummy_count = [0] 130 | def print_dot(count=None, max=None): 131 | print_count = (count is not None) 132 | if count is None: 133 | dummy_count[0] += 1 134 | count = dummy_count[0] 135 | sys.stdout.write('.') 136 | sys.stdout.flush() 137 | if count % NEWLINE_EVERY == 0: 138 | if print_count: 139 | if max is not None: 140 | sys.stdout.write(' [%d/%d]' % (count, max)) 141 | else: 142 | sys.stdout.write(' [%d]' % count) 143 | sys.stdout.write('\n') 144 | elif count == max: 145 | sys.stdout.write('\n') 146 | sys.stdout.flush() 147 | 148 | 149 | 150 | def sample_noise(N, obs=None, b0=1.): 151 | if obs is None: 152 | obs = np.ones(N.shape, dtype=bool) 153 | 154 | nrows, ncols = N.shape 155 | ssq_rows, ssq_cols = sample_noise_tied(N, obs, b0) 156 | lambda_rows = 1. / ssq_rows 157 | lambda_cols = 1. / ssq_cols 158 | 159 | a0 = 1. 160 | 161 | for tr in range(10): 162 | a = a0 + 0.5 * obs.sum(1) 163 | b = b0 + 0.5 * np.sum(obs * N**2 * lambda_cols[nax,:], axis=1) 164 | lambda_rows = np.random.gamma(a, 1. / b) 165 | 166 | if np.isscalar(lambda_rows): # np.random.gamma converts singleton arrays into scalars 167 | lambda_rows = np.array([lambda_rows]) 168 | 169 | a = a0 + 0.5 * obs.sum(0) 170 | b = b0 + 0.5 * np.sum(obs * N**2 * lambda_rows[:,nax], axis=0) 171 | lambda_cols = np.random.gamma(a, 1. / b) 172 | 173 | if np.isscalar(lambda_cols): 174 | lambda_cols = np.array([lambda_cols]) 175 | 176 | return 1. / lambda_rows, 1. / lambda_cols 177 | 178 | def sample_noise_tied(N, obs=None, b0=1.): 179 | if obs is None: 180 | obs = np.ones(N.shape, dtype=bool) 181 | nrows, ncols = N.shape 182 | a0 = 1. 183 | a = a0 + 0.5 * obs.sum() 184 | b = b0 + 0.5 * np.sum(obs * N**2) 185 | prec = np.random.gamma(a, 1. / b) 186 | 187 | return np.ones(nrows) / np.sqrt(prec), np.ones(ncols) / np.sqrt(prec) 188 | 189 | def sample_col_noise(N): 190 | nrows, ncols = N.shape 191 | A0 = 1. 192 | B0 = 1. 193 | B0 = np.mean(N**2) # UNDO 194 | a = A0 + 0.5 * nrows 195 | b = B0 + 0.5 * np.sum(N**2, axis=0) 196 | return 1. / np.random.gamma(a, 1. / b) 197 | 198 | 199 | 200 | 201 | def kalman_filter_diag(mu_0, sigma_sq_0, sigma_sq_v, lam, y): 202 | ndim, ntime = y.shape 203 | mu_forward = np.zeros((ndim, ntime)) 204 | sigma_sq_forward = np.zeros((ndim, ntime)) 205 | mu_forward[:, 0] = mu_0 206 | sigma_sq_forward[:, 0] = sigma_sq_0 207 | 208 | a = b = 1. 209 | 210 | mu = np.zeros((ndim, ntime)) 211 | sigma_sq = np.zeros((ndim, ntime)) 212 | 213 | # forward propagation 214 | for t in range(ntime): 215 | # execute dynamics 216 | if t > 0: 217 | mu_forward[:, t] = a * mu[:, t-1] 218 | sigma_sq_forward[:, t] = a**2 * sigma_sq[:, t-1] + sigma_sq_v 219 | 220 | # account for observations 221 | lambda_post = 1. / sigma_sq_forward[:, t] + b**2 * lam[:, t] 222 | h_post = mu_forward[:, t] / sigma_sq_forward[:, t] + \ 223 | b * y[:, t] * lam[:, t] 224 | mu[:, t] = h_post / lambda_post 225 | sigma_sq[:, t] = 1. / lambda_post 226 | 227 | h_backward = np.zeros((ndim, ntime)) 228 | lambda_backward = np.zeros((ndim, ntime)) 229 | 230 | # backward_propagation 231 | for t in range(ntime-1)[::-1]: 232 | lambda_post = lambda_backward[:, t+1] + b**2 * lam[:, t+1] 233 | h_post = h_backward[:, t+1] + b * lam[:, t+1] * y[:, t+1] 234 | 235 | lambda_backward[:, t] = a**2 / (sigma_sq_v + 1. / lambda_post) 236 | h_backward[:, t] = a * h_post / (sigma_sq_v * lambda_post + 1.) 237 | 238 | # combine both directions 239 | lambda_forward = 1. / sigma_sq_forward 240 | h_forward = mu_forward / sigma_sq_forward 241 | lambda_post = lambda_forward + lambda_backward + b**2 * lam 242 | h_post = h_forward + h_backward + b * lam * y 243 | sigma_sq_post = 1. / lambda_post 244 | mu_post = h_post / lambda_post 245 | 246 | assert np.all(np.isfinite(mu_post)) 247 | 248 | return mu_post, sigma_sq_post 249 | 250 | def kalman_filter_codiag(mu_0, sigma_sq_0, sigma_sq_v, Lambda, y, mask): 251 | assert np.isscalar(sigma_sq_0) and np.isscalar(sigma_sq_v) 252 | ndim, ntime = y.shape 253 | d, Q = scipy.linalg.eigh(Lambda) 254 | mu_0_proj = np.dot(Q.T, mu_0) 255 | y_proj = np.dot(Q.T, y) 256 | lam = d[:, nax] * mask[nax, :] 257 | mu_post_proj, sigma_sq_post_proj = kalman_filter_diag( 258 | mu_0_proj, sigma_sq_0, sigma_sq_v, lam, y_proj) 259 | mu_post = np.dot(Q, mu_post_proj) 260 | Sigma_post = np.array([np.dot(Q, np.dot(np.diag(sigma_sq_post_proj[:, t]), Q.T)) 261 | for t in range(ntime)]).T 262 | return mu_post, Sigma_post 263 | 264 | def kalman_filter_codiag2(mu_0, Sigma_v, Lambda, y, mask): 265 | ndim, ntime = y.shape 266 | d, Q = scipy.linalg.eigh(Sigma_v) 267 | idxs = np.where(d > 1e-6)[0] 268 | d, Q = d[idxs], Q[:, idxs] 269 | sqrt_d = d ** 0.5 270 | S = np.dot(Q, np.diag(sqrt_d)) 271 | 272 | mu_0_trans = np.dot(Q.T, mu_0) / sqrt_d 273 | Lambda_trans = np.dot(S.T, np.dot(Lambda, S)) 274 | y_trans = np.dot(Q.T, y) / sqrt_d[:, nax] 275 | mu_trans, Sigma_trans = kalman_filter_codiag(mu_0_trans, 1e5, 1., Lambda_trans, y_trans, mask) 276 | mu = np.dot(S, mu_trans) 277 | Sigma = np.array([np.dot(S, np.dot(Sigma_trans[:, :, t], S.T)) 278 | for t in range(ntime)]).T 279 | return mu, Sigma 280 | 281 | 282 | 283 | def logdet(A): 284 | """Compute the log-determinant of a symmetric positive definite matrix A using the Cholesky factorization.""" 285 | L = np.linalg.cholesky(A) 286 | return 2 * np.sum(np.log(np.diag(L))) 287 | 288 | def slice_list(lst, slc): 289 | """Slice a Python list as if it were an array.""" 290 | if isinstance(slc, np.ndarray): 291 | slc = slc.ravel() 292 | idxs = np.arange(len(lst))[slc] 293 | return [lst[i] for i in idxs] 294 | 295 | def extract_slices(slc): 296 | if type(slc) == tuple: 297 | result = [] 298 | for s in slc: 299 | if isinstance(s, np.ndarray): 300 | result.append(s.ravel()) 301 | else: 302 | result.append(s) 303 | return tuple(result) 304 | else: 305 | return slc 306 | 307 | def _err_string(arr1, arr2): 308 | try: 309 | if np.allclose(arr1, arr2): 310 | return 'OK' 311 | elif arr1.shape == arr2.shape: 312 | return 'off by %s' % np.abs(arr1 - arr2).max() 313 | else: 314 | return 'incorrect shapes: %s and %s' % (arr1.shape, arr2.shape) 315 | except: 316 | return 'error comparing' 317 | 318 | err_info = collections.defaultdict(list) 319 | def set_err_info(key, info): 320 | err_info[key] = info 321 | 322 | def summarize_error(key): 323 | """Print a helpful description of the reason a condition was not satisfied. Intended usage: 324 | assert pot1.allclose(pot2), summarize_error()""" 325 | if type(err_info[key]) == str: 326 | return ' ' + err_info[key] 327 | else: 328 | return '\n' + '\n'.join([' %s: %s' % (name, err) for name, err in err_info[key]]) + '\n' 329 | 330 | 331 | def broadcast(idx, shape): 332 | result = [] 333 | for i, d in zip(idx, shape): 334 | if d == 1: 335 | result.append(0) 336 | else: 337 | result.append(i) 338 | return tuple(result) 339 | 340 | def full_shape(shapes): 341 | """The shape of the full array that results from broadcasting the arrays of the given shapes.""" 342 | return tuple(np.array(shapes).max(0)) 343 | 344 | 345 | def array_map(fn, arrs, n): 346 | """Takes a list of arrays a_1, ..., a_n where the elements of the first n dimensions line up. For every possible 347 | index into the first n dimensions, apply fn to the corresponding slices, and combine the results into 348 | an n-dimensional array. Supports broadcasting but does not prepend 1's to the shapes.""" 349 | # we shouldn't need a special case for n == 0, but NumPy complains about indexing into a zero-dimensional 350 | # array a using a[(Ellipsis,)]. 351 | if n == 0: 352 | return fn(*arrs) 353 | 354 | full_shape = tuple(np.array([a.shape[:n] for a in arrs]).max(0)) 355 | result = None 356 | for full_idx in itertools.product(*map(range, full_shape)): 357 | inputs = [a[broadcast(full_idx, a.shape[:n]) + (Ellipsis,)] for a in arrs] 358 | curr = fn(*inputs) 359 | 360 | if result is None: 361 | if type(curr) == tuple: 362 | result = tuple(np.zeros(full_shape + np.asarray(c).shape) for c in curr) 363 | else: 364 | result = np.zeros(full_shape + np.asarray(curr).shape) 365 | 366 | if type(curr) == tuple: 367 | for i, c in enumerate(curr): 368 | result[i][full_idx + (Ellipsis,)] = c 369 | else: 370 | result[full_idx + (Ellipsis,)] = curr 371 | return result 372 | 373 | def extend_slice(slc, n): 374 | if not isinstance(slc, tuple): 375 | slc = (slc,) 376 | if any([isinstance(s, np.ndarray) for s in slc]): 377 | raise NotImplementedError('Advanced slicing not implemented yet') 378 | return slc + (slice(None),) * n 379 | 380 | def process_slice(slc, shape, n): 381 | """Takes a slice and returns the appropriate slice into an array that's being broadcast (i.e. by 382 | converting the appropriate entries to 0's and :'s.""" 383 | if not isinstance(slc, tuple): 384 | slc = (slc,) 385 | slc = list(slc) 386 | ndim = len(shape) - n 387 | assert ndim >= 0 388 | shape_idx = 0 389 | for slice_idx, s in enumerate(slc): 390 | if s == nax: 391 | continue 392 | if shape[shape_idx] == 1: 393 | if type(s) == int: 394 | slc[slice_idx] = 0 395 | else: 396 | slc[slice_idx] = slice(None) 397 | shape_idx += 1 398 | if shape_idx != ndim: 399 | raise IndexError('Must have %d terms in the slice object' % ndim) 400 | return extend_slice(tuple(slc), n) 401 | 402 | def my_sum(a, axis, count): 403 | """For an array a which might be broadcast, return the value of a.sum() were a to be expanded out in full.""" 404 | if a.shape[axis] == count: 405 | return a.sum(axis) 406 | elif a.shape[axis] == 1: 407 | return count * a.sum(axis) 408 | else: 409 | raise IndexError('Cannot be broadcast: a.shape=%s, axis=%d, count=%d' % (a.shape, axis, count)) 410 | 411 | 412 | 413 | def match_shapes(arrs): 414 | """Prepend 1's to the shapes so that the dimensions line up.""" 415 | #temp = [(name, np.asarray(a), deg) for name, a, deg in arrs] 416 | #ndim = max([a.ndim - deg for _, a, deg in arrs]) 417 | 418 | temp = [a for name, a, deg in arrs] 419 | for i in range(len(temp)): 420 | if np.isscalar(temp[i]): 421 | temp[i] = np.array(temp[i]) 422 | ndim = max([a.ndim - deg for a, (_, _, deg) in zip(temp, arrs)]) 423 | 424 | prep_arrs = [] 425 | for name, a, deg in arrs: 426 | if np.isscalar(a): 427 | a = np.asarray(a) 428 | if a.ndim < deg: 429 | raise RuntimeError('%s.ndim must be at least %d' % (name, deg)) 430 | if a.ndim < ndim + deg: 431 | #a = a.reshape((1,) * (ndim + deg - a.ndim) + a.shape) 432 | slc = (nax,) * (ndim + deg - a.ndim) + (Ellipsis,) 433 | a = a[slc] 434 | prep_arrs.append(a) 435 | 436 | return prep_arrs 437 | 438 | def lstsq(A, b): 439 | # do this rather than call lstsq to support efficient broadcasting 440 | P = array_map(np.linalg.pinv, [A], A.ndim - 2) 441 | return array_map(np.dot, [P, b], A.ndim - 2) 442 | 443 | def dot(A, b): 444 | return array_map(np.dot, [A, b], A.ndim - 2) 445 | 446 | def vdot(x, y): 447 | return (x*y).sum(-1) 448 | 449 | def transp(A): 450 | return A.swapaxes(-2, -1) 451 | 452 | 453 | def get_counts(array, n): 454 | result = np.zeros(n, dtype=int) 455 | ans = np.bincount(array) 456 | result[:ans.size] = ans 457 | return result 458 | 459 | 460 | def log_erfc_helper(x): 461 | p = 0.47047 462 | a1 = 0.3480242 463 | a2 = -0.0958798 464 | a3 = 0.7478556 465 | t = 1. / (1 + p*x) 466 | return np.log(a1 * t + a2 * t**2 + a3 * t**3) - x ** 2 467 | 468 | def log_erfc(x): 469 | return np.where(x > 0., log_erfc_helper(x), np.log(2. - np.exp(log_erfc_helper(-x)))) 470 | 471 | def log_inv_probit(x): 472 | return log_erfc(-x / np.sqrt(2.)) - np.log(2.) 473 | 474 | def inv_probit(x): 475 | return 0.5 * scipy.special.erfc(-x / np.sqrt(2.)) 476 | 477 | def log_erfcinv(log_y): 478 | a = 0.140012 479 | log_term = log_y + np.log(2 - np.exp(log_y)) 480 | 481 | temp1 = 2 / (np.pi * a) + 0.5 * log_term 482 | temp2 = temp1 ** 2 - log_term / a 483 | temp3 = np.sqrt(temp2) - temp1 484 | return np.sign(1. - np.exp(log_y)) * np.sqrt(temp3) 485 | 486 | def log_probit(log_p): 487 | return -np.sqrt(2) * log_erfcinv(log_p + np.log(2)) 488 | 489 | def probit(p): 490 | return -np.sqrt(2) * scipy.special.erfcinv(2 * p) 491 | 492 | 493 | def check_close(a, b): 494 | if not np.allclose([a], [b]): # array brackets to avoid an error comparing inf and inf 495 | if np.isscalar(a) and np.isscalar(b): 496 | raise RuntimeError('a=%f, b=%f' % (a, b)) 497 | else: 498 | raise RuntimeError('Off by %f' % np.max(np.abs(a - b))) 499 | 500 | COLORS = ['red', 'green', 'yellow', 'blue', 'magenta', 'cyan'] 501 | 502 | def print_integers_colored(a): 503 | print '[', 504 | for ai in a: 505 | color = COLORS[ai % len(COLORS)] 506 | print termcolor.colored(str(ai), color, attrs=['bold']), 507 | print ']' 508 | 509 | def pbar(maxval): 510 | widgets = [progressbar.Percentage(), ' ', progressbar.Bar(), progressbar.ETA()] 511 | return progressbar.ProgressBar(widgets=widgets, maxval=maxval).start() 512 | 513 | 514 | def send_email(header, body, address): 515 | msg = '\r\n'.join(['From: %s' % address, 516 | 'To: %s' % address, 517 | 'Subject: %s' % header, 518 | '', 519 | body]) 520 | 521 | s = smtplib.SMTP('localhost') 522 | s.sendmail(address, [address], msg) 523 | s.quit() 524 | 525 | 526 | -------------------------------------------------------------------------------- /utils/profiler.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import functools 3 | import sys 4 | import time 5 | 6 | ENABLE_PROFILER = False 7 | TOP_ONLY = True 8 | 9 | depth = collections.defaultdict(int) 10 | counts = collections.defaultdict(lambda: collections.defaultdict(int)) 11 | total_time = collections.defaultdict(lambda: collections.defaultdict(float)) 12 | 13 | def reset(category=None): 14 | global counts, total_time 15 | if category is None: 16 | counts = collections.defaultdict(lambda: collections.defaultdict(int)) 17 | total_time = collections.defaultdict(lambda: collections.defaultdict(float)) 18 | else: 19 | counts[category] = collections.defaultdict(int) 20 | total_time[category] = collections.defaultdict(float) 21 | 22 | def get_key(name, args): 23 | k = [] 24 | for arg in args: 25 | if hasattr(arg, 'shape_str'): 26 | k.append((str(arg.__class__), arg.shape_str)) 27 | elif hasattr(arg, 'shape'): 28 | k.append((str(arg.__class__), arg.shape)) 29 | return (name,) + tuple(k) 30 | 31 | 32 | class profiled: 33 | def __init__(self, category): 34 | self.category = category 35 | 36 | def __call__(self, fn): 37 | if not ENABLE_PROFILER: 38 | return fn 39 | 40 | name = fn.__name__ 41 | 42 | @functools.wraps(fn) 43 | def profiled_fn(*args, **kwargs): 44 | global depth 45 | t0 = time.clock() 46 | depth[self.category] += 1 47 | ans = fn(*args, **kwargs) 48 | depth[self.category] -= 1 49 | if depth[self.category] == 0 or not TOP_ONLY: 50 | key = get_key(name, args) 51 | counts[self.category][key] += 1 52 | total_time[self.category][key] += time.clock() - t0 53 | return ans 54 | 55 | return profiled_fn 56 | 57 | 58 | def summarize(category, cutoff=0.5, outstr=sys.stdout): 59 | tt = total_time[category] 60 | c = counts[category] 61 | srtd = sorted(tt.keys(), key=lambda k: tt[k], reverse=True) 62 | for k in srtd: 63 | if tt[k] < cutoff: 64 | continue 65 | print >> outstr, '%1.2f seconds for %d calls' % (tt[k], c[k]) 66 | print >> outstr, k[0] 67 | for tp, sz in k[1:]: 68 | print >> outstr, ' %s %s' % (tp, sz) 69 | print >> outstr 70 | 71 | 72 | 73 | -------------------------------------------------------------------------------- /utils/storage.py: -------------------------------------------------------------------------------- 1 | import cPickle 2 | import os 3 | 4 | def ensure_directory(d, trial=False): 5 | parts = d.split('/') 6 | for i in range(2, len(parts)+1): 7 | fname = '/'.join(parts[:i]) 8 | if not os.path.exists(fname): 9 | print 'Creating', fname 10 | if not trial: 11 | try: 12 | os.mkdir(fname) 13 | except: 14 | pass 15 | 16 | def load(fname): 17 | return cPickle.load(open(fname)) 18 | 19 | def dump(obj, fname): 20 | d, f = os.path.split(fname) 21 | ensure_directory(d) 22 | cPickle.dump(obj, open(fname, 'w'), protocol=2) 23 | 24 | 25 | def exists(fname): 26 | return os.path.exists(fname) 27 | 28 | def mkdir(dirname): 29 | os.mkdir(dirname) 30 | 31 | def join(*args): 32 | return os.path.join(*args) 33 | 34 | --------------------------------------------------------------------------------