├── .gitignore
├── README.md
├── algorithms
    ├── __init__.py
    ├── ais_gsm.py
    ├── chains.py
    ├── crp.py
    ├── dumb_samplers.py
    ├── ibp.py
    ├── ibp_split_merge.py
    ├── low_rank.py
    ├── low_rank_poisson.py
    ├── slice_sampling.py
    ├── sparse_coding.py
    └── variational.py
├── config_example.py
├── example.py
├── example_data
    ├── animals-data.txt
    ├── animals-features.txt
    └── animals-names.txt
├── experiments.py
├── grammar.py
├── initialization.py
├── models.py
├── observations.py
├── parallel.py
├── parsing.py
├── predictive_distributions.py
├── presentation.py
├── recursive.py
├── scoring.py
├── single_process.py
├── synthetic_experiments.py
└── utils
    ├── __init__.py
    ├── distributions.py
    ├── gaussians.py
    ├── misc.py
    ├── profiler.py
    ├── psd_matrices.py
    └── storage.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | *~
3 | config.py
4 | parsetab.py
5 | parser.out
6 | debugging
7 | sandbox
8 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | This software package implements the algorithms described in the paper
  2 | 
  3 | > Roger B. Grosse, Ruslan Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum,
  4 | > "Exploiting compositionality to explore a large space of model structures," UAI 2012.
  5 | 
  6 | In particular, it takes an input matrix, runs the structure search, and outputs a report
  7 | summarizing the choices made at each step. There is also a script which runs the synthetic
  8 | data experiments from the paper. 
  9 | 
 10 | ## Caveats
 11 | 
 12 | This is a research prototype, and I've made some simplifying assumptions which may or may
 13 | not match your situation. In particular,
 14 | 
 15 | - Matrices are assumed to be real-valued, and it handles binary matrices by treating the
 16 |   values as real and adding a small amount of noise to prevent degenerate solutions. (As
 17 |   a sanity check, I've also experimented with samplers which handle binary inputs directly,
 18 |   in order to check that the results were consistent with the real-valued version. However,
 19 |   I didn't get the algorithms working robustly enough to include in the experiments
 20 |   or the software package.)
 21 | - It handles missing observations by explicitly sampling the missing values. 
 22 |   This seems to work well for matrices with small numbers of missing entries, but might
 23 |   have poor mixing on sparse input matrices.
 24 | - I haven't run the software on matrices larger than 1000 x 1000. There's no conceptual reason the
 25 |   algorithms can't scale beyond this, but there may be implementational reasons.
 26 | 
 27 | I am working on a newer version of the software package which shouldn't have these
 28 | limitations.
 29 | 
 30 | 
 31 | ## Requirements
 32 | 
 33 | This code base depends on a number of Python packages, most of which are pretty standard.
 34 | Most of the packages are available through [Enthought Canopy](https://www.enthought.com/products/canopy/),
 35 | which all academic users (including professors and postdocs) can use for free under their
 36 | [academic license](https://www.enthought.com/products/canopy/academic/). We use the following
 37 | Python packages which are included in Canopy:
 38 | 
 39 | - [NumPy](http://www.numpy.org/) (I used 1.6.1)
 40 | - [Matplotlib](http://matplotlib.org/index.html) (I used 1.2.0)
 41 | - [SciPy](http://www.scipy.org/) (I used 0.12.0)
 42 | - [scikit-learn](http://scikit-learn.org/stable/)  (I used 0.13.1)
 43 | 
 44 | Note: I've been told that [Anaconda Python](https://store.continuum.io/cshop/anaconda/) is an
 45 | alternative distribution which includes these same packages, has a comparable academic license,
 46 | and is easier to get running. I've never tried it myself, though.
 47 | 
 48 | There are two additional requirements, which are both `easy_install`able:
 49 | 
 50 | - [termcolor](https://pypi.python.org/pypi/termcolor)
 51 | - [progressbar](https://code.google.com/p/python-progressbar/)
 52 | 
 53 | More recent versions than the ones listed above should work fine, though unfortunately
 54 | the interfaces to some SciPy routines have a tendency to change without warning...
 55 | 
 56 | Also, if you want to distribute jobs across multiple cores or machines (highly recommended), you
 57 | will need to do one of the following:
 58 | 
 59 | - install [GNU Parallel](www.gnu.org/software/parallel) (see Configuration section for more details)
 60 | - write a scheduler which better matches your own computing resources ([see below](#ownsched))
 61 | 
 62 | 
 63 | ## Configuration
 64 | 
 65 | In order to run the structure search, you need to specify some local configuration parameters
 66 | in `config.py`. First, in the main project directory, copy the template:
 67 | 
 68 |     cp config_example.py config.py
 69 | 
 70 | In `config.py`, you need to specify the following paths:
 71 | 
 72 | - `CODE_PATH`, the directory where you keep the code for this project
 73 | - `CACHE_PATH`, a directory for storing intermediate results (which can take up a fair amount of disk
 74 |   space and are OK to delete when the experiment is done)
 75 | - `RESULTS_PATH`, the directory for storing the machine-readable results of the structure search
 76 | - `REPORT_PATH`, the directory for saving human-readable reports
 77 | 
 78 | You also need to specify `SCHEDULER` to determine how the experiment jobs are to be run. The 
 79 | choices are `'single_process'`, which runs everything in a single process (not practical except
 80 | for the smallest matrices), and `'parallel'`, which uses GNU Parallel to distribute the jobs
 81 | across different machines, or different processes on the same machine. If you use GNU Parallel,
 82 | you also need to specify:
 83 | 
 84 | - `JOBS_PATH`, a directory for saving the status of jobs, if you are using GNU Parallel
 85 | - `DEFAULT_NUM_JOBS`, the number of jobs to run on each machine
 86 | 
 87 | Note that using our GNU Parallel wrapper requires the ability to `ssh` into the machines without
 88 | entering a password. We realize this might not correspond to your situation, so [see below](#ownsched)
 89 | for how you can write your own job scheduler module geared towards the clusters at your own institution.
 90 | 
 91 | 
 92 | ## Running the example
 93 | 
 94 | We provide an example of how to run the structure search in `example.py`. This runs the
 95 | structure search on the mammals dataset of Kemp et al. (2006), "Learning systems of concepts
 96 | with an infinite relational model." This is a 50 x 85 matrix where the rows represent
 97 | different species of mammal, the columns represent attributes, and each entry is a binary
 98 | value representing subjects' judgments of whether the animal has that attribute. Our structure
 99 | search did not result in a clear structure for this dataset, but it serves as an example which
100 | can be run quickly (2 CPU minutes for me). 
101 | 
102 | After following the configuration directions above, run the following from the command line:
103 | 
104 |     python example.py
105 |     python experiments.py everything example
106 | 
107 | This will run the structure search, and then output the results to the shell (and also save
108 | them to the `example` subdirectory of `config.REPORT_PATH`). The results include the following:
109 | 
110 | - the best-performing structure at each level of the search, with their improvement in
111 |   predictive log-likelihood for rows and columns, as well as z-scores for the improvement
112 | - the total CPU time, also broken down by model
113 | - the predictive log-liklihood scores for all structures at all levels of the search, sorted 
114 |   from best to worst
115 | 
116 | Note that the search parameters used in this example are probably
117 | insufficient for inference; if you are interested in accurate results for this dataset,
118 | change `QuickParams` to `SmallParams` in `example.py`.
119 | 
120 | 
121 | 
122 | ## Running the structure search
123 | 
124 | Suppose you have a real-valued matrix `X` you're interested in learning the structure of,
125 | in the form of a NumPy array. The first step is to create a `DataMatrix` instance:
126 | 
127 |     from observations import DataMatrix
128 |     data_matrix = DataMatrix.from_real_values(X)
129 | 
130 | This constructor also takes some optional arguments: 
131 | 
132 | - `mask`, which is a binary array determining which entries of `X` are observed. (By default,
133 |   all entries are assumed to be observed.)
134 | - `row_label` and `col_label`, which are Python lists giving the label of each row or column.
135 |   These are used for printing the learned clusters and binary components.
136 | 
137 | The code doesn't do any preprocessing of the data, so it's recommended that you standardize
138 | it to have zero mean and unit variance.
139 | 
140 | Next, you want to initialize an experiment for this matrix. You do this by passing in the
141 | `DataMatrix` instance, along with a parameters object. `experiments.SmallParams` gives a
142 | reasonable set of defaults for small matrices (e.g. 200 x 200), and `experiments.LargeParams`
143 | gives a reasonable set of defaults for larger matrices (e.g. 1000 x 1000). This creates a
144 | subdirectories of `config.RESULTS_PATH` and `config.REPORT_PATH` where all the computations
145 | and results will be stored. For example,
146 | 
147 |     from experiments import init_experiment, LargeParams
148 |     init_experiment('experiment_name', data_matrix, LargeParams())
149 | 
150 | You can also override the default parameters by passing keyword arguments to the parameters
151 | constructor. See `experiments.DefaultParams` for more details. Finally, from the command line,
152 | run the whole structure search using the following:
153 | 
154 |     python experiments.py everything experiment_name
155 |     
156 | You can also specify some optional keyword arguments:
157 | 
158 | - `--machines`, the list of machines to distribute the jobs to if you are using GNU Parallel.
159 |   This should be a comma-separated list with no spaces. By default, it runs jobs only on the same machine.
160 | - `--njobs`, the number of jobs to run on each machine if you are using GNU Parallel. (This
161 |   overrides the default value in `config.DEFAULT_NUM_JOBS`.)
162 | - `--email`, your e-mail address, if you want it to e-mail you the report when it finishes.
163 | 
164 | For example,
165 | 
166 |     python experiments.py everything experiment_name --machines machine1,machine2,machine3 --njobs 2 --email me@example.com
167 | 
168 | If all goes well, a report will be saved to `experiment_name/results.txt` under `config.REPORT_PATH`.
169 | 
170 | 
171 | ## <a name="ownsched" /> Using your own scheduler
172 | 
173 | As mentioned above, the experiment script assumes you have GNU Parallel installed, and that you're 
174 | able to SSH into machines without entering a password. This might not match your situation; for instance,
175 | your institution might use a queueing system to distribute jobs. I've tried to make it simple to adapt
176 | the experiment scripts to your own cluster setup. In particular, you need to do the following:
177 | 
178 | 1. Write a Python function which takes a list of jobs and distributes them on your cluster. In particular,
179 |    it should take two arguments:
180 |     * `script`, the name of the Python file to execute
181 |     * `jobs`, a list of jobs, where each one is a list of strings, each one corresponding to one
182 |       command line argument.
183 |    
184 |    See `single_process.run` for an example. Note that some of the arguments may contain the single quote
185 |    character, so you will have to escape them.
186 | 2. Add another case to `experiments.run_jobs` which calls your scheduler, and change `config.SCHEDULER`
187 |    to the appropriate value.
188 | 3. If your scheduler should take any additional command line arguments, you can specify them in
189 |    `experiments.add_scheduler_args`.
190 | 
191 | The above directions assume that all of the machines have access to a common filesystem (e.g. AFS, NFS).
192 | If this isn't the case (for instance, if you are running on Amazon EC2), you'll also need to modify
193 | the functions in `storage.py` to read and write from whatever storage system is shared between the
194 | machines.
195 | 
196 | 
197 | ## Organization of the code
198 | 
199 | The main code directly contains the following files which handle the logic of the experiments,
200 | and are described above:
201 | 
202 | - `experiments.py`, as mentioned above, which manages the structure search for a single input matrix
203 | - `synthetic_experiments.py`, which runs the synthetic data experiments from the paper, i.e. by
204 |   generating a lot of synthetic matrices and running the structure search on each
205 | - `presentation.py`, which formats the results into tables
206 | - `parallel.py` and `single_process.py`, utilities for running jobs
207 | 
208 | The following files define the main data structures used in the structure search:
209 | 
210 | - `grammar.py`, which defines the conext-free grammar
211 | - `parsing.py`, which parses string representations of the models into expression trees
212 | - `observations.py`, which defines the `DataMatrix` and `Observations` classes used to represent
213 |   the input matrices
214 | - `recursive.py`, which defines the `Node` classes which store the actual decompositions
215 | - `models.py`, which defines model classes which parallel the structure of the `Node` classes, but
216 |   define properties of the model itself (such as whether variance parameters for a matrix are 
217 |   associated with rows or columns)
218 | 
219 | The following handle the posterior inference over decompositions:
220 | 
221 | - `initialization.py`, which does the most interesting algorithmic work, namely initializing
222 |   the more complex structures using algorithms particular to each production rule.
223 | - `algorithms/dumb_samplers.py`, which contains simple MCMC operators which are run after the
224 |   recursive initialization procedure
225 | - the `algorithms` subdirectory contains inference algorithms corresponding to particular production
226 |   rules: in particular, `chains.py`, `crp.py`, `ibp.py`, `low_rank_poisson.py`, and `sparse_coding.py`.
227 | 
228 | Finally, the following files handle the predictive likelihood scoring:
229 | 
230 | - `scoring.py`, the main procedures for predictive likelihood scoring
231 | - `predictive_distributions.py`, which converts the predictive distributions into a sum of terms
232 |   as in Section 5 of the paper
233 | - `algorithms/variational.py`, which implements the variational lower bound of Section 5
234 | - `algorithms/ais_gsm.py`, which performs the additional AIS step needed for evaluating the GSM models.
235 | 


--------------------------------------------------------------------------------
/algorithms/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rgrosse/compositional_structure_search/b93f9f8d3a714213002c09403c1766b57835025e/algorithms/__init__.py


--------------------------------------------------------------------------------
/algorithms/ais_gsm.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | nax = np.newaxis
  3 | import pylab
  4 | 
  5 | import predictive_distributions
  6 | from utils import distributions
  7 | import variational
  8 | 
  9 | SIGMOID_SCHEDULE = True
 10 | 
 11 | def p_s_given_z(S, Z, t, sigma_sq_approx):
 12 |     return (1. - t) * distributions.gauss_loglik(S, 0., sigma_sq_approx[nax, :]) + \
 13 |            t * distributions.gauss_loglik(S, 0., np.exp(Z))
 14 | 
 15 | def log_odds_to_prob(log_odds):
 16 |     prob = np.exp(log_odds - np.logaddexp.reduce(log_odds, axis=1)[:, nax])
 17 |     prob /= prob.sum(1)[:, nax]   # redundant in principle, but numerical error makes np.random.multinomial unhappy
 18 |     return prob
 19 | 
 20 | def get_schedule(num_steps, first_odds):
 21 |     tau = np.linspace(-first_odds, first_odds, num_steps)
 22 |     temp = 1. / (1. + np.exp(-tau))
 23 |     return (temp - temp[0]) / (temp[-1] - temp[0])
 24 | 
 25 |     
 26 | 
 27 | class MultinomialSampler:
 28 |     def __init__(self, pi, A, Sigma):
 29 |         self.pi = pi
 30 |         self.A = A
 31 |         self.Sigma = Sigma
 32 |         self.nlat, nvis = A.shape
 33 |         if Sigma.ndim == 2:
 34 |             self.Lambda = np.linalg.inv(self.Sigma)[nax, :, :]
 35 |         else:
 36 |             self.Lambda = np.array([np.linalg.inv(self.Sigma[i, :, :])
 37 |                                     for i in range(self.Sigma.shape[0])])
 38 | 
 39 |     def random_initialization(self, variational_reps):
 40 |         for vr in variational_reps:
 41 |             assert isinstance(vr, variational.MultinomialRepresentation)
 42 |         return np.array([rep.sample() for rep in variational_reps])
 43 | 
 44 |     def step(self, targets, t, U):
 45 |         N = targets.shape[0]
 46 |         diff = self.A[nax, :, :] - targets[:, nax, :]
 47 |         obs_term = -0.5 * np.sum(np.sum(diff[:, :, :, nax] * diff[:, :, nax, :] *
 48 |                                         self.Lambda[:, nax, :, :], axis=3), axis=2)
 49 |         prob = log_odds_to_prob(obs_term + np.log(self.pi)[nax, :])
 50 |         return np.array([np.random.multinomial(1, prob[i, :])
 51 |                          for i in range(N)])
 52 | 
 53 |     def p_star(self, t, U):
 54 |         # constant with respect to time, so it doesn't affect AIS output
 55 |         return 0
 56 | 
 57 |     def contribution(self, U):
 58 |         return np.dot(U, self.A)
 59 | 
 60 | class InnerMultinomialSampler:
 61 |     def __init__(self, pi, A, sigma_sq_approx):
 62 |         self.pi = pi
 63 |         self.A = A
 64 |         self.sigma_sq_approx = sigma_sq_approx
 65 | 
 66 |     def random_initialization(self, N):
 67 |         return np.random.multinomial(1, self.pi, size=N)
 68 | 
 69 |     def step(self, Z0, S, t, U):
 70 |         N, nspars, nclusters = S.shape[0], S.shape[1], self.pi.size
 71 |         ev = np.zeros((N, nclusters))
 72 |         for k in range(nclusters):
 73 |             Z = Z0 + self.A[k, :][nax, :]
 74 |             ev[:, k] = p_s_given_z(S, Z, t, self.sigma_sq_approx).sum(1)
 75 |         prob = log_odds_to_prob(ev + np.log(self.pi)[nax, :])
 76 |         #return np.random.multinomial(1, prob)
 77 |         return np.array([np.random.multinomial(1, prob[i, :])
 78 |                          for i in range(N)])
 79 | 
 80 |     def contribution(self, Z):
 81 |         return np.dot(Z, self.A)
 82 | 
 83 | class BernoulliSampler:
 84 |     def __init__(self, pi, A, Sigma):
 85 |         self.pi = pi
 86 |         self.A = A
 87 |         self.Sigma = Sigma
 88 |         self.nlat, nvis = A.shape
 89 |         if Sigma.ndim == 2:
 90 |             self.Lambda = np.linalg.inv(self.Sigma)[nax, :, :]
 91 |         else:
 92 |             self.Lambda = np.array([np.linalg.inv(self.Sigma[i, :, :])
 93 |                                     for i in range(self.Sigma.shape[0])])
 94 | 
 95 |     def random_initialization(self, variational_reps):
 96 |         for vr in variational_reps:
 97 |             assert isinstance(vr, variational.BernoulliRepresentation)
 98 |         return np.array([rep.sample() for rep in variational_reps])
 99 | 
100 |     def step(self, targets, t, U):
101 |         U = U.copy()
102 |         N, K = U.shape
103 |         for i in range(N):
104 |             x = np.dot(U[i, :], self.A)
105 |             curr_targets = targets[i, :]
106 |             if self.Lambda.ndim == 2:
107 |                 Lambda = self.Lambda
108 |             else:
109 |                 Lambda = self.Lambda[i, :, :]
110 |             
111 |             for k in range(K):
112 |                 if U[i, k]:
113 |                     x -= self.A[k, :]
114 |                 off_score = -0.5 * np.dot(x - curr_targets, np.dot(Lambda, x - curr_targets))
115 |                 x_on = x + self.A[k, :]
116 |                 on_score = -0.5 * np.dot(x_on - curr_targets, np.dot(Lambda, x_on - curr_targets))
117 | 
118 |                 log_odds = np.log(self.pi[k]) + on_score - off_score
119 |                 prob = 1. / (1. + np.exp(-log_odds))
120 |                 U[i, k] = np.random.binomial(1, prob)
121 |                 x += U[i, k] * self.A[k, :]
122 | 
123 |         return U
124 | 
125 |     def p_star(self, t, u):
126 |         # constant with respect to time, so it doesn't affect AIS output
127 |         return 0
128 | 
129 |     def contribution(self, Z):
130 |         return np.dot(Z, self.A)
131 | 
132 | class InnerBernoulliSampler:
133 |     def __init__(self, pi, A, sigma_sq_approx):
134 |         self.pi = pi
135 |         self.A = A
136 |         self.sigma_sq_approx = sigma_sq_approx
137 | 
138 |     def step(self, Z0, S, t, U):
139 |         U = U.copy()
140 |         ndata, nspars, nfea = S.shape[0], S.shape[1], U.shape[1]
141 |         for i in range(ndata):
142 |             z = Z0[i, :] + np.dot(U[i, :], self.A)
143 | 
144 |             for k in range(nfea):
145 |                 if U[i, k]:
146 |                     z -= self.A[k, :]
147 |                 off_score = p_s_given_z(S[i, :], z, t, self.sigma_sq_approx).sum()
148 |                 z_on = z + self.A[k, :]
149 |                 on_score = p_s_given_z(S[i, :], z_on, t, self.sigma_sq_approx).sum()
150 | 
151 |                 log_odds = np.log(self.pi[k]) + on_score - off_score
152 |                 prob = 1. / (1. + np.exp(-log_odds))
153 |                 U[i, k] = np.random.binomial(1, prob)
154 |                 z += U[i, k] * self.A[k, :]
155 | 
156 |         return U
157 | 
158 |     def contribution(self, U):
159 |         return np.dot(U, self.A)
160 | 
161 |     def random_initialization(self, N):
162 |         return np.random.binomial(1, self.pi[nax, :], size=(N, self.pi.size))
163 | 
164 | def mh_multivariate_gaussian(U, f, mu, Sigma, epsilon):
165 |     N, K = U.shape
166 |     perturbation = np.array([np.random.multivariate_normal(np.zeros(K), Sigma)
167 |                              for i in range(N)])
168 |     proposal = mu[nax, :] + \
169 |                np.sqrt(1. - epsilon ** 2) * (U - mu[nax, :]) + \
170 |                epsilon * perturbation
171 |     L0 = f(U)
172 |     L1 = f(proposal)
173 |     accept = np.random.binomial(1, np.where(L1 > L0, 1., np.exp(L1 - L0)))
174 |     if np.isscalar(accept): # np.random.binomial converts length 1 arrays to scalars
175 |         accept = np.array([accept])
176 |     return np.where(accept[:, nax], proposal, U)
177 | 
178 | 
179 | class InnerGaussianSampler:
180 |     def __init__(self, mu, Sigma, sigma_sq_approx):
181 |         self.mu = mu
182 |         self.Sigma = Sigma
183 |         self.sigma_sq_approx = sigma_sq_approx
184 | 
185 |     def step(self, Z0, S, t, U):
186 |         EPSILON = 0.5
187 |         N, K = S.shape
188 |         U = U.copy()
189 | 
190 |         def f(U):
191 |             return p_s_given_z(S, Z0 + U, t, self.sigma_sq_approx).sum(1)
192 | 
193 |         return mh_multivariate_gaussian(U, f, self.mu, self.Sigma, EPSILON)
194 | 
195 |         
196 |     def contribution(self, U):
197 |         return U
198 | 
199 |     def random_initialization(self, N):
200 |         return np.array([np.random.multivariate_normal(self.mu, self.Sigma)
201 |                          for i in range(N)])
202 | 
203 | 
204 | class GSMRepresentation:
205 |     def __init__(self, S, U_all):
206 |         self.S = S
207 |         self.U_all = U_all
208 | 
209 |     def copy(self):
210 |         return GSMRepresentation(self.S.copy(), self.U_all[:])
211 | 
212 | class GSMSampler:
213 |     def __init__(self, scale_samplers, sigma_sq_approx, evidence_Sigma, A):
214 |         self.scale_samplers = scale_samplers
215 |         self.sigma_sq_approx = sigma_sq_approx
216 |         self.evidence_Sigma = evidence_Sigma
217 |         if evidence_Sigma.ndim == 2:
218 |             self.evidence_Lambda = np.linalg.inv(evidence_Sigma)
219 |         else:
220 |             self.evidence_Lambda = np.array([np.linalg.inv(evidence_Sigma[i, :, :])
221 |                                              for i in range(evidence_Sigma.shape[0])])
222 |         self.A = A
223 | 
224 |     def random_initialization(self, S):
225 |         N = S.shape[0]
226 |         U_all = [sampler.random_initialization(N) for sampler in self.scale_samplers]
227 |         return GSMRepresentation(S.copy(), U_all)
228 | 
229 |     def step(self, targets, t, rep):
230 |         N, D = targets.shape
231 |         K = rep.S.shape[1]
232 |         rep = rep.copy()
233 | 
234 |         
235 |         # sample S
236 |         if self.evidence_Lambda.ndim == 2:
237 |             Lambda_ev = np.dot(self.A, np.dot(self.evidence_Lambda, self.A.T))
238 |         else:
239 |             Lambda_ev = np.array([np.dot(self.A, np.dot(self.evidence_Lambda[i, :, :], self.A.T))
240 |                                   for i in range(N)])
241 |         h_ev = np.dot(self.A, np.dot(self.evidence_Lambda, targets.T)).T
242 |         
243 |         Z = np.zeros((N, K))
244 |         for comp, samp in zip(rep.U_all, self.scale_samplers):
245 |             Z += samp.contribution(comp)
246 |         #sigma_sq_pri = np.exp((1. - t) * np.log(self.sigma_sq_approx)[nax, :] +
247 |         #                      t * Z)
248 |         lam_pri = (1. - t) / self.sigma_sq_approx[nax, :] + \
249 |                   t * np.exp(-Z)
250 | 
251 | 
252 | 
253 |         rep.S = np.zeros((N, K))
254 |         for i in range(N):
255 |             Lambda_pri = np.diag(lam_pri[i, :])
256 |             if Lambda_ev.ndim == 2:
257 |                 Lambda = Lambda_pri + Lambda_ev
258 |             else:
259 |                 Lambda = Lambda_pri + Lambda_ev[i, :, :]
260 |             Sigma = np.linalg.inv(Lambda)
261 |             mu = np.dot(Sigma, h_ev[i, :])
262 |             rep.S[i, :] = np.random.multivariate_normal(mu, Sigma)
263 | 
264 |         # sample components of Z
265 |         rep = rep.copy()
266 |         for c in range(len(self.scale_samplers)):
267 |             Z0 = np.zeros(Z.shape)
268 |             for d in range(len(self.scale_samplers)):
269 |                 if d != c:
270 |                     Z0 += self.scale_samplers[d].contribution(rep.U_all[d])
271 |             
272 |             rep.U_all[c] = self.scale_samplers[c].step(Z0, rep.S, t, rep.U_all[c])
273 | 
274 |         # temporary
275 |         #assert not stop
276 | 
277 |         return rep
278 | 
279 |     def p_star(self, t, rep):
280 |         N, K = rep.S.shape
281 |         Z = np.zeros((N, K))
282 |         for comp, samp in zip(rep.U_all, self.scale_samplers):
283 |             Z += samp.contribution(comp)
284 |         return p_s_given_z(rep.S, Z, t, self.sigma_sq_approx).sum(1)
285 | 
286 |     def contribution(self, rep):
287 |         return np.dot(rep.S, self.A)
288 | 
289 | # temporary
290 | stop = False
291 | 
292 | class AISModel:
293 |     def __init__(self, samplers, X, Sigma, init_partition_function):
294 |         self.samplers = samplers
295 |         self.X = X
296 |         self.Sigma = Sigma
297 |         self.init_partition_function = init_partition_function
298 | 
299 |     def step(self, reps, t):
300 |         reps = reps[:]
301 |         for i in range(len(self.samplers)):
302 |             targets = self.X.copy()
303 |             for j in range(len(self.samplers)):
304 |                 if j != i:
305 |                     targets -= self.samplers[j].contribution(reps[j])
306 |             reps[i] = self.samplers[i].step(targets, t, reps[i])
307 |         return reps
308 | 
309 |     def init_sample(self, variational_reps):
310 |         N, D = self.X.shape
311 |         is_gsm = np.array([isinstance(s, GSMSampler) for s in self.samplers])
312 |         gsm_idxs = np.where(is_gsm)[0]
313 |         non_gsm_idxs = np.where(-is_gsm)[0]
314 | 
315 |         if len(gsm_idxs) == 0:
316 |             raise RuntimeError('No GSM components; problem with module reloading?')
317 |         if len(gsm_idxs) > 1:
318 |             raise RuntimeError('Cannot handle multiple GSM components yet')
319 |         gsm_sampler = self.samplers[gsm_idxs[0]]
320 |             
321 |         reps = [None for i in range(len(self.samplers))]
322 |         discrete_part = np.zeros((N, D))
323 |         for vr_idx, sampler_idx in enumerate(non_gsm_idxs):
324 |             curr_reps = [vr[vr_idx] for vr in variational_reps]
325 |             reps[sampler_idx] = self.samplers[sampler_idx].random_initialization(curr_reps)
326 |             discrete_part += self.samplers[sampler_idx].contribution(reps[sampler_idx])
327 | 
328 |         # S = coefficients
329 |         # G = GSM part = SA
330 |         # E = Gaussian part
331 |         # C = continuous part = S + E
332 |         A = gsm_sampler.A
333 |         C = self.X - discrete_part
334 |         Sigma_S = np.diag(gsm_sampler.sigma_sq_approx)
335 |         Sigma_E = self.Sigma
336 |         Sigma_C = np.dot(A.T, np.dot(Sigma_S, A)) + Sigma_E
337 |         Sigma_C_inv = np.linalg.inv(Sigma_C)
338 |         temp = np.dot(Sigma_S, A)
339 |         mu_S_given_C = np.dot(temp, np.dot(Sigma_C_inv, C.T)).T
340 |         Sigma_S_given_C = np.dot(temp, np.dot(Sigma_C_inv, temp.T))
341 |         S = np.array([np.random.multivariate_normal(mu_S_given_C[i, :], Sigma_S_given_C)
342 |                       for i in range(N)])
343 | 
344 |         reps[gsm_idxs[0]] = gsm_sampler.random_initialization(S)
345 | 
346 |         return reps
347 | 
348 |     def p_star(self, reps, t):
349 |         total = 0.
350 |         for sampler, rep in zip(self.samplers, reps):
351 |             total += sampler.p_star(t, rep)
352 |         return total
353 | 
354 | def ais(ais_model, t_schedule, variational_representations):
355 |     init_partition_function = ais_model.init_partition_function
356 |     total = init_partition_function.copy()
357 |     reps = ais_model.init_sample(variational_representations)
358 | 
359 |     all_deltas = []
360 | 
361 |     count = 0
362 |     for t0, t1 in zip(t_schedule[:-1], t_schedule[1:]):
363 |         if count == 1:
364 |             for i in range(100):
365 |                 reps = ais_model.step(reps, t0)
366 |         else:
367 |             reps = ais_model.step(reps, t0)
368 |         delta = ais_model.p_star(reps, t1) - ais_model.p_star(reps, t0)
369 | 
370 |         # temporary
371 |         if count > 0:
372 |             total += delta
373 |         
374 |         all_deltas.append(delta)
375 | 
376 |         # temporary
377 |         global stop
378 |         if delta > 1.:
379 |             stop = True
380 | 
381 |         count += 1
382 | 
383 |     return total
384 | 
385 | # __init__(self, scale_samplers, sigma_sq_approx, evidence_Sigma):
386 | def compute_likelihood(X, components, Sigma, variational_representations, init_partition_function,
387 |                        t_schedule=None, num_steps=1000):
388 | 
389 |     samplers = []
390 |     for comp in components:
391 |         if isinstance(comp, predictive_distributions.MultinomialPredictiveDistribution):
392 |             sampler = MultinomialSampler(comp.pi, comp.centers, Sigma)
393 |             
394 |         elif isinstance(comp, predictive_distributions.BernoulliPredictiveDistribution):
395 |             sampler = BernoulliSampler(comp.pi, comp.A, Sigma)
396 |             
397 |         elif isinstance(comp, predictive_distributions.GSMPredictiveDistribution):
398 |             inner_samplers = []
399 |             for sc in comp.scale_components:
400 |                 if isinstance(sc, predictive_distributions.MultinomialPredictiveDistribution):
401 |                     inner_sampler = InnerMultinomialSampler(sc.pi, sc.centers, comp.sigma_sq_approx)
402 |                     
403 |                 elif isinstance(sc, predictive_distributions.BernoulliPredictiveDistribution):
404 |                     inner_sampler = InnerBernoulliSampler(sc.pi, sc.A, comp.sigma_sq_approx)
405 |                     
406 |                 else:
407 |                     raise RuntimeError("Can't convert to inner sampler: %s" % sc.__class__)
408 |                 
409 |                 inner_samplers.append(inner_sampler)
410 | 
411 |             igs = InnerGaussianSampler(comp.scale_mu, comp.scale_Sigma, comp.sigma_sq_approx)
412 |             inner_samplers.append(igs)
413 | 
414 |             sampler = GSMSampler(inner_samplers, comp.sigma_sq_approx, Sigma,
415 |                                  comp.A)
416 | 
417 |         samplers.append(sampler)
418 | 
419 |     ais_model = AISModel(samplers, X, Sigma, init_partition_function)
420 |             
421 |     if t_schedule is None:
422 |         if SIGMOID_SCHEDULE:
423 |             t_schedule = get_schedule(num_steps, 10.)
424 |         else:
425 |             t_schedule = np.linspace(0., 1., num_steps)
426 | 
427 |     return ais(ais_model, t_schedule, variational_representations)
428 | 
429 | 
430 | 
431 | 


--------------------------------------------------------------------------------
/algorithms/chains.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | nax = np.newaxis
  3 | import scipy.linalg
  4 | import time
  5 | 
  6 | from utils import misc
  7 | 
  8 | 
  9 | def integration_matrix(m):
 10 |     return (np.arange(m)[:,nax] >= np.arange(m)[nax,:]).astype(float)
 11 | 
 12 | def sample_single_chain(t, lambda_D, lambda_N):
 13 |     m = t.size
 14 |     diagonal = lambda_N.copy()
 15 |     diagonal[1:] += lambda_D
 16 |     diagonal[:-1] += lambda_D
 17 |     off_diag = -lambda_D
 18 | 
 19 |     a = np.zeros((2, m))
 20 |     a[0, 1:] = off_diag
 21 |     a[1,:] = diagonal
 22 | 
 23 |     x = scipy.linalg.solveh_banded(a, t * lambda_N)
 24 |     c = scipy.linalg.cholesky_banded(a)
 25 |     
 26 |     x = x.ravel()
 27 | 
 28 |     # generate noise
 29 |     lower = np.zeros(c.shape)
 30 |     lower[0,:] = c[1,:]
 31 |     lower[1,:-1] = c[0,1:]
 32 |     u = np.random.normal(size=m)
 33 |     n = scipy.linalg.solve_banded((1, 0), lower, u)
 34 | 
 35 |     assert np.max(np.abs(x + n)) < 1000.
 36 | 
 37 |     return x + n
 38 | 
 39 | def single_chain_marginal(t, lambda_D, lambda_N):
 40 |     m = t.size
 41 |     diagonal = lambda_N.copy()
 42 |     diagonal[1:] += lambda_D
 43 |     diagonal[:-1] += lambda_D
 44 |     off_diag = -lambda_D
 45 | 
 46 |     a = np.zeros((2, m))
 47 |     a[0, 1:] = off_diag
 48 |     a[1,:] = diagonal
 49 | 
 50 |     x = scipy.linalg.solveh_banded(a, t * lambda_N)
 51 |     x = x.ravel()
 52 | 
 53 |     Lambda = np.diag(diagonal) + np.diag(off_diag, -1) + np.diag(off_diag, 1)
 54 |     Sigma = np.linalg.inv(Lambda)
 55 |     return x, np.diag(Sigma)
 56 | 
 57 | 
 58 | def chain_gibbs(X, obs, D, row_ids=None, row_variance=False):
 59 |     m, n = X.shape
 60 |     if row_ids is not None:
 61 |         row_ids = np.array(row_ids)
 62 |         time_steps = (row_ids[1:] - row_ids[:-1]).astype(float)
 63 |     else:
 64 |         time_steps = np.ones(m-1)
 65 |         
 66 |     S = D.cumsum(0)
 67 |     N = X - S
 68 | 
 69 | 
 70 |     if row_variance:
 71 |         # UNDO: is this correct?
 72 |         sigma_sq_D_rows, sigma_sq_D_cols = misc.sample_noise(D[1:,:] / np.sqrt(time_steps[:,nax]))
 73 |         sigma_sq_N_rows, sigma_sq_N_cols = misc.sample_noise(N, obs=obs)
 74 |     else:
 75 |         sigma_sq_D_rows, sigma_sq_N_rows = np.ones(m-1), np.ones(m)
 76 |         sigma_sq_D_cols = misc.sample_col_noise(D[1:,:] / np.sqrt(time_steps[:,nax]))
 77 |         sigma_sq_N_cols = misc.sample_col_noise(N)
 78 | 
 79 |     for j in range(n):
 80 |         sigma_sq_D = sigma_sq_D_rows * sigma_sq_D_cols[j]
 81 |         sigma_sq_N = sigma_sq_N_rows * sigma_sq_N_cols[j]
 82 | 
 83 |         # UNDO
 84 |         sigma_sq_D = sigma_sq_D.clip(1e-4, 100.)
 85 |         sigma_sq_N = sigma_sq_N.clip(1e-4, 100.)
 86 |         
 87 |         S[:,j] = sample_single_chain(X[:,j], 1. / (time_steps * sigma_sq_D), obs[:,j] / sigma_sq_N)
 88 |         N[:,j] = X[:,j] - S[:,j]
 89 | 
 90 |     D = np.zeros(X.shape)
 91 |     D[0,:] = S[0,:]
 92 |     D[1:,:] = S[1:,:] - S[:-1,:]
 93 |     return D
 94 | 
 95 | NUM_ITER = 200
 96 | 
 97 | def sample_variance(values):
 98 |     a = 1. + 0.5 * values.size
 99 |     b = 1. + 0.5 * np.sum(values ** 2)
100 |     prec = np.random.gamma(a, 1. / b)
101 |     prec = np.clip(prec, 1e-4, 1e4)    # avoid numerical issues
102 |     return 1. / prec
103 | 
104 | 
105 | def fit_model(data_matrix, num_iter=NUM_ITER):
106 |     N_orig, N, D = data_matrix.m_orig, data_matrix.m, data_matrix.n
107 |     X = data_matrix.sample_latent_values(np.zeros((N, D)), 1.)
108 |     sigma_sq_D = sigma_sq_N = 1.
109 |     fixed_variance = data_matrix.fixed_variance()
110 | 
111 |     row_ids = data_matrix.row_ids
112 |     X_full = np.zeros((N_orig, D))
113 |     X_full[row_ids, :] = X
114 | 
115 |     states = np.zeros((N_orig, D))
116 |     resid = np.zeros((N, D))
117 |     diff = np.zeros((N_orig-1, D))
118 | 
119 |     pbar = misc.pbar(num_iter)
120 | 
121 |     t0 = time.time()
122 |     for it in range(num_iter):
123 |         lam_N = np.zeros(N_orig)
124 |         lam_N[row_ids] = 1. / sigma_sq_N
125 |         for j in range(D):
126 |             states[:, j] = sample_single_chain(X_full[:, j], 1. / sigma_sq_D, lam_N)
127 |         resid = X - states[row_ids, :]
128 |         diff = states[1:, :] - states[:-1, :]
129 | 
130 |         sigma_sq_D = sample_variance(diff)
131 |         if not fixed_variance:
132 |             sigma_sq_N = sample_variance(resid)
133 | 
134 |         X = data_matrix.sample_latent_values(states[row_ids, :], sigma_sq_N)
135 |         X_full[row_ids, :] = X
136 | 
137 |         if time.time() - t0 > 3600.:   # 1 hour
138 |             break
139 | 
140 |         pbar.update(it)
141 |     pbar.finish()
142 | 
143 |     return states, sigma_sq_D, sigma_sq_N
144 | 
145 | 
146 | def sample_chain(X, obs, row_ids=None):
147 |     m, n = X.shape
148 | 
149 |     # initalize deltas
150 |     X_noise = X + np.random.normal(0., 0.1, size=X.shape)
151 |     D = np.zeros(X_noise.shape)
152 |     D[0,:] = X_noise[0,:]
153 |     D[1:,:] = X_noise[1:,:] - X_noise[:-1,:]
154 | 
155 |     niter = 50 # UNDO
156 |     for it in range(niter):
157 |         D = chain_gibbs(X, obs, D, row_ids)
158 | 
159 |     return D
160 | 


--------------------------------------------------------------------------------
/algorithms/crp.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | nax = np.newaxis
  3 | import sklearn.cluster
  4 | import scipy.special
  5 | import time
  6 | 
  7 | import low_rank
  8 | from utils import distributions, gaussians, psd_matrices, misc
  9 | from_iso = gaussians.Potential.from_moments_iso
 10 | 
 11 | 
 12 | np.seterr(divide='ignore', invalid='ignore')
 13 | 
 14 | MAX_COMPONENTS = 100
 15 | 
 16 | class CRPModel:
 17 |     def __init__(self, alpha, ndim, within_var_prior, between_var_prior, isotropic_w, isotropic_b):
 18 |         self.alpha = alpha
 19 |         self.ndim = ndim
 20 |         self.within_var_prior = within_var_prior
 21 |         self.between_var_prior = between_var_prior
 22 |         self.isotropic_w = isotropic_w
 23 |         self.isotropic_b = isotropic_b
 24 | 
 25 | 
 26 | class CollapsedCRPState:
 27 |     def __init__(self, X, assignments, sigma_sq_w, sigma_sq_b):
 28 |         self.X = X.copy()
 29 |         self.assignments = assignments
 30 |         self.sigma_sq_w = sigma_sq_w
 31 |         self.sigma_sq_b = sigma_sq_b
 32 | 
 33 | class FullCRPState:
 34 |     def __init__(self, X, assignments, centers, sigma_sq_w, sigma_sq_b):
 35 |         self.X = X
 36 |         self.assignments = assignments
 37 |         self.centers = centers
 38 |         self.sigma_sq_w = sigma_sq_w
 39 |         self.sigma_sq_b = sigma_sq_b
 40 | 
 41 |     def copy(self):
 42 |         if np.isscalar(self.sigma_sq_w):
 43 |             sigma_sq_w = self.sigma_sq_w
 44 |         else:
 45 |             sigma_sq_w = self.sigma_sq_w.copy()
 46 |         if np.isscalar(self.sigma_sq_b):
 47 |             sigma_sq_b = self.sigma_sq_b
 48 |         else:
 49 |             sigma_sq_b = self.sigma_sq_b.copy()
 50 |         return FullCRPState(self.X.copy(), self.assignments.copy(), self.centers.copy(), sigma_sq_w, sigma_sq_b)
 51 | 
 52 | 
 53 |     
 54 | 
 55 | 
 56 | class CollapsedCRPCache:
 57 |     def __init__(self, model, X, mask, assignments, counts, obs_counts, sum_X, sum_X_sq):
 58 |         self.model = model
 59 |         self.X = X.copy()
 60 |         self.mask = mask.copy()
 61 |         self.assignments = assignments
 62 |         self.ncomp = assignments.max() + 1
 63 |         self.counts = counts
 64 |         self.obs_counts = obs_counts
 65 |         self.sum_X = sum_X
 66 |         self.sum_X_sq = sum_X_sq
 67 | 
 68 |     def copy(self):
 69 |         return CollapsedCRPCache(self.model, self.X, self.mask, self.assignments.copy(), self.counts.copy(),
 70 |                                  self.obs_counts.copy(), self.sum_X.copy(), self.sum_X_sq.copy())
 71 | 
 72 |     def add(self, i, k, x):
 73 |         assert self.assignments[i] == -1
 74 |         if k == self.ncomp:
 75 |             self.counts = np.concatenate([self.counts, [0]])
 76 |             self.obs_counts = np.vstack([self.obs_counts, np.zeros(self.model.ndim, dtype=int)])
 77 |             self.sum_X = np.vstack([self.sum_X, np.zeros((1, self.model.ndim))])
 78 |             self.sum_X_sq = np.vstack([self.sum_X_sq, np.zeros((1, self.model.ndim))])
 79 |             self.ncomp += 1
 80 |         self.counts[k] += 1
 81 |         self.obs_counts[k, :] += self.mask[i, :]
 82 |         self.sum_X[k, :] += self.mask[i, :] * x
 83 |         self.sum_X_sq[k, :] += self.mask[i, :] * x ** 2
 84 |         self.assignments[i] = k
 85 |         self.X[i, :] = x
 86 | 
 87 |     def remove(self, i):
 88 |         assert self.assignments[i] != -1
 89 |         k = self.assignments[i]
 90 |         self.counts[k] -= 1
 91 |         self.obs_counts[k, :] -= self.mask[i, :]
 92 |         self.sum_X[k, :] -= self.mask[i, :] * self.X[i, :]
 93 |         self.sum_X_sq[k, :] -= self.mask[i, :] * self.X[i, :] ** 2
 94 |         self.assignments[i] = -1
 95 | 
 96 |     def replace(self, i, k):
 97 |         self.remove(i)
 98 |         self.add(i, k, self.X[i, :])
 99 | 
100 |     def squeeze(self, state):
101 |         # renumber the clusters to eliminate empty ones
102 |         for i in range(self.ncomp)[::-1]:
103 |             if self.counts[i] == 0:
104 |                 assert np.all(state.assignments == self.assignments)
105 |                 self.assignments = np.where(self.assignments > i, self.assignments - 1, self.assignments)
106 |                 state.assignments = self.assignments.copy()
107 |                 self.ncomp -= 1
108 |                 self.counts = np.concatenate([self.counts[:i], self.counts[i+1:]])
109 |                 self.obs_counts = np.vstack([self.obs_counts[:i, :], self.obs_counts[i+1:, :]])
110 |                 self.sum_X = np.vstack([self.sum_X[:i, :], self.sum_X[i+1:, :]])
111 |                 self.sum_X_sq = np.vstack([self.sum_X_sq[:i, :], self.sum_X_sq[i+1:, :]])
112 |                 
113 | 
114 |     def check(self, data, state):
115 |         new_cache = CollapsedCRPCache.from_state(self.model, data, state)
116 |         self.check_close(new_cache)
117 |         assert np.all(self.counts > 0)
118 | 
119 |     def check_close(self, other):
120 |         assert np.all(self.counts == other.counts)
121 |         assert np.all(self.obs_counts == other.obs_counts)
122 |         assert np.allclose(self.sum_X, other.sum_X)
123 |         assert np.allclose(self.sum_X_sq, other.sum_X_sq)
124 |         assert np.all(self.assignments == other.assignments)
125 | 
126 |     @staticmethod
127 |     def from_state(model, data, state):
128 |         assignments = state.assignments.copy()
129 |         ncomp = assignments.max() + 1
130 |         counts = misc.get_counts(state.assignments, ncomp)
131 |         obs_counts = np.zeros((ncomp, model.ndim), dtype=int)
132 |         sum_X = np.zeros((ncomp, model.ndim))
133 |         sum_X_sq = np.zeros((ncomp, model.ndim))
134 |         for k in range(ncomp):
135 |             obs_counts[k, :] = data.mask[assignments==k, :].sum(0)
136 |             sum_X[k, :] = (data.mask * state.X)[assignments==k, :].sum(0)
137 |             sum_X_sq[k, :] = (data.mask * state.X**2)[assignments==k, :].sum(0)
138 |         return CollapsedCRPCache(model, state.X, data.mask, assignments, counts, obs_counts, sum_X, sum_X_sq)
139 | 
140 | 
141 | def crp_loglik(assignments, alpha):
142 |     counts = np.bincount(assignments)
143 |     N = counts.sum()
144 |     K = counts.size
145 |     return scipy.special.gammaln(alpha) + \
146 |            -scipy.special.gammaln(alpha + N) + \
147 |            K * np.log(alpha) + \
148 |            scipy.special.gammaln(counts).sum()
149 | 
150 | 
151 | def p_tilde_collapsed(model, data, state):
152 |     cache = CollapsedCRPCache.from_state(model, data, state)
153 |     ncomp = cache.counts.size
154 |     total = 0.
155 |     
156 |     # data evidence, marginalizing out the centers
157 |     ce = center_evidence(model, state, cache)
158 |     for k in range(ncomp):
159 |         if model.isotropic_b:
160 |             prior_term = from_iso(np.zeros(model.ndim), state.sigma_sq_b)
161 |         else:
162 |             prior_term = gaussians.Potential.from_moments_diag(np.zeros(model.ndim), state.sigma_sq_b)
163 |         evidence = ce[k]
164 |         total += (prior_term + evidence).integral()
165 | 
166 |     # hyperparameters
167 |     total += np.sum(model.within_var_prior.loglik(state.sigma_sq_w))
168 |     total += np.sum(model.between_var_prior.loglik(state.sigma_sq_b))
169 | 
170 |     # partition
171 |     total += crp_loglik(state.assignments, model.alpha)
172 | 
173 |     return total
174 | 
175 | def p_tilde(model, data, state):
176 |     total = 0.
177 | 
178 |     # data likelihood
179 |     evidence = p_X_given_centers(model, data, state)
180 |     total += evidence.score(state.centers[state.assignments, :]).sum()
181 | 
182 |     # centers likelihood
183 |     if model.isotropic_b:
184 |         centers_dist = from_iso(np.zeros(model.ndim), state.sigma_sq_b)
185 |     else:
186 |         centers_dist = gaussians.Potential.from_moments_diag(np.zeros(model.ndim), state.sigma_sq_b)
187 |     total += centers_dist[nax].score(state.centers).sum()
188 | 
189 |     # hyperparameters
190 |     total += np.sum(model.within_var_prior.loglik(state.sigma_sq_w))
191 |     total += np.sum(model.between_var_prior.loglik(state.sigma_sq_b))
192 | 
193 |     # partition
194 |     total += crp_loglik(state.assignments, model.alpha)
195 | 
196 |     return total
197 | 
198 | def p_X_given_centers(model, data, state):
199 |     lam = data.mask / state.sigma_sq_w
200 |     h = lam * state.X
201 |     temp = -0.5 * np.log(2*np.pi) + \
202 |            -0.5 * np.log(state.sigma_sq_w) + \
203 |            -0.5 * lam * state.X**2
204 |     Z = (data.mask * temp).sum(1)
205 |     return gaussians.Potential(h, psd_matrices.DiagonalMatrix(lam), Z)
206 |     
207 |     
208 | 
209 | def center_evidence(model, state, cache):
210 |     lam = cache.obs_counts / state.sigma_sq_w
211 |     mu = np.where(cache.obs_counts > 0, cache.sum_X / cache.obs_counts, 0.)
212 |     h = mu * lam
213 |     if model.isotropic_w:
214 |         Z = -0.5 * cache.obs_counts.sum(1) * np.log(2*np.pi) + \
215 |             -0.5 * cache.obs_counts.sum(1) * np.log(state.sigma_sq_w) + \
216 |             -0.5 * cache.sum_X_sq.sum(1) / state.sigma_sq_w
217 |     else:
218 |         Z = -0.5 * cache.obs_counts.sum(1) * np.log(2 * np.pi) + \
219 |             -0.5 * (cache.obs_counts * np.log(state.sigma_sq_w)).sum(1) + \
220 |             -0.5 * (cache.sum_X_sq / state.sigma_sq_w).sum(1)
221 |     return gaussians.Potential(h, psd_matrices.DiagonalMatrix(lam), Z)
222 | 
223 | def center_beliefs(model, state, cache):
224 |     if model.isotropic_b:
225 |         prior_term = from_iso(np.zeros(model.ndim), state.sigma_sq_b)
226 |     else:
227 |         prior_term = gaussians.Potential.from_moments_diag(np.zeros(model.ndim), state.sigma_sq_b)
228 |     return (center_evidence(model, state, cache) + prior_term).renorm()
229 | 
230 | def new_center_beliefs(model, state):
231 |     return from_iso(np.zeros(model.ndim), state.sigma_sq_b)
232 | 
233 | def center_predictive(model, state, cache, k):
234 |     N, D = state.X.shape
235 |     if k == cache.ncomp:
236 |         return np.zeros(D), np.ones(D) * (state.sigma_sq_b + state.sigma_sq_w)
237 |     else:
238 |         ssq_w, ssq_b = state.sigma_sq_w, state.sigma_sq_b
239 |         lam = cache.obs_counts[k, :] / ssq_w + 1. / ssq_b
240 |         predictive_mu = (cache.sum_X[k, :] / ssq_w) / lam
241 |         predictive_sigma_sq = 1. / lam + state.sigma_sq_w
242 |         return predictive_mu, predictive_sigma_sq
243 | 
244 | def cond_assignments_collapsed(model, data, state, cache, i):
245 |     cache.remove(i)
246 |     
247 |     prior_term = np.concatenate([np.log(cache.counts), [np.log(model.alpha)]])
248 | 
249 |     if MAX_COMPONENTS is not None and cache.ncomp >= MAX_COMPONENTS:
250 |         prior_term[-1] = -np.infty
251 | 
252 |     data_term = np.zeros(cache.ncomp + 1)
253 |     for k in range(cache.ncomp + 1):
254 |         predictive_mu, predictive_ssq = center_predictive(model, state, cache, k)
255 |         data_term[k] = data[i, :].loglik(predictive_mu, predictive_ssq)
256 |         
257 |     cache.add(i, state.assignments[i], state.X[i, :])
258 | 
259 |     return distributions.MultinomialDistribution.from_odds(prior_term + data_term)
260 | 
261 | 
262 | def gibbs_step_assignments_collapsed(model, data, state, cache, i):
263 |     dist = cond_assignments_collapsed(model, data, state, cache, i)
264 |     new_assignment = dist.sample().argmax()
265 |     state.assignments[i] = new_assignment
266 |     cache.remove(i)
267 |     predictive_mu, predictive_ssq = center_predictive(model, state, cache, new_assignment)
268 |     state.X[i, :] = data[i, :].sample_latent_values(predictive_mu, predictive_ssq)
269 |     cache.add(i, new_assignment, state.X[i, :])
270 |     cache.squeeze(state)
271 | 
272 | 
273 | def cond_centers(model, data, state, cache):
274 |     if model.isotropic_b:
275 |         prior_term = from_iso(np.zeros(model.ndim), state.sigma_sq_b)
276 |     else:
277 |         prior_term = gaussians.Potential.from_moments_diag(np.zeros(model.ndim), state.sigma_sq_b)
278 |     center_beliefs = center_evidence(model, state, cache) + prior_term
279 |     return center_beliefs.renorm()
280 | 
281 | def gibbs_step_centers(model, data, state, cache):
282 |     cond = cond_centers(model, data, state, cache)
283 |     new_centers = cond.to_distribution().sample()
284 |     state.centers = new_centers
285 | 
286 | def cond_sigma_sq_b(model, data, state):
287 |     counts = np.bincount(state.assignments)
288 |     nz = np.where(counts > 0)[0]
289 |     centers = state.centers[nz, :]
290 | 
291 |     if model.isotropic_b:
292 |         a = model.between_var_prior.a + 0.5 * nz.size * model.ndim
293 |         b = model.between_var_prior.b + 0.5 * np.sum(centers**2)
294 |     else:
295 |         a = model.between_var_prior.a + 0.5 * nz.size * np.ones(model.ndim)
296 |         b = model.between_var_prior.b + 0.5 * np.sum(centers**2, axis=0)
297 |     return distributions.InverseGammaDistribution(a, b)
298 | 
299 | def gibbs_step_sigma_sq_b(model, data, state):
300 |     cond = cond_sigma_sq_b(model, data, state)
301 |     state.sigma_sq_b = cond.sample()
302 | 
303 | def cond_sigma_sq_w(model, data, state):
304 |     diff = state.X - state.centers[state.assignments, :]
305 |     if model.isotropic_w:
306 |         a = model.within_var_prior.a + 0.5 * np.sum(data.mask)
307 |         b = model.within_var_prior.b + 0.5 * np.sum(data.mask * diff**2)
308 |     else:
309 |         a = model.within_var_prior.a + 0.5 * np.sum(data.mask, axis=0)
310 |         b = model.within_var_prior.b + 0.5 * np.sum(data.mask * diff**2, axis=0)
311 |     return distributions.InverseGammaDistribution(a, b)
312 | 
313 | def gibbs_step_sigma_sq_w(model, data, state):
314 |     cond = cond_sigma_sq_w(model, data, state)
315 |     state.sigma_sq_w = cond.sample()
316 | 
317 | def gibbs_sweep_collapsed(model, data, state, fixed_variance=False):
318 |     cache = CollapsedCRPCache.from_state(model, data, state)
319 |     num = state.X.shape[0]
320 |     for i in range(num):
321 |         gibbs_step_assignments_collapsed(model, data, state, cache, i)
322 |         
323 |     cache = CollapsedCRPCache.from_state(model, data, state)
324 |     gibbs_step_centers(model, data, state, cache)
325 |     #assert False
326 |     gibbs_step_sigma_sq_b(model, data, state)
327 |     if not fixed_variance:
328 |         gibbs_step_sigma_sq_w(model, data, state)
329 |     
330 |     
331 | 
332 | 
333 | NUM_ITER = 200
334 | 
335 | def init_X(data_matrix):
336 |     X_init = data_matrix.sample_latent_values(np.zeros((data_matrix.m, data_matrix.n)), 1.)
337 |     svd_K = min(20, data_matrix.m // 4, data_matrix.n // 4)
338 |     svd_K = max(svd_K, 2)  # 0 and 1 cause it to crash
339 |     _, _, _, _, _, X_init = low_rank.fit_model(data_matrix, svd_K, 10)
340 |     return X_init
341 | 
342 | 
343 | def fit_model(data_matrix, isotropic_w=True, isotropic_b=True, num_iter=NUM_ITER):
344 |     X_init = init_X(data_matrix)
345 | 
346 |     model = CRPModel(1., X_init.shape[1], distributions.InverseGammaDistribution(0.01, 0.01),
347 |                      distributions.InverseGammaDistribution(0.01, 0.01), isotropic_w, isotropic_b)
348 | 
349 |     N, D = X_init.shape
350 | 
351 |     k_init = min(N//4, 40)
352 |     km = sklearn.cluster.KMeans(n_clusters=k_init)
353 |     km.fit(X_init)
354 |     init_assignments = km.labels_
355 | 
356 |     
357 |     
358 |     sigma_sq_f = sigma_sq_n = X_init.var() / 2.
359 |     if not model.isotropic_b:
360 |         sigma_sq_f = X_init.var(0) / 2.
361 |     state = CollapsedCRPState(X_init, init_assignments, sigma_sq_n, sigma_sq_f)
362 |     state.centers = km.cluster_centers_
363 | 
364 |     fixed_variance = data_matrix.fixed_variance()
365 | 
366 |     data = data_matrix.observations
367 | 
368 |     if fixed_variance:
369 |         if isotropic_w:
370 |             state.sigma_sq_w = 1.
371 |         else:
372 |             state.sigma_sq_w = np.ones(D)
373 | 
374 |     pbar = misc.pbar(num_iter)
375 | 
376 |     t0 = time.time()
377 |     for it in range(num_iter):
378 |         pred = state.centers[state.assignments, :]
379 |         state.X = data_matrix.sample_latent_values(pred, state.sigma_sq_w)
380 |         gibbs_sweep_collapsed(model, data, state, fixed_variance)
381 | 
382 |         if time.time() - t0 > 3600.:   # 1 hour
383 |             break
384 | 
385 |         pbar.update(it)
386 |     pbar.finish()
387 | 
388 |     # sample the centers
389 |     cache = CollapsedCRPCache.from_state(model, data, state)
390 |     gibbs_step_centers(model, data, state, cache)
391 | 
392 |     return state
393 | 
394 | 
395 | 


--------------------------------------------------------------------------------
/algorithms/dumb_samplers.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | nax = np.newaxis
  3 | import scipy.optimize
  4 | 
  5 | import grammar
  6 | import models
  7 | import slice_sampling
  8 | import sparse_coding
  9 | from utils import distributions, misc
 10 | 
 11 | 
 12 | def sample_variance(node):
 13 |     if node.isleaf() and node.distribution() == 'g':
 14 |         node.sample_variance()
 15 |     elif node.issum():
 16 |         for child in node.children:
 17 |             sample_variance(child)
 18 |     elif node.isproduct():
 19 |         for child in node.children:
 20 |             sample_variance(child)
 21 | 
 22 | 
 23 | 
 24 | 
 25 | 
 26 | class GenericGibbsSampler:
 27 |     def __init__(self, node):
 28 |         self.node = node
 29 | 
 30 |     def step(self, niter=1, maximize=False):
 31 |         if not maximize:
 32 |             self.node.gibbs_update2()
 33 | 
 34 |     def __str__(self):
 35 |         return 'GenericGibbsSampler(%d)' % self.node.model.id
 36 | 
 37 |     def preserves_root_value(self):
 38 |         return True
 39 | 
 40 | 
 41 | class GaussianSampler:
 42 |     def __init__(self, product_node, noise_node, side, maximize):
 43 |         self.product_node = product_node
 44 |         self.noise_node = noise_node
 45 |         self.side = side
 46 |         self.maximize = maximize
 47 | 
 48 |     def step(self):
 49 |         left, right = self.product_node.children
 50 |         m, n = self.noise_node.m, self.noise_node.n
 51 | 
 52 |         old = np.dot(left.value(), right.value()) + self.noise_node.value()
 53 | 
 54 |         if self.side == 'left' and ((left.isleaf() and left.distribution() == 'g')
 55 |                                     or left.isgsm()):
 56 |             A = np.eye(m)
 57 |             B = right.value()
 58 |             X_node = left
 59 |         elif self.side == 'left' and left.issum():
 60 |             A = np.eye(m)
 61 |             B = right.value()
 62 |             X_node = left.children[-1]
 63 |         elif self.side == 'right' and ((right.isleaf() and right.distribution() == 'g')
 64 |                                        or right.isgsm()):
 65 |             A = left.value()
 66 |             B = np.eye(n)
 67 |             X_node = right
 68 |         elif self.side == 'right' and right.issum():
 69 |             A = left.value()
 70 |             B = np.eye(n)
 71 |             X_node = right.children[-1]
 72 |         else:
 73 |             raise RuntimeError("shouldn't get here")
 74 | 
 75 |         X = X_node.value()
 76 |         N_node = self.noise_node
 77 |         N = N_node.value()
 78 |         C = np.dot(np.dot(A, X), B) + N
 79 |         obs = np.ones((m, n), dtype=bool)
 80 | 
 81 |         if X_node.has_rank1_variance() and N_node.has_rank1_variance():
 82 |             ssq_row_N, ssq_col_N = N_node.row_col_variance()
 83 |             ssq_row_X, ssq_col_X = X_node.row_col_variance()
 84 |             d_1 = 1. / np.sqrt(ssq_row_N)
 85 |             d_2 = 1. / np.sqrt(ssq_col_N)
 86 |             d_3 = 1. / np.sqrt(ssq_row_X)
 87 |             d_4 = 1. / np.sqrt(ssq_col_X)
 88 | 
 89 |             if self.maximize:
 90 |                 X_new = misc.map_gaussian_matrix_em(A, B, C, d_1, d_2, d_3, d_4, obs, X)
 91 |             else:
 92 |                 X_new = misc.sample_gaussian_matrix_em(A, B, C, d_1, d_2, d_3, d_4, obs, X)
 93 | 
 94 |         else:
 95 |             if self.maximize:
 96 |                 fn = misc.map_gaussian_matrix2
 97 |             else:
 98 |                 fn = misc.sample_gaussian_matrix2
 99 | 
100 |             if self.side == 'left':
101 |                 X_new = fn(B.T, C.T, 1. / X_node.variance().T, obs.T / N_node.variance().T).T
102 |             else:
103 |                 X_new = fn(A, C, 1. / X_node.variance(), obs / N_node.variance())
104 | 
105 | 
106 |         X_node.set_value(X_new)
107 |         N_new = C - np.dot(np.dot(A, X_new), B)
108 |         N_node.set_value(N_new)
109 | 
110 |         new = np.dot(left.value(), right.value()) + self.noise_node.value()
111 |         assert np.allclose(old, new)
112 | 
113 |     def __str__(self):
114 |         return 'GaussianSampler(prod=%d, noise=%d, side=%s, maximize=%s)' % \
115 |                (self.product_node.model.id, self.noise_node.model.id, self.side, self.maximize)
116 | 
117 |     def preserves_root_value(self):
118 |         return True
119 | 
120 | 
121 | class LatentValueSampler:
122 |     def __init__(self, data_matrix, node):
123 |         self.data_matrix = data_matrix
124 |         self.node = node
125 | 
126 |     def step(self):
127 |         pred = self.node.value() - self.node.children[-1].value()
128 |         new_X = self.data_matrix.sample_latent_values(pred, self.node.children[-1].variance())
129 |         self.node.children[-1].set_value(new_X - pred)
130 | 
131 |     def __str__(self):
132 |         return 'LatentValueSampler(%d)' % self.node.model.id
133 | 
134 |     def preserves_root_value(self):
135 |         return False
136 | 
137 |     
138 | 
139 | class LatentValueMaximizer:
140 |     def __init__(self, data_matrix, node):
141 |         self.data_matrix = data_matrix
142 |         self.node = node
143 | 
144 |     def step(self):
145 |         pred = self.node.value() - self.node.children[-1].value()
146 |         new_X = np.where(self.data_matrix.observations.mask, self.node.value(), pred)
147 |         self.node.children[-1].set_value(new_X - pred)
148 | 
149 |     def __str__(self):
150 |         return 'LatentValueMaximizer(%d)' % self.node.model.id
151 | 
152 |     def preserves_root_value(self):
153 |         return False
154 | 
155 | class VarianceSampler:
156 |     def __init__(self, node):
157 |         self.node = node
158 | 
159 |     def step(self):
160 |         self.node.sample_variance()
161 | 
162 |     def __str__(self):
163 |         return 'VarianceSampler(%d)' % self.node.model.id
164 | 
165 |     def preserves_root_value(self):
166 |         return True
167 | 
168 | class GSMScaleSampler:
169 |     def __init__(self, gsm_node, maximize=False):
170 |         self.gsm_node = gsm_node
171 |         self.maximize = maximize
172 | 
173 |     def step(self):
174 |         # S ~ N(0, exp(Z / 2))
175 |         #
176 |         # Z = signal_node + noise_node + bias
177 |         #   = signal_node +   gaussian_term
178 |         #   =       scale_node         + bias
179 |         #
180 |         # resample gaussian_term conditioned on signal_node
181 |          
182 |         scale_node = self.gsm_node.scale_node
183 |         S = self.gsm_node.value()
184 |         N, K = S.shape
185 | 
186 |         # resample Z
187 |         Z = scale_node.value() + self.gsm_node.bias
188 |         if scale_node.isleaf():
189 |             mu = self.gsm_node.bias * np.ones((N, K))
190 |             sigma_sq = scale_node.variance()
191 |         else:
192 |             assert scale_node.issum()
193 |             mu = self.gsm_node.bias + scale_node.value() - scale_node.children[-1].value()
194 |             sigma_sq = scale_node.children[-1].variance()
195 | 
196 |         for i in range(N):
197 |             for k in range(K):
198 |                 log_f = sparse_coding.LogFUncollapsed(S[i, k])
199 |                 if self.maximize:
200 |                     temp = lambda z: -log_f(z) - distributions.gauss_loglik(z, mu[i, k], sigma_sq[i, k])
201 |                     Z[i, k] = scipy.optimize.fmin(temp, Z[i, k], disp=False)
202 |                 else:
203 |                     Z[i, k] = slice_sampling.slice_sample_gauss(log_f, mu[i, k], sigma_sq[i, k], Z[i, k])
204 | 
205 |         # resample bias
206 |         if scale_node.isleaf():
207 |             gaussian_term = Z
208 |         else:
209 |             signal = scale_node.value() - scale_node.children[-1].value()
210 |             gaussian_term = Z - signal
211 | 
212 |         if not self.maximize:
213 |             if self.gsm_node.bias_type == 'scalar':
214 |                 mu = gaussian_term.mean()
215 |                 lam = (1. / sigma_sq).sum()
216 |                 self.gsm_node.bias = np.random.normal(mu, 1. / lam)
217 |             elif self.gsm_node.bias_type == 'row':
218 |                 mu = gaussian_term.mean(1)
219 |                 lam = (1. / sigma_sq).sum(1)
220 |                 self.gsm_node.bias = np.random.normal(mu, 1. / lam)[:, nax]
221 |             elif self.gsm_node.bias_type == 'col':
222 |                 mu = gaussian_term.mean(0)
223 |                 lam = (1. / sigma_sq).sum(0)
224 |                 self.gsm_node.bias = np.random.normal(mu, 1. / lam)[nax, :]
225 | 
226 |         # set noise node
227 |         noise_term = gaussian_term - self.gsm_node.bias
228 |         if scale_node.isleaf():
229 |             scale_node.set_value(noise_term)
230 |         else:
231 |             scale_node.children[-1].set_value(noise_term)
232 |     
233 |     def __str__(self):
234 |         return 'GSMScaleSampler(%d, maximize=%s)' % (self.gsm_node.model.id, self.maximize)
235 | 
236 |     def preserves_root_value(self):
237 |         return True
238 | 
239 | 
240 | def get_samplers(data_matrix, node, maximize):
241 |     samplers = []
242 |     if data_matrix is not None and not maximize:
243 |         samplers.append(LatentValueSampler(data_matrix, node))
244 |     if data_matrix is not None and maximize:
245 |         samplers.append(LatentValueMaximizer(data_matrix, node))
246 |         
247 |     if node.isleaf() and not node.model.fixed and not maximize:
248 |         samplers.append(GenericGibbsSampler(node))
249 |         
250 |     if node.isleaf() and node.distribution() == 'g' and not node.model.fixed_variance and not maximize:
251 |         samplers.append(VarianceSampler(node))
252 |         
253 |     if node.issum():
254 |         children = node.children[:-1]
255 |         noise_node = node.children[-1]
256 |         for child in children:
257 |             left, right = child.children
258 |             if ((left.isleaf() and left.distribution() == 'g') or left.issum() or left.isgsm()) and \
259 |                    not left.model.fixed and not noise_node.model.fixed:
260 |                 samplers.append(GaussianSampler(child, noise_node, 'left', maximize))
261 |             if ((right.isleaf() and right.distribution() == 'g') or right.issum() or left.isgsm()) and \
262 |                    not right.model.fixed and not noise_node.model.fixed:
263 |                 samplers.append(GaussianSampler(child, noise_node, 'right', maximize))
264 | 
265 |     if node.isgsm():
266 |         samplers.append(GSMScaleSampler(node, maximize=maximize))
267 |                 
268 |     for child in node.children:
269 |         samplers += get_samplers(None, child, maximize)
270 |         
271 |     return samplers
272 | 
273 | def list_samplers(model, maximize=False):
274 |     node = model.dummy()
275 |     models.align(node, model)
276 |     samplers = get_samplers('dummy', node, maximize)
277 |     node.model.display()
278 |     print
279 |     for s in samplers:
280 |         print s
281 | 
282 | 
283 | def sweep(data_matrix, root, num_iter=100, maximize=False):
284 |     samplers = get_samplers(data_matrix, root, maximize)
285 | 
286 |     if num_iter > 1:
287 |         print 'Dumb Gibbs sampling on %s...' % grammar.pretty_print(root.structure())
288 |         pbar = misc.pbar(num_iter)
289 |     else:
290 |         pbar = None
291 |         
292 |     for it in range(num_iter):
293 |         for sampler in samplers:
294 |             if sampler.preserves_root_value():
295 |                 old = root.value()
296 |             sampler.step()
297 |             if sampler.preserves_root_value():
298 |                 assert np.allclose(old, root.value())
299 | 
300 |         if pbar is not None:
301 |             pbar.update(it)
302 |     if pbar is not None:
303 |         pbar.finish()
304 |         
305 | 


--------------------------------------------------------------------------------
/algorithms/ibp_split_merge.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | nax = np.newaxis
  3 | import scipy.special
  4 | 
  5 | import ibp
  6 | import observations
  7 | from utils import distributions, gaussians, psd_matrices
  8 | 
  9 | def poisson(k, lam):
 10 |     return -lam * k * np.log(lam) - scipy.special.gammaln(k+1)
 11 | 
 12 | def evidence(model, data, state):
 13 |     K, D = state.Z.shape[1], state.X.shape[1]
 14 | 
 15 |     Lambda = np.dot(state.Z.T, state.Z) / state.sigma_sq_n + np.eye(K) / state.sigma_sq_f
 16 |     h = np.dot(state.Z.T, state.X) / state.sigma_sq_n
 17 | 
 18 |     # we can ignore the constant factors because they don't depend on Z
 19 |     pot = gaussians.Potential(h.T, psd_matrices.FullMatrix(Lambda[nax, :, :]), 0.)
 20 |     return pot.integral().sum()
 21 | 
 22 | def sample_features(model, data, state):
 23 |     K, D = state.Z.shape[1], state.X.shape[1]
 24 | 
 25 |     Lambda = np.dot(state.Z.T, state.Z) / state.sigma_sq_n + np.eye(K) / state.sigma_sq_f
 26 |     h = np.dot(state.Z.T, state.X) / state.sigma_sq_n
 27 | 
 28 |     # we can ignore the constant factors because they don't depend on Z
 29 |     pot = gaussians.Potential(h.T, psd_matrices.FullMatrix(Lambda[nax, :, :]), 0.)
 30 |     return pot.to_distribution().sample().T
 31 | 
 32 | 
 33 | def next_assignment_proposal(model, data, state, cache, Sigma_info, i, k):
 34 |     assert not cache.rows_included[i]
 35 |     x = state.X[i, :]
 36 | 
 37 |     evidence = np.zeros(2)
 38 |     for assignment in [0, 1]:
 39 |         mu = Sigma_info.mu_for(k, assignment)
 40 |         ssq = Sigma_info.sigma_sq_for(k, assignment) + state.sigma_sq_n
 41 |         evidence[assignment] = ibp.gauss_loglik_vec_C2(x, mu, ssq)
 42 |     data_odds = evidence[1] - evidence[0]
 43 | 
 44 |     if cache.counts[k] > 0:
 45 |         prior_odds = np.log(cache.counts[k]) - np.log(cache.num_included - cache.counts[k] + 1)
 46 |     else:
 47 |         #prior_odds = poisson(1, 0.5 * model.alpha / (i+1)) - poisson(0, 0.5 * model.alpha / (i+1))
 48 |         prior_odds = np.log(model.alpha) - np.log(cache.num_included + 1)
 49 | 
 50 |     return distributions.BernoulliDistribution.from_odds(data_odds + prior_odds)
 51 |     
 52 | 
 53 | def propose_assignments(model, data, state, update=False):
 54 |     """Generate the proposal for K columns using sequential Monte Carlo. Assumes the remaining
 55 |     features have been sampled, the remaining assignments are fixed, and the other features' contributions
 56 |     are subtracted from the data matrix. Generally K = 2."""
 57 |     N, K = state.Z.shape
 58 |     state = state.copy()
 59 |     cache = ibp.IBPCache.from_state(model, data, state, np.zeros(N, dtype=bool))
 60 | 
 61 |     proposal_prob = 0.
 62 | 
 63 |     for i in range(N):
 64 |         Sigma_info = cache.fpost.Sigma_info(np.zeros(K, dtype=int))
 65 |         for k in range(K):
 66 |             cond = next_assignment_proposal(model, data, state, cache, Sigma_info, i, k)
 67 |             if update:
 68 |                 state.Z[i, k] = cond.sample()
 69 |             proposal_prob += cond.loglik(state.Z[i, k])
 70 |             Sigma_info.update(k, state.Z[i, k])
 71 |         cache.add(i, state.Z[i, :], state.X[i, :])
 72 | 
 73 |     return state, proposal_prob
 74 | 
 75 | CHOICES = [(0, 0), (0, 1), (1, 0), (1, 1)]
 76 | 
 77 | def propose_assignments2(model, data, state, update=False):
 78 |     N, K = state.Z.shape
 79 |     state = state.copy()
 80 |     cache = ibp.IBPCache.from_state(model, data, state, np.zeros(N, dtype=bool))
 81 | 
 82 |     proposal_prob = 0.
 83 | 
 84 |     for i in range(N):
 85 |         obs = data.mask[i, :]
 86 |         x = state.X[i, :]
 87 |         
 88 |         evidence = np.zeros(4)
 89 |         prior_odds = np.zeros(4)
 90 |         for c, (z1, z2) in enumerate(CHOICES):
 91 |             z = np.array([z1, z2])
 92 |             mu = cache.fpost.predictive_mu(z)
 93 |             ssq = cache.fpost.predictive_ssq(z) + state.sigma_sq_n
 94 |             evidence[c] = ibp.gauss_loglik_vec_C2(x[obs], mu[obs], ssq)
 95 | 
 96 |             for k in [0, 1]:
 97 |                 if cache.counts[k] > 0:
 98 |                     prior_odds[c] += np.log(cache.counts[k]) - np.log(cache.num_included - cache.counts[k] + 1)
 99 |                 else:
100 |                     prior_odds[c] += np.log(model.alpha) - np.log(cache.num_included + 1)
101 | 
102 |         odds = evidence + prior_odds
103 |         dist = distributions.MultinomialDistribution.from_odds(odds)
104 |         if update:
105 |             state.Z[i, :] = CHOICES[dist.sample().argmax()]
106 |         proposal_prob += dist.loglik(CHOICES.index(tuple(state.Z[i, :])))
107 |         cache.add(i, state.Z[i, :], state.X[i, :])
108 | 
109 |         assert np.isfinite(proposal_prob)
110 | 
111 |     return state, proposal_prob
112 |         
113 | 
114 | def ibp_loglik(Z, alpha):
115 |     N = Z.shape[0]
116 |     idxs = np.where(Z.any(0))[0]
117 |     K = idxs.size
118 |     
119 |     total = -alpha * (1. / np.arange(1, N+1)).sum()
120 |     total += alpha * K
121 | 
122 |     if K > 0:
123 |         m = Z[:, idxs].sum(0)
124 |         total += scipy.special.gammaln(N - m + 1).sum()
125 |         total += scipy.special.gammaln(m).sum()
126 |         total -= K * scipy.special.gammaln(N + 1)
127 | 
128 |     assert np.isfinite(total)
129 |     
130 |     return total
131 | 
132 | 
133 | def choose_columns(K):
134 |     if np.random.binomial(1, 0.5):
135 |         k1 = 'new'
136 |     else:
137 |         k1 = np.random.randint(0, K)
138 | 
139 |     if np.random.binomial(1, 0.5):
140 |         k2 = 'new'
141 |     else:
142 |         k2 = np.random.randint(0, K)
143 |         if k2 == k1:
144 |             k2 = 'new'
145 | 
146 |     return k1, k2
147 | 
148 | def column_probability(K, k1, k2):
149 |     total = 0.
150 |     if k1 == 'new':
151 |         total += np.log(0.5)
152 |     else:
153 |         total += np.log(0.5) - np.log(K)
154 | 
155 |     if k2 == 'new':
156 |         total += np.log(0.5 + 0.5 / K)
157 |     else:
158 |         assert k1 != k2
159 |         total += np.log(0.5) - np.log(K)
160 | 
161 |     return total
162 | 
163 | 
164 | def backward_move_info(K_orig, k1, k2, new_reduced_state):
165 |     any_ones = new_reduced_state.Z.any(0)
166 | 
167 |     K_back = K_orig
168 |     if k1 == 'new':
169 |         K_back += 1
170 |     if k2 == 'new':
171 |         K_back += 1
172 |     if not any_ones[0]:
173 |         K_back -= 1
174 |     if not any_ones[1]:
175 |         K_back -= 1
176 |     
177 |     if any_ones[0]:
178 |         k1_back = 0
179 |     else:
180 |         k1_back = 'new'
181 | 
182 |     if any_ones[1]:
183 |         k2_back = 1
184 |     else:
185 |         k2_back = 'new'
186 | 
187 |     return K_back, k1_back, k2_back
188 | 
189 | 
190 | def split_merge_step(model, data, state):
191 |     N, K, D = state.X.shape[0], state.Z.shape[1], state.X.shape[1]
192 | 
193 |     if K <= 2:
194 |         return    # this case is awkward to deal with, and if it occurs, the model probably isn't too good anyway
195 | 
196 |     # choose random columns
197 |     k1, k2 = choose_columns(K)
198 | 
199 |     # generate reduced problem
200 |     prod = np.zeros(state.X.shape)
201 |     for k in range(K):
202 |         if k not in (k1, k2):
203 |             prod += np.outer(state.Z[:, k], state.A[k, :])
204 |     reduced_data = observations.RealObservations(state.X - prod, np.ones(state.X.shape, dtype=bool))
205 |     reduced_X = state.X - prod
206 |     reduced_state = ibp.CollapsedIBPState(reduced_X, np.zeros((N, 2), dtype=int), state.sigma_sq_f, state.sigma_sq_n)
207 |     if k1 != 'new':
208 |         reduced_state.Z[:, 0] = state.Z[:, k1]
209 |     if k2 != 'new':
210 |         reduced_state.Z[:, 1] = state.Z[:, k2]
211 | 
212 |     # propose assignments
213 |     new_reduced_state, forward_prob = propose_assignments2(model, reduced_data, reduced_state, True)
214 |     forward_prob += column_probability(K, k1, k2)
215 | 
216 |     # score the states
217 |     old_score = ibp_loglik(reduced_state.Z, model.alpha) + evidence(model, reduced_data, reduced_state)
218 |     new_score = ibp_loglik(new_reduced_state.Z, model.alpha) + evidence(model, reduced_data, new_reduced_state)
219 | 
220 |     # backward proposal probability
221 |     K_back, k1_back, k2_back = backward_move_info(K, k1, k2, new_reduced_state)
222 |     backward_prob = column_probability(K_back, k1_back, k2_back)
223 |     _, proposal_prob = propose_assignments2(model, reduced_data, reduced_state, False)
224 |     backward_prob += proposal_prob
225 | 
226 |     mh_score = new_score - old_score + backward_prob - forward_prob
227 |     if mh_score > 0.:
228 |         acceptance_prob = 1.
229 |     else:
230 |         acceptance_prob = np.exp(mh_score)
231 | 
232 |     accept = np.random.binomial(1, acceptance_prob)
233 | 
234 |     if accept:
235 |         A = sample_features(model, reduced_data, new_reduced_state)
236 |         
237 |         if k1 == 'new':
238 |             if np.any(new_reduced_state.Z[:, 0] > 0):
239 |                 state.Z = np.hstack([state.Z, new_reduced_state.Z[:, 0][:, nax]])
240 |                 state.A = np.vstack([state.A, A[0, :][nax, :]])
241 |         else:
242 |             state.Z[:, k1] = new_reduced_state.Z[:, 0]
243 |             state.A[k1, :] = A[0, :]
244 | 
245 |         if k2 == 'new':
246 |             if np.any(new_reduced_state.Z[:, 1] > 0):
247 |                 state.Z = np.hstack([state.Z, new_reduced_state.Z[:, 1][:, nax]])
248 |                 state.A = np.vstack([state.A, A[1, :][nax, :]])
249 |         else:
250 |             state.Z[:, k2] = new_reduced_state.Z[:, 1]
251 |             state.A[k2, :] = A[1, :]
252 | 
253 |     else:
254 |         pass
255 | 
256 | 
257 |     
258 | 


--------------------------------------------------------------------------------
/algorithms/low_rank.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | nax = np.newaxis
 3 | import scipy.linalg
 4 | import time
 5 | 
 6 | from utils import misc
 7 | 
 8 | def sample_variance(values, axis):
 9 |     a = 0.01 + 0.5 * np.ones(values.shape).sum(axis)
10 |     b = 0.01 + 0.5 * (values ** 2).sum(axis)
11 |     prec = np.random.gamma(a, 1. / b)
12 |     return 1. / prec
13 | 
14 | NUM_ITER = 200
15 | 
16 | 
17 | def fit_model(data_matrix, K, num_iter=NUM_ITER, rotation_trick=True):
18 |     N, D = data_matrix.m, data_matrix.n
19 |     X = data_matrix.sample_latent_values(np.zeros((N, D)), 1.)
20 | 
21 |     if rotation_trick:
22 |         U_, s_, V_ = scipy.linalg.svd(X, full_matrices=False)
23 |         U = U_[:, :K] * np.sqrt(s_[:K][nax, :])
24 |         V = V_[:K, :] * np.sqrt(s_[:K][:, nax])
25 |     else:
26 |         U = np.random.normal(size=(N, K))
27 |         V = np.random.normal(size=(K, D))
28 | 
29 |     ssq_U = np.mean(U**2, axis=0)
30 |     ssq_V = np.mean(V**2, axis=1)
31 | 
32 |     pred = np.dot(U, V)
33 |     if data_matrix.observations.fixed_variance():
34 |         ssq_N = 1.
35 |     else:
36 |         ssq_N = np.mean((X - pred) ** 2)
37 | 
38 |     t0 = time.time()
39 |     for it in range(num_iter):
40 |         if np.any(-data_matrix.observations.mask):
41 |             obs = data_matrix.observations.mask
42 |             U_var = np.outer(np.ones(N), ssq_U)
43 |             V_var = np.outer(ssq_V, np.ones(D))
44 |             U = misc.sample_gaussian_matrix2(V.T, X.T, 1. / U_var.T, obs.T / ssq_N).T
45 |             V = misc.sample_gaussian_matrix2(U, X, 1. / V_var, obs / ssq_N)
46 |         else:
47 |             U = misc.sample_gaussian_matrix(np.eye(N), V, X, np.ones(N) / ssq_N, np.ones(D), np.ones(N), 1. / ssq_U)
48 |             V = misc.sample_gaussian_matrix(U, np.eye(D), X, np.ones(N) / ssq_N, np.ones(D), 1. / ssq_V, np.ones(D))
49 | 
50 | 
51 |         # rotation trick (to speed up learning the variances)
52 |         if rotation_trick and it < num_iter // 4:
53 |             UtU = np.dot(U.T, U)
54 |             _, Q = scipy.linalg.eigh(UtU)
55 |             Q = Q[:, ::-1]
56 |             U = np.dot(U, Q)
57 |             V = np.dot(Q.T, V)
58 | 
59 | 
60 |         ssq_U = sample_variance(U, 0)
61 |         ssq_V = sample_variance(V, 1)
62 |         ssq_U = np.sqrt(ssq_U * ssq_V)
63 |         ssq_V = ssq_U.copy()
64 |         
65 |         pred = np.dot(U, V)
66 |         if not data_matrix.observations.fixed_variance():
67 |             ssq_N = sample_variance(X - pred, None)
68 | 
69 |         X = data_matrix.sample_latent_values(pred, ssq_N)
70 | 
71 |         if time.time() - t0 > 3600.:   # 1 hour
72 |             break
73 | 
74 |     return U, V, ssq_U, ssq_V, ssq_N, X
75 | 
76 | 
77 | 


--------------------------------------------------------------------------------
/algorithms/low_rank_poisson.py:
--------------------------------------------------------------------------------
  1 | import itertools
  2 | import numpy as np
  3 | nax = np.newaxis
  4 | import random
  5 | import scipy.integrate
  6 | import scipy.linalg
  7 | import scipy.special
  8 | import time
  9 | 
 10 | from utils import distributions, gaussians, misc, psd_matrices
 11 | 
 12 | A = 0.1
 13 | B = 0.1
 14 | 
 15 | VERBOSE = False
 16 | SEED_0 = False
 17 | K_INIT = 2
 18 | 
 19 | class State:
 20 |     def __init__(self, U, V, ssq_U, ssq_N):
 21 |         self.U = U
 22 |         self.V = V
 23 |         self.ssq_U = ssq_U
 24 |         self.ssq_N = ssq_N
 25 | 
 26 |     def copy(self):
 27 |         return State(self.U.copy(), self.V.copy(), self.ssq_U.copy(), self.ssq_N)
 28 | 
 29 | def sample_variance(values, axis, mask=None):
 30 |     if mask is None:
 31 |         mask = np.ones(values.shape, dtype=bool)
 32 |     a = 0.01 + 0.5 * mask.sum(axis)
 33 |     b = 0.01 + 0.5 * (mask * values ** 2).sum(axis)
 34 |     prec = np.random.gamma(a, 1. / b)
 35 |     return 1. / prec
 36 | 
 37 | def p_u(u):
 38 |     N = u.size
 39 |     return -(A + 0.5 * N) * np.log(B + 0.5 * np.sum(u ** 2))
 40 | 
 41 | def givens_move(U, V, a, b):
 42 |     N = U.shape[0]
 43 |     theta = np.linspace(-np.pi / 4., np.pi / 4.)
 44 |     uaa = np.dot(U[:, a], U[:, a])
 45 |     uab = np.dot(U[:, a], U[:, b])
 46 |     ubb = np.dot(U[:, b], U[:, b])
 47 | 
 48 |     sin, cos = np.sin(theta), np.cos(theta)
 49 |     uaa_prime_ssq = uaa * cos ** 2 + 2 * uab * cos * sin + ubb * sin ** 2
 50 |     ubb_prime_ssq = uaa * sin ** 2 - 2 * uab * cos * sin + ubb * cos ** 2
 51 |     odds = -(A + 0.5 * N) * (np.log(B + 0.5 * uaa_prime_ssq) + np.log(B + 0.5 * ubb_prime_ssq))
 52 |     p = np.exp(odds - np.logaddexp.reduce(odds))
 53 |     p /= np.sum(p)
 54 |     idx = np.random.multinomial(1, p).argmax()
 55 |     
 56 |     theta = theta[idx]
 57 |     sin, cos = np.sin(theta), np.cos(theta)
 58 |     U[:, a], U[:, b] = cos * U[:, a] + sin * U[:, b], -sin * U[:, a] + cos * U[:, b]
 59 |     V[a, :], V[b, :] = cos * V[a, :] + sin * V[b, :], -sin * V[a, :] + cos * V[b, :]
 60 |     
 61 | def givens_moves(state):
 62 |     U, V = state.U, state.V
 63 |     N, K, D = U.shape[0], U.shape[1], V.shape[1]
 64 |     pairs = list(itertools.combinations(range(K), 2))
 65 |     if not SEED_0:
 66 |         random.shuffle(pairs)
 67 |     for a, b in pairs:
 68 |         givens_move(U, V, a, b)
 69 |     state.ssq_U = sample_variance(U, 0)
 70 | 
 71 | def scaling_move(U, V, a):
 72 |     alpha_pts = np.logspace(-2., 2., 100)
 73 |     odds = np.zeros(len(alpha_pts))
 74 |     for i, alpha in enumerate(alpha_pts):
 75 |         odds[i] = p_u(alpha * U[:, a]) + distributions.gauss_loglik(V[a, :] / alpha, 0., 1.).sum()
 76 |     p = np.exp(odds - np.logaddexp.reduce(odds))
 77 |     p /= np.sum(p)
 78 |     idx = np.random.multinomial(1, p).argmax()
 79 |     alpha = alpha_pts[idx]
 80 |     
 81 |     U[:, a] *= alpha
 82 |     V[a, :] /= alpha
 83 | 
 84 | def scaling_moves(state):
 85 |     U, V = state.U, state.V
 86 |     N, K, D = U.shape[0], U.shape[1], V.shape[1]
 87 |     for a in range(K):
 88 |         scaling_move(U, V, a)
 89 |     state.ssq_U = sample_variance(U, 0)
 90 | 
 91 | 
 92 | def cond_U(X, obs, V, ssq_U, ssq_N):
 93 |     N, K, D = X.shape[0], V.shape[0], X.shape[1]
 94 |     if np.all(obs):
 95 |         Lambda = np.diag(1. / ssq_U) + np.dot(V, V.T) / ssq_N
 96 |         Lambda = Lambda[nax, :, :]
 97 |     else:
 98 |         Lambda = np.zeros((N, K, K))
 99 |         for i in range(N):
100 |             idxs = np.where(obs[i, :])[0]
101 |             V_curr = V[:, idxs]
102 |             Lambda[i, :, :] = np.diag(1. / ssq_U) + np.dot(V_curr, V_curr.T) / ssq_N
103 |     h = np.dot(X * obs, V.T) / ssq_N
104 |     return gaussians.Potential(h, psd_matrices.FullMatrix(Lambda), 0.)
105 | 
106 | def cond_Vt(X, obs, U, ssq_N):
107 |     K = U.shape[1]
108 |     return cond_U(X.T, obs.T, U.T, np.ones(K), ssq_N)
109 | 
110 | def sample_U_V(state, X, obs):
111 |     state.U = cond_U(X, obs, state.V, state.ssq_U, state.ssq_N).to_distribution().sample()
112 |     state.V = cond_Vt(X, obs, state.U, state.ssq_N).to_distribution().sample().T
113 |     
114 | 
115 | class InstabilityError(Exception):
116 |     pass
117 | 
118 | class ProposalInfo:
119 |     def __init__(self, resid, obs, ssq_N):
120 |         N, D = resid.shape
121 |         self.resid = resid.copy()
122 |         self.obs = obs.copy()
123 |         self.ssq_N = ssq_N
124 |         self.u = np.zeros(N)
125 |         self.assigned = np.zeros(N, dtype=bool)
126 |         self.lam = np.ones(D)   # N(0, 1) prior
127 |         self.h = np.zeros(D)
128 |         self.v = None
129 |         self.ssq_u = None
130 |         self.num_assigned = 0
131 |         self.sum_u_sq = 0.
132 | 
133 |     def update_u(self, i, ui):
134 |         assert not self.assigned[i]
135 |         self.u[i] = ui
136 |         idxs = np.where(self.obs[i, :])[0]
137 |         self.lam[idxs] += ui ** 2 / self.ssq_N
138 |         self.h[idxs] += ui * self.resid[i, idxs] / self.ssq_N
139 |         self.assigned[i] = True
140 |         self.num_assigned += 1
141 |         self.sum_u_sq += ui ** 2
142 | 
143 |     def cond_v(self):
144 |         return distributions.GaussianDistribution(self.h / self.lam, 1. / self.lam)
145 | 
146 |     def cond_ssq_u(self):
147 |         a = A + 0.5 * self.num_assigned
148 |         b = B + 0.5 * self.sum_u_sq
149 |         return distributions.InverseGammaDistribution(a, b)
150 | 
151 |     def cond_u(self, i):
152 |         idxs = np.where(self.obs[i, :])[0]
153 |         #lam = np.dot(self.v[idxs], self.v[idxs]) / self.ssq_N + 1. / self.ssq_u
154 |         v = self.v[idxs]
155 |         lam = (v**2).sum() / self.ssq_N + 1. / self.ssq_u
156 |         h = (self.resid[i, idxs] * v).sum() / self.ssq_N
157 |         if np.abs(h / lam) < 1e-10:
158 |             raise InstabilityError()
159 |         return distributions.GaussianDistribution(h / lam, 1. / lam)
160 | 
161 |     def fit_v_and_var(self):
162 |         self.v = self.cond_v().maximize()
163 |         #self.v /= np.sqrt(np.mean(self.v ** 2))
164 |         self.ssq_u = self.sum_u_sq / (self.num_assigned + 1)
165 | 
166 | class Proposal:
167 |     def __init__(self, u, v, ssq_u):
168 |         self.u = u
169 |         self.v = v
170 |         self.ssq_u = ssq_u
171 | 
172 | 
173 | def make_proposal(resid, obs, ssq_N, order=None):
174 |     pi = ProposalInfo(resid, obs, ssq_N)
175 |     N, D = resid.shape
176 |     if order is None:
177 |         order = range(N)
178 | 
179 |     for i in order:
180 |         if i == order[0]:
181 |             dist = distributions.GaussianDistribution(0., 1.)
182 |         else:
183 |             dist = pi.cond_u(i)
184 |         pi.update_u(i, dist.sample())
185 |         pi.fit_v_and_var()
186 | 
187 |     v = pi.cond_v().sample()
188 |     ssq_u = pi.cond_ssq_u().sample()
189 | 
190 |     return Proposal(pi.u.copy(), v, ssq_u)
191 |         
192 | def proposal_probability(resid, obs, ssq_N, proposal, order=None):
193 |     pi = ProposalInfo(resid, obs, ssq_N)
194 |     N, D = resid.shape
195 |     if order is None:
196 |         order = range(N)
197 | 
198 |     total = 0.
199 |     for i in order:
200 |         if i == order[0]:
201 |             dist = distributions.GaussianDistribution(0., 1.)
202 |         else:
203 |             dist = pi.cond_u(i)
204 | 
205 |         total += dist.loglik(proposal.u[i])
206 |         pi.update_u(i, proposal.u[i])
207 |         pi.fit_v_and_var()
208 | 
209 |     total += pi.cond_v().loglik(proposal.v).sum()
210 |     total += pi.cond_ssq_u().loglik(proposal.ssq_u)
211 | 
212 |     return total
213 | 
214 | 
215 | def log_poisson(K, lam):
216 |     return -lam + K * np.log(lam) - scipy.special.gammaln(K+1)
217 | 
218 | def p_star(state, X, obs):
219 |     K = state.U.shape[1]
220 |     total = log_poisson(K, 1.)
221 | 
222 |     var_prior = distributions.InverseGammaDistribution(A, B)
223 |     total += var_prior.loglik(state.ssq_U).sum()
224 | 
225 |     assert np.isfinite(total)
226 | 
227 |     U_dist = distributions.GaussianDistribution(0., state.ssq_U[nax, :])
228 |     total += U_dist.loglik(state.U).sum()
229 | 
230 |     assert np.isfinite(total)
231 | 
232 |     V_dist = distributions.GaussianDistribution(0., 1.)
233 |     total += V_dist.loglik(state.V).sum()
234 | 
235 |     assert np.isfinite(total)
236 |     
237 |     pred = np.dot(state.U, state.V)
238 |     X_dist = distributions.GaussianDistribution(pred, state.ssq_N)
239 |     total += X_dist.loglik(X)[obs].sum()
240 | 
241 |     assert np.isfinite(total)
242 | 
243 |     return total
244 | 
245 | def add_delete_move(state, X, obs):
246 |     N, K, D = state.U.shape[0], state.U.shape[1], state.V.shape[1]
247 |     order = np.random.permutation(N)
248 |     if np.random.binomial(1, 0.5):   # add move
249 |         pred = np.dot(state.U, state.V)
250 |         resid = X - pred
251 |         try:
252 |             proposal = make_proposal(resid, obs, state.ssq_N, order)
253 |         except InstabilityError:
254 |             return state
255 |         except OverflowError:
256 |             return state
257 |         forward_prob = -np.log(2) + proposal_probability(resid, obs, state.ssq_N, proposal, order)
258 |         backward_prob = -np.log(2) - np.log(K + 1)
259 | 
260 |         new_U = np.hstack([state.U, proposal.u[:, nax]])
261 |         new_V = np.vstack([state.V, proposal.v[nax, :]])
262 |         new_ssq_U = np.concatenate([state.ssq_U, [proposal.ssq_u]])
263 |         new_state = State(new_U, new_V, new_ssq_U, state.ssq_N)
264 |         p_star_new = p_star(new_state, X, obs)
265 |         p_star_old = p_star(state, X, obs)
266 | 
267 |         ratio = p_star_new - p_star_old - forward_prob + backward_prob
268 |         assert np.isfinite(ratio)
269 |         if np.random.binomial(1, min(np.exp(ratio), 1)):
270 |             if VERBOSE:
271 |                 print 'Add move accepted (ratio=%1.2f)' % ratio
272 |             return new_state
273 |         else:
274 |             if VERBOSE:
275 |                 print 'Add move rejected (ratio=%1.2f)' % ratio
276 |             return state
277 |         
278 |     else:   # delete move
279 |         if K <= 2:   # zero or one dimensions causes NumPy awkwardness
280 |             return state
281 |         
282 |         k = np.random.randint(0, K)
283 |         pred = np.dot(state.U, state.V) - np.outer(state.U[:, k], state.V[k, :])
284 |         resid = X - pred
285 |         reverse_proposal = Proposal(state.U[:, k], state.V[k, :], state.ssq_U[k])
286 |         forward_prob = -np.log(2) - np.log(K)
287 |         try:
288 |             backward_prob = -np.log(2) + proposal_probability(resid, obs, state.ssq_N, reverse_proposal, order)
289 |         except InstabilityError:
290 |             return state
291 |         except OverflowError:
292 |             return state
293 | 
294 |         new_U = np.hstack([state.U[:, :k], state.U[:, k+1:]])
295 |         new_V = np.vstack([state.V[:k, :], state.V[k+1:, :]])
296 |         new_ssq_U = np.concatenate([state.ssq_U[:k], state.ssq_U[k+1:]])
297 |         new_state = State(new_U, new_V, new_ssq_U, state.ssq_N)
298 | 
299 |         p_star_new = p_star(new_state, X, obs)
300 |         p_star_old = p_star(state, X, obs)
301 | 
302 |         ratio = p_star_new - p_star_old - forward_prob + backward_prob
303 |         assert np.isfinite(ratio)
304 |         if np.random.binomial(1, min(np.exp(ratio), 1)):
305 |             if VERBOSE:
306 |                 print 'Delete move accepted (ratio=%1.2f)' % ratio
307 |             return new_state
308 |         else:
309 |             if VERBOSE:
310 |                 print 'Delete move rejected (ratio=%1.2f)' % ratio
311 |             return state
312 |         
313 | 
314 | 
315 | NUM_ITER = 200
316 | 
317 | def init_state(data_matrix, K):
318 |     N, D = data_matrix.m, data_matrix.n
319 |     X = data_matrix.sample_latent_values(np.zeros((N, D)), 1.)
320 |     U = np.random.normal(0., 1. / np.sqrt(K), size=(N, K))
321 |     V = np.random.normal(0., 1., size=(K, D))
322 |     ssq_U = np.mean(U**2, axis=0)
323 | 
324 |     pred = np.dot(U, V)
325 |     if data_matrix.observations.fixed_variance():
326 |         ssq_N = 1.
327 |     else:
328 |         ssq_N = np.mean((X - pred) ** 2)
329 |     return X, State(U, V, ssq_U, ssq_N)
330 | 
331 | def fit_model(data_matrix, K=K_INIT, num_iter=NUM_ITER, name=None):
332 |     if SEED_0:
333 |         np.random.seed(0)
334 |     N, D = data_matrix.m, data_matrix.n
335 |     X, state = init_state(data_matrix, K)
336 | 
337 |     pbar = misc.pbar(num_iter)
338 | 
339 |     t0 = time.time()
340 |     for it in range(num_iter):
341 |         sample_U_V(state, X, data_matrix.observations.mask)
342 | 
343 |         old = np.dot(state.U, state.V)
344 |         givens_moves(state)
345 |         assert np.allclose(np.dot(state.U, state.V), old)
346 |         scaling_moves(state)
347 |         assert np.allclose(np.dot(state.U, state.V), old)
348 | 
349 |         state.ssq_U = sample_variance(state.U, 0)
350 |         pred = np.dot(state.U, state.V)
351 |         if not data_matrix.observations.fixed_variance():
352 |             state.ssq_N = sample_variance(X - pred, None, mask=data_matrix.observations.mask)
353 | 
354 |         X = data_matrix.sample_latent_values(pred, state.ssq_N)
355 | 
356 |         for i in range(10):
357 |             state = add_delete_move(state, X, data_matrix.observations.mask)
358 | 
359 |         if VERBOSE:
360 |             print 'K =', state.U.shape[1]
361 |             print 'ssq_N =', state.ssq_N
362 |             print 'X.var() =', X.var()
363 | 
364 |         #misc.print_dot(it+1, num_iter)
365 |         pbar.update(it)
366 | 
367 |         if time.time() - t0 > 3600.:   # 1 hour
368 |             break
369 | 
370 |     pbar.finish()
371 | 
372 |     return state, X
373 | 
374 | 
375 | 
376 | 


--------------------------------------------------------------------------------
/algorithms/slice_sampling.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | nax = np.newaxis
 3 | 
 4 | from utils import misc
 5 | 
 6 | MAX_ITER = 1000
 7 | 
 8 | def slice_sample(log_f, x0, L, U):
 9 |     assert L < x0 < U
10 |     log_y = log_f(x0) + np.log(np.random.uniform(0, 1))
11 | 
12 |     count = 0
13 |     while True:
14 |         x1 = np.random.uniform(L, U)
15 |         if log_f(x1) >= log_y:
16 |             return x1
17 | 
18 |         if x1 < x0:
19 |             L = x1
20 |         else:
21 |             U = x1
22 | 
23 |         count += 1
24 |         if count >= MAX_ITER:
25 |             raise RuntimeError('Exceeded maximum iterations for slice sampling')
26 | 
27 | 
28 | class GaussObj:
29 |     def __init__(self, log_f, mu, sigma_sq):
30 |         self.log_f = log_f
31 |         self.mu = mu
32 |         self.sigma_sq = sigma_sq
33 | 
34 |     def __call__(self, x):
35 |         return self.log_f(x) - 0.5 * (x - self.mu)**2 / self.sigma_sq
36 | 
37 | def slice_sample_gauss(log_f, mu, sigma_sq, x0):
38 |     sigma = np.sqrt(sigma_sq)
39 |     temp = (x0 - mu) / sigma
40 |     if not -4. <= temp <= 4.:
41 |         # If x takes an extreme value, scipy.special.erf may fail, so fall back to ordinary slice sampling.
42 |         # This isn't a valid sample, since it assumes a contiguous interval, which may not be the case.
43 |         # Hopefully this case doesn't arise too often.
44 |         return slice_sample(GaussObj(log_f, mu, sigma_sq), x0, x0 - 4. * sigma, x0 + 4. * sigma)
45 | 
46 |     L, U = 1e-10, 1. - 1e-10
47 |     p0 = misc.inv_probit((x0 - mu) / sigma)
48 |     log_y = log_f(x0) + np.log(np.random.uniform(0, 1))
49 | 
50 |     count = 0
51 |     while True:
52 |         p1 = np.random.uniform(L, U)
53 |         x1 = mu + misc.probit(p1) * sigma
54 |         if log_f(x1) >= log_y:
55 |             #if np.random.binomial(1, 0.001):
56 |             #    print 'Took %d iterations' % count
57 |             return x1
58 | 
59 |         if p1 < p0:
60 |             L = p1
61 |         else:
62 |             U = p1
63 | 
64 |         count += 1
65 |         if count >= MAX_ITER:
66 |             raise RuntimeError('Exceeded maximum iterations for slice sampling')
67 | 
68 | 
69 | 
70 | 


--------------------------------------------------------------------------------
/algorithms/sparse_coding.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | nax = np.newaxis
 3 | 
 4 | import slice_sampling
 5 | from utils import distributions
 6 | 
 7 | debugger = None
 8 | 
 9 | class SparseCodingState:
10 |     def __init__(self, S, A, Z, sigma_sq_N, mu_Z, sigma_sq_Z, sigma_sq_A):
11 |         self.S = S
12 |         self.A = A
13 |         self.Z = Z
14 |         self.sigma_sq_N = sigma_sq_N
15 |         self.mu_Z = mu_Z
16 |         self.sigma_sq_Z = sigma_sq_Z
17 |         self.sigma_sq_A = sigma_sq_A
18 | 
19 |     def copy(self):
20 |         if np.isscalar(self.mu_Z):
21 |             mu_Z = self.mu_Z
22 |         else:
23 |             mu_Z = self.mu_Z.copy()
24 |         return SparseCodingState(self.S.copy(), self.A.copy(), self.Z.copy(), self.sigma_sq_N, mu_Z,
25 |                                  self.sigma_sq_Z, self.sigma_sq_A)
26 | 
27 | 
28 | class LogFCollapsed:
29 |     def __init__(self, lam, h):
30 |         self.lam = lam
31 |         self.h = h
32 | 
33 |     def __call__(self, z):
34 |         sigma_sq = np.exp(z) + 1. / self.lam
35 |         mu = self.h / self.lam
36 | 
37 |         return -0.5 * np.log(sigma_sq) + \
38 |                -0.5 * mu ** 2 / sigma_sq
39 | 
40 | class LogFUncollapsed:
41 |     def __init__(self, s):
42 |         self.s = s
43 | 
44 |     def __call__(self, z):
45 |         return -0.5 * z + \
46 |                -0.5 * self.s ** 2 / np.exp(z)
47 | 
48 | 
49 | def cond_mu_Z(state, by_column=False):
50 |     if by_column:
51 |         mu = state.Z.mean(0)
52 |         sigma_sq = state.sigma_sq_Z / state.Z.shape[0] * np.ones(state.Z.shape[1])
53 |     else:
54 |         mu = state.Z.mean()
55 |         sigma_sq = state.sigma_sq_Z / state.Z.size
56 |     return distributions.GaussianDistribution(mu, sigma_sq)
57 | 
58 | def cond_sigma_sq_Z(state):
59 |     a = 1. + 0.5 * state.Z.size
60 |     b = 1. + 0.5 * np.sum((state.Z - state.mu_Z) ** 2)
61 |     return distributions.InverseGammaDistribution(a, b)
62 | 
63 | 
64 | def sample_Z(state):
65 |     N, K= state.S.shape[0], state.Z.shape[1]
66 |     for i in range(N):
67 |         for k in range(K):
68 |             log_f = LogFUncollapsed(state.S[i, k])
69 |             if np.isscalar(state.mu_Z):
70 |                 mu_Z = state.mu_Z
71 |             else:
72 |                 mu_Z = state.mu_Z[k]
73 |             state.Z[i, k] = slice_sampling.slice_sample_gauss(log_f, mu_Z, state.sigma_sq_Z, state.Z[i, k])
74 | 
75 |     if hasattr(debugger, 'after_sample_Z'):
76 |         debugger.after_sample_Z(vars())
77 |     
78 | 


--------------------------------------------------------------------------------
/algorithms/variational.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | nax = np.newaxis
  3 | import random
  4 | Random = random.Random()
  5 | import scipy.linalg, scipy.stats
  6 | 
  7 | from utils import misc
  8 | 
  9 | 
 10 | 
 11 | def perturb_simplex(q, eps=1e-5):
 12 |     eps = 1e-5
 13 |     k = q.size
 14 |     q = q.copy()
 15 |     for tr in range(10):
 16 |         large_inds = np.where(q > eps)[0]
 17 |         i = Random.choice(large_inds)
 18 |         j = np.random.randint(0, k)
 19 |         if i == j or q[j] > 1-eps:
 20 |             continue
 21 |         q[i] -= eps
 22 |         q[j] += eps
 23 |     return q
 24 | 
 25 | def perturb_psd(S, eps=1e-5):
 26 |     d, V = scipy.linalg.eigh(S)
 27 |     d *= np.exp(np.random.normal(0., eps, size=d.shape))
 28 |     return np.dot(np.dot(V, np.diag(d)), V.T)
 29 | 
 30 | def perturb_pos(x, eps=1e-5):
 31 |     return x * np.exp(np.random.normal(0., eps, size=x.shape))
 32 | 
 33 |     
 34 | 
 35 | ALPHA = 1.
 36 | class MultinomialEstimator:
 37 |     def __init__(self, pi, A):
 38 |         self.pi = pi
 39 |         self.nclass = pi.size
 40 |         self.A = A
 41 | 
 42 |     def expected_log_prob(self, rep):
 43 |         return np.dot(rep.q, np.log(self.pi))
 44 | 
 45 |     def fit_representation(self, t, Sigma_N, init=None):
 46 |         data_term = np.zeros(self.nclass)
 47 |         Lambda_N = np.linalg.inv(Sigma_N)
 48 |         for i in range(self.nclass):
 49 |             diff = t - self.A[i,:]
 50 |             #data_term[i] = -0.5 * np.sum(diff**2 / sigma_sq_N)
 51 |             data_term[i] = -0.5 * np.dot(np.dot(diff, Lambda_N), diff)
 52 |         log_q = np.log(self.pi) + data_term
 53 |         log_q -= np.logaddexp.reduce(log_q)
 54 |         q = np.exp(log_q)
 55 |         return MultinomialRepresentation(q)
 56 | 
 57 |     def init_representation(self):
 58 |         return MultinomialRepresentation(self.pi)
 59 | 
 60 |     @staticmethod
 61 |     def random(k, n):
 62 |         pi = np.random.uniform(0., 1., size=k)
 63 |         pi /= pi.sum()
 64 |         A = np.random.normal(size=(k, n))
 65 |         return MultinomialEstimator(pi, A)
 66 | 
 67 |     @staticmethod
 68 |     def random_u(k):
 69 |         u = np.random.uniform(0., 1., size=k)
 70 |         return u / u.sum()
 71 | 
 72 | class MultinomialRepresentation:
 73 |     def __init__(self, q):
 74 |         self.q = q
 75 |         assert np.allclose(np.sum(self.q), 1.)
 76 | 
 77 |     def expected_value(self):
 78 |         return self.q
 79 | 
 80 |     def covariance(self):
 81 |         return np.diag(self.q) - np.outer(self.q, self.q)
 82 | 
 83 |     def entropy(self):
 84 |         return scipy.stats.distributions.entropy(self.q)
 85 | 
 86 |     def sample(self):
 87 |         return np.random.multinomial(1, self.q)
 88 | 
 89 |     def perturb(self, eps):
 90 |         return MultinomialRepresentation(perturb_simplex(self.q, eps))
 91 | 
 92 | 
 93 | 
 94 | class BernoulliEstimator:
 95 |     def __init__(self, pi, A):
 96 |         self.pi = pi
 97 |         self.A = A
 98 |         self.nclass = self.pi.size
 99 | 
100 |     def expected_log_prob(self, rep):
101 |         return np.dot(rep.q, np.log(self.pi)) + np.dot(1-rep.q, np.log(1-self.pi))
102 | 
103 |     def fit_representation(self, t, Sigma_N, init=None):
104 |         Lambda_N = np.linalg.inv(Sigma_N)
105 |         J = -np.log(self.pi) + np.log(1. - self.pi) - np.dot(self.A, np.dot(Lambda_N, t))
106 |         Lambda = np.dot(np.dot(self.A, Lambda_N), self.A.T)
107 |         return BernoulliRepresentation(misc.mean_field(J, Lambda, init.q))
108 | 
109 |     def init_representation(self):
110 |         return BernoulliRepresentation(self.pi)
111 | 
112 |     @staticmethod
113 |     def random(k, n):
114 |         pi = np.random.uniform(0., 1., size=k)
115 |         A = np.random.normal(size=(k, n))
116 |         return BernoulliEstimator(pi, A)
117 | 
118 |     @staticmethod
119 |     def random_u(k):
120 |         return np.random.uniform(0., 1., size=k)
121 | 
122 | class BernoulliRepresentation:
123 |     def __init__(self, q):
124 |         self.q = q
125 |         
126 |     def expected_value(self):
127 |         return self.q
128 | 
129 |     def covariance(self):
130 |         return np.diag(self.q * (1. - self.q))
131 | 
132 |     def entropy(self):
133 |         #return misc.bernoulli_entropy(self.q) * np.log(2)
134 |         return np.sum([scipy.stats.distributions.entropy([p, 1.-p]) for p in self.q])
135 | 
136 |     def sample(self):
137 |         return np.random.binomial(1, self.q)
138 | 
139 |     def perturb(self, eps):
140 |         q = np.clip(np.random.normal(self.q, eps), 0., 1.)
141 |         return BernoulliRepresentation(q)
142 | 
143 | 
144 | 
145 | class VariationalProblem:
146 |     def __init__(self, estimators, x, Sigma_N):
147 |         self.estimators = estimators
148 |         self.x = x
149 |         self.nterms = len(estimators)
150 |         self.nfea = self.x.size
151 |         self.Sigma_N = Sigma_N
152 |         assert Sigma_N.shape == (x.size, x.size)
153 |         
154 |     def objective_function(self, reps, collapse_z=False):
155 |         assert len(reps) == self.nterms
156 | 
157 |         fobj = 0.
158 |         m = np.zeros(self.nfea)
159 |         S = np.zeros((self.nfea, self.nfea))
160 |         for estimator, rep in zip(self.estimators, reps):
161 |             # E[log P(u|U)]
162 |             fobj += estimator.expected_log_prob(rep)
163 |             
164 |             # H(q)
165 |             fobj += rep.entropy()
166 |             
167 |             # sufficient statistics
168 |             m += np.dot(estimator.A.T, rep.expected_value())
169 |             S += misc.mult([estimator.A.T, rep.covariance(), estimator.A])
170 | 
171 |         Lambda_N = np.linalg.inv(self.Sigma_N)
172 | 
173 |         fobj += -0.5 * self.nfea * np.log(2*np.pi) - 0.5 * misc.logdet(self.Sigma_N)
174 |         diff = self.x - m
175 |         fobj += -0.5 * np.dot(np.dot(diff, Lambda_N), diff)
176 |         fobj += -0.5 * np.sum(S * Lambda_N)
177 | 
178 |         return fobj
179 | 
180 |     def update_one(self, reps, i):
181 |         reps = reps[:] # make copy
182 |         m = np.zeros(self.nfea)
183 |         for j, estimator in enumerate(self.estimators):
184 |             if i == j:
185 |                 continue
186 |             m += np.dot(estimator.A.T, reps[j].expected_value())
187 | 
188 |         t = self.x - m
189 |         reps[i] = self.estimators[i].fit_representation(t, self.Sigma_N, reps[i])
190 |         return reps
191 | 
192 |     def update_all(self, reps):
193 |         for i in range(self.nterms):
194 |             reps = self.update_one(reps, i)
195 |         return reps
196 | 
197 |     def solve(self):
198 |         if len(self.estimators) <= 1:
199 |             NUM_ITER = 1
200 |         else:
201 |             NUM_ITER = 10
202 |         reps = [estimator.init_representation() for estimator in self.estimators]
203 |         for it in range(NUM_ITER):
204 |             reps = self.update_all(reps)
205 |         return reps
206 | 
207 | 


--------------------------------------------------------------------------------
/config_example.py:
--------------------------------------------------------------------------------
 1 | # experiment directories
 2 | RESULTS_PATH = '/path/to/results'
 3 | CACHE_PATH = '/path/to/cached'
 4 | REPORT_PATH = '/path/to/reports'
 5 | 
 6 | # 'single_process' to run in a single process, 'parallel' to use GNU Parallel
 7 | SCHEDULER = 'single_process'
 8 | 
 9 | # additional options for GNU Parallel
10 | DEFAULT_NUM_JOBS = 2
11 | JOBS_PATH = '/path/to/job_info'
12 | 


--------------------------------------------------------------------------------
/example.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | from experiments import init_experiment, QuickParams
 4 | from observations import DataMatrix
 5 | 
 6 | ###
 7 | ### First follow the configuration directions in README.md. Then run the following:
 8 | ###
 9 | ###     python example.py
10 | ###     python experiments.py everything example
11 | ###
12 | 
13 | def read_array(fname):
14 |     return np.array([map(float, line.split()) for line in open(fname)])
15 | 
16 | def read_list(fname):
17 |     return map(str.strip, open(fname).readlines())
18 | 
19 | def init():
20 |     X = read_array('example_data/animals-data.txt')
21 |     row_labels = read_list('example_data/animals-names.txt')
22 |     col_labels = read_list('example_data/animals-features.txt')
23 | 
24 |     # normalize to zero mean, unit variance
25 |     X -= X.mean()
26 |     X /= X.std()
27 | 
28 |     # since the data were binary, add a small amount of noise to prevent degeneracy
29 |     X = np.random.normal(X, np.sqrt(0.1))
30 | 
31 |     data_matrix = DataMatrix.from_real_values(X, row_labels=row_labels, col_labels=col_labels)
32 |     init_experiment('example', data_matrix, QuickParams(search_depth=2))
33 |                                                            
34 | if __name__ == '__main__':
35 |     init()
36 | 
37 | 
38 | 
39 | 


--------------------------------------------------------------------------------
/example_data/animals-features.txt:
--------------------------------------------------------------------------------
 1 | black
 2 | white
 3 | blue
 4 | brown
 5 | gray
 6 | orange
 7 | red
 8 | yellow
 9 | patches
10 | spots
11 | stripes
12 | furry
13 | hairless
14 | toughskin
15 | big
16 | small
17 | bulbous
18 | lean
19 | flippers
20 | hands
21 | hooves
22 | pads
23 | paws
24 | longleg
25 | longneck
26 | tail
27 | chewteeth
28 | meatteeth
29 | buckteeth
30 | strainteeth
31 | horns
32 | claws
33 | tusks
34 | smelly
35 | flys
36 | hops
37 | swims
38 | tunnels
39 | walks
40 | fast
41 | slow
42 | strong
43 | weak
44 | muscle
45 | bipedal
46 | quadrapedal
47 | active
48 | inactive
49 | nocturnal
50 | hibernate
51 | agility
52 | fish
53 | meat
54 | plankton
55 | vegetation
56 | insects
57 | forager
58 | grazer
59 | hunter
60 | scavenger
61 | skimmer
62 | stalker
63 | newworld
64 | oldworld
65 | arctic
66 | coastal
67 | desert
68 | bush
69 | plains
70 | forest
71 | fields
72 | jungle
73 | mountains
74 | ocean
75 | ground
76 | water
77 | tree
78 | cave
79 | fierce
80 | timid
81 | smart
82 | group
83 | solitary
84 | nestspot
85 | domestic
86 | 


--------------------------------------------------------------------------------
/example_data/animals-names.txt:
--------------------------------------------------------------------------------
 1 | antelope
 2 | grizzly bear
 3 | killer whale
 4 | beaver
 5 | dalmatian
 6 | persian cat
 7 | horse
 8 | german shepherd
 9 | blue whale
10 | siamese cat
11 | skunk
12 | mole
13 | tiger
14 | hippopotamus
15 | leopard
16 | moose
17 | spider monkey
18 | humpback whale
19 | elephant
20 | gorilla
21 | ox
22 | fox
23 | sheep
24 | seal
25 | chimpanzee
26 | hamster
27 | squirrel
28 | rhinoceros
29 | rabbit
30 | bat
31 | giraffe
32 | wolf
33 | chihuahua
34 | rat
35 | weasel
36 | otter
37 | buffalo
38 | zebra
39 | giant panda
40 | deer
41 | bobcat
42 | pig
43 | lion
44 | mouse
45 | polar bear
46 | collie
47 | walrus
48 | raccoon
49 | cow
50 | dolphin
51 | 


--------------------------------------------------------------------------------
/grammar.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import parsing
  3 | 
  4 | START = 'g'
  5 | 
  6 | PRODUCTION_RULES = {'low-rank':          [('g',     ('+', ('*', 'g', 'g'), 'g'))],
  7 |                 
  8 |                     'clustering':        [('g',     ('+', ('*', 'm', 'g'), 'g')),
  9 |                                           ('g',     ('+', ('*', 'g', 'M'), 'g'))],
 10 |                     
 11 |                     'binary':            [('g',     ('+', ('*', 'b', 'g'), 'g')),
 12 |                                           ('g',     ('+', ('*', 'g', 'B'), 'g'))],
 13 |                     
 14 |                     'chain':             [('g',     ('+', ('*', 'c', 'g'), 'g')),
 15 |                                           ('g',     ('+', ('*', 'g', 'C'), 'g'))],
 16 |                     
 17 |                     'sparsity':          [('g',     ('s', 'g'))],
 18 |                     
 19 |                     'expand-disc':       [('m',     ('+', ('*', 'm', 'g'), 'g')),
 20 |                                           ('M',     ('+', ('*', 'g', 'M'), 'g')),
 21 |                                           ('b',     ('+', ('*', 'b', 'g'), 'g')),
 22 |                                           ('B',     ('+', ('*', 'g', 'B'), 'g'))],
 23 |                     
 24 |                     'm-to-b':            [('m',     'b')],
 25 |                     }
 26 | 
 27 | 
 28 | def is_valid(structure):
 29 |     if type(structure) == str and structure != 'g':
 30 |         return False
 31 |     if type(structure) == tuple and structure[0] == 's':
 32 |         return False
 33 |     return True
 34 | 
 35 | def list_successors_helper(structure, rule_names, is_noise, expand_noise=True):
 36 |     rules = reduce(list.__add__, [PRODUCTION_RULES[rn] for rn in rule_names])
 37 | 
 38 |     if is_noise and not expand_noise:
 39 |         return []
 40 |     
 41 |     if type(structure) == str:
 42 |         return [rhs for lhs, rhs in rules if lhs == structure]
 43 |     
 44 |     successors = []
 45 |     for pos in range(len(structure)):
 46 |         is_noise = (structure[0] == '+' and pos == len(structure) - 1)
 47 |         for child_succ in list_successors_helper(structure[pos], rule_names, is_noise, expand_noise):
 48 |             if is_noise and type(child_succ) == tuple and child_succ[0] == 's':
 49 |                 continue
 50 |             successors.append(structure[:pos] + (child_succ,) + structure[pos+1:])
 51 |     return successors
 52 | 
 53 | def list_successors(structure, rules, expand_noise=True):
 54 |     successors = list_successors_helper(structure, rules, False, expand_noise)
 55 |     return filter(is_valid, successors)
 56 | 
 57 | def collapse_sums(structure):
 58 |     if type(structure) == str:
 59 |         return structure
 60 |     elif structure[0] == '+':
 61 |         new_structure = ('+',)
 62 |         for s_ in structure[1:]:
 63 |             s = collapse_sums(s_)
 64 |             if type(s) == tuple and s[0] == '+':
 65 |                 new_structure = new_structure + s[1:]
 66 |             else:
 67 |                 new_structure = new_structure + (s,)
 68 |         return new_structure
 69 |     else:
 70 |         return tuple([collapse_sums(s) for s in structure])
 71 | 
 72 | def list_collapsed_successors(structure, rule_names, expand_noise=True):
 73 |     return [collapse_sums(s) for s in list_successors_helper(structure, rule_names, False, expand_noise)
 74 |             if is_valid(collapse_sums(s))]
 75 | 
 76 | def pretty_print(structure, spaces=True, quotes=True):
 77 |     if spaces:
 78 |         PLUS = ' + '
 79 |     else:
 80 |         PLUS = '+'
 81 |     
 82 |     if type(structure) == str:
 83 |         if structure.isupper() and quotes:
 84 |             return structure.lower() + "'"
 85 |         else:
 86 |             return structure
 87 |     elif structure[0] == '+':
 88 |         parts = [pretty_print(s, spaces, quotes) for s in structure[1:]]
 89 |         return PLUS.join(parts)
 90 |     elif structure[0] == 's':
 91 |         return 's(%s)' % pretty_print(structure[1], spaces, quotes)
 92 |     else:
 93 |         assert structure[0] == '*'
 94 |         parts = []
 95 |         for s in structure[1:]:
 96 |             if type(s) == str or s[0] == '*' or s[0] == 's':
 97 |                 parts.append(pretty_print(s, spaces, quotes))
 98 |             else:
 99 |                 parts.append('(' + pretty_print(s, spaces, quotes) + ')')
100 |         return ''.join(parts)
101 | 
102 | def list_derivations(depth, do_print=False):
103 |     derivations = [['g']]
104 |     for i in range(depth):
105 |         new_derivations = []
106 |         for d in derivations:
107 |             new_derivations += [d + [s] for s in list_successors(d[-1])]
108 |         derivations = new_derivations
109 | 
110 |     for d in derivations:
111 |         if do_print:
112 |             print [pretty_print(s) for s in d]
113 | 
114 |     return derivations
115 | 
116 | def list_structures(depth):
117 |     full = set()
118 |     for i in range(1, depth+1):
119 |         derivations = list_derivations(depth, False)
120 |         full.update(set([d[-1] for d in derivations]))
121 |     return full
122 | 
123 | 
124 | 
125 | 
126 | def parse(string):
127 |     structure = parsing.parse(string)
128 |     return collapse_sums(structure)
129 |     
130 | 
131 | 


--------------------------------------------------------------------------------
/initialization.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | nax = np.newaxis
  3 | 
  4 | from algorithms import low_rank_poisson, crp, ibp, sparse_coding, chains
  5 | import grammar
  6 | import observations
  7 | import recursive
  8 | from utils import misc
  9 | 
 10 | debugger = None
 11 | 
 12 | 
 13 | def init_low_rank(data_matrix, num_iter=200):
 14 |     m, n = data_matrix.m, data_matrix.n
 15 |     state, X = low_rank_poisson.fit_model(data_matrix, 2, num_iter=num_iter)
 16 |     U, V, ssq_U, ssq_N = state.U, state.V, state.ssq_U, state.ssq_N
 17 | 
 18 |     U /= ssq_U[nax, :] ** 0.25
 19 |     V *= ssq_U[:, nax] ** 0.25
 20 | 
 21 |     left = recursive.GaussianNode(U, 'col', np.sqrt(ssq_U))
 22 |     
 23 |     right = recursive.GaussianNode(V, 'row', np.sqrt(ssq_U))
 24 | 
 25 |     pred = np.dot(U, V)
 26 |     X = data_matrix.sample_latent_values(pred, ssq_N)
 27 |     noise = recursive.GaussianNode(X - pred, 'scalar', ssq_N)
 28 | 
 29 |     return recursive.SumNode([recursive.ProductNode([left, right]), noise])
 30 | 
 31 | def init_row_clustering(data_matrix, isotropic, num_iter=200):
 32 |     m, n = data_matrix.m, data_matrix.n
 33 |     state = crp.fit_model(data_matrix, isotropic_w=isotropic, isotropic_b=isotropic, num_iter=num_iter)
 34 | 
 35 |     U = np.zeros((m, state.assignments.max() + 1), dtype=int)
 36 |     U[np.arange(m), state.assignments] = 1
 37 |     left = recursive.MultinomialNode(U)
 38 | 
 39 |     if isotropic:
 40 |         right = recursive.GaussianNode(state.centers, 'scalar', state.sigma_sq_b)
 41 |     else:
 42 |         right = recursive.GaussianNode(state.centers, 'col', state.sigma_sq_b)
 43 |     
 44 |     pred = state.centers[state.assignments, :]
 45 |     X = data_matrix.sample_latent_values(pred, state.sigma_sq_w * np.ones((m, n)))
 46 |     if isotropic:
 47 |         noise = recursive.GaussianNode(X - pred, 'scalar', state.sigma_sq_w)
 48 |     else:
 49 |         noise = recursive.GaussianNode(X - pred, 'col', state.sigma_sq_w)
 50 |     
 51 |     return recursive.SumNode([recursive.ProductNode([left, right]), noise])
 52 | 
 53 | def init_col_clustering(data_matrix, isotropic, num_iter=200):
 54 |     return init_row_clustering(data_matrix.transpose(), isotropic, num_iter=num_iter).transpose()
 55 | 
 56 | def init_row_binary(data_matrix, num_iter=200):
 57 |     state = ibp.fit_model(data_matrix, num_iter=num_iter)
 58 | 
 59 |     left = recursive.BernoulliNode(state.Z)
 60 |     
 61 |     right = recursive.GaussianNode(state.A, 'scalar', state.sigma_sq_f)
 62 |     
 63 |     pred = np.dot(state.Z, state.A)
 64 |     X = data_matrix.sample_latent_values(pred, state.sigma_sq_n)
 65 |     noise = recursive.GaussianNode(X - pred, 'scalar', state.sigma_sq_n)
 66 |     
 67 |     return recursive.SumNode([recursive.ProductNode([left, right]), noise])
 68 | 
 69 | def init_col_binary(data_matrix, num_iter=200):
 70 |     return init_row_binary(data_matrix.transpose(), num_iter=num_iter).transpose()
 71 | 
 72 | def init_row_chain(data_matrix, num_iter=200):
 73 |     states, sigma_sq_D, sigma_sq_N = chains.fit_model(data_matrix, num_iter=num_iter)
 74 | 
 75 |     integ = chains.integration_matrix(data_matrix.m_orig)[data_matrix.row_ids, :]
 76 |     left = recursive.IntegrationNode(integ)
 77 |     
 78 |     temp = np.vstack([states[0, :][nax, :],
 79 |                       states[1:, :] - states[:-1, :]])
 80 |     right = recursive.GaussianNode(temp, 'scalar', sigma_sq_D)
 81 | 
 82 |     pred = states[data_matrix.row_ids, :]
 83 |     X = data_matrix.sample_latent_values(pred, sigma_sq_N)
 84 |     noise = recursive.GaussianNode(X - pred, 'scalar', sigma_sq_N)
 85 | 
 86 |     return recursive.SumNode([recursive.ProductNode([left, right]), noise])
 87 | 
 88 | def init_col_chain(data_matrix, num_iter=200):
 89 |     return init_row_chain(data_matrix.transpose(), num_iter=num_iter).transpose()
 90 | 
 91 | def init_sparsity(data_matrix, mu_Z_mode, num_iter=200):
 92 |     if mu_Z_mode == 'row':
 93 |         return init_sparsity(data_matrix.transpose(), 'col', num_iter).transpose()
 94 |     elif mu_Z_mode == 'col':
 95 |         by_column = True
 96 |     elif mu_Z_mode == 'scalar':
 97 |         by_column = False
 98 | 
 99 |     # currently, data_matrix should always be real-valued with no missing values, so this just
100 |     # passes on data_matrix.observations.values; we may want to replace it with interval observations
101 |     # obtained from slice sampling
102 |     S = data_matrix.sample_latent_values(np.zeros((data_matrix.m, data_matrix.n)),
103 |                                          np.ones((data_matrix.m, data_matrix.n)))
104 |     
105 |     Z = np.random.normal(-1., 1., size=S.shape)
106 | 
107 |     # sparse_coding.py wants a full sparse coding problem, so pass in None for the things
108 |     # that aren't relevant here
109 |     state = sparse_coding.SparseCodingState(S, None, Z, None, -1., 1., None)
110 | 
111 |     pbar = misc.pbar(num_iter)
112 |     for i in range(num_iter):
113 |         sparse_coding.sample_Z(state)
114 |         state.mu_Z = sparse_coding.cond_mu_Z(state, by_column).sample()
115 |         state.sigma_sq_Z = sparse_coding.cond_sigma_sq_Z(state).sample()
116 | 
117 |         if hasattr(debugger, 'after_init_sparsity_iter'):
118 |             debugger.after_init_sparsity_iter(locals())
119 | 
120 |         pbar.update(i)
121 |     pbar.finish()
122 | 
123 |     scale_node = recursive.GaussianNode(state.Z, 'scalar', state.sigma_sq_Z)
124 |     return recursive.GSMNode(state.S, scale_node, mu_Z_mode, state.mu_Z)
125 | 
126 |     
127 | 
128 | def initialize(data_matrix, root, old_structure, new_structure, num_iter=200):
129 |     root = root.copy()
130 |     if old_structure == new_structure:
131 |         return root
132 |     node, old_dist, rule = recursive.find_changed_node(root, old_structure, new_structure)
133 | 
134 |     old = root.value()
135 | 
136 |     # if we're replacing the root, pass on the observation model; otherwise, treat
137 |     # the node we're factorizing as exact real-valued observations
138 |     if node is root:
139 |         inner_data_matrix = data_matrix
140 |     else:
141 |         row_ids = recursive.row_ids_for(data_matrix, node)
142 |         col_ids = recursive.col_ids_for(data_matrix, node)
143 |         m_orig, n_orig = recursive.orig_shape_for(data_matrix, node)
144 |         frv = observations.DataMatrix.from_real_values
145 |         inner_data_matrix = frv(node.value(), row_ids=row_ids, col_ids=col_ids,
146 |                                 m_orig=m_orig, n_orig=n_orig)
147 | 
148 |     print 'Initializing %s from %s...' % (grammar.pretty_print(new_structure), grammar.pretty_print(old_structure))
149 | 
150 |     if rule == grammar.parse("gg+g"):
151 |         new_node = init_low_rank(inner_data_matrix, num_iter=num_iter)
152 |     elif rule == grammar.parse("mg+g"):
153 |         isotropic = (node is root)
154 |         new_node = init_row_clustering(inner_data_matrix, isotropic, num_iter=num_iter)
155 |     elif rule == grammar.parse("gM+g"):
156 |         isotropic = (node is root)
157 |         new_node = init_col_clustering(inner_data_matrix, isotropic, num_iter=num_iter)
158 |     elif rule == grammar.parse("bg+g"):
159 |         new_node = init_row_binary(inner_data_matrix, num_iter=num_iter)
160 |     elif rule == grammar.parse("gB+g"):
161 |         new_node = init_col_binary(inner_data_matrix, num_iter=num_iter)
162 |     elif rule == grammar.parse("cg+g"):
163 |         new_node = init_row_chain(inner_data_matrix, num_iter=num_iter)
164 |     elif rule == grammar.parse("gC+g"):
165 |         new_node = init_col_chain(inner_data_matrix, num_iter=num_iter)
166 |     elif rule == grammar.parse("s(g)"):
167 |         new_node = init_sparsity(inner_data_matrix, node.variance_type, num_iter=num_iter)
168 |     else:
169 |         raise RuntimeError('Unknown production rule: %s ==> %s' % (grammar.pretty_print(old_dist),
170 |                                                                    grammar.pretty_print(rule)))
171 | 
172 |     root = recursive.splice(root, node, new_node)
173 | 
174 |     if isinstance(data_matrix.observations, observations.RealObservations):
175 |         assert np.allclose(root.value()[data_matrix.observations.mask], old[data_matrix.observations.mask])
176 | 
177 |     return root
178 | 
179 | 
180 | 
181 | 
182 | 


--------------------------------------------------------------------------------
/models.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | nax = np.newaxis
  3 | 
  4 | import recursive
  5 | 
  6 | class Leaf:
  7 |     def __init__(self, left_side, right_side, fixed):
  8 |         self.left_side = left_side
  9 |         self.right_side = right_side
 10 |         self.fixed = fixed
 11 |         self.children = []
 12 | 
 13 |     def structure(self):
 14 |         return self.distribution()
 15 | 
 16 |     def transpose(self):
 17 |         return self.transpose_class()(self.right_side, self.left_side)
 18 | 
 19 |     def display(self, indent=0):
 20 |         s = self.__class__.__name__
 21 |         if self.fixed:
 22 |             s += ', fixed'
 23 |         s = ' ' * indent + s
 24 |         if hasattr(self, 'id'):
 25 |             s = '(%2d)  ' % self.id + s
 26 |         print s
 27 | 
 28 |     def dummy(self):
 29 |         return self.node_class().dummy()
 30 | 
 31 | class Gaussian(Leaf):
 32 |     def __init__(self, variance_type, fixed_variance, left_side, right_side, fixed):
 33 |         Leaf.__init__(self, left_side, right_side, fixed)
 34 |         if variance_type not in ['row', 'col', 'scalar']:
 35 |             raise RuntimeError('Unknown variance type: %s' % variance_type)
 36 |         self.variance_type = variance_type
 37 |         self.fixed_variance = fixed_variance
 38 | 
 39 |     def distribution(self):
 40 |         return 'g'
 41 | 
 42 |     def transpose_class(self):
 43 |         return Gaussian
 44 | 
 45 |     def node_class(self):
 46 |         return recursive.GaussianNode
 47 | 
 48 |     def transpose(self):
 49 |         if self.variance_type == 'row':
 50 |             variance_type = 'col'
 51 |         elif self.variance_type == 'col':
 52 |             variance_type = 'row'
 53 |         elif self.variance_type == 'scalar':
 54 |             variance_type = 'scalar'
 55 |         return Gaussian(variance_type, self.fixed_variance, self.right_side, self.left_side)
 56 |     
 57 |     def display(self, indent=0):
 58 |         s = 'Gaussian, %s' % self.variance_type
 59 |         if self.fixed_variance:
 60 |             s += ', fixed_variance'
 61 |         if self.fixed:
 62 |             s += ', fixed'
 63 |         s = ' ' * indent + s
 64 |         if hasattr(self, 'id'):
 65 |             s = '(%2d)  ' % self.id + s
 66 |         print s
 67 | 
 68 |     def dummy(self):
 69 |         return recursive.GaussianNode.dummy(self.variance_type)
 70 | 
 71 | class Multinomial(Leaf):
 72 |     def distribution(self):
 73 |         return 'm'
 74 | 
 75 |     def transpose_class(self):
 76 |         return MultinomialT
 77 | 
 78 |     def node_class(self):
 79 |         return recursive.MultinomialNode
 80 | 
 81 | class MultinomialT(Leaf):
 82 |     def distribution(self):
 83 |         return 'M'
 84 | 
 85 |     def transpose_class(self):
 86 |         return Multinomial
 87 | 
 88 |     def node_class(self):
 89 |         return recursive.MultinomialTNode
 90 | 
 91 | class Bernoulli(Leaf):
 92 |     def distribution(self):
 93 |         return 'b'
 94 | 
 95 |     def transpose_class(self):
 96 |         return BernoulliT
 97 | 
 98 |     def node_class(self):
 99 |         return recursive.BernoulliNode
100 | 
101 | class BernoulliT(Leaf):
102 |     def distribution(self):
103 |         return 'B'
104 | 
105 |     def transpose_class(self):
106 |         return Bernoulli
107 | 
108 |     def node_class(self):
109 |         return recursive.BernoulliTNode
110 | 
111 | class Integration(Leaf):
112 |     def distribution(self):
113 |         return 'c'
114 | 
115 |     def transpose_class(self):
116 |         return IntegrationT
117 | 
118 |     def node_class(self):
119 |         return recursive.IntegrationNode
120 | 
121 | 
122 | 
123 | class IntegrationT(Leaf):
124 |     def distribution(self):
125 |         return 'C'
126 | 
127 |     def transpose_class(self):
128 |         return Integration
129 | 
130 |     def node_class(self):
131 |         return recursive.IntegrationTNode
132 | 
133 | class GSM:
134 |     def __init__(self, left_side, right_side, fixed, scale_node, bias_type):
135 |         self.left_side = left_side
136 |         self.right_side = right_side
137 |         self.fixed = fixed
138 |         self.scale_node = scale_node
139 |         if bias_type not in ['row', 'col', 'scalar']:
140 |             raise RuntimeError('Unknown bias type: %s' % bias_type)
141 |         self.bias_type = bias_type
142 |         self.children = [self.scale_node]
143 | 
144 |     def structure(self):
145 |         return ('s', self.scale_node.structure())
146 | 
147 |     def transpose(self):
148 |         return GSM(self.scale_node.transpose())
149 | 
150 |     def display(self, indent=0):
151 |         s = ' ' * indent + 'GSM'
152 |         if hasattr(self, 'id'):
153 |             s = '(%2d)  ' % self.id + s
154 |         print s
155 | 
156 |         self.scale_node.display(indent + 4)
157 | 
158 |     def dummy(self):
159 |         value = np.zeros((5, 5))
160 |         if self.bias_type in ['row', 'col']:
161 |             bias = np.zeros(5)
162 |         else:
163 |             bias = 0.
164 |         return recursive.GSMNode(value, self.scale_node.dummy(), self.bias_type, bias)
165 | 
166 | 
167 | class Sum:
168 |     def __init__(self, children, left_side, right_side, fixed):
169 |         self.children = children
170 |         self.left_side = left_side
171 |         self.right_side = right_side
172 |         self.fixed = fixed
173 | 
174 |     def structure(self):
175 |         return ('+',) + tuple([c.structure() for c in self.children])
176 | 
177 |     def transpose(self):
178 |         return Sum([c.transpose() for c in self.children], self.right_side, self.left_side)
179 | 
180 |     def display(self, indent=0):
181 |         s = ' ' * indent + 'Sum'
182 |         if hasattr(self, 'id'):
183 |             s = '(%2d)  ' % self.id + s
184 |         print s
185 | 
186 |         for c in self.children:
187 |             c.display(indent + 4)
188 | 
189 |     def dummy(self):
190 |         return recursive.SumNode([c.dummy() for c in self.children])
191 |     
192 | 
193 | class Product:
194 |     def __init__(self, left, right, left_side, right_side, fixed):
195 |         self.left = left
196 |         self.right = right
197 |         self.children = [left, right]
198 |         self.left_side = left_side
199 |         self.right_side = right_side
200 |         self.fixed = fixed
201 | 
202 |     def structure(self):
203 |         return ('*',) + tuple([self.left.structure(), self.right.structure()])
204 | 
205 |     def transpose(self):
206 |         return Product(self.right.transpose(), self.left.transpose(), self.obs.T.copy())
207 | 
208 |     def display(self, indent=0):
209 |         s = ' ' * indent + 'Product'
210 |         if hasattr(self, 'id'):
211 |             s = '(%2d)  ' % self.id + s
212 |         print s
213 |         
214 |         for c in [self.left, self.right]:
215 |             c.display(indent + 4)
216 | 
217 |     def dummy(self):
218 |         return recursive.ProductNode([self.left.dummy(), self.right.dummy()])
219 | 
220 | 
221 | def continuous_left(structure):
222 |     if type(structure) == str:
223 |         return structure in ['g', 's', 'k']
224 |     elif type(structure) == tuple and structure[0] == '+':
225 |         return any([continuous_left(c) for c in structure[1:]])
226 |     elif type(structure) == tuple and structure[0] == '*':
227 |         assert len(structure) == 3
228 |         return continuous_left(structure[1])
229 |     elif type(structure) == tuple and structure[0] == 's':
230 |         return True
231 |     else:
232 |         raise RuntimeError('Invalid structure: %s' % structure)
233 | 
234 | def continuous_right(structure):
235 |     if type(structure) == str:
236 |         return structure == 'g'
237 |     elif type(structure) == tuple and structure[0] == '+':
238 |         return any([continuous_right(c) for c in structure[1:]])
239 |     elif type(structure) == tuple and structure[0] == '*':
240 |         assert len(structure) == 3
241 |         return continuous_right(structure[2])
242 |     elif type(structure) == tuple and structure[0] == 's':
243 |         return True
244 |     else:
245 |         raise RuntimeError('Invalid structure: %s' % str(structure))
246 |     
247 |     
248 | 
249 | 
250 | dist2class = {'g': Gaussian,
251 |               'm': Multinomial,
252 |               'M': MultinomialT,
253 |               'b': Bernoulli,
254 |               'B': BernoulliT,
255 |               'c': Integration,
256 |               'C': IntegrationT,
257 |               }
258 | 
259 | def get_model_helper(structure, left_side, right_side, fixed, fixed_variance, variance_type):
260 |     if type(structure) == str:
261 |         if structure == 'g':
262 |             return Gaussian(variance_type, fixed_variance, left_side, right_side, fixed)
263 |         else:
264 |             return dist2class[structure](left_side, right_side, fixed)
265 | 
266 |     elif type(structure) == tuple and structure[0] == '+':
267 |         child_models = [get_model_helper(s, left_side, right_side, False, fixed_variance, variance_type)
268 |                         for s in structure[1:]]
269 |         return Sum(child_models, left_side, right_side, fixed)
270 |         
271 |     elif type(structure) == tuple and structure[0] == '*':
272 |         assert len(structure) == 3
273 |         
274 |         iv = continuous_right(structure[1]) and continuous_left(structure[2])
275 |         if iv:
276 |             left_variance_type = 'col'
277 |             right_variance_type = 'row'
278 |         else:
279 |             left_variance_type = right_variance_type = 'scalar'
280 |             
281 |         left_fixed = (structure[2] == 'C')
282 |         right_fixed = (structure[1] == 'c')
283 |         
284 |         left = get_model_helper(structure[1], left_side, False, left_fixed, left_fixed, left_variance_type)
285 |         right = get_model_helper(structure[2], False, right_side, right_fixed, right_fixed, right_variance_type)
286 |         return Product(left, right, left_side, right_side, fixed)
287 | 
288 |     elif type(structure) == tuple and structure[0] == 's':
289 |         assert len(structure) == 2
290 | 
291 |         scale_node = get_model_helper(structure[1], left_side, right_side, False, False, 'scalar')
292 |         return GSM(left_side, right_side, fixed, scale_node, variance_type)
293 |     
294 |     else:
295 |         raise RuntimeError('Invalid structure: %s' % structure)
296 |             
297 | 
298 | def assign_ids(model_node, next_id=1):
299 |     model_node.id = next_id
300 |     next_id += 1
301 |     for child in model_node.children:
302 |         next_id = assign_ids(child, next_id)
303 |     return next_id
304 | 
305 | 
306 | def get_model(structure, fixed_noise_variance=False):
307 |     model = get_model_helper(structure, True, True, False, fixed_noise_variance, 'scalar')
308 |     assign_ids(model)
309 |     return model
310 | 
311 | def align(node, model_node):
312 |     assert node.model is None
313 |     node.model = model_node
314 |     for nchild, mchild in zip(node.children, model_node.children):
315 |         align(nchild, mchild)
316 |     
317 | 


--------------------------------------------------------------------------------
/observations.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | nax = np.newaxis
  3 | 
  4 | from utils import distributions, misc
  5 | 
  6 | class DataMatrix:
  7 |     def __init__(self, observations, row_ids=None, col_ids=None, row_labels=None, col_labels=None,
  8 |                  m_orig=None, n_orig=None):
  9 |         self.m, self.n = observations.shape
 10 |         self.observations = observations
 11 |         
 12 |         if row_ids is None:                    # indices from the original matrix (used for chain models)
 13 |             row_ids = np.arange(self.m)
 14 |         self.row_ids = np.array(row_ids)
 15 |         if col_ids is None:
 16 |             col_ids = np.arange(self.n)
 17 |         self.col_ids = np.array(col_ids)
 18 |         
 19 |         if row_labels is None:                 # e.g. entity or attribute names
 20 |             row_labels = range(self.m)
 21 |         self.row_labels = list(row_labels)     # make sure it's not an array
 22 |         if col_labels is None:
 23 |             col_labels = range(self.n)
 24 |         self.col_labels = list(col_labels)
 25 |         
 26 |         if m_orig is None:                     # size of the original matrix (used for chain models)
 27 |             m_orig = self.m
 28 |         self.m_orig = m_orig
 29 |         if n_orig is None:
 30 |             n_orig = self.n
 31 |         self.n_orig = n_orig
 32 | 
 33 |     def transpose(self):
 34 |         return DataMatrix(self.observations.transpose(), self.col_ids, self.row_ids, self.col_labels, self.row_labels,
 35 |                           self.n_orig, self.m_orig)
 36 | 
 37 |     def copy(self):
 38 |         return DataMatrix(self.observations.copy(), self.row_ids.copy(), self.col_ids.copy(), list(self.row_labels),
 39 |                           list(self.col_labels), self.n_orig, self.m_orig)
 40 | 
 41 |     def __getitem__(self, slc):
 42 |         rslc, cslc = misc.extract_slices(slc)
 43 |         return DataMatrix(self.observations[slc], self.row_ids[rslc], self.col_ids[cslc],
 44 |                           misc.slice_list(self.row_labels, rslc), misc.slice_list(self.col_labels, cslc),
 45 |                           self.m_orig, self.n_orig)
 46 | 
 47 |     def sample_latent_values(self, predictions, noise):
 48 |         return self.observations.sample_latent_values(predictions, noise)
 49 | 
 50 |     def loglik(self, predictions, noise):
 51 |         return self.observations.loglik(predictions, noise)
 52 | 
 53 |     def fixed_variance(self):
 54 |         return self.observations.fixed_variance()
 55 | 
 56 |     @staticmethod
 57 |     def from_decomp(decomp):
 58 |         obs = RealObservations(decomp.root.value(), decomp.obs)
 59 |         return DataMatrix(obs, decomp.row_ids, decomp.col_ids, decomp.row_labels, decomp.col_labels)
 60 | 
 61 |     @staticmethod
 62 |     def from_real_values(values, mask=None, **kwargs):
 63 |         if mask is None:
 64 |             mask = np.ones(values.shape, dtype=bool)
 65 |         observations = RealObservations(values, mask)
 66 |         return DataMatrix(observations, **kwargs)
 67 | 
 68 | class RealObservations:
 69 |     def __init__(self, values, mask):
 70 |         self.values = values
 71 |         self.mask = mask
 72 |         self.shape = values.shape
 73 |         assert isinstance(self.values, np.ndarray) and self.values.dtype == float
 74 |         assert isinstance(self.mask, np.ndarray) and self.mask.dtype == bool
 75 |     
 76 |     def sample_latent_values(self, predictions, noise):
 77 |         missing_values = np.random.normal(predictions, np.sqrt(noise))
 78 |         return np.where(self.mask, self.values, missing_values)
 79 | 
 80 |     def copy(self):
 81 |         return RealObservations(self.values.copy(), self.mask.copy())
 82 | 
 83 |     def transpose(self):
 84 |         return RealObservations(self.values.T, self.mask.T)
 85 | 
 86 |     def loglik(self, predictions, noise):
 87 |         if not np.isscalar(noise):
 88 |             noise = noise[self.mask]
 89 |         return distributions.gauss_loglik(self.values[self.mask], predictions[self.mask], noise).sum()
 90 | 
 91 |     def loglik_each(self, predictions, noise):
 92 |         return np.where(self.mask,
 93 |                         distributions.gauss_loglik(self.values, predictions, noise),
 94 |                         0.)
 95 | 
 96 |     def fixed_variance(self):
 97 |         return False
 98 | 
 99 |     def variance_estimate(self):
100 |         return (self.values[self.mask] ** 2).mean()
101 | 
102 |     def __getitem__(self, slc):
103 |         return RealObservations(self.values[slc], self.mask[slc])
104 | 
105 | 


--------------------------------------------------------------------------------
/parallel.py:
--------------------------------------------------------------------------------
  1 | import glob
  2 | import os
  3 | import re
  4 | import smtplib
  5 | import socket
  6 | import subprocess
  7 | import sys
  8 | 
  9 | import config
 10 | 
 11 | def _status_path(key):
 12 |     return os.path.join(config.JOBS_PATH, key)
 13 | 
 14 | def _status_file(key, host=None):
 15 |     if host is not None:
 16 |         return os.path.join(_status_path(key), 'status-%s.txt' % host)
 17 |     else:
 18 |         return os.path.join(_status_path(key), 'status.txt')
 19 | 
 20 | def _run_job(script, key, args):
 21 |     if key != 'None':
 22 |         outstr = open(_status_file(key, socket.gethostname()), 'a')
 23 |         print >> outstr, 'running:', args
 24 |         outstr.close()
 25 |         
 26 |     ret = subprocess.call('python %s %s' % (script, args), shell=True)
 27 | 
 28 |     if key != 'None':
 29 |         outstr = open(_status_file(key, socket.gethostname()), 'a')
 30 |         if ret == 0:
 31 |             print >> outstr, 'finished:', args
 32 |         else:
 33 |             print >> outstr, 'failed:', args
 34 |         outstr.close()
 35 | 
 36 | def _executable_exists(command):
 37 |     # taken from stackoverflow.com/questions/377017/test-if-executable-exists-in-python
 38 |     def is_exe(fpath):
 39 |         return os.path.isfile(fpath) and os.access(fpath, os.X_OK)
 40 | 
 41 |     for path in os.environ['PATH'].split(os.pathsep):
 42 |         path = path.strip('"')
 43 |         exe_file = os.path.join(path, command)
 44 |         if is_exe(exe_file):
 45 |             return True
 46 | 
 47 |     return False
 48 | 
 49 | def _remove_status_files(key):
 50 |     fnames = os.listdir(_status_path(key))
 51 |     for fname in fnames:
 52 |         if re.match(r'status-.*.txt', fname):
 53 |             full_path = os.path.join(_status_path(key), fname)
 54 |             os.remove(full_path)
 55 | 
 56 | def escape(job):
 57 |     return ' '.join(["'" + arg.replace("'", r"\'") + "'"
 58 |                      for arg in job])
 59 | 
 60 | def run_command(command, jobs, machines=None, chdir=None):
 61 |     args = ['parallel', '--gnu']
 62 |     if machines is not None:
 63 |         for m in machines:
 64 |             args += ['--sshlogin', m]
 65 | 
 66 |     if chdir is not None:
 67 |         command = 'cd %s; %s' % (chdir, command)
 68 |     args += [command]
 69 | 
 70 |     p = subprocess.Popen(args, shell=False, stdin=subprocess.PIPE)
 71 |     p.communicate('\n'.join(map(escape, jobs)))
 72 | 
 73 | def run(script, jobs, machines=None, key=None, email=False, rm_status=True):
 74 |     if not _executable_exists('parallel'):
 75 |         raise RuntimeError('GNU Parallel executable not found.')
 76 |     if not hasattr(config, 'JOBS_PATH'):
 77 |         raise RuntimeError('Need to specify JOBS_PATH in config.py')
 78 |     if not os.path.exists(config.JOBS_PATH):
 79 |         raise RuntimeError('Path chosen for config.JOBS_PATH does not exist: %s' % config.JOBS_PATH)
 80 |     
 81 |     if key is not None:
 82 |         if not os.path.exists(_status_path(key)):
 83 |             os.mkdir(_status_path(key))
 84 |             
 85 |         outstr = open(_status_file(key), 'w')
 86 |         for job in jobs:
 87 |             print >> outstr, 'queued:', job
 88 |         outstr.close()
 89 | 
 90 |         if rm_status:
 91 |             _remove_status_files(key)
 92 |         
 93 |     command = 'python parallel.py %s %s' % (key, script)
 94 |     run_command(command, jobs, machines=machines, chdir=os.getcwd())
 95 | 
 96 |     if email:
 97 |         if key is not None:
 98 |             subject = '%s jobs finished' % key
 99 |             p = subprocess.Popen(['check_status', key], stdout=subprocess.PIPE)
100 |             body, _ = p.communicate()
101 |         else:
102 |             subject = 'jobs finished'
103 |             body = ''
104 | 
105 |         msg = '\r\n'.join(['From: %s' % config.EMAIL,
106 |                            'To: %s' % config.EMAIL,
107 |                            'Subject: %s' % subject,
108 |                            '',
109 |                            body])
110 |         
111 |         s = smtplib.SMTP('localhost')
112 |         s.sendmail(config.EMAIL, [config.EMAIL], msg)
113 |         s.quit()
114 | 
115 | def isint(p):
116 |     try:
117 |         int(p)
118 |         return True
119 |     except:
120 |         return False
121 | 
122 | def parse_machines(s, njobs):
123 |     if s is None:
124 |         return s
125 |     parts = s.split(',')
126 |     return ['%d/%s' % (njobs, p) for p in parts]
127 | 
128 | def list_jobs(key, status_val):
129 |     status_files = [os.path.join(_status_path(key), 'status.txt')]
130 |     status_files += glob.glob('%s/status-*.txt' % _status_path(key))
131 | 
132 |     status = {}
133 |     for fname in status_files:
134 |         for line_ in open(fname).readlines():
135 |             line = line_.strip()
136 |             sv, args = line.split(':')
137 |             args = args.strip()
138 |             status[args] = sv
139 | 
140 |     return [k for k, v in status.items() if v == status_val]
141 | 
142 | 
143 | if __name__ == '__main__':
144 |     assert len(sys.argv) == 4
145 |     key = sys.argv[1]
146 |     script = sys.argv[2]
147 |     args = sys.argv[3]
148 |     _run_job(script, key, args)
149 | 


--------------------------------------------------------------------------------
/parsing.py:
--------------------------------------------------------------------------------
 1 | 
 2 | tokens = ('LETTER', 'PLUS', 'LPAREN', 'RPAREN', 'GSM')
 3 | 
 4 | t_LETTER = r'[gmbcMBC]'
 5 | t_PLUS = r'\+'
 6 | t_LPAREN = r'\('
 7 | t_RPAREN = r'\)'
 8 | t_GSM = r's'
 9 | 
10 | t_ignore = ' '
11 | 
12 | def t_error(t):
13 |     raise RuntimeError("Illegal character: '%s'" % t.value[0])
14 | 
15 | import ply.lex as lex
16 | lex.lex()
17 | 
18 | def p_expression_plus(t):
19 |     """expression : expression PLUS term"""
20 |     t[0] = ('+', t[1], t[3])
21 | 
22 | def p_expression_term(t):
23 |     """expression : term"""
24 |     t[0] = t[1]
25 | 
26 | def p_term_times(t):
27 |     """term : factor factor"""
28 |     t[0] = ('*', t[1], t[2])
29 | 
30 | def p_term_factor(t):
31 |     """term : factor"""
32 |     t[0] = t[1]
33 | 
34 | def p_factor_gsm(t):
35 |     """factor : GSM LPAREN expression RPAREN"""
36 |     t[0] = ('s', t[3])
37 | 
38 | def p_factor_group(t):
39 |     """factor : LPAREN expression RPAREN"""
40 |     t[0] = t[2]
41 | 
42 | def p_factor_letter(t):
43 |     """factor : LETTER"""
44 |     t[0] = t[1]
45 | 
46 | def p_error(t):
47 |     raise RuntimeError("Syntax error at '%s'" % t[1])
48 | 
49 | import ply.yacc as yacc
50 | yacc.yacc()
51 | 
52 | def parse(s):
53 |     return yacc.parse(s)
54 | 
55 | 
56 | 


--------------------------------------------------------------------------------
/predictive_distributions.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | nax = np.newaxis
  3 | 
  4 | from utils import misc
  5 | 
  6 | class PredictiveDistribution:
  7 |     def __slice__(self, slc):
  8 |         return self.__getitem__(slc)
  9 | 
 10 | 
 11 | class GaussianPredictiveDistribution(PredictiveDistribution):
 12 |     def __init__(self, mu, Sigma):
 13 |         self.mu = mu.copy()
 14 |         self.Sigma = Sigma.copy()
 15 | 
 16 |     def __getitem__(self, slc):
 17 |         return GaussianPredictiveDistribution(self.mu, self.Sigma)
 18 | 
 19 |     def generate_data(self, N):
 20 |         return np.array([np.random.multivariate_normal(self.mu, self.Sigma)
 21 |                          for i in range(N)])
 22 | 
 23 | class MultinomialPredictiveDistribution(PredictiveDistribution):
 24 |     def __init__(self, pi, centers):
 25 |         self.pi = pi.copy()
 26 |         self.centers = centers.copy()
 27 | 
 28 |     def __getitem__(self, slc):
 29 |         return MultinomialPredictiveDistribution(self.pi, self.centers[:, slc])
 30 | 
 31 |     @staticmethod
 32 |     def random(K, N):
 33 |         pi = np.random.uniform(0., 1., size=K)
 34 |         pi /= pi.sum()
 35 |         centers = np.random.normal(size=(K, N))
 36 |         return MultinomialPredictiveDistribution(pi, centers)
 37 | 
 38 |     def generate_data(self, N):
 39 |         Z = np.random.multinomial(1, self.pi, size=N)
 40 |         return np.dot(Z, self.centers)
 41 | 
 42 | class BernoulliPredictiveDistribution(PredictiveDistribution):
 43 |     def __init__(self, pi, A):
 44 |         self.pi = pi.copy()
 45 |         self.A = A.copy()
 46 | 
 47 |     def __getitem__(self, slc):
 48 |         return BernoulliPredictiveDistribution(self.pi, self.A[:, slc])
 49 | 
 50 |     @staticmethod
 51 |     def random(K, N):
 52 |         pi = np.random.uniform(0., 1., size=K)
 53 |         A = np.random.normal(size=(K, N))
 54 |         return BernoulliPredictiveDistribution(pi, A)
 55 | 
 56 |     def generate_data(self, N):
 57 |         Z = np.random.binomial(1, self.pi[nax, :], size=(N, self.pi.size))
 58 |         return np.dot(Z, self.A)
 59 | 
 60 | 
 61 | class PredictiveInfo:
 62 |     def __init__(self, components, mu, Sigma):
 63 |         self.components = components
 64 |         self.mu = mu
 65 |         self.Sigma = Sigma
 66 | 
 67 |     def predictive_for_row(self, i, idxs):
 68 |         components = [c[idxs] for c in self.components]
 69 |         if self.mu.ndim == 2:
 70 |             return components, self.mu[i, idxs], self.Sigma[i, :, :][idxs[:, nax], idxs[nax, :]]
 71 |         else:
 72 |             assert self.mu.ndim == 1
 73 |             return components, self.mu[idxs], self.Sigma[idxs[:, nax], idxs[nax, :]]
 74 | 
 75 |     def predictive_for_rows(self, rows):
 76 |         if self.mu.ndim == 1:
 77 |             N, D = rows.size, self.mu.size
 78 |             return self.components, np.tile(self.mu[nax, :], (N, 1)), np.tile(self.Sigma[nax, :, :], (N, 1, 1))
 79 |         else:
 80 |             return self.components, self.mu[rows], self.Sigma[rows, :, :]
 81 | 
 82 |     def generate_data(self, N):
 83 |         D = self.Sigma.shape[0]
 84 |         X = np.zeros((N, D))
 85 |         for c in self.components:
 86 |             X += c.generate_data(N)
 87 |         X += np.array([np.random.multivariate_normal(self.mu, self.Sigma)
 88 |                        for i in range(N)])
 89 |         return X
 90 | 
 91 | class GSMPredictiveDistribution(PredictiveDistribution):
 92 |     def __init__(self, scale_components, scale_mu, scale_Sigma, sigma_sq_approx, A):
 93 |         self.scale_components = scale_components
 94 |         self.scale_mu = scale_mu
 95 |         self.scale_Sigma = scale_Sigma
 96 |         self.sigma_sq_approx = sigma_sq_approx
 97 |         self.A = A.copy()
 98 | 
 99 |     def __getitem__(self, slc):
100 |         return GSMPredictiveDistribution(self.scale_components, self.scale_mu, self.scale_Sigma,
101 |                                          self.sigma_sq_approx, self.A[:, slc])
102 | 
103 |     def generate_data(self, N):
104 |         K, D = self.A.shape
105 |         Z = np.zeros((N, K))
106 |         for sc in self.scale_components:
107 |             Z += sc.generate_data(N)
108 |         Z += np.array([np.random.multivariate_normal(self.scale_mu, self.scale_Sigma)
109 |                                   for i in range(N)])
110 |         S = np.random.normal(0., np.exp(0.5 * Z))
111 |         return np.dot(S, self.A)
112 | 
113 | 
114 | 
115 | 
116 | 
117 | ######################## computing the predictive distributions ################
118 |         
119 | class FixedTerm:
120 |     def __init__(self, values):
121 |         self.values = values
122 | 
123 | class GaussianTerm:
124 |     def __init__(self, values, mu, Sigma):
125 |         self.values = values
126 |         self.mu = mu
127 |         self.Sigma = Sigma
128 | 
129 | class ChainTerm:
130 |     def __init__(self, values, mu_delta, Sigma_delta):
131 |         self.values = values
132 |         self.mu_delta = mu_delta
133 |         self.Sigma_delta = Sigma_delta
134 | 
135 | def extract_terms(node):
136 |     if node.isleaf():
137 |         assert node.distribution() in ['g', 'm', 'b']
138 |         if node.distribution() == 'g':
139 |             mu = np.zeros(node.n)
140 |             sigma_sq_row, sigma_sq_col = node.row_col_variance()
141 |             Sigma = np.diag(sigma_sq_row.mean() * sigma_sq_col)
142 |             return [GaussianTerm(node.value(), mu, Sigma)]
143 |         else:
144 |             return [FixedTerm(node.value())]
145 |         
146 |     elif node.issum():
147 |         child_terms = [extract_terms(child) for child in node.children]
148 |         return reduce(list.__add__, child_terms)
149 | 
150 |     elif node.isgsm():
151 |         return [FixedTerm(node.value())]
152 | 
153 |     elif node.isproduct():
154 |         left, right = node.children
155 |         
156 |         if left.isleaf() and left.distribution() == 'c':
157 |             child_terms = extract_terms(right)
158 |             terms = []
159 |             for ct in child_terms:
160 |                 if isinstance(ct, FixedTerm):
161 |                     # fixed terms inside chains remain fixed
162 |                     terms.append(FixedTerm(ct.values.cumsum(0)))
163 |                 elif isinstance(ct, GaussianTerm):
164 |                     # Gaussians become chains
165 |                     terms.append(ChainTerm(ct.values.cumsum(0), ct.mu, ct.Sigma))
166 |                 elif isinstance(ct, ChainTerm):
167 |                     # freeze nested chains since these are annoying
168 |                     terms.append(FixedTerm(ct.values.cumsum(0)))
169 |                 else:
170 |                     raise RuntimeError('Unknown term')
171 |             return terms
172 | 
173 |         else:
174 |             child_terms = extract_terms(left)
175 |             V = right.value()
176 |             terms = []
177 |             for ct in child_terms:
178 |                 # same distribution, but multiplied by V on the right
179 |                 if isinstance(ct, FixedTerm):
180 |                     terms.append(FixedTerm(np.dot(ct.values, V)))
181 |                 elif isinstance(ct, GaussianTerm):
182 |                     mu = np.dot(ct.mu, V)
183 |                     Sigma = np.dot(V.T, np.dot(ct.Sigma, V))
184 |                     terms.append(GaussianTerm(np.dot(ct.values, V), mu, Sigma))
185 |                 elif isinstance(ct, ChainTerm):
186 |                     mu = np.dot(ct.mu_delta, V)
187 |                     Sigma = np.dot(V.T, np.dot(ct.Sigma_delta, V))
188 |                     terms.append(ChainTerm(np.dot(ct.values, V), mu, Sigma))
189 |                 else:
190 |                     raise RuntimeError('Unknown term')
191 |             return terms
192 | 
193 | def collect_terms(terms):
194 |     fixed_values = 0.
195 |     gaussian_values = 0.
196 |     gaussian_mu = 0.
197 |     gaussian_Sigma = 0.
198 |     chain_values = 0.
199 |     chain_mu = 0.
200 |     chain_Sigma = 0.
201 |     has_fixed = has_gaussian = has_chain = False
202 | 
203 |     for term in terms:
204 |         if isinstance(term, FixedTerm):
205 |             fixed_values += term.values
206 |             has_fixed = True
207 |         elif isinstance(term, GaussianTerm):
208 |             gaussian_values += term.values
209 |             gaussian_mu += term.mu
210 |             gaussian_Sigma += term.Sigma
211 |             has_gaussian = True
212 |         elif isinstance(term, ChainTerm):
213 |             chain_values += term.values
214 |             chain_mu += term.mu_delta
215 |             chain_Sigma += term.Sigma_delta
216 |             has_chain = True
217 |         else:
218 |             raise RuntimeError('Unknown term')
219 | 
220 |     if has_fixed:
221 |         fixed_term = FixedTerm(fixed_values)
222 |     else:
223 |         fixed_term = None
224 | 
225 |     if has_gaussian:
226 |         gaussian_term = GaussianTerm(gaussian_values, gaussian_mu, gaussian_Sigma)
227 |     else:
228 |         gaussian_term = None
229 |         
230 |     if has_chain:
231 |         chain_term = ChainTerm(chain_values, chain_mu, chain_Sigma)
232 |     else:
233 |         chain_term = None
234 | 
235 |     return fixed_term, gaussian_term, chain_term
236 | 
237 | 
238 | def compute_gaussian_part(training_data_matrix, root, N):
239 |     fixed_term, gaussian_term, chain_term = collect_terms(extract_terms(root))
240 |     assert gaussian_term is not None
241 |     D = gaussian_term.values.shape[1]
242 | 
243 |     if chain_term is None:
244 |         return gaussian_term.mu, gaussian_term.Sigma
245 | 
246 |     X = training_data_matrix.sample_latent_values(root.predictions(), root.children[-1].sigma_sq)
247 | 
248 |     mu_0 = np.zeros(D)
249 |     Sigma_v = chain_term.Sigma_delta
250 | 
251 |     y = np.zeros((D, N))
252 |     for i, row in enumerate(training_data_matrix.row_ids):
253 |         if fixed_term is not None:
254 |             y[:, row] = X[i, :] - gaussian_term.mu - fixed_term.values[i, :]
255 |         else:
256 |             y[:, row] = X[i, :] - gaussian_term.mu
257 | 
258 |     mask = np.zeros(N, dtype=bool)
259 |     mask[training_data_matrix.row_ids] = True
260 | 
261 |     mu_chains, Sigma_chains = misc.kalman_filter_codiag2(
262 |         mu_0, Sigma_v, np.linalg.inv(gaussian_term.Sigma), y, mask)
263 |     mu_total = mu_chains.T + gaussian_term.mu[nax, :]
264 |     Sigma_total = np.zeros((N, D, D))
265 |     for i in range(N):
266 |         Sigma_total[i, :, :] = Sigma_chains[:, :, i] + gaussian_term.Sigma
267 | 
268 |     return mu_total, Sigma_total
269 |     
270 | 
271 | 
272 | def extract_non_gaussian_part(node):
273 |     if node.isleaf():
274 |         assert node.distribution() in ['g', 'm', 'b']
275 |         if node.distribution() == 'g':
276 |             return []
277 |         elif node.distribution() == 'm':
278 |             pi = (1. + node.value().sum(0)) / (node.n + node.m)
279 |             return [MultinomialPredictiveDistribution(pi, np.eye(node.n))]
280 |         elif node.distribution() == 'b':
281 |             pi = (1. + node.value().sum(0)) / (2. + node.m)
282 |             return [BernoulliPredictiveDistribution(pi, np.eye(node.n))]
283 | 
284 |     elif node.issum():
285 |         child_components = [extract_non_gaussian_part(child) for child in node.children]
286 |         return reduce(list.__add__, child_components)
287 | 
288 |     elif node.isproduct():
289 |         left, right = node.children
290 | 
291 |         if left.isleaf() and left.distribution() == 'c':
292 |             return []
293 | 
294 |         else:
295 |             child_components = extract_non_gaussian_part(left)
296 |             components = []
297 |             for cp in child_components:
298 |                 if isinstance(cp, MultinomialPredictiveDistribution):
299 |                     components.append(MultinomialPredictiveDistribution(cp.pi, np.dot(cp.centers, right.value())))
300 |                 elif isinstance(cp, BernoulliPredictiveDistribution):
301 |                     components.append(BernoulliPredictiveDistribution(cp.pi, np.dot(cp.A, right.value())))
302 |                 elif isinstance(cp, GSMPredictiveDistribution):
303 |                     components.append(GSMPredictiveDistribution(cp.scale_components, cp.scale_mu, cp.scale_Sigma,
304 |                                                                 cp.sigma_sq_approx, np.dot(cp.A, right.value())))
305 |             return components
306 | 
307 |     elif node.isgsm():
308 |         scale_node = node.scale_node
309 |         scale_components = extract_non_gaussian_part(scale_node)
310 |         fixed_term, gaussian_term, chain_term = collect_terms(extract_terms(scale_node))
311 |         assert chain_term is None
312 |         scale_mu, scale_Sigma = gaussian_term.mu, gaussian_term.Sigma
313 |         if node.bias_type == 'col':
314 |             scale_mu += node.bias.ravel()
315 |         elif node.bias_type == 'scalar':
316 |             scale_mu += node.bias
317 |         else:
318 |             raise RuntimeError('Invalid bias type: %s' % node.bias_type)
319 |         sigma_sq_approx = (node.value() ** 2).mean(0)
320 |         return [GSMPredictiveDistribution(scale_components, scale_mu, scale_Sigma,
321 |                                           sigma_sq_approx, np.eye(node.n))]
322 |     
323 |         
324 | 
325 | 
326 | def compute_predictive_info(train_data_matrix, root, N):
327 |     components = extract_non_gaussian_part(root)
328 |     mu, Sigma = compute_gaussian_part(train_data_matrix, root, N)
329 |     return PredictiveInfo(components, mu, Sigma)
330 | 
331 | 
332 | def remove_gsm(predictive_info):
333 |     new_components = []
334 |     new_mu, new_Sigma = predictive_info.mu.copy(), predictive_info.Sigma.copy()
335 |     for c in predictive_info.components:
336 |         if isinstance(c, GSMPredictiveDistribution):
337 |             #new_Sigma += np.diag(c.sigma_sq_approx)
338 |             new_Sigma += np.dot(c.A.T, np.dot(np.diag(c.sigma_sq_approx), c.A))
339 |         else:
340 |             new_components.append(c)
341 |     return PredictiveInfo(new_components, new_mu, new_Sigma)
342 | 
343 | def has_gsm(predictive_info):
344 |     for c in predictive_info.components:
345 |         if isinstance(c, GSMPredictiveDistribution):
346 |             return True
347 |     return False
348 | 
349 | 


--------------------------------------------------------------------------------
/presentation.py:
--------------------------------------------------------------------------------
  1 | import collections
  2 | import numpy as np
  3 | import sys
  4 | 
  5 | import grammar
  6 | 
  7 | 
  8 | 
  9 | 
 10 | def format_table(table, sep='  '):
 11 |     num_cols = len(table[0])
 12 |     if any([len(row) != num_cols for row in table]):
 13 |         raise RuntimeError('Number of columns must match.')
 14 | 
 15 |     widths = [max([len(row[i]) for row in table])
 16 |               for i in range(num_cols)]
 17 |     format_string = sep.join(['%' + str(w) + 's' for w in widths])
 18 |     return [format_string % tuple(row) for row in table]
 19 | 
 20 | def format_table_latex(table):
 21 |     return [l + ' \\\\' for l in format_table(table, ' & ')]
 22 | 
 23 | class Failure:
 24 |     def __init__(self, structure, level, all_failed, name=None):
 25 |         self.structure = structure
 26 |         self.level = level
 27 |         self.all_failed = all_failed
 28 |         self.name = name
 29 | 
 30 | def print_failed_structures(failures, outfile=sys.stdout):
 31 |     if failures:
 32 |         print >> outfile, 'The inference algorithms failed for the following structures:'
 33 |         print >> outfile
 34 |         print >> outfile, '%30s%8s        %s' % \
 35 |               ('structure', 'level', 'notes')
 36 |         print >> outfile
 37 |         for f in failures:
 38 |             line = '%30s%8d        ' % (grammar.pretty_print(f.structure), f.level)
 39 |             if f.name:
 40 |                 line += '(for %s)  ' % f.name
 41 |             if not f.all_failed:
 42 |                 line += '(only some jobs failed)  '
 43 |             print >> outfile, line
 44 |         print >> outfile
 45 |         print >> outfile
 46 | 
 47 | 
 48 | class ModelScore:
 49 |     def __init__(self, structure, row_score, col_score, total, row_improvement, col_improvement,
 50 |                  z_score_row, z_score_col):
 51 |         self.structure = structure
 52 |         self.row_score = row_score
 53 |         self.col_score = col_score
 54 |         self.total = total
 55 |         self.row_improvement = row_improvement
 56 |         self.col_improvement = col_improvement
 57 |         self.z_score_row = z_score_row
 58 |         self.z_score_col = z_score_col
 59 | 
 60 | def print_scores(level, model_scores, outfile=sys.stdout):
 61 |     print >> outfile, 'The following are the top-scoring structures for level %d:' % level
 62 |     print >> outfile
 63 |     print >> outfile, '%30s%10s%10s%13s%13s%13s%10s%10s' % \
 64 |           ('structure', 'row', 'col', 'total', 'row impvt.', 'col impvt.', 'z (row)', 'z (col)')
 65 |     print >> outfile
 66 |     for ms in model_scores:
 67 |         print >> outfile, '%30s%10.2f%10.2f%13.2f%13.2f%13.2f%10.2f%10.2f' % \
 68 |               (grammar.pretty_print(ms.structure), ms.row_score, ms.col_score, ms.total,
 69 |                ms.row_improvement, ms.col_improvement, ms.z_score_row, ms.z_score_col)
 70 |     print >> outfile
 71 |     print >> outfile
 72 |     
 73 | 
 74 | def print_model_sequence(model_scores, outfile=sys.stdout):
 75 |     print >> outfile, "Here are the best-performing structures in each level of the search:"
 76 |     print >> outfile
 77 |     print >> outfile, '%10s%25s%13s%13s%10s%10s' % \
 78 |           ('level', 'structure', 'row impvt.', 'col impvt.', 'z (row)', 'z (col)')
 79 |     print >> outfile
 80 |     for i, ms in enumerate(model_scores):
 81 |         print >> outfile, '%10d%25s%13.2f%13.2f%10.2f%10.2f' % \
 82 |               (i+1, grammar.pretty_print(ms.structure), ms.row_improvement, ms.col_improvement,
 83 |                ms.z_score_row, ms.z_score_col)
 84 |     print >> outfile
 85 |     print >> outfile
 86 | 
 87 | 
 88 | class RunningTime:
 89 |     def __init__(self, level, structure, num_samples, total_time):
 90 |         self.level = level
 91 |         self.structure = structure
 92 |         self.num_samples = num_samples
 93 |         self.total_time = total_time
 94 | 
 95 | def format_time(t):
 96 |     if t < 60.:
 97 |         return '%1.1f seconds' % t
 98 |     elif t < 3600.:
 99 |         return '%1.1f minutes' % (t / 60.)
100 |     else:
101 |         return '%1.1f hours' % (t / 3600.)
102 | 
103 | def print_running_times(running_times, outfile=sys.stdout):
104 |     total = sum([rt.total_time for rt in running_times])
105 |     print >> outfile, 'Total CPU time was %s. Here is the breakdown:' % format_time(total)
106 |     print >> outfile
107 |     print >> outfile, '%30s%8s        %s' % \
108 |           ('structure', 'level', 'time')
109 |     print >> outfile
110 |     running_times = sorted(running_times, key=lambda rt: rt.total_time, reverse=True)
111 |     for rt in running_times:
112 |         time_str = '%d  x  %s' % (rt.num_samples, format_time(rt.total_time / rt.num_samples))
113 |         print >> outfile, '%30s%8d        %s' % (grammar.pretty_print(rt.structure), rt.level, time_str)
114 |     print >> outfile
115 |     print >> outfile
116 | 
117 | 
118 | class FinalResult:
119 |     def __init__(self, expt_name, structure):
120 |         self.expt_name = expt_name
121 |         self.structure = structure
122 | 
123 | def print_learned_structures(results, outfile=sys.stdout):
124 |     def sortkey(result):
125 |         return result.expt_name.split('_')[-1]
126 |     results = sorted(results, key=sortkey)
127 | 
128 |     print >> outfile, 'The learned structures:'
129 |     print >> outfile
130 |     print >> outfile, '%25s%25s' % ('experiment', 'structure')
131 |     print >> outfile
132 |     for r in results:
133 |         print >> outfile, '%25s%25s' % (r.expt_name, grammar.pretty_print(r.structure))
134 |     print >> outfile
135 |     print >> outfile
136 | 
137 | 
138 | 
139 | class LatentVariables:
140 |     def __init__(self, label, z):
141 |         self.label = label
142 |         self.z = z
143 | 
144 | def print_components(model, structure, row_or_col, items, outfile=sys.stdout):
145 |     cluster_members = collections.defaultdict(list)
146 |     if model == 'clustering':
147 |         for item in items:
148 |             z = item.z if np.isscalar(item.z) else item.z.argmax()
149 |             cluster_members[z].append(item.label)
150 | 
151 |         component_type, component_type_pl = 'Cluster', 'clusters'
152 |     elif model == 'binary':
153 |         for item in items:
154 |             for i, zi in enumerate(item.z):
155 |                 if zi:
156 |                     cluster_members[i].append(item.label)
157 |         component_type, component_type_pl = 'Component', 'components'
158 |             
159 |     cluster_ids = sorted(cluster_members.keys(), key=lambda k: len(cluster_members[k]), reverse=True)
160 | 
161 |     row_col_str = {'row': 'row', 'col': 'column'}[row_or_col]
162 |     print >> outfile, 'For structure %s, the following %s %s were found:' % \
163 |           (grammar.pretty_print(structure), row_col_str, component_type_pl)
164 |     print >> outfile
165 | 
166 |     for i, cid in enumerate(cluster_ids):
167 |         print >> outfile, '    %s %d:' % (component_type, i+1)
168 |         print >> outfile
169 |         for label in cluster_members[cid]:
170 |             print >> outfile, '        %s' % label
171 |         print >> outfile
172 |     print >> outfile
173 | 
174 | 
175 |     
176 | 
177 | 


--------------------------------------------------------------------------------
/scoring.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | nax = np.newaxis
 3 | 
 4 | from algorithms import ais_gsm, variational
 5 | import observations
 6 | import predictive_distributions
 7 | from utils import misc
 8 | 
 9 | 
10 | CACHE = False
11 | cached_pi = None
12 | 
13 | def score_row_predictive_variational(train_data_matrix, root, test_data_matrix, num_steps_ais=2000):
14 |     N = test_data_matrix.m_orig
15 |     predictive_info_orig = predictive_distributions.compute_predictive_info(train_data_matrix, root, N)
16 |     predictive_info = predictive_distributions.remove_gsm(predictive_info_orig)
17 | 
18 |     result = np.zeros(test_data_matrix.m)
19 |     pbar = misc.pbar(test_data_matrix.m)
20 |     for i, row in enumerate(test_data_matrix.row_ids):
21 |         idxs = np.where(test_data_matrix.observations.mask[i, :])[0]
22 | 
23 |         components, mu, Sigma = predictive_info.predictive_for_row(row, idxs)
24 | 
25 |         estimators = []
26 |         for comp in components:
27 |             if isinstance(comp, predictive_distributions.MultinomialPredictiveDistribution):
28 |                 estimators.append(variational.MultinomialEstimator(comp.pi, comp.centers))
29 |             elif isinstance(comp, predictive_distributions.BernoulliPredictiveDistribution):
30 |                 estimators.append(variational.BernoulliEstimator(comp.pi, comp.A))
31 |             else:
32 |                 raise RuntimeError('Unknown predictive distribution')
33 | 
34 |         assert isinstance(test_data_matrix.observations, observations.RealObservations)
35 |         
36 |         problem = variational.VariationalProblem(estimators, test_data_matrix.observations.values[i, idxs] - mu,
37 |                                                  Sigma)
38 |         reps = problem.solve()
39 |         result[i] = problem.objective_function(reps)
40 | 
41 |         if predictive_distributions.has_gsm(predictive_info_orig):
42 |             components, mu, Sigma = predictive_info_orig.predictive_for_row(row, idxs)
43 |             assert np.allclose(mu, 0.)   # can't do chains yet
44 |             X = test_data_matrix.observations.values[i, idxs]
45 |             X = X[nax, :]
46 |             result[i] = ais_gsm.compute_likelihood(X, components, Sigma, [reps], np.array([result[i]]),
47 |                                                    num_steps=num_steps_ais)[0]
48 | 
49 |         pbar.update(i)
50 |     pbar.finish()
51 | 
52 | 
53 |     return result
54 | 
55 | def score_col_predictive_variational(train_data_matrix, root, test_data_matrix, num_steps_ais=2000):
56 |     return score_row_predictive_variational(train_data_matrix.transpose(), root.transpose(),
57 |                                             test_data_matrix.transpose(), num_steps_ais=num_steps_ais)
58 | 
59 | 
60 | 
61 | def no_structure_row_loglik(train_data, row_test_data):
62 |     sigma_sq = train_data.observations.variance_estimate()
63 |     return np.array([row_test_data.observations[i, :].loglik(np.zeros(row_test_data.n), sigma_sq)
64 |                      for i in range(row_test_data.m)])
65 | 
66 | 
67 | def no_structure_col_loglik(train_data, col_test_data):
68 |     return no_structure_row_loglik(train_data.transpose(), col_test_data.transpose())
69 |     
70 | def evaluate_model(train_data, root, row_test_data, col_test_data, label='', avg_col_mean=True,
71 |                    init_row_loglik=None, init_col_loglik=None, num_steps_ais=2000, max_dim=None):
72 | 
73 |     print 'Scoring row predictive likelihood...'
74 |     row_loglik_all = score_row_predictive_variational(
75 |         train_data[:, :max_dim], root[:, :max_dim], row_test_data[:, :max_dim], num_steps_ais=num_steps_ais)
76 |     if avg_col_mean:
77 |         if init_row_loglik is None:
78 |             init_row_loglik = no_structure_row_loglik(train_data[:, :max_dim], row_test_data[:, :max_dim])
79 |         row_loglik_all = np.logaddexp(row_loglik_all + np.log(0.99),
80 |                                       init_row_loglik + np.log(0.01))
81 | 
82 |     print 'Scoring column predictive likelihood...'
83 |     col_loglik_all = score_col_predictive_variational(
84 |         train_data[:max_dim, :], root[:max_dim, :], col_test_data[:max_dim, :], num_steps_ais=num_steps_ais)
85 |     if avg_col_mean:
86 |         if init_col_loglik is None:
87 |             init_col_loglik = no_structure_col_loglik(train_data[:max_dim, :], col_test_data[:max_dim, :])
88 |         col_loglik_all = np.logaddexp(col_loglik_all + np.log(0.99),
89 |                                       init_col_loglik + np.log(0.01))
90 | 
91 |     return row_loglik_all, col_loglik_all
92 | 
93 | 
94 | 
95 | 


--------------------------------------------------------------------------------
/single_process.py:
--------------------------------------------------------------------------------
1 | import subprocess
2 | 
3 | 
4 | def run(script, jobs):
5 |     for job in jobs:
6 |         subprocess.call(['python', script] + list(job))
7 | 
8 | 
9 | 


--------------------------------------------------------------------------------
/synthetic_experiments.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import collections
  3 | import numpy as np
  4 | nax = np.newaxis
  5 | import os
  6 | import StringIO
  7 | import sys
  8 | 
  9 | import config
 10 | import experiments
 11 | import observations
 12 | import presentation
 13 | from utils import misc, storage
 14 | 
 15 | 
 16 | NUM_ROWS = 200
 17 | NUM_COLS = 200
 18 | NUM_COMPONENTS = 10
 19 | 
 20 | DEFAULT_SEARCH_DEPTH = 3
 21 | DEFAULT_PREFIX = 'synthetic'
 22 | 
 23 | def generate_ar(nrows, ncols, a):
 24 |     X = np.zeros((nrows, ncols))
 25 |     X[0,:] = np.random.normal(size=ncols)
 26 |     for i in range(1, nrows):
 27 |         X[i,:] = a * X[i-1,:] + np.random.normal(0., np.sqrt(1-a**2), size=ncols)
 28 |     return X
 29 | 
 30 | def generate_data(data_str, nrows, ncols, ncomp, return_components=False):
 31 |     IBP_ALPHA = 2.
 32 |     pi_crp = np.ones(ncomp) / ncomp
 33 |     pi_ibp = np.ones(ncomp) * IBP_ALPHA / ncomp
 34 | 
 35 |     if data_str[-1] == 'T':
 36 |         data_str = data_str[:-1]
 37 |         transpose = True
 38 |         nrows, ncols = ncols, nrows
 39 |     else:
 40 |         transpose = False
 41 |     
 42 |     if data_str == 'pmf':
 43 |         U = np.random.normal(0., 1., size=(nrows, ncomp))
 44 |         V = np.random.normal(0., 1., size=(ncomp, ncols))
 45 |         data = np.dot(U, V)
 46 |         components = (U, V)
 47 |         
 48 |     elif data_str == 'mog':
 49 |         U = np.random.multinomial(1, pi_crp, size=nrows)
 50 |         V = np.random.normal(0., 1., size=(ncomp, ncols))
 51 |         data = np.dot(U, V)
 52 |         components = (U, V)
 53 | 
 54 |     elif data_str == 'ibp':
 55 |         U = np.random.binomial(1, pi_ibp[nax,:], size=(nrows, ncomp))
 56 |         V = np.random.normal(0., 1., size=(ncomp, ncols))
 57 |         data = np.dot(U, V)
 58 |         components = (U, V)
 59 | 
 60 |     elif data_str == 'sparse':
 61 |         Z = np.random.normal(0., 1., size=(nrows, ncomp))
 62 |         U = np.random.normal(0., np.exp(Z))
 63 |         V = np.random.normal(0., 1., size=(ncomp, ncols))
 64 |         data = np.dot(U, V)
 65 |         components = (U, V)
 66 |         
 67 | 
 68 |     elif data_str == 'gsm':
 69 |         U_inner = np.random.normal(0., 1., size=(nrows, 1))
 70 |         V_inner = np.random.normal(0., 1., size=(1, ncomp))
 71 |         Z = np.random.normal(U_inner * V_inner, 1.)
 72 |         #Z = 2. * Z / np.sqrt(np.mean(Z**2))
 73 | 
 74 |         U = np.random.normal(0., np.exp(Z))
 75 |         V = np.random.normal(0., 1., size=(ncomp, ncols))
 76 |         data = np.dot(U, V)
 77 |         components = (U, V)
 78 | 
 79 |     elif data_str == 'irm':
 80 |         U = np.random.multinomial(1, pi_crp, size=nrows)
 81 |         R = np.random.normal(0., 1., size=(ncomp, ncomp))
 82 |         V = np.random.multinomial(1, pi_crp, size=ncols).T
 83 |         data = np.dot(np.dot(U, R), V)
 84 |         components = (U, R, V)
 85 | 
 86 |     elif data_str == 'bmf':
 87 |         U = np.random.binomial(1, pi_ibp[nax,:], size=(nrows, ncomp))
 88 |         R = np.random.normal(0., 1., size=(ncomp, ncomp))
 89 |         V = np.random.binomial(1, pi_ibp[nax,:], size=(ncols, ncomp)).T
 90 |         data = np.dot(np.dot(U, R), V)
 91 |         components = (U, R, V)
 92 | 
 93 |     elif data_str == 'mgb':
 94 |         U = np.random.multinomial(1, pi_crp, size=nrows)
 95 |         R = np.random.normal(0., 1., size=(ncomp, ncomp))
 96 |         V = np.random.binomial(1, pi_ibp[nax,:], size=(ncols, ncomp)).T
 97 |         data = np.dot(np.dot(U, R), V)
 98 |         components = (U, R, V)
 99 | 
100 |     elif data_str == 'chain':
101 |         data = generate_ar(nrows, ncols, 0.9)
102 |         components = (data)
103 | 
104 |     elif data_str == 'kf':
105 |         U = generate_ar(nrows, ncomp, 0.9)
106 |         V = np.random.normal(size=(ncomp, ncols))
107 |         data = np.dot(U, V)
108 |         components = (U, V)
109 | 
110 |     elif data_str == 'bctf':
111 |         temp1, (U1, V1) = generate_data('mog', nrows, ncols, ncomp, True)
112 |         F1 = np.random.normal(temp1, 1.)
113 |         temp2, (U2, V2) = generate_data('mog', nrows, ncols, ncomp, True)
114 |         F2 = np.random.normal(temp2, 1.)
115 |         data = np.dot(F1, F2.T)
116 |         components = (U1, V1, F1, U2, V2, F2)
117 |         
118 | 
119 |     data /= np.std(data)
120 | 
121 |     if transpose:
122 |         data = data.T
123 | 
124 |     if return_components:
125 |         return data, components
126 |     else:
127 |         return data
128 | 
129 | 
130 | NOISE_STR_VALUES = ['0.1', '1.0', '3.0', '10.0']
131 | ALL_MODELS = ['pmf', 'mog', 'ibp', 'chain', 'irm', 'bmf', 'kf', 'bctf', 'sparse', 'gsm']
132 | 
133 | 
134 | def experiment_name(prefix, noise_str, model):
135 |     return '%s_%s_%s' % (prefix, noise_str, model)
136 | 
137 | def all_experiment_names(prefix):
138 |     return [experiment_name(prefix, noise_str, model)
139 |             for noise_str in NOISE_STR_VALUES
140 |             for model in ALL_MODELS
141 |             ]
142 | 
143 | def load_params(prefix):
144 |     expt_name = all_experiment_names(prefix)[0]
145 |     return storage.load(experiments.params_file(expt_name))
146 | 
147 | def initial_samples_jobs(prefix, level):
148 |     return reduce(list.__add__, [experiments.initial_samples_jobs(name, level)
149 |                                  for name in all_experiment_names(prefix)])
150 | 
151 | def initial_samples_key(prefix, level):
152 |     return '%s_init_%d' % (prefix, level)
153 | 
154 | def evaluation_jobs(prefix, level):
155 |     return reduce(list.__add__, [experiments.evaluation_jobs(name, level)
156 |                                  for name in all_experiment_names(prefix)])
157 | 
158 | def evaluation_key(prefix, level):
159 |     return '%s_eval_%d' % (prefix, level)
160 | 
161 | def final_model_jobs(prefix):
162 |     return reduce(list.__add__, [experiments.final_model_jobs(name)
163 |                                  for name in all_experiment_names(prefix)])
164 | 
165 | def final_model_key(prefix):
166 |     return '%s_final' % prefix
167 | 
168 | def report_dir(prefix):
169 |     return os.path.join(config.REPORT_PATH, prefix)
170 | 
171 | def report_file(prefix):
172 |     return os.path.join(report_dir(prefix), 'results.txt')
173 | 
174 | 
175 | def init_experiment(prefix, debug, search_depth=3):
176 |     experiments.check_required_directories()
177 |         
178 |     for noise_str in NOISE_STR_VALUES:
179 |         for model in ALL_MODELS:
180 |             name = experiment_name(prefix, noise_str, model)
181 |             if debug:
182 |                 params = experiments.QuickParams(search_depth=search_depth)
183 |             else:
184 |                 params = experiments.SmallParams(search_depth=search_depth)
185 |             data, components = generate_data(model, NUM_ROWS, NUM_COLS, NUM_COMPONENTS, True)
186 |             clean_data_matrix = observations.DataMatrix.from_real_values(data)
187 |             noise_var = float(noise_str)
188 |             noisy_data = np.random.normal(data, np.sqrt(noise_var))
189 |             data_matrix = observations.DataMatrix.from_real_values(noisy_data)
190 |             experiments.init_experiment(name, data_matrix, params, components,
191 |                                         clean_data_matrix=clean_data_matrix)
192 |     
193 |         
194 | def init_level(prefix, level):
195 |     for name in all_experiment_names(prefix):
196 |         experiments.init_level(name, level)
197 | 
198 | def collect_scores_for_level(prefix, level):
199 |     for name in all_experiment_names(prefix):
200 |         experiments.collect_scores_for_level(name, level)
201 | 
202 | def run_everything(prefix, args):
203 |     params = load_params(prefix)
204 |     init_level(prefix, 1)
205 |     experiments.run_jobs(evaluation_jobs(prefix, 1), args, evaluation_key(prefix, 1))
206 |     collect_scores_for_level(prefix, 1)
207 |     for level in range(2, params.search_depth + 1):
208 |         init_level(prefix, level)
209 |         experiments.run_jobs(initial_samples_jobs(prefix, level), args, initial_samples_key(prefix, level))
210 |         experiments.run_jobs(evaluation_jobs(prefix, level), args, evaluation_key(prefix, level))
211 |         collect_scores_for_level(prefix, level)
212 |     experiments.run_jobs(final_model_jobs(prefix), args, final_model_key(prefix))
213 | 
214 | 
215 | def print_failures(prefix, outfile=sys.stdout):
216 |     params = load_params(prefix)
217 |     failures = []
218 |     for level in range(1, params.search_depth + 1):
219 |         ok_counts = collections.defaultdict(int)
220 |         fail_counts = collections.defaultdict(int)
221 |         for expt_name in all_experiment_names(prefix):
222 |             for _, structure in storage.load(experiments.structures_file(expt_name, level)):
223 |                 for split_id in range(params.num_splits):
224 |                     for sample_id in range(params.num_samples):
225 |                         ok = False
226 |                         fname = experiments.scores_file(expt_name, level, structure, split_id, sample_id)
227 |                         if storage.exists(fname):
228 |                             row_loglik, col_loglik = storage.load(fname)
229 |                             if np.all(np.isfinite(row_loglik)) and np.all(np.isfinite(col_loglik)):
230 |                                 ok = True
231 | 
232 |                         if ok:
233 |                             ok_counts[structure] += 1
234 |                         else:
235 |                             fail_counts[structure] += 1
236 | 
237 |         for structure in fail_counts:
238 |             if ok_counts[structure] > 0:
239 |                 failures.append(presentation.Failure(structure, level, False))
240 |             else:
241 |                 failures.append(presentation.Failure(structure, level, True))
242 | 
243 |     presentation.print_failed_structures(failures, outfile)
244 | 
245 | def print_learned_structures(prefix, outfile=sys.stdout):
246 |     results = []
247 |     for expt_name in all_experiment_names(prefix):
248 |         structure, _ = experiments.final_structure(expt_name)
249 |         results.append(presentation.FinalResult(expt_name, structure))
250 |     presentation.print_learned_structures(results, outfile)
251 | 
252 | def summarize_results(prefix, outfile=sys.stdout):
253 |     print_learned_structures(prefix, outfile)
254 |     print_failures(prefix, outfile)
255 | 
256 | def save_report(name, email=None):
257 |     # write to stdout
258 |     summarize_results(name)
259 | 
260 |     # write to report file
261 |     if not os.path.exists(report_dir(name)):
262 |         os.mkdir(report_dir(name))
263 |     summarize_results(name, open(report_file(name), 'w'))
264 | 
265 |     if email is not None and email.find('@') != -1:
266 |         header = 'experiment %s finished' % name
267 |         buff = StringIO.StringIO()
268 |         print >> buff, 'These results are best viewed in a monospace font.'
269 |         print >> buff
270 |         summarize_results(name, buff)
271 |         body = buff.getvalue()
272 |         buff.close()
273 |         misc.send_email(header, body, email)
274 | 
275 | 
276 | 
277 | if __name__ == '__main__':
278 |     command = sys.argv[1]
279 |     parser = argparse.ArgumentParser()
280 |     parser.add_argument('command')
281 | 
282 |     if command == 'generate':
283 |         parser.add_argument('--debug', action='store_true', default=False)
284 |         parser.add_argument('--search_depth', type=int, default=DEFAULT_SEARCH_DEPTH)
285 |         parser.add_argument('--prefix', type=str, default=DEFAULT_PREFIX)
286 |         args = parser.parse_args()
287 |         init_experiment(args.prefix, args.debug, args.search_depth)
288 | 
289 |     elif command == 'init':
290 |         parser.add_argument('level', type=int)
291 |         parser.add_argument('--prefix', type=str, default=DEFAULT_PREFIX)
292 |         experiments.add_scheduler_args(parser)
293 |         args = parser.parse_args()
294 |         init_level(args.prefix, args.level)
295 |         if args.level > 1:
296 |             experiments.run_jobs(initial_samples_jobs(args.prefix, args.level), args,
297 |                                  initial_samples_key(args.prefix, args.level))
298 | 
299 |     elif command == 'eval':
300 |         parser.add_argument('level', type=int)
301 |         parser.add_argument('--prefix', type=str, default=DEFAULT_PREFIX)
302 |         experiments.add_scheduler_args(parser)
303 |         args = parser.parse_args()
304 |         experiments.run_jobs(evaluation_jobs(args.prefix, args.level), args,
305 |                              evaluation_key(args.prefix, args.level))
306 |         collect_scores_for_level(args.prefix, args.level)
307 | 
308 |     elif command == 'final':
309 |         parser.add_argument('level', type=int)
310 |         parser.add_argument('--prefix', type=str, default=DEFAULT_PREFIX)
311 |         experiments.add_scheduler_args(parser)
312 |         args = parser.parse_args()
313 |         experiments.run_jobs(final_model_jobs(args.prefix, args.level), args,
314 |                              final_model_key(args.prefix))
315 | 
316 |     elif command == 'everything':
317 |         parser.add_argument('--prefix', type=str, default=DEFAULT_PREFIX)
318 |         experiments.add_scheduler_args(parser)
319 |         args = parser.parse_args()
320 |         run_everything(args.prefix, args)
321 | 
322 |     else:
323 |         raise RuntimeError('Unknown command: %s' % command)
324 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
1 | import distributions
2 | import gaussians
3 | import misc
4 | import profiler
5 | import psd_matrices
6 | import storage
7 | 


--------------------------------------------------------------------------------
/utils/distributions.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | nax = np.newaxis
  3 | import scipy.special
  4 | 
  5 | # temporary
  6 | ALPHA_CRP = 5
  7 | 
  8 | 
  9 | gammaln = scipy.special.gammaln
 10 | 
 11 | def uni_gauss_information_to_expectation(lam, J):
 12 |     sigma_sq = 1. / lam
 13 |     mu = -sigma_sq * J
 14 |     return sigma_sq, mu
 15 | 
 16 | def uni_gauss_expectation_to_information(sigma_sq, mu):
 17 |     lam = 1. / sigma_sq
 18 |     J = -lam * mu
 19 |     return lam, J
 20 | 
 21 | def gauss_loglik(x, mu, sigma_sq):
 22 |     return -0.5 * np.log(2*np.pi) - 0.5 * np.log(sigma_sq) \
 23 |            - 0.5 * (x - mu)**2 / sigma_sq
 24 | 
 25 | def sample_dirichlet(alpha):
 26 |     temp = np.random.gamma(alpha)
 27 |     return temp / np.sum(temp)
 28 | 
 29 | def dirichlet_loglik(alpha, U):
 30 |     norm = gammaln(alpha.sum(-1)) - gammaln(alpha).sum(-1)
 31 |     return norm + (U * np.log(alpha-1.)).sum(-1)
 32 | 
 33 | def dirichlet_multinomial_loglik(alpha, U):
 34 |     c = U.sum(0)
 35 |     assert alpha.ndim == 1 and alpha.shape == c.shape
 36 |     return gammaln(alpha + c).sum(-1) - gammaln(alpha).sum(-1) + \
 37 |            gammaln(alpha.sum()) - gammaln(alpha.sum() + c.sum())
 38 | 
 39 | 
 40 | def check_dirichlet_multinomial_loglik():
 41 |     U = np.array([[1, 0],
 42 |                   [1, 0],
 43 |                   [0, 1],
 44 |                   [1, 0]])
 45 |     alpha = np.array([1., 1.])
 46 |     assert np.allclose(dirichlet_multinomial_loglik(alpha, U), np.log(1./2 * 2./3 * 1./4 * 3./5))
 47 | 
 48 | def beta_bernoulli_loglik(alpha0, alpha1, U):
 49 |     M = U.shape[0]
 50 |     c = U.sum(0)
 51 |     assert alpha0.ndim == 1 and alpha0.shape == alpha1.shape == c.shape
 52 |     temp = gammaln(alpha0 + M - c) - gammaln(alpha0) + \
 53 |            gammaln(alpha1 + c) - gammaln(alpha1) + \
 54 |            gammaln(alpha0 + alpha1 ) - gammaln(alpha0 + alpha1 + M)
 55 |     return temp.sum()
 56 | 
 57 | def check_beta_bernoulli_loglik():
 58 |     U = np.array([[1, 0],
 59 |                   [1, 1],
 60 |                   [0, 1],
 61 |                   [0, 1]])
 62 |     alpha0 = np.array([2., 2.])
 63 |     alpha1 = np.array([1., 1.])
 64 |     result = beta_bernoulli_loglik(alpha0, alpha1, U)
 65 |     assert np.allclose(result, np.log(1./3) + np.log(2./4) + np.log(2./5) + np.log(3./6) +
 66 |                        np.log(2./3) + np.log(1./4) + np.log(2./5) + np.log(3./6))
 67 | 
 68 | 
 69 | 
 70 | class GammaDistribution:
 71 |     def __init__(self, a, b):
 72 |         if np.shape(a) != np.shape(b):
 73 |             raise RuntimeError('a and b should be the same shape')
 74 |         self.a = a
 75 |         self.b = b
 76 | 
 77 |     def expectation(self):
 78 |         return self.a / self.b
 79 | 
 80 |     def variance(self):
 81 |         return self.a / self.b**2
 82 | 
 83 |     def expectation_log(self):
 84 |         return scipy.special.basic.digamma(self.a) - np.log(self.b)
 85 | 
 86 |     def entropy(self):
 87 |         return scipy.special.gammaln(self.a) - (self.a - 1.) * scipy.special.basic.digamma(self.a) - np.log(self.b) + self.a
 88 | 
 89 |     def sample(self):
 90 |         return np.random.gamma(self.a, 1./self.b)
 91 |     
 92 |     def loglik(self, tau):
 93 |         return self.a * np.log(self.b) - scipy.special.gammaln(self.a) + (self.a - 1.) * np.log(tau) - self.b * tau
 94 | 
 95 |     def perturb(self, eps=1e-5):
 96 |         a = self.a * np.exp(np.random.normal(0., eps, size=self.a.shape))
 97 |         b = self.b * np.exp(np.random.normal(0., eps, size=self.b.shape))
 98 |         return GammaDistribution(a, b)
 99 | 
100 |     def copy(self):
101 |         try:
102 |             return GammaDistribution(self.a.copy(), self.b.copy())
103 |         except: # not arrays
104 |             return GammaDistribution(self.a, self.b)
105 | 
106 | class InverseGammaDistribution:
107 |     def __init__(self, a, b):
108 |         self.a = a
109 |         self.b = b
110 | 
111 |     def sample(self):
112 |         return 1. / np.random.gamma(self.a, 1. / self.b)
113 | 
114 |     def loglik(self, tau):
115 |         return GammaDistribution(self.a, self.b).loglik(1. / tau) - 2 * np.log(tau)
116 |     
117 |     
118 | class MultinomialDistribution:
119 |     def __init__(self, log_p):
120 |         # take log_p rather than p as an argument because of underflow
121 |         self.log_p = log_p
122 |         self.p = np.exp(log_p)
123 |         self.p /= self.p.sum(-1)[..., nax]    # should already be normalized, but sometimes numerical error causes problems
124 | 
125 |     def expectation(self):
126 |         return self.p
127 | 
128 |     def sample(self):
129 |         #return np.random.multinomial(1, self.p)
130 |         shape = self.p.shape[:-1]
131 |         pr = int(np.prod(shape))
132 |         p = self.p.reshape((pr, self.p.shape[-1]))
133 |         temp = np.array([np.random.multinomial(1, p[i, :])
134 |                          for i in range(pr)])
135 |         return temp.reshape(shape + (self.p.shape[-1],))
136 | 
137 |     def loglik(self, a):
138 |         a = np.array(a)
139 |         if not np.issubdtype(a.dtype, int):
140 |             raise RuntimeError('a must be an integer array')
141 |         if np.shape(a) != np.shape(self.p)[:a.ndim]:
142 |             raise RuntimeError('sizes do not match')
143 |         
144 |         if a.ndim == self.p.ndim:
145 |             if not (np.all((a == 0) + (a == 1)) and a.sum(-1) == 1):
146 |                 raise RuntimeError('a must be 1-of-n')
147 |             return np.sum(a * self.log_p)
148 |         elif a.ndim == self.p.ndim - 1:
149 |             shp = np.shape(self.log_p)[:-1]
150 |             size = np.prod(shp).astype(int)
151 |             log_p_ = self.log_p.reshape((size, np.shape(self.log_p)[-1]))
152 |             a_ = a.ravel()
153 |             result = log_p_[np.arange(size), a_]
154 |             return result.reshape(shp)
155 |         else:
156 |             raise RuntimeError('sizes do not match')
157 | 
158 |     def __slice__(self, slc):
159 |         return MultinomialDistribution(self.log_p[slc])
160 | 
161 |     @staticmethod
162 |     def from_odds(odds):
163 |         return MultinomialDistribution(odds - np.logaddexp.reduce(odds, axis=-1)[..., nax])
164 | 
165 | class BernoulliDistribution:
166 |     def __init__(self, odds):
167 |         self.odds = odds
168 | 
169 |     def _p(self):
170 |         return 1. / (1 + np.exp(-self.odds))
171 | 
172 |     def expectation(self):
173 |         return self._p()
174 | 
175 |     def variance(self):
176 |         p = self._p()
177 |         return p * (1. - p)
178 | 
179 |     def sample(self):
180 |         return np.random.binomial(1, self._p())
181 | 
182 |     def loglik(self, a):
183 |         if not np.issubdtype(a.dtype, int):
184 |             raise RuntimeError('a must be an integer array')
185 |         if not np.all((a==0) + (a==1)):
186 |             raise RuntimeError('a must be a binary array')
187 | 
188 |         log_p = -np.logaddexp(0., -self.odds)
189 |         log_1_minus_p = -np.logaddexp(0., self.odds)
190 |         return a * log_p + (1-a) * log_1_minus_p
191 | 
192 |     @staticmethod
193 |     def from_odds(odds):
194 |         return BernoulliDistribution(odds)
195 | 
196 | 
197 | 
198 | class GaussianDistribution:
199 |     def __init__(self, mu, sigma_sq):
200 |         self.mu = mu
201 |         self.sigma_sq = sigma_sq
202 | 
203 |     def loglik(self, x):
204 |         return -0.5 * np.log(2*np.pi) + \
205 |                -0.5 * np.log(self.sigma_sq) + \
206 |                -0.5 * (x - self.mu) ** 2 / self.sigma_sq
207 | 
208 |     def sample(self):
209 |         return np.random.normal(self.mu, self.sigma_sq)
210 | 
211 |     def maximize(self):
212 |         return self.mu
213 | 


--------------------------------------------------------------------------------
/utils/gaussians.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | nax = np.newaxis
  3 | 
  4 | import psd_matrices
  5 | 
  6 | #from profiler import profiled
  7 | import profiler
  8 | profiled = profiler.profiled('gaussians')
  9 | from misc import _err_string, process_slice, my_sum, match_shapes, dot, full_shape, broadcast, set_err_info, transp
 10 | 
 11 | 
 12 | class Potential():
 13 |     def __init__(self, J, Lambda, Z):
 14 |         J, Lambda, Z = match_shapes([('J', J, 1), ('Lambda', Lambda, 0), ('Z', Z, 0)])
 15 |         self._J = J
 16 |         self._Lambda = Lambda
 17 |         self._Z = Z
 18 |         self.shape = full_shape([J.shape[:-1], Lambda.shape, Z.shape])
 19 |         self.ndim = J.ndim - 1
 20 |         self.dim = J.shape[-1]
 21 |         self.shape_str = '%s J=%s Z=%s %s' % (Lambda.__class__, J.shape, Z.shape, Lambda.shape_str)
 22 |         self.mutable = False
 23 | 
 24 |     def set_mutable(self, m):
 25 |         # copy everything, just in case
 26 |         self._J = self._J.copy()
 27 |         self._Lambda = self._Lambda.copy()
 28 |         self._Z = self._Z.copy()
 29 | 
 30 |         self.mutable = m
 31 |         self._Lambda.set_mutable(m)
 32 |         
 33 | 
 34 |     @profiled
 35 |     def full(self):
 36 |         return Potential(self._J, self._Lambda.full(), self._Z)
 37 | 
 38 |     @profiled
 39 |     def copy(self):
 40 |         return Potential(self._J.copy(), self._Lambda.copy(), self._Z.copy())
 41 | 
 42 |     @profiled
 43 |     def score(self, x):
 44 |         return -0.5 * self._Lambda.qform(x) + (self._J * x).sum(-1) + self._Z
 45 | 
 46 |     @profiled
 47 |     def flip(self):
 48 |         return Potential(-self._J, self._Lambda, self._Z)
 49 | 
 50 |     @profiled
 51 |     def translate(self, dmu):
 52 |         new_J = self._J + self._Lambda.dot(dmu)
 53 |         linv = self._Lambda.pinv()
 54 |         new_Z = self._Z + 0.5 * linv.qform(self._J) - 0.5 * linv.qform(new_J)
 55 |         return Potential(new_J, self._Lambda, new_Z)
 56 | 
 57 |     def __getitem__(self, slc):
 58 |         return self.__slice__(slc)
 59 | 
 60 |     @profiled
 61 |     def __slice__(self, slc):
 62 |         J_slc = process_slice(slc, self._J.shape, 1)
 63 |         Lambda_slc = process_slice(slc, self._Lambda.shape, 0)
 64 |         Z_slc = process_slice(slc, self._Z.shape, 0)
 65 |         return Potential(self._J[J_slc], self._Lambda[Lambda_slc], self._Z[Z_slc])
 66 | 
 67 |     def __setitem__(self, slc, other):
 68 |         return self.__setslice__(slc, other)
 69 | 
 70 |     @profiled
 71 |     def __setslice__(self, slc, other):
 72 |         if not self.mutable:
 73 |             raise RuntimeError('Attempt to modify immutable potential')
 74 |         J_slc = process_slice(slc, self._J.shape, 1)
 75 |         Lambda_slc = process_slice(slc, self._Lambda.shape, 0)
 76 |         Z_slc = process_slice(slc, self._Z.shape, 0)
 77 |         self._J[J_slc] = other._J
 78 |         self._Lambda[Lambda_slc] = other._Lambda
 79 |         self._Z[Z_slc] = other._Z
 80 | 
 81 |     @profiled
 82 |     def __add__(self, other):
 83 |         return Potential(self._J + other._J, self._Lambda + other._Lambda, self._Z + other._Z)
 84 | 
 85 |     @profiled
 86 |     def __sub__(self, other):
 87 |         return Potential(self._J - other._J, self._Lambda - other._Lambda, self._Z - other._Z)
 88 | 
 89 |     @profiled
 90 |     def __mul__(self, other):
 91 |         other = np.asarray(other)
 92 |         return Potential(self._J * other[..., nax], self._Lambda * other, self._Z * other)
 93 | 
 94 |     @profiled
 95 |     def __rmul__(self, other):
 96 |         return self * other
 97 | 
 98 |     @profiled
 99 |     def sum(self, axis):
100 |         assert type(axis) == int and 0 <= axis < self.ndim
101 |         return Potential(my_sum(self._J, axis, self.shape[axis]),
102 |                                 my_sum(self._Lambda, axis, self.shape[axis]),
103 |                                 my_sum(self._Z, axis, self.shape[axis]))
104 | 
105 |     @profiled
106 |     def conv(self, other):
107 |         J1, J2, Lambda1, Lambda2, Z1, Z2 = self._J, other._J, self._Lambda, other._Lambda, self._Z, other._Z
108 |         LL = Lambda1 + Lambda2
109 |         P = LL.pinv()
110 |         Lambda_c = Lambda1.conv(Lambda2)
111 |         J_c = Lambda1.dot(P.dot(J2)) + Lambda2.dot(P.dot(J1))
112 |         Z_c = 0.5 * P.qform(J1 - J2) + 0.5 * self.dim * np.log(2*np.pi) - 0.5 * LL.logdet() + Z1 + Z2
113 |         return Potential(J_c, Lambda_c, Z_c)
114 | 
115 |     @profiled
116 |     def transform(self, A):
117 |         J = dot(transp(A), self._J)
118 |         Lambda = self._Lambda.alat(transp(A))
119 |         return Potential(J, Lambda, self._Z)
120 | 
121 |     @profiled
122 |     def rescale(self, a):
123 |         a = np.array(a)
124 |         J = a[..., nax] * self._J
125 |         Lambda = self._Lambda.rescale(a)
126 |         return Potential(J, Lambda, self._Z)
127 | 
128 | 
129 | 
130 |     @profiled
131 |     def integral(self):
132 |         J, Lambda, Z = self._J, self._Lambda, self._Z
133 |         linv = Lambda.pinv()
134 |         return 0.5 * self.dim * np.log(2*np.pi) - 0.5 * Lambda.logdet() + 0.5 * linv.qform(J) + Z
135 | 
136 |     @profiled
137 |     def renorm(self):
138 |         return Potential(self._J, self._Lambda, self._Z - self.integral())
139 | 
140 |     @profiled
141 |     def add_dummy_dimension(self):
142 |         J = np.zeros(self._J.shape[:-1] + (self.dim + 1,))
143 |         J[..., 1:] = self._J
144 |         Lambda = self._Lambda.add_dummy_dimension()
145 |         return Potential(J, Lambda, self._Z)
146 | 
147 |     @profiled
148 |     def to_eig(self):
149 |         return Potential(self._J, self._Lambda.to_eig(), self._Z)
150 | 
151 |     @staticmethod
152 |     @profiled
153 |     def from_moments(mu, Sigma):
154 |         return Distribution(mu, Sigma).to_potential()
155 | 
156 |     @staticmethod
157 |     @profiled
158 |     def from_moments_full(mu, Sigma):
159 |         return Distribution(mu, psd_matrices.FullMatrix(Sigma)).to_potential()
160 | 
161 |     @staticmethod
162 |     @profiled
163 |     def from_moments_diag(mu, sigma_sq):
164 |         return Distribution(mu, psd_matrices.DiagonalMatrix(sigma_sq)).to_potential()
165 | 
166 |     @staticmethod
167 |     @profiled
168 |     def from_moments_iso(mu, sigma_sq):
169 |         sigma_sq = np.asarray(sigma_sq)
170 |         return Distribution(mu, psd_matrices.EyeMatrix(sigma_sq, mu.shape[-1])).to_potential()
171 | 
172 |     @staticmethod
173 |     @profiled
174 |     def from_moments_eig(mu, d, Q, s_perp):
175 |         return Distribution(mu, psd_matrices.FixedEigMatrix(d, Q, s_perp)).to_potential()
176 | 
177 |     @profiled
178 |     def allclose(self, other):
179 |         J_err = _err_string(self._J, other._J)
180 |         Lambda_err = _err_string(self._Lambda.full()._S, other._Lambda.full()._S)
181 |         Z_err = _err_string(self._Z, other._Z)
182 |         set_err_info('gaussians', [('J', J_err), ('Lambda', Lambda_err), ('Z', Z_err)])
183 | 
184 |         return np.allclose(self._J, other._J) and \
185 |                self._Lambda.allclose(other._Lambda) and \
186 |                np.allclose(self._Z, other._Z)
187 | 
188 |     @profiled
189 |     def to_distribution(self):
190 |         Sigma = self._Lambda.inv()
191 |         mu = Sigma.dot(self._J)
192 |         Z = self._Z + 0.5 * self.dim * np.log(2*np.pi) + 0.5 * Sigma.logdet() + 0.5 * Sigma.qform(self._J)
193 |         return Distribution(mu, Sigma, Z)
194 | 
195 |     @staticmethod
196 |     def random(J_shape, Z_shape, Lambda, dim):
197 |         J = np.random.normal(size=J_shape + (dim,))
198 |         Z = np.random.normal(size=Z_shape)
199 |         return Potential(J, Lambda, Z)
200 | 
201 |     @profiled
202 |     def conditionals(self, X):
203 |         return Conditionals.from_potential(self, X)
204 | 
205 |     @profiled
206 |     def mu(self):
207 |         return self._Lambda.pinv().dot(self._J)
208 | 
209 |     
210 | class Distribution:
211 |     def __init__(self, mu, Sigma, Z=0.):
212 |         mu, Sigma, Z = match_shapes([('mu', mu, 1), ('Sigma', Sigma, 0), ('Z', Z, 0)])
213 |         self._mu = mu
214 |         self._Sigma = Sigma
215 |         self._Z = Z
216 |         self.dim = mu.shape[-1]
217 |         self.ndim = mu.ndim - 1
218 |         self.shape = full_shape([Sigma.shape, mu.shape[:-1], Z.shape])
219 |         self.shape_str = '%s mu=%s Z=%s %s' % (Sigma.__class__, mu.shape, Z.shape, Sigma.shape_str)
220 | 
221 |     def allclose(self, other):
222 |         return np.allclose(self._mu, other._mu) and \
223 |                np.allclose(self._Z, other._Z) and \
224 |                self._Sigma.allclose(other._Sigma)
225 | 
226 |     @profiled
227 |     def full(self):
228 |         return Distribution(self._mu, self._Sigma.full(), self._Z)
229 | 
230 |     @profiled
231 |     def __add__(self, other):
232 |         return Distribution(self._mu + other._mu, self._Sigma + other._Sigma, self._Z + other._Z)
233 | 
234 |     @profiled
235 |     def translate(self, dmu):
236 |         return Distribution(self._mu + dmu, self._Sigma, self._Z)
237 | 
238 |     @profiled
239 |     def to_potential(self):
240 |         Lambda = self._Sigma.inv()
241 |         J = Lambda.dot(self._mu)
242 |         Z = -0.5 * self.dim * np.log(2*np.pi) - 0.5 * self._Sigma.logdet() - 0.5 * self._Sigma.qform(J) + self._Z
243 |         return Potential(J, Lambda, Z)
244 | 
245 |     @profiled
246 |     def sample(self):
247 |         return self._mu + self._Sigma.sqrt_dot(np.random.normal(size=self.shape + (self.dim,)))
248 | 
249 |     @profiled
250 |     def transform(self, A):
251 |         return Distribution(dot(A, self._mu), self._Sigma.alat(A), self._Z)
252 | 
253 |     @profiled
254 |     def __slice__(self, slc):
255 |         mu_slc = process_slice(slc, self._mu.shape, 1)
256 |         Sigma_slc = process_slice(slc, self._Sigma.shape, 0)
257 |         Z_slc = process_slice(slc, self._Z.shape, 0)
258 |         return Distribution(self._mu[mu_slc], self._Sigma[Sigma_slc], self._Z[Z_slc])
259 | 
260 |     @profiled
261 |     def loglik(self, x):
262 |         return self.to_potential().score(x)
263 | 
264 |     @staticmethod
265 |     def from_moments_full(mu, Sigma, Z=0.):
266 |         return Distribution(mu, psd_matrices.FullMatrix(Sigma), Z)
267 | 
268 |     @staticmethod
269 |     def from_moments_diag(mu, sigma_sq, Z=0.):
270 |         return Distribution(mu, psd_matrices.FullMatrix(np.diag(sigma_sq)), Z)
271 | 
272 |     @staticmethod
273 |     def from_moments_iso(mu, sigma_sq, Z=0.):
274 |         dim = mu.shape[-1]
275 |         return Distribution(mu, psd_matrices.EyeMatrix(sigma_sq, dim), Z)
276 | 
277 |     def mu(self):
278 |         return self._mu
279 | 
280 |     def Sigma(self):
281 |         return self._Sigma.full()._S
282 | 
283 |     def Z(self):
284 |         return self._Z
285 | 
286 | 
287 | class Conditionals:
288 |     def __init__(self, Lambda, J_diff, Z_diff, X):
289 |         Lambda, J_diff, X = match_shapes([('Lambda', Lambda, 0), ('J_diff', J_diff, 1), ('X', X, 1)])
290 |         self._Lambda = Lambda
291 |         self._J_diff = J_diff.copy()
292 |         self._Z_diff = Z_diff.copy()
293 |         self._X = X.copy()
294 |         self.dim = self._J_diff.shape[-1]
295 |         self.ndim = self._J_diff.ndim - 1
296 |         self.shape = full_shape([Lambda.shape, J_diff.shape[:-1], X.shape[:-1]])
297 |         self.shape_str = '%s J_diff=%s X=%s %s' % (Lambda.__class__, J_diff.shape, X.shape, Lambda.shape_str)
298 | 
299 |         ## can't have EigMatrix of zero dimensions, since NumPy doesn't like zero-dimensional object arrays
300 |         #if self.shape == () and isinstance(Lambda, psd_matrices.EigMatrix):
301 |         #    self._Lambda = self._Lambda.full()
302 | 
303 |     def allclose(self, other):
304 |         return self._Lambda.allclose(other._Lambda) and \
305 |                np.allclose(self._J_diff, other._J_diff) and \
306 |                np.allclose(self._Z_diff, other._Z_diff) and \
307 |                np.allclose(self._X, other._X)
308 |         
309 |     @profiled
310 |     def __slice__(self, slc):
311 |         Lambda_slc = process_slice(slc, self._Lambda.shape, 0)
312 |         J_slc = process_slice(slc, self._J_diff.shape, 1)
313 |         Z_slc = process_slice(slc, self._Z_diff.shape, 0)
314 |         X_slc = process_slice(slc, self._X.shape, 1)
315 |         return Conditionals(self._Lambda[Lambda_slc], self._J_diff[J_slc], self._Z_diff[Z_slc], self._X[X_slc])
316 | 
317 |     @profiled
318 |     def conditional_for(self, i):
319 |         Lambda = psd_matrices.EyeMatrix(self._Lambda.elt(i, i), 1)
320 |         return Potential(self._J_diff[..., i:i+1].copy(), Lambda, self._Z_diff).translate(self._X[..., i:i+1])
321 | 
322 |     @profiled
323 |     def assign(self, j, x_new):
324 |         diff = x_new - self._X[..., j]
325 |         self._X[..., j] = x_new
326 |         self._Z_diff += self._J_diff[..., j] * diff + \
327 |                         -0.5 * self._Lambda.elt(j, j) * diff ** 2
328 |         self._J_diff -= diff[..., nax] * self._Lambda.col(j)
329 | 
330 | 
331 |     @profiled
332 |     def assign_one(self, idx, j, x_new):
333 |         if type(idx) == int:
334 |             idx = (idx,)
335 |         diff = x_new - self._X[idx + (j,)]
336 |         self._X[idx + (j,)] = x_new
337 |         Lambda_idx = broadcast(idx, self._Lambda.shape)
338 |         self._Z_diff[idx] += self._J_diff[idx + (j,)] * diff + \
339 |                              -0.5 * self._Lambda[Lambda_idx].elt(j, j) * diff ** 2
340 |         self._J_diff[idx + (slice(None),)] -= diff * self._Lambda[Lambda_idx].col(j)
341 | 
342 |     @staticmethod
343 |     @profiled
344 |     def from_potential(pot, X):
345 |         J_diff = pot._J - pot._Lambda.dot(X)
346 |         Z_diff = pot.score(X)
347 |         return Conditionals(pot._Lambda, J_diff, Z_diff, X)
348 | 
349 | 
350 | 


--------------------------------------------------------------------------------
/utils/misc.py:
--------------------------------------------------------------------------------
  1 | import collections
  2 | import itertools
  3 | import numpy as np
  4 | nax = np.newaxis
  5 | import progressbar
  6 | import scipy.linalg, scipy.integrate
  7 | import smtplib
  8 | import sys
  9 | import termcolor
 10 | 
 11 | 
 12 | def is_diag(A):
 13 |     return A.shape[0] == A.shape[1] and np.all(A == np.diag(np.diag(A)))
 14 | 
 15 | def my_svd(A):
 16 |     m, n = A.shape
 17 |     if is_diag(A):
 18 |         return np.eye(m), np.diag(A), np.eye(m)
 19 |     else:
 20 |         return scipy.linalg.svd(A, full_matrices=False)
 21 | 
 22 | def map_gaussian_matrix(A, B, C, d_1, d_2, d_3, d_4):
 23 |     """sample X, where P(X) \propto e^{-J(X)} and
 24 |     J(X) = 1/2 \|D_1(AXB - C)D_2\|^2 + 1/2 \|D_3 X D_4\|^2."""
 25 |     A_tilde = d_1[:, nax] * A / d_3[nax, :]
 26 |     B_tilde = (1. / d_4[:, nax]) * B * d_2[nax, :]
 27 |     C_tilde = d_1[:, nax] * C * d_2[nax, :]
 28 | 
 29 |     U_A, lambda_A, Vt_A = my_svd(A_tilde)
 30 |     V_A = Vt_A.T
 31 |     
 32 |     U_B, lambda_B, Vt_B = my_svd(B_tilde)
 33 |     V_B = Vt_B.T
 34 | 
 35 |     Lambda = lambda_A[:, nax] * lambda_B[nax, :]
 36 |     Y = Lambda * np.dot(np.dot(U_A.T, C_tilde), V_B) / (1. + Lambda**2)
 37 |     X_tilde = np.dot(np.dot(V_A, Y), U_B.T)
 38 |     X = (1. / d_3[:, nax]) * X_tilde * (1. / d_4[nax, :])
 39 | 
 40 |     return X
 41 | 
 42 | def map_gaussian_matrix_em(A, B, C, d_1, d_2, d_3, d_4, obs, X):
 43 |     C_ = np.where(obs, C, np.dot(np.dot(A, X), B))
 44 |     return map_gaussian_matrix(A, B, C_, d_1, d_2, d_3, d_4)
 45 |     
 46 | 
 47 | def sample_gaussian_matrix(A, B, C, d_1, d_2, d_3, d_4):
 48 |     """sample X, where P(X) \propto e^{-J(X)} and
 49 |     J(X) = 1/2 \|D_1(AXB - C)D_2\|^2 + 1/2 \|D_3 X D_4\|^2."""
 50 |     A_tilde = d_1[:, nax] * A / d_3[nax, :]
 51 |     B_tilde = (1. / d_4[:, nax]) * B * d_2[nax, :]
 52 |     C_tilde = d_1[:, nax] * C * d_2[nax, :]
 53 | 
 54 |     U_A, lambda_A, Vt_A = my_svd(A_tilde)
 55 |     V_A = Vt_A.T
 56 | 
 57 |     U_B, lambda_B, Vt_B = my_svd(B_tilde)
 58 |     V_B = Vt_B.T
 59 | 
 60 |     Lambda = lambda_A[:, nax] * lambda_B[nax, :]
 61 |     Y_mean = Lambda * np.dot(np.dot(U_A.T, C_tilde), V_B) / (1. + Lambda**2)
 62 |     Y_var = 1. / (1. + Lambda**2)
 63 |     Y = np.random.normal(Y_mean, np.sqrt(Y_var))
 64 |     X_tilde = np.dot(np.dot(V_A, Y), U_B.T)
 65 |     X = (1. / d_3[:, nax]) * X_tilde * (1. / d_4[nax, :])
 66 | 
 67 |     return X
 68 | 
 69 | def sample_gaussian_matrix_em(A, B, C, d_1, d_2, d_3, d_4, obs, X):
 70 |     C_ = np.where(obs, C, np.dot(np.dot(A, X), B))
 71 |     return sample_gaussian_matrix(A, B, C_, d_1, d_2, d_3, d_4)
 72 | 
 73 | 
 74 | 
 75 | def sample_gaussian_matrix2(A, B, W_X, W_N):
 76 |     nrows, ncols = A.shape[1], B.shape[1]
 77 |     X = np.zeros((nrows, ncols))
 78 |     for j in range(ncols):
 79 |         Lambda = np.dot(np.dot(A.T, np.diag(W_N[:,j])), A) + np.diag(W_X[:,j])
 80 |         Sigma = np.linalg.inv(Lambda)
 81 |         mu = mult([Sigma, A.T, W_N[:,j] * B[:,j]])
 82 |         X[:,j] = np.random.multivariate_normal(mu, Sigma)
 83 |     return X
 84 | 
 85 | def map_gaussian_matrix2(A, B, W_X, W_N):
 86 |     nrows, ncols = A.shape[1], B.shape[1]
 87 |     X = np.zeros((nrows, ncols))
 88 |     for j in range(ncols):
 89 |         Lambda = np.dot(np.dot(A.T, np.diag(W_N[:,j])), A) + np.diag(W_X[:,j])
 90 |         Sigma = np.linalg.inv(Lambda)
 91 |         mu = mult([Sigma, A.T, W_N[:,j] * B[:,j]])
 92 |         X[:,j] = mu
 93 |     return X
 94 | 
 95 | 
 96 | 
 97 | def mult(matrices):
 98 |     """Matrix multiplication"""
 99 |     prod = matrices[0]
100 |     for mat in matrices[1:]:
101 |         prod = np.dot(prod, mat)
102 |     return prod
103 | 
104 | 
105 | 
106 | def mean_field(J, Lambda, z_init=None):
107 |     n = J.size
108 |     assert J.shape == (n,) and Lambda.shape == (n, n)
109 |     if z_init is not None:
110 |         z = z_init.copy()
111 |     else:
112 |         z = np.zeros(n)
113 | 
114 |     # move quadratic potentials for one variable to unary terms
115 |     J = J + 0.5 * Lambda[range(n), range(n)]
116 |     Lambda[range(n), range(n)] = 0.
117 | 
118 |     for tr in range(100):
119 |         for j in range(n):
120 |             Lambda_term = np.dot(Lambda, z)
121 |             odds = -J - Lambda_term
122 |             odds = odds.clip(-100., 100.)   # to avoid the overflow warnings
123 |             z_new = 1. / (1. + np.exp(-odds))
124 |             z[j] = 0.8*z[j] + 0.2*z_new[j]
125 | 
126 |     return z
127 | 
128 | NEWLINE_EVERY = 50
129 | dummy_count = [0]
130 | def print_dot(count=None, max=None):
131 |     print_count = (count is not None)
132 |     if count is None:
133 |         dummy_count[0] += 1
134 |         count = dummy_count[0]
135 |     sys.stdout.write('.')
136 |     sys.stdout.flush()
137 |     if count % NEWLINE_EVERY == 0:
138 |         if print_count:
139 |             if max is not None:
140 |                 sys.stdout.write(' [%d/%d]' % (count, max))
141 |             else:
142 |                 sys.stdout.write(' [%d]' % count)
143 |         sys.stdout.write('\n')
144 |     elif count == max:
145 |         sys.stdout.write('\n')
146 |     sys.stdout.flush()
147 | 
148 | 
149 | 
150 | def sample_noise(N, obs=None, b0=1.):
151 |     if obs is None:
152 |         obs = np.ones(N.shape, dtype=bool)
153 |         
154 |     nrows, ncols = N.shape
155 |     ssq_rows, ssq_cols = sample_noise_tied(N, obs, b0)
156 |     lambda_rows = 1. / ssq_rows
157 |     lambda_cols = 1. / ssq_cols
158 | 
159 |     a0 = 1.
160 | 
161 |     for tr in range(10):
162 |         a = a0 + 0.5 * obs.sum(1)
163 |         b = b0 + 0.5 * np.sum(obs * N**2 * lambda_cols[nax,:], axis=1)
164 |         lambda_rows = np.random.gamma(a, 1. / b)
165 | 
166 |         if np.isscalar(lambda_rows):  # np.random.gamma converts singleton arrays into scalars
167 |             lambda_rows = np.array([lambda_rows])
168 | 
169 |         a = a0 + 0.5 * obs.sum(0)
170 |         b = b0 + 0.5 * np.sum(obs * N**2 * lambda_rows[:,nax], axis=0)
171 |         lambda_cols = np.random.gamma(a, 1. / b)
172 | 
173 |         if np.isscalar(lambda_cols):
174 |             lambda_cols = np.array([lambda_cols])
175 | 
176 |     return 1. / lambda_rows, 1. / lambda_cols
177 | 
178 | def sample_noise_tied(N, obs=None, b0=1.):
179 |     if obs is None:
180 |         obs = np.ones(N.shape, dtype=bool)
181 |     nrows, ncols = N.shape
182 |     a0 = 1.
183 |     a = a0 + 0.5 * obs.sum()
184 |     b = b0 + 0.5 * np.sum(obs * N**2)
185 |     prec = np.random.gamma(a, 1. / b)
186 | 
187 |     return np.ones(nrows) / np.sqrt(prec), np.ones(ncols) / np.sqrt(prec)
188 | 
189 | def sample_col_noise(N):
190 |     nrows, ncols = N.shape
191 |     A0 = 1.
192 |     B0 = 1.
193 |     B0 = np.mean(N**2)   # UNDO
194 |     a = A0 + 0.5 * nrows
195 |     b = B0 + 0.5 * np.sum(N**2, axis=0)
196 |     return 1. / np.random.gamma(a, 1. / b)
197 |         
198 | 
199 | 
200 | 
201 | def kalman_filter_diag(mu_0, sigma_sq_0, sigma_sq_v, lam, y):
202 |     ndim, ntime = y.shape
203 |     mu_forward = np.zeros((ndim, ntime))
204 |     sigma_sq_forward = np.zeros((ndim, ntime))
205 |     mu_forward[:, 0] = mu_0
206 |     sigma_sq_forward[:, 0] = sigma_sq_0
207 | 
208 |     a = b = 1.
209 | 
210 |     mu = np.zeros((ndim, ntime))
211 |     sigma_sq = np.zeros((ndim, ntime))
212 | 
213 |     # forward propagation
214 |     for t in range(ntime):
215 |         # execute dynamics
216 |         if t > 0:
217 |             mu_forward[:, t] = a * mu[:, t-1]
218 |             sigma_sq_forward[:, t] = a**2 * sigma_sq[:, t-1] + sigma_sq_v
219 | 
220 |         # account for observations
221 |         lambda_post = 1. / sigma_sq_forward[:, t] + b**2 * lam[:, t]
222 |         h_post = mu_forward[:, t] / sigma_sq_forward[:, t] + \
223 |                  b * y[:, t] * lam[:, t]
224 |         mu[:, t] = h_post / lambda_post
225 |         sigma_sq[:, t] = 1. / lambda_post
226 | 
227 |     h_backward = np.zeros((ndim, ntime))
228 |     lambda_backward = np.zeros((ndim, ntime))
229 | 
230 |     # backward_propagation
231 |     for t in range(ntime-1)[::-1]:
232 |         lambda_post = lambda_backward[:, t+1] + b**2 * lam[:, t+1]
233 |         h_post = h_backward[:, t+1] + b * lam[:, t+1] * y[:, t+1]
234 | 
235 |         lambda_backward[:, t] = a**2 / (sigma_sq_v + 1. / lambda_post)
236 |         h_backward[:, t] = a * h_post /  (sigma_sq_v * lambda_post + 1.)
237 |         
238 |     # combine both directions
239 |     lambda_forward = 1. / sigma_sq_forward
240 |     h_forward = mu_forward / sigma_sq_forward
241 |     lambda_post = lambda_forward + lambda_backward + b**2 * lam
242 |     h_post = h_forward + h_backward + b * lam * y
243 |     sigma_sq_post = 1. / lambda_post
244 |     mu_post = h_post / lambda_post
245 | 
246 |     assert np.all(np.isfinite(mu_post))
247 | 
248 |     return mu_post, sigma_sq_post
249 | 
250 | def kalman_filter_codiag(mu_0, sigma_sq_0, sigma_sq_v, Lambda, y, mask):
251 |     assert np.isscalar(sigma_sq_0) and np.isscalar(sigma_sq_v)
252 |     ndim, ntime = y.shape
253 |     d, Q = scipy.linalg.eigh(Lambda)
254 |     mu_0_proj = np.dot(Q.T, mu_0)
255 |     y_proj = np.dot(Q.T, y)
256 |     lam = d[:, nax] * mask[nax, :]
257 |     mu_post_proj, sigma_sq_post_proj = kalman_filter_diag(
258 |         mu_0_proj, sigma_sq_0, sigma_sq_v, lam, y_proj)
259 |     mu_post = np.dot(Q, mu_post_proj)
260 |     Sigma_post = np.array([np.dot(Q, np.dot(np.diag(sigma_sq_post_proj[:, t]), Q.T))
261 |                            for t in range(ntime)]).T
262 |     return mu_post, Sigma_post
263 | 
264 | def kalman_filter_codiag2(mu_0, Sigma_v, Lambda, y, mask):
265 |     ndim, ntime = y.shape
266 |     d, Q = scipy.linalg.eigh(Sigma_v)
267 |     idxs = np.where(d > 1e-6)[0]
268 |     d, Q = d[idxs], Q[:, idxs]
269 |     sqrt_d = d ** 0.5
270 |     S = np.dot(Q, np.diag(sqrt_d))
271 | 
272 |     mu_0_trans = np.dot(Q.T, mu_0) / sqrt_d
273 |     Lambda_trans = np.dot(S.T, np.dot(Lambda, S))
274 |     y_trans = np.dot(Q.T, y) / sqrt_d[:, nax]
275 |     mu_trans, Sigma_trans = kalman_filter_codiag(mu_0_trans, 1e5, 1., Lambda_trans, y_trans, mask)
276 |     mu = np.dot(S, mu_trans)
277 |     Sigma = np.array([np.dot(S, np.dot(Sigma_trans[:, :, t], S.T))
278 |                       for t in range(ntime)]).T
279 |     return mu, Sigma
280 |     
281 | 
282 |     
283 | def logdet(A):
284 |     """Compute the log-determinant of a symmetric positive definite matrix A using the Cholesky factorization."""
285 |     L = np.linalg.cholesky(A)
286 |     return 2 * np.sum(np.log(np.diag(L)))
287 | 
288 | def slice_list(lst, slc):
289 |     """Slice a Python list as if it were an array."""
290 |     if isinstance(slc, np.ndarray):
291 |         slc = slc.ravel()
292 |     idxs = np.arange(len(lst))[slc]
293 |     return [lst[i] for i in idxs]
294 | 
295 | def extract_slices(slc):
296 |     if type(slc) == tuple:
297 |         result = []
298 |         for s in slc:
299 |             if isinstance(s, np.ndarray):
300 |                 result.append(s.ravel())
301 |             else:
302 |                 result.append(s)
303 |         return tuple(result)
304 |     else:
305 |         return slc
306 | 
307 | def _err_string(arr1, arr2):
308 |     try:
309 |         if np.allclose(arr1, arr2):
310 |             return 'OK'
311 |         elif arr1.shape == arr2.shape:
312 |             return 'off by %s' % np.abs(arr1 - arr2).max()
313 |         else:
314 |             return 'incorrect shapes: %s and %s' % (arr1.shape, arr2.shape)
315 |     except:
316 |         return 'error comparing'
317 | 
318 | err_info = collections.defaultdict(list)
319 | def set_err_info(key, info):
320 |     err_info[key] = info
321 | 
322 | def summarize_error(key):
323 |     """Print a helpful description of the reason a condition was not satisfied. Intended usage:
324 |         assert pot1.allclose(pot2), summarize_error()"""
325 |     if type(err_info[key]) == str:
326 |         return '    ' + err_info[key]
327 |     else:
328 |         return '\n' + '\n'.join(['    %s: %s' % (name, err) for name, err in err_info[key]]) + '\n'
329 | 
330 | 
331 | def broadcast(idx, shape):
332 |     result = []
333 |     for i, d in zip(idx, shape):
334 |         if d == 1:
335 |             result.append(0)
336 |         else:
337 |             result.append(i)
338 |     return tuple(result)
339 | 
340 | def full_shape(shapes):
341 |     """The shape of the full array that results from broadcasting the arrays of the given shapes."""
342 |     return tuple(np.array(shapes).max(0))
343 | 
344 | 
345 | def array_map(fn, arrs, n):
346 |     """Takes a list of arrays a_1, ..., a_n where the elements of the first n dimensions line up. For every possible
347 |     index into the first n dimensions, apply fn to the corresponding slices, and combine the results into
348 |     an n-dimensional array. Supports broadcasting but does not prepend 1's to the shapes."""
349 |     # we shouldn't need a special case for n == 0, but NumPy complains about indexing into a zero-dimensional
350 |     # array a using a[(Ellipsis,)].
351 |     if n == 0:
352 |         return fn(*arrs)
353 |     
354 |     full_shape = tuple(np.array([a.shape[:n] for a in arrs]).max(0))
355 |     result = None
356 |     for full_idx in itertools.product(*map(range, full_shape)):
357 |         inputs = [a[broadcast(full_idx, a.shape[:n]) + (Ellipsis,)] for a in arrs]
358 |         curr = fn(*inputs)
359 |         
360 |         if result is None:
361 |             if type(curr) == tuple:
362 |                 result = tuple(np.zeros(full_shape + np.asarray(c).shape) for c in curr)
363 |             else:
364 |                 result = np.zeros(full_shape + np.asarray(curr).shape)
365 | 
366 |         if type(curr) == tuple:
367 |             for i, c in enumerate(curr):
368 |                 result[i][full_idx + (Ellipsis,)] = c
369 |         else:
370 |             result[full_idx + (Ellipsis,)] = curr
371 |     return result
372 | 
373 | def extend_slice(slc, n):
374 |     if not isinstance(slc, tuple):
375 |         slc = (slc,)
376 |     if any([isinstance(s, np.ndarray) for s in slc]):
377 |         raise NotImplementedError('Advanced slicing not implemented yet')
378 |     return slc + (slice(None),) * n
379 | 
380 | def process_slice(slc, shape, n):
381 |     """Takes a slice and returns the appropriate slice into an array that's being broadcast (i.e. by
382 |     converting the appropriate entries to 0's and :'s."""
383 |     if not isinstance(slc, tuple):
384 |         slc = (slc,)
385 |     slc = list(slc)
386 |     ndim = len(shape) - n
387 |     assert ndim >= 0
388 |     shape_idx = 0
389 |     for slice_idx, s in enumerate(slc):
390 |         if s == nax:
391 |             continue
392 |         if shape[shape_idx] == 1:
393 |             if type(s) == int:
394 |                 slc[slice_idx] = 0
395 |             else:
396 |                 slc[slice_idx] = slice(None)
397 |         shape_idx += 1
398 |     if shape_idx != ndim:
399 |         raise IndexError('Must have %d terms in the slice object' % ndim)
400 |     return extend_slice(tuple(slc), n)
401 | 
402 | def my_sum(a, axis, count):
403 |     """For an array a which might be broadcast, return the value of a.sum() were a to be expanded out in full."""
404 |     if a.shape[axis] == count:
405 |         return a.sum(axis)
406 |     elif a.shape[axis] == 1:
407 |         return count * a.sum(axis)
408 |     else:
409 |         raise IndexError('Cannot be broadcast: a.shape=%s, axis=%d, count=%d' % (a.shape, axis, count))
410 |         
411 |     
412 | 
413 | def match_shapes(arrs):
414 |     """Prepend 1's to the shapes so that the dimensions line up."""
415 |     #temp = [(name, np.asarray(a), deg) for name, a, deg in arrs]
416 |     #ndim = max([a.ndim - deg for _, a, deg in arrs])
417 | 
418 |     temp = [a for name, a, deg in arrs]
419 |     for i in range(len(temp)):
420 |         if np.isscalar(temp[i]):
421 |             temp[i] = np.array(temp[i])
422 |     ndim = max([a.ndim - deg for a, (_, _, deg) in zip(temp, arrs)])
423 | 
424 |     prep_arrs = []
425 |     for name, a, deg in arrs:
426 |         if np.isscalar(a):
427 |             a = np.asarray(a)
428 |         if a.ndim < deg:
429 |             raise RuntimeError('%s.ndim must be at least %d' % (name, deg))
430 |         if a.ndim < ndim + deg:
431 |             #a = a.reshape((1,) * (ndim + deg - a.ndim) + a.shape)
432 |             slc = (nax,) * (ndim + deg - a.ndim) + (Ellipsis,)
433 |             a = a[slc]
434 |         prep_arrs.append(a)
435 | 
436 |     return prep_arrs
437 |     
438 | def lstsq(A, b):
439 |     # do this rather than call lstsq to support efficient broadcasting
440 |     P = array_map(np.linalg.pinv, [A], A.ndim - 2)
441 |     return array_map(np.dot, [P, b], A.ndim - 2)
442 | 
443 | def dot(A, b):
444 |     return array_map(np.dot, [A, b], A.ndim - 2)
445 | 
446 | def vdot(x, y):
447 |     return (x*y).sum(-1)
448 | 
449 | def transp(A):
450 |     return A.swapaxes(-2, -1)
451 | 
452 | 
453 | def get_counts(array, n):
454 |     result = np.zeros(n, dtype=int)
455 |     ans = np.bincount(array)
456 |     result[:ans.size] = ans
457 |     return result
458 | 
459 | 
460 | def log_erfc_helper(x):
461 |     p = 0.47047
462 |     a1 = 0.3480242
463 |     a2 = -0.0958798
464 |     a3 = 0.7478556
465 |     t = 1. / (1 + p*x)
466 |     return np.log(a1 * t + a2 * t**2 + a3 * t**3) - x ** 2
467 | 
468 | def log_erfc(x):
469 |     return np.where(x > 0., log_erfc_helper(x), np.log(2. - np.exp(log_erfc_helper(-x))))
470 | 
471 | def log_inv_probit(x):
472 |     return log_erfc(-x / np.sqrt(2.)) - np.log(2.)
473 | 
474 | def inv_probit(x):
475 |     return 0.5 * scipy.special.erfc(-x / np.sqrt(2.))
476 | 
477 | def log_erfcinv(log_y):
478 |     a = 0.140012
479 |     log_term = log_y + np.log(2 - np.exp(log_y))
480 | 
481 |     temp1 = 2 / (np.pi * a) + 0.5 * log_term
482 |     temp2 = temp1 ** 2 - log_term / a
483 |     temp3 = np.sqrt(temp2) - temp1
484 |     return np.sign(1. - np.exp(log_y)) * np.sqrt(temp3)
485 | 
486 | def log_probit(log_p):
487 |     return -np.sqrt(2) * log_erfcinv(log_p + np.log(2))
488 | 
489 | def probit(p):
490 |     return -np.sqrt(2) * scipy.special.erfcinv(2 * p)
491 | 
492 | 
493 | def check_close(a, b):
494 |     if not np.allclose([a], [b]):   # array brackets to avoid an error comparing inf and inf
495 |         if np.isscalar(a) and np.isscalar(b):
496 |             raise RuntimeError('a=%f, b=%f' % (a, b))
497 |         else:
498 |             raise RuntimeError('Off by %f' % np.max(np.abs(a - b)))
499 | 
500 | COLORS = ['red', 'green', 'yellow', 'blue', 'magenta', 'cyan']
501 | 
502 | def print_integers_colored(a):
503 |     print '[',
504 |     for ai in a:
505 |         color = COLORS[ai % len(COLORS)]
506 |         print termcolor.colored(str(ai), color, attrs=['bold']),
507 |     print ']'
508 | 
509 | def pbar(maxval):
510 |     widgets = [progressbar.Percentage(), ' ', progressbar.Bar(), progressbar.ETA()]
511 |     return progressbar.ProgressBar(widgets=widgets, maxval=maxval).start()
512 | 
513 | 
514 | def send_email(header, body, address):
515 |     msg = '\r\n'.join(['From: %s' % address,
516 |                        'To: %s' % address,
517 |                        'Subject: %s' % header,
518 |                        '',
519 |                        body])
520 | 
521 |     s = smtplib.SMTP('localhost')
522 |     s.sendmail(address, [address], msg)
523 |     s.quit()
524 | 
525 |     
526 | 


--------------------------------------------------------------------------------
/utils/profiler.py:
--------------------------------------------------------------------------------
 1 | import collections
 2 | import functools
 3 | import sys
 4 | import time
 5 | 
 6 | ENABLE_PROFILER = False
 7 | TOP_ONLY = True
 8 | 
 9 | depth = collections.defaultdict(int)
10 | counts = collections.defaultdict(lambda: collections.defaultdict(int))
11 | total_time = collections.defaultdict(lambda: collections.defaultdict(float))
12 | 
13 | def reset(category=None):
14 |     global counts, total_time
15 |     if category is None:
16 |         counts = collections.defaultdict(lambda: collections.defaultdict(int))
17 |         total_time = collections.defaultdict(lambda: collections.defaultdict(float))
18 |     else:
19 |         counts[category] = collections.defaultdict(int)
20 |         total_time[category] = collections.defaultdict(float)
21 | 
22 | def get_key(name, args):
23 |     k = []
24 |     for arg in args:
25 |         if hasattr(arg, 'shape_str'):
26 |             k.append((str(arg.__class__), arg.shape_str))
27 |         elif hasattr(arg, 'shape'):
28 |             k.append((str(arg.__class__), arg.shape))
29 |     return (name,) + tuple(k)
30 | 
31 | 
32 | class profiled:
33 |     def __init__(self, category):
34 |         self.category = category
35 | 
36 |     def __call__(self, fn):
37 |         if not ENABLE_PROFILER:
38 |             return fn
39 | 
40 |         name = fn.__name__
41 | 
42 |         @functools.wraps(fn)
43 |         def profiled_fn(*args, **kwargs):
44 |             global depth
45 |             t0 = time.clock()
46 |             depth[self.category] += 1
47 |             ans = fn(*args, **kwargs)
48 |             depth[self.category] -= 1
49 |             if depth[self.category] == 0 or not TOP_ONLY:
50 |                 key = get_key(name, args)
51 |                 counts[self.category][key] += 1
52 |                 total_time[self.category][key] += time.clock() - t0
53 |             return ans
54 | 
55 |         return profiled_fn
56 | 
57 | 
58 | def summarize(category, cutoff=0.5, outstr=sys.stdout):
59 |     tt = total_time[category]
60 |     c = counts[category]
61 |     srtd = sorted(tt.keys(), key=lambda k: tt[k], reverse=True)
62 |     for k in srtd:
63 |         if tt[k] < cutoff:
64 |             continue
65 |         print >> outstr, '%1.2f seconds for %d calls' % (tt[k], c[k])
66 |         print >> outstr, k[0]
67 |         for tp, sz in k[1:]:
68 |             print >> outstr, '    %s %s' % (tp, sz)
69 |         print >> outstr
70 | 
71 | 
72 |     
73 | 


--------------------------------------------------------------------------------
/utils/storage.py:
--------------------------------------------------------------------------------
 1 | import cPickle
 2 | import os
 3 | 
 4 | def ensure_directory(d, trial=False):
 5 |     parts = d.split('/')
 6 |     for i in range(2, len(parts)+1):
 7 |         fname = '/'.join(parts[:i])
 8 |         if not os.path.exists(fname):
 9 |             print 'Creating', fname
10 |             if not trial:
11 |                 try:
12 |                     os.mkdir(fname)
13 |                 except:
14 |                     pass
15 | 
16 | def load(fname):
17 |     return cPickle.load(open(fname))
18 | 
19 | def dump(obj, fname):
20 |     d, f = os.path.split(fname)
21 |     ensure_directory(d)
22 |     cPickle.dump(obj, open(fname, 'w'), protocol=2)
23 | 
24 | 
25 | def exists(fname):
26 |     return os.path.exists(fname)
27 | 
28 | def mkdir(dirname):
29 |     os.mkdir(dirname)
30 | 
31 | def join(*args):
32 |     return os.path.join(*args)
33 | 
34 | 


--------------------------------------------------------------------------------