├── .gitignore
├── LICENSE.txt
├── README.org
├── csnanalysis
    ├── __init__.py
    ├── csn.py
    └── matrix.py
├── examples
    ├── committor_net_3state.png
    ├── examples.ipynb
    ├── matrix.npz
    └── state_U.dat
├── requirements.in
└── setup.py


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Created by https://www.gitignore.io/api/python
  2 | # Edit at https://www.gitignore.io/?templates=python
  3 | 
  4 | ### Python ###
  5 | # Byte-compiled / optimized / DLL files
  6 | __pycache__/
  7 | *.py[cod]
  8 | *$py.class
  9 | 
 10 | # C extensions
 11 | *.so
 12 | 
 13 | # Distribution / packaging
 14 | .Python
 15 | build/
 16 | develop-eggs/
 17 | dist/
 18 | downloads/
 19 | eggs/
 20 | .eggs/
 21 | lib/
 22 | lib64/
 23 | parts/
 24 | sdist/
 25 | var/
 26 | wheels/
 27 | pip-wheel-metadata/
 28 | share/python-wheels/
 29 | *.egg-info/
 30 | .installed.cfg
 31 | *.egg
 32 | MANIFEST
 33 | 
 34 | # PyInstaller
 35 | #  Usually these files are written by a python script from a template
 36 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 37 | *.manifest
 38 | *.spec
 39 | 
 40 | # Installer logs
 41 | pip-log.txt
 42 | pip-delete-this-directory.txt
 43 | 
 44 | # Unit test / coverage reports
 45 | htmlcov/
 46 | .tox/
 47 | .nox/
 48 | .coverage
 49 | .coverage.*
 50 | .cache
 51 | nosetests.xml
 52 | coverage.xml
 53 | *.cover
 54 | .hypothesis/
 55 | .pytest_cache/
 56 | 
 57 | # Translations
 58 | *.mo
 59 | *.pot
 60 | 
 61 | # Scrapy stuff:
 62 | .scrapy
 63 | 
 64 | # Sphinx documentation
 65 | docs/_build/
 66 | 
 67 | # PyBuilder
 68 | target/
 69 | 
 70 | # pyenv
 71 | .python-version
 72 | 
 73 | # pipenv
 74 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 75 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 76 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 77 | #   install all needed dependencies.
 78 | #Pipfile.lock
 79 | 
 80 | # celery beat schedule file
 81 | celerybeat-schedule
 82 | 
 83 | # SageMath parsed files
 84 | *.sage.py
 85 | 
 86 | # Spyder project settings
 87 | .spyderproject
 88 | .spyproject
 89 | 
 90 | # Rope project settings
 91 | .ropeproject
 92 | 
 93 | # Mr Developer
 94 | .mr.developer.cfg
 95 | .project
 96 | .pydevproject
 97 | 
 98 | # mkdocs documentation
 99 | /site
100 | 
101 | # mypy
102 | .mypy_cache/
103 | .dmypy.json
104 | dmypy.json
105 | 
106 | # Pyre type checker
107 | .pyre/
108 | 
109 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2016 Samuel D. Lotz
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.org:
--------------------------------------------------------------------------------
 1 | * CSNAnalysis: Tools for creating, analyzing and visualizing Conformation Space Networks.
 2 | 
 3 | CSNAnalysis is a set of tools for network-based analysis of molecular dynamics trajectories.
 4 | To use, initialize a `CSN` object using a matrix of transition counts.
 5 | The "killer app" of CSNAnalysis is an easy interface between enhanced sampling algorithms 
 6 | (e.g. WExplore), molecular clustering programs (e.g. MSMBuilder), graph analysis packages (e.g. networkX) 
 7 | and graph visualization programs (e.g. Gephi).
 8 | 
 9 | CSNAnalysis is currently in beta.
10 | 
11 | * Installation
12 | 
13 | To install CSNAnalysis, you can get the latest:
14 | 
15 | #+begin_src bash
16 |   pip install git+https://github.com/ADicksonLab/CSNAnalysis
17 | #+end_src
18 | 
19 | Or one of the releases:
20 | 
21 | #+begin_src bash
22 |   pip install git+https://github.com/ADicksonLab/CSNAnalysis@v0.3
23 | #+end_src
24 | 
25 | * Dependencies
26 | - numpy
27 | - scipy
28 | - networkx
29 | 
30 | * Features
31 | CSNAnalysis has the following capabilities:
32 | 
33 | - constructing transition probability matrices
34 | - trimming CSNs using a variety of criteria
35 | - computing committor probabilities with an arbitrary number of basins
36 | - export gexf files with custom node colorings
37 | 
38 | * Tutorial
39 | See the Jupyter Notebook in examples/examples.ipynb
40 | 
41 | * Misc
42 | ** Versioning
43 | 
44 | See [[http://semver.org/]] for version number meanings.
45 | 
46 | Version 1.0.0 will be released whenever the abstract layer API is stable. Subsequent 1.X.y releases will be made as applied and porcelain layer features are added.
47 | 


--------------------------------------------------------------------------------
/csnanalysis/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ADicksonLab/CSNAnalysis/7700653374937c179a441c656f0783f80e44c26d/csnanalysis/__init__.py


--------------------------------------------------------------------------------
/csnanalysis/csn.py:
--------------------------------------------------------------------------------
  1 | import itertools
  2 | from copy import deepcopy
  3 | 
  4 | import scipy
  5 | import networkx as nx
  6 | import numpy as np
  7 | 
  8 | from csnanalysis.matrix import (
  9 |     count_to_trans,
 10 |     symmetrize_matrix,
 11 |     eig_weights,
 12 |     mult_weights,
 13 |     committor,
 14 |     committor_linalg,
 15 |     get_eigenvectors,
 16 |     well_conditioned,
 17 |     fptd,
 18 | )
 19 | 
 20 | class CSN(object):
 21 | 
 22 |     def __init__(self, counts, symmetrize=False):
 23 |         """
 24 |         Initializes a CSN object using a counts matrix.  This can either be a numpy array,
 25 |         a scipy sparse matrix, or a list of lists. Indices: [to][from], (or, [row][column]).
 26 |         """
 27 |         if type(counts) is list:
 28 |             self.countmat = scipy.sparse.coo_matrix(counts)
 29 |         elif type(counts) is np.ndarray:
 30 |             self.countmat = scipy.sparse.coo_matrix(counts)
 31 |         elif type(counts) is scipy.sparse.coo.coo_matrix:
 32 |             self.countmat = counts
 33 |         else:
 34 |             try:
 35 |                 self.countmat = counts.tocoo()
 36 |             except:
 37 |                 raise TypeError("Count matrix is of unsupported type: ",type(counts))
 38 | 
 39 |         if self.countmat.shape[0] != self.countmat.shape[1]:
 40 |             raise ValueError("Count matrix is not square: ",self.countmat.shape)
 41 | 
 42 |         totcounts = self.countmat.sum(axis=1).tolist()
 43 | 
 44 |         self.symmetrize = symmetrize
 45 |         if self.symmetrize:
 46 |             self.countmat = symmetrize_matrix(self.countmat)
 47 | 
 48 |         self.nnodes = self.countmat.shape[0]
 49 |         self.transmat = count_to_trans(self.countmat)
 50 | 
 51 |         self.trim_transmat = None
 52 | 
 53 |         # initialize networkX directed graph
 54 |         self.graph = nx.DiGraph()
 55 |         labels = [{'label' : i, 'count' : int(totcounts[i][0])} for i in range(self.nnodes)]
 56 |         self.graph.add_nodes_from(zip(range(self.nnodes),labels))
 57 |         self.graph.add_weighted_edges_from(zip(self.transmat.col,self.transmat.row,100*self.transmat.data))
 58 | 
 59 |         # remove self edges from graph
 60 |         self_edges = [(i,i) for i in range(self.nnodes)]
 61 |         self.graph.remove_edges_from(self_edges)
 62 | 
 63 |     def to_gephi_csv(self, cols='all', node_name='node.csv', edge_name='edge.csv', directed=False):
 64 |         """
 65 |         Writes node and edge files for import into the Gephi network visualization program.
 66 | 
 67 |         cols  --  A list of columns that should be written to the node file.  ID and label are
 68 |                   included by default.  'all' will include every attribute attached to the
 69 |                   nodes in self.graph.
 70 | 
 71 |         """
 72 |         if cols == 'all':
 73 |             cols = list(self.graph.node[0].keys())
 74 |         else:
 75 |             if 'label' not in cols:
 76 |                 cols = ['label'] + cols
 77 |             if 'ID' not in cols:
 78 |                 cols = ['ID'] + cols
 79 | 
 80 |         with open(node_name,mode='w') as f:
 81 |             f.write(" ".join(cols)+"\n")
 82 |             for i in range(self.nnodes):
 83 |                 data = [str(self.graph.node[i][c]) for c in cols]
 84 |                 f.write(' '.join(data)+"\n")
 85 | 
 86 |         # compute edge weights
 87 |         if directed:
 88 |             with open(edge_name,mode='w') as f:
 89 |                 f.write("source target type prob i_weight\n")
 90 |                 for (from_ind, to_ind, weight_dict) in self.graph.edges.data():
 91 |                     wt = weight_dict['weight']
 92 |                     f.write("{0:d} {1:d} {2:s} {3:f} {4:d}\n".format(from_ind,to_ind,'Directed',wt,int(wt*100)))
 93 |         else:
 94 |             with open(edge_name,mode='w') as f:
 95 |                 f.write("source target type prob i_weight\n")
 96 |                 for (from_ind, to_ind, weight_dict) in self.graph.edges.data():
 97 |                     if from_ind <= to_ind:
 98 |                         if self.graph.has_edge(to_ind,from_ind):
 99 |                             back_wt = self.graph.edges[to_ind,from_ind]['weight']
100 |                         else:
101 |                             back_wt = 0
102 |                         edge_weight = 0.5*(back_wt + weight_dict['weight'])
103 |                         f.write("{0:d} {1:d} {2:s} {3:f} {4:d}\n".format(from_ind,to_ind,'Undirected',edge_weight,int(edge_weight*100)))
104 | 
105 |     def add_attr(self, name, values):
106 |         """
107 |         Adds an attribute to the set of nodes in the CSN.
108 |         """
109 |         attr = {}
110 |         for i, v in enumerate(values):
111 |             attr[i] = v
112 | 
113 |         nx.set_node_attributes(self.graph,values=attr,name=name)
114 | 
115 |     def add_trim_attr(self, name, values, default=0):
116 |         """
117 |         Adds an attribute to the set of nodes in the CSN.
118 |         Values should be an iterable of the size of csn.trim_indices
119 |         """
120 |         attr = {}
121 |         for i in range(self.nnodes):
122 |             if i in self.trim_indices:
123 |                 trim_idx = self.trim_indices.index(i)
124 |                 attr[i] = values[trim_idx]
125 |             else:
126 |                 attr[i] = default
127 | 
128 |         nx.set_node_attributes(self.graph,values=attr,name=name)
129 |         
130 |     def set_colors(self, rgb):
131 |         """
132 |         Adds colors to each node for gexf export of the graph.
133 | 
134 |         rgb: A dict that stores the rgb values of each node.
135 | 
136 |         Example: rgb['0']['r'] = 255
137 |                  rgb['0']['g'] = 0
138 |                  rgb['0']['b'] = 0
139 |         """
140 |         for node in rgb:
141 |             if 'viz' not in self.graph.node[node]:
142 |                 self.graph.node[node]['viz'] = {}
143 |             self.graph.node[node]['viz']['color'] = {'r': rgb[node]['r'], 'g': rgb[node]['g'], 'b': rgb[node]['b'], 'a': 0}
144 | 
145 |     def set_positions(self, xy):
146 |         """
147 |         Adds x,y positions to each node for gexf export of the graph.
148 | 
149 |         xy: A dict that stores the xy positions of each node.
150 | 
151 |         Example: xy[0]['x'] = 0.5
152 |                  xy[0]['y'] = 1.6
153 |         """
154 |         for node in xy:
155 |             if 'viz' not in self.graph.node[node]:
156 |                 self.graph.node[node]['viz'] = {}
157 |             self.graph.node[node]['viz']['position'] = {'x': float(xy[node]['x']), 'y': float(xy[node]['y']), 'z': float(0)}
158 | 
159 | 
160 |     def colors_from_committors(self,comm):
161 |         """
162 |         Returns rgb dict using values of committor probabilities.
163 |         Very useful for 3-basin committors!
164 | 
165 |         comm:  Numpy array of committors, as returns from self.calc_committors
166 |         """
167 |         highc = 255
168 |         nbasin = comm.shape[1]
169 |         rgb = {}
170 |         colors = ['r','g','b']
171 |         for node in range(self.nnodes):
172 |             maxc = comm[node,:].max()
173 |             for i in range(min(3,nbasin)):
174 |                 if node not in rgb:
175 |                     rgb[node] = {}
176 |                 if maxc == 0:
177 |                     rgb[node][colors[i]] = 0
178 |                 else:
179 |                     rgb[node][colors[i]] = int(highc*comm[node,i]/maxc)
180 | 
181 |         return rgb
182 | 
183 | 
184 |     def trim(self, by_inflow=True, by_outflow=True, min_count=None):
185 |         """
186 |         Trims a graph to delete nodes that are not connected to the main
187 |         component, which is the component containing the most-sampled node (MSN)
188 |         by counts.
189 | 
190 |         by_inflow: whether to delete nodes that are not connected to the MSN by inflow
191 | 
192 |         by_outflow: whether to delete nodes that are not connected to the MSN by outflow
193 | 
194 |         min_count: nodes that do not have a count > min_count will be deleted
195 | 
196 |         Trimmed graph is saved as self.trim_graph. The trimmed transition matrix
197 |         is saved as self.trim_transmat, and the count matrix is saved as
198 |         self.trim_countmat.
199 | 
200 |         The mapping from the nodes in the trimmed set to the full set is given by
201 |         self.trim_indices.
202 |         """
203 | 
204 |         totcounts = self.countmat.toarray().sum(axis=0)
205 |         msn = totcounts.argmax()
206 | 
207 |         mask = np.ones(self.nnodes,dtype=bool)
208 |         oldmask = np.zeros(self.nnodes,dtype=bool)
209 | 
210 |         if min_count is not None:
211 |             mask[[i for i in range(self.nnodes) if totcounts[i] < min_count]] = False
212 |         else:
213 |             mask[[i for i in range(self.nnodes) if totcounts[i] == 0]] = False
214 | 
215 |         itercount = 0
216 |         diff = []
217 |         while (mask != oldmask).any():
218 | 
219 |             oldmask = mask.copy()
220 |             self.trim_indices = [i for i in range(self.nnodes) if mask[i] == True]
221 |             self.trim_graph = self.graph.subgraph(self.trim_indices)
222 | 
223 |             print(f"Iteration {itercount}:",diff)
224 |             itercount += 1
225 |             
226 |             if by_outflow:
227 |                 downstream = [i for i in self.trim_indices if nx.has_path(self.trim_graph,msn,i)]
228 |                 mask[[i for i in range(self.nnodes) if i not in downstream]] = False
229 | 
230 |             if by_inflow:
231 |                 upstream = [i for i in self.trim_indices if nx.has_path(self.trim_graph,i,msn)]
232 |                 mask[[i for i in range(self.nnodes) if i not in upstream]] = False
233 | 
234 |             diff = [i for i in range(self.nnodes) if mask[i] != oldmask[i]]
235 | 
236 |         # count all transitions to masked states and add these as self-transitions
237 |         # rows = to, cols = from
238 |         to_add = {}
239 |         rows = self.countmat.row
240 |         cols = self.countmat.col
241 |         data = self.countmat.data
242 | 
243 |         for i in range(len(data)):
244 |             if mask[rows[i]] == False and mask[cols[i]] == True:
245 |                 if cols[i] in to_add:
246 |                     to_add[cols[i]] += data[i]
247 |                 else:
248 |                     to_add[cols[i]] = data[i]
249 | 
250 |         tmp_arr = self.countmat.toarray()[mask,...][...,mask]
251 | 
252 |         for ind,full_ind in enumerate(self.trim_indices):
253 |             if full_ind in to_add:
254 |                 tmp_arr[ind][ind] += to_add[full_ind]
255 | 
256 |         assert tmp_arr.sum(axis=0).min() > 0, 'Error! A state in the trimmed countmat has no transitions'
257 |         self.trim_countmat = scipy.sparse.coo_matrix(tmp_arr)
258 | 
259 |         if self.symmetrize:
260 |             self.trim_countmat = symmetrize_matrix(self.trim_countmat)
261 | 
262 |         self.trim_nnodes = self.trim_countmat.shape[0]
263 |         self.trim_transmat = count_to_trans(self.trim_countmat)
264 | 
265 |         is_trim = np.zeros((self.nnodes))
266 |         for i in range(self.nnodes):
267 |             if i not in self.trim_indices:
268 |                 is_trim[i] = 1
269 |         self.add_attr('trim',is_trim)
270 | 
271 |         if not well_conditioned(self.trim_transmat.toarray()):
272 |             print("Warning: trimmed transition matrix is not well-conditioned.")
273 |                 
274 |     def calc_eig_weights(self,label='eig_weights'):
275 |         """
276 |         Calculates weights of states using the highest Eigenvalue of the
277 |         transition matrix.  By default it uses self.trim_transmat, but will
278 |         use self.transmat if no trimming has been done.
279 | 
280 |         The weights are stored as node attributes in self.graph with the label
281 |         'label', and are also returned from the function.
282 |         """
283 | 
284 |         if self.trim_transmat is None:
285 |             # use full transition matrix
286 |             full_wts = eig_weights(self.transmat)
287 |         else:
288 |             # use trimmed transition matrix
289 |             wts = eig_weights(self.trim_transmat)
290 |             full_wts = np.zeros(self.nnodes,dtype=float)
291 |             for i,ind in enumerate(self.trim_indices):
292 |                 full_wts[ind] = wts[i]
293 | 
294 |         fw_float = [float(i) for i in full_wts]
295 |         self.add_attr(label, fw_float)
296 | 
297 |         return full_wts
298 | 
299 |     def calc_mult_weights(self,label='mult_weights',tol=1e-6):
300 |         """
301 |         Calculates weights of states using iterative multiplication of the
302 |         transition matrix.  By default it uses self.trim_transmat, but will
303 |         use self.transmat if no trimming has been done.
304 | 
305 |         The weights are stored as node attributes in self.graph with the label
306 |         'label', and are also returned from the function.
307 |         """
308 | 
309 |         if self.trim_transmat is None:
310 |             # use full transition matrix
311 |             full_wts = mult_weights(self.transmat,tol)
312 |         else:
313 |             # use trimmed transition matrix
314 |             wts = mult_weights(self.trim_transmat,tol)
315 |             full_wts = np.zeros(self.nnodes,dtype=float)
316 |             for i,ind in enumerate(self.trim_indices):
317 |                 full_wts[ind] = wts[i]
318 | 
319 |         fw_float = [float(i) for i in full_wts]
320 |         if label is not None:
321 |             self.add_attr(label, fw_float)
322 | 
323 |         return full_wts
324 | 
325 |     def calc_committors(self, basins,
326 |                         labels=None,
327 |                         basin_labels=None,
328 |                         add_basins=False,
329 |                         tol=1e-6,
330 |                         maxstep=20,
331 |                         method='iter'):
332 |         """
333 |         Calculates committor probabilities between an arbitrary set of N basins.
334 | 
335 |         basins     -- A list of lists, describing which states make up the
336 |                       basins of attraction.  There can be any number of basins.
337 |                       e.g. [[basin1_a,basin1_b,...],[basin2_a,basin2_b,...]]
338 |         labels     -- A list of labels given to the committors (one for each
339 |                       basin) in the attribute list.
340 |         add_basins -- Whether to add basin vectors to attribute list.
341 |         basin_labels -- List of names of the basins.
342 |         tol        -- Tolerance of iterative multiplication process
343 |                       (see matrix.trans_mult_iter)
344 |         maxstep    -- Maximum number of iteractions of multiplication process.
345 |         method     -- 'iter' for iterative multiplication, 'linalg' for 
346 |                       linear algebra solve (two-basin only)
347 | 
348 |         The committors are also returned from the function as a numpy array.
349 |         """
350 | 
351 |         assert method in ['iter','linalg'], 'Error! method must be either iter or linalg'
352 | 
353 |         if self.trim_transmat is None:
354 |             assert well_conditioned(self.transmat.toarray()), "Error: cannot calculate committors from transition matrix. Try trimming first."
355 | 
356 |             # use full transition matrix
357 |             if method == 'iter':
358 |                 full_comm = committor(self.transmat,basins,tol=tol,maxstep=maxstep)
359 |             elif method == 'linalg':
360 |                 full_comm = committor_linalg(self.transmat,basins)
361 |                     
362 |         else:
363 |             # use trimmed transition matrix
364 |             trim_basins = []
365 |             for i,b in enumerate(basins):
366 |                 trim_basins.append([])
367 |                 for state in b:
368 |                     if state in self.trim_indices:
369 |                         trim_basins[i].append(self.trim_indices.index(state))
370 | 
371 |             if method == 'iter':
372 |                 comm = committor(self.trim_transmat,trim_basins,tol=tol,maxstep=maxstep)
373 |             elif method == 'linalg':
374 |                 comm = committor_linalg(self.trim_transmat,trim_basins)
375 | 
376 |             full_comm = np.zeros((self.transmat.shape[0],len(basins)),dtype=float)
377 |             for i,ind in enumerate(self.trim_indices):
378 |                 full_comm[ind] = comm[i]
379 | 
380 |         if labels is None:
381 |             labels = ['p' + str(i) for i in range(len(basins))]
382 | 
383 |         for i in range(len(basins)):
384 |             fc_float = [float(i) for i in full_comm[:,i]]
385 |             self.add_attr(labels[i], fc_float)
386 | 
387 |         if add_basins:
388 |             if basin_labels is None:
389 |                 basin_labels = [str(i) for i in range(len(basins))]
390 |             for i,b in enumerate(basins):
391 |                 bvec = np.zeros(self.nnodes,dtype=int)
392 |                 bvec[b] = 1
393 |                 bv_int = [int(i) for i in bvec]
394 |                 self.add_attr(basin_labels[i],bv_int)
395 | 
396 |         return full_comm
397 | 
398 |     def calc_mfpt(self,sinks,maxsteps=None,tol=1e-3,sources=None):
399 |         """
400 |         Calculates the mean first passage time (MFPT) and the first passage time distribution (FPTD)
401 |         from every state in the matrix to a set of "sinks".
402 | 
403 |         sinks -- (list of int) A list of states that will be used as sinks
404 | 
405 |         stepsize -- (int) The lagtime, in multiples of tau, that is used to compute the MFPT, which is 
406 |                           also the resolution of the FPTD.
407 | 
408 |         maxsteps -- (int) The maximum number of steps used to compute the FPTD.
409 | 
410 |         tol -- (float) The quitting criteria for FPTD calculation.  The calculation will stop if the 
411 |                        largest "un-sunk" probability is below tol.
412 | 
413 |         sources -- (None or list of int) List of source states to average over.  If None, will return
414 |                    MFPT and FPTD of all states.
415 |         """
416 | 
417 |         assert tol is not None or maxsteps is not None, "Error: either maxsteps or tol must be defined!"
418 | 
419 |         if tol is None:
420 |             tol = 0.0
421 |         if maxsteps is None:
422 |             maxsteps = np.inf
423 |         
424 |         if self.trim_transmat is None:
425 |             assert well_conditioned(self.transmat.toarray()), "Error: cannot calculate mfpt from transition matrix. Try trimming first."
426 | 
427 |             # use full transition matrix
428 |             full_fptd = fptd(self.transmat,sinks,maxsteps=maxsteps,tol=tol)
429 |             full_mfpt = np.zeros((self.transmat.shape[0]),dtype=float)
430 |             for i in range(full_fptd.shape[0]):
431 |                 # loop over exponentially placed timepoints
432 |                 # this entry is the flux between lag*(2**[i]) and lag*(2**[i+1])
433 |                 # avg. of endpoints is (2**(i-1) + 2**(i))
434 |                 full_mfpt += full_fptd[i,:]*(2**(i-1) + 2**(i)) # in units of lagtime
435 | 
436 |         else:
437 |             # use trimmed transition matrix
438 |             trim_sinks = []
439 |             for state in sinks:
440 |                 if state in self.trim_indices:
441 |                     trim_sinks.append(self.trim_indices.index(state))
442 | 
443 |             trim_fptd = fptd(self.trim_transmat,trim_sinks,maxsteps=maxsteps,tol=tol)
444 |             trim_mfpt = np.zeros((trim_fptd.shape[1]),dtype=float)
445 |             for i in range(trim_fptd.shape[0]):
446 |                 # loop over exponentially placed timepoints
447 |                 # this entry is the flux between lag*(2**[i]) and lag*(2**[i+1])
448 |                 trim_mfpt += trim_fptd[i,:]*(2**(i-1) + 2**(i)) # in units of lagtime
449 | 
450 |             full_fptd = np.zeros((trim_fptd.shape[0],self.transmat.shape[0]),dtype=float)
451 |             full_mfpt = np.zeros((self.transmat.shape[0]),dtype=float)
452 |             for i,ind in enumerate(self.trim_indices):
453 |                 full_fptd[:,ind] = trim_fptd[:,i]
454 |                 full_mfpt[ind] = trim_mfpt[i]
455 | 
456 |         if sources is not None:
457 |             wts = self.calc_mult_weights(label=None,tol=1e-6)
458 |             wt_sum = wts[sources].sum()
459 | 
460 |             avg_mfpt = 0
461 |             avg_fptd = np.zeros((full_fptd.shape[0]))
462 |             for s in sources:
463 |                 avg_mfpt += full_mfpt[s]*wts[s]
464 |                 avg_fptd += full_fptd[:,s]*wts[s]
465 | 
466 |             return np.array([avg_mfpt/wt_sum]), np.array([avg_fptd/wt_sum])
467 |         else:
468 |             return full_mfpt, full_fptd
469 | 
470 |         
471 | 
472 |     def idxs_to_trim(self,idxs):
473 |         """
474 |         Converts a list of idxs to trim_idxs.
475 | 
476 |         idxs -- List of states in the transition matrix. Elements should be
477 |                 integers from 0 to nstates.
478 |         """
479 | 
480 |         return [self.trim_indices.index(i) for i in idxs if i in self.trim_indices]
481 | 
482 |     def calc_eigvectors(self, n_eig=3,
483 |                         include_wt_vec=False,
484 |                         save_to_graph=True,
485 |                         save_imag_to_graph=False,
486 |                         save_label='eig'):
487 |         """
488 |         Calculates committor probabilities between an arbitrary set of N basins.
489 | 
490 |         n_eig    -- The number of eigenvectors to return
491 | 
492 |         include_wt_vec -- Whether or not to include the eigenvector with 
493 |                           eigenvalue = 1.  Note that this is equal to the 
494 |                           steady state weights.
495 | 
496 |         save_to_graph -- Whether or not to save the eigenvectors to the graph
497 |                          (real part).
498 | 
499 |         save_imag_to_graph -- Whether or not to save the eigenvectors to the 
500 |                               graph (imaginary part).
501 | 
502 |         save_label -- Labels given to each eigenvector when saving to the graph.
503 |                       Indices are appended and counting starts at zero (e.g. 
504 |                       eig0, eig1, ..).  If imaginary part is saved (eig0_imag, eig1_imag, ...)
505 | 
506 |         Output:
507 |         eig_vecs -- A numpy array (N, n_eig) of eigenvector elements (real part only)
508 | 
509 |         eig_vals -- A numpy array of the n_eig eigenvalues (real part only)
510 |         
511 |         eig_vecs_imag -- A numpy array (N, n_eig) of eigenvector elements (imaginary part only)
512 |         
513 |         eig_vals_imag - A numpy array of the n_eig eigenvalues (imaginary part only)
514 | 
515 |         """
516 | 
517 |         if self.trim_transmat is None:
518 |             # use full transition matrix
519 |             vec_real, val_real, vec_imag, val_imag = get_eigenvectors(self.transmat.toarray(), n_eig=n_eig, return_wt_vec=include_wt_vec)
520 |         else:
521 |             # use trimmed transition matrix
522 |             trim_vec_real, val_real, trim_vec_imag, val_imag = get_eigenvectors(self.trim_transmat.toarray(), n_eig=n_eig, return_wt_vec=include_wt_vec)
523 | 
524 |             vec_real = np.zeros((self.transmat.shape[0],n_eig),dtype=float)
525 |             vec_imag = np.zeros((self.transmat.shape[0],n_eig),dtype=float)
526 |             for i,ind in enumerate(self.trim_indices):
527 |                 vec_real[ind] = trim_vec_real[i]
528 |                 vec_imag[ind] = trim_vec_imag[i]
529 | 
530 |         # add eigenvectors as attributes
531 |         if save_to_graph:
532 |             for idx in range(n_eig):
533 |                 label = f'{save_label}{idx}'
534 |                 self.add_attr(label, vec_real[:,idx])
535 | 
536 |         if save_imag_to_graph:
537 |             for idx in range(n_eig):
538 |                 label = f'{save_label}{idx}_imag'
539 |                 self.add_attr(label, vec_imag[:,idx])
540 | 
541 |         return vec_real, val_real, vec_imag, val_imag
542 | 


--------------------------------------------------------------------------------
/csnanalysis/matrix.py:
--------------------------------------------------------------------------------
  1 | import scipy
  2 | import numpy as np
  3 | from itertools import compress
  4 | 
  5 | def count_to_trans(countmat):
  6 |     """
  7 |     Converts a count matrix (in scipy sparse format) to a transition
  8 |     matrix.
  9 |     """
 10 |     tmp = np.array(countmat.toarray(),dtype=float)
 11 |     colsums = tmp.sum(axis=0)
 12 |     for i,c in enumerate(colsums):
 13 |         if c > 0:
 14 |             tmp[:,i] /= c
 15 | 
 16 |     return(scipy.sparse.coo_matrix(tmp))
 17 | 
 18 | def symmetrize_matrix(countmat):
 19 |     """
 20 |     Symmetrizes a count matrix (in scipy sparse format).
 21 |     """
 22 |     return scipy.sparse.coo_matrix(0.5*(countmat + countmat.transpose()))
 23 | 
 24 | def _make_sink(transmat,sink_states):
 25 |     """
 26 |     Constructs a transition matrix with "sink states", where the columns are
 27 |     replaced with identity vectors (diagonal element = 1, off-diagonals = 0).
 28 | 
 29 |     Input:
 30 | 
 31 |     transmat -- An N x N transition matrix in scipy sparse coo format.
 32 |                 Columns should sum to 1. Indices: [to][from].
 33 | 
 34 |     sink_states: A list of integers denoting sinks.
 35 | 
 36 |     Output:     A transition matrix in scipy sparse coo format.
 37 |     """
 38 |     sink_mat = transmat.copy()
 39 | 
 40 |     # remove redundant elements in sink_states
 41 |     sink_states = list(set(sink_states))
 42 | 
 43 |     set_to_one = np.zeros(len(sink_states),dtype=bool)
 44 |     for i in range(len(sink_mat.data)):
 45 |         if sink_mat.col[i] in sink_states:
 46 |             if sink_mat.col[i] != sink_mat.row[i]:
 47 |                 sink_mat.data[i] = 0.
 48 |             else:
 49 |                 sink_mat.data[i] = 1.
 50 |                 set_to_one[sink_states.index(sink_mat.col[i])] = True
 51 | 
 52 |     # set diagonal elements to 1 that haven't been set to one already
 53 |     statelist = np.asarray(list(compress(sink_states, np.logical_not(set_to_one))),
 54 |                            dtype=int)
 55 | 
 56 |     if statelist.shape[0] > 0:
 57 |         sink_mat.row = np.append(sink_mat.row, statelist)
 58 |         sink_mat.col = np.append(sink_mat.col,statelist)
 59 |         sink_mat.data = np.append(sink_mat.data, np.ones_like(statelist, dtype=int))
 60 | 
 61 |     # remove zeros
 62 |     sink_mat.eliminate_zeros()
 63 | 
 64 |     # check if sink_mat is well-conditioned
 65 |     if not well_conditioned(sink_mat.toarray()):
 66 |         raise ValueError("Error! sink matrix is no longer well-conditioned in make_sink!")
 67 | 
 68 |     return sink_mat
 69 | 
 70 | def eig_weights(transmat):
 71 |     """
 72 |     Calculates the weights as the top eigenvector of the transition matrix.
 73 | 
 74 |     Input:
 75 | 
 76 |     transmat -- An N x N transition matrix as a numpy array or in
 77 |                 scipy sparse coo format.  Columns should sum to 1.
 78 |                 Indices: [to][from]
 79 | 
 80 |     Output:     An array of weights of size N.
 81 |     """
 82 | 
 83 |     vals, vecs = scipy.sparse.linalg.eigs(transmat,k=1)
 84 |     return np.real(vecs[:,0])/np.real(vecs[:,0].sum())
 85 | 
 86 | def mult_weights(transmat,tol=1e-6):
 87 |     """
 88 |     Calculates the steady state weights as the columns of transmat^infinity.
 89 |     transmat^infinity is approximated by successively squaring transmat until
 90 |     the maximum variation in the rows is less than tol.
 91 | 
 92 |     Input:
 93 | 
 94 |     transmat -- An N x N transition matrix as a numpy array or in
 95 |                 scipy sparse coo format.  Columns should sum to 1.
 96 |                 Indices: [to][from]
 97 | 
 98 |     tol      -- Threshold for stopping the iterative multiplication.
 99 | 
100 |     Output:     An array of weights of size N.
101 |     """
102 | 
103 |     banded_mat = _trans_mult_iter(transmat,tol)
104 |     return banded_mat[:,0]
105 | 
106 | def _renorm(mat,tol=1e-7):
107 |     """
108 |     Renormalizes a matrix (to,from) so that its columns sum to one.
109 |     This is meant to encourage numerical stability during long
110 |     matrix multiplication chains.
111 |     """
112 |     for i in range(mat.shape[1]):
113 |         col_sum = mat[:,i].sum()
114 |         assert np.abs(1.0-col_sum) < tol, f"Error! 1 - column sum ({1.0-col_sum}) is greater than tolerance ({tol}) in _renorm!"
115 |         mat[:,i] /= col_sum
116 | 
117 |     return mat
118 | 
119 | def fptd(transmat,sinks,maxsteps=200,tol=0.0):
120 |     """
121 |     Calculates the first passage time distribution for transmat with a set
122 |     of sink states (sinks). The FPTD is evaluated at the set of points lag*2^i.  
123 |     It will run for a total of maxsteps, or until the maximum "un-sunk" probability 
124 |     of a state falls below tol.
125 |     """
126 | 
127 |     sm = _renorm(_make_sink(transmat,sinks).toarray())
128 | 
129 |     step = 0
130 |     non_sinks = [i for i in range(transmat.shape[0]) if i not in sinks]
131 |     max_prob_to = sm.sum(axis=1)[non_sinks].max()
132 | 
133 |     fptd = []
134 |     last_step_warped = np.zeros((transmat.shape[0]))
135 |     while step < maxsteps and max_prob_to > tol:
136 |         newmat = _renorm(np.matmul(sm,sm))
137 |         
138 |         warped = newmat[sinks,:].sum(axis=0)
139 |         warped[sinks] = 0
140 |         fptd.append(warped-last_step_warped)
141 |         last_step_warped = warped
142 |         
143 |         sm = newmat.copy()
144 |         max_prob_to = sm.sum(axis=1)[non_sinks].max()
145 |         step += 1
146 | 
147 |     return np.array(fptd)
148 |         
149 | 
150 | def _trans_mult_iter(transmat,tol,maxstep=200):
151 |     """
152 |     Performs iterative multiplication of transmat until the maximum variation in
153 |     the rows is less than tol.
154 |     """
155 |     if type(transmat) is np.ndarray:
156 |         t = transmat.copy()
157 |     else:
158 |         t = transmat.toarray()
159 | 
160 |     var = 1
161 |     step = 0
162 |     while (var > tol) and (step < maxstep):
163 |         newmat = np.matmul(t,t)
164 |         var = np.abs(newmat-t).max()
165 |         t = newmat.copy()
166 |         step += 1
167 | 
168 |     if step == maxstep and var > tol:
169 |         print("Warning: iterative multiplication not converged after",step,"steps: (var = ",var,"), (tol = ",tol,")")
170 | 
171 |     return t
172 | 
173 | def committor(transmat,basins,tol=1e-6,maxstep=20):
174 |     """
175 |     This function computes committor probabilities, given a transition matrix
176 |     and a list of states that comprise the basins. It uses iterative multiplication of
177 |     a modified transition matrix, with identity vectors for each basin state.
178 | 
179 |     Note that this method works regardless of the number of basins.
180 | 
181 |     Input:
182 | 
183 |     transmat -- An N x N transition matrix in scipy sparse coo format.
184 |                 Columns should sum to 1. Indices: [to][from]
185 | 
186 |     basins -- A list of lists, describing which states make up the
187 |               basins of attraction.  There can be any number of basins.
188 |               e.g. [[basin1_a,basin1_b,...],[basin2_a,basin2_b,...]]
189 | 
190 |     Output:   An array of committor probabilities of size N x B, where B
191 |               is the number of basins. Committors will sum to 1 for each state.
192 |     """
193 | 
194 |     # make sink_matrix
195 | 
196 |     flat_sink = [i for b in basins for i in b]
197 |     sink_mat = _make_sink(transmat,flat_sink)
198 |     sink_results = _trans_mult_iter(sink_mat,tol,maxstep)
199 | 
200 |     committor = np.zeros((transmat.shape[0],len(basins)),dtype=float)
201 | 
202 |     for i in range(transmat.shape[0]):
203 |         comm_done = False
204 |         for j,b in enumerate(basins):
205 |             if i in b:
206 |                 committor[i][j] = 1
207 |                 comm_done = True
208 |                 break
209 |         if not comm_done:
210 |             for j,b in enumerate(basins):
211 |                 committor[i][j] = 0.
212 |                 for bstate in b:
213 |                     committor[i][j] += sink_results[bstate][i]
214 | 
215 |     return committor
216 | 
217 | def committor_linalg(transmat,basins):
218 |     """
219 |     This function computes committor probabilities, given a transition matrix
220 |     and a list of states that comprise the basins, by solving the system
221 |     of equations:
222 | 
223 |     0 = q_i - sum_j T_ij * q_j      for i not in a basin
224 | 
225 |     by solving the equation AQ = B.
226 | 
227 |     Note: this requires that the number of basins is 2, and q_i is the 
228 |     probability that a trajectory in state i commits to the SECOND basin.
229 | 
230 |     Input:
231 |     
232 |     transmat -- An N x N transition matrix in scipy sparse coo format.  
233 |                 Columns should sum to 1. Indices: [to][from]
234 | 
235 |     basins -- A list of lists, describing which states make up the
236 |               basins of attraction.  There can be any number of basins.
237 |               e.g. [[basin1_a,basin1_b,...],[basin2_a,basin2_b,...]]
238 | 
239 |     Output:   An array of committor probabilities of size N x 2, where 2
240 |               is the number of basins. Committors will sum to 1 for each state.
241 |     """
242 | 
243 |     assert len(basins) == 2, 'Error! linalg method only works with two basins.'
244 | 
245 |     trans_arr = transmat.toarray()
246 |     n = trans_arr.shape[0]
247 |     A_mat = np.zeros((n,n))
248 |     B_vec = np.zeros((n))
249 | 
250 |     for i in range(n):
251 |         A_mat[i,i] = 1
252 |         if i in basins[0]:
253 |             B_vec[i] = 0
254 |         elif i in basins[1]:
255 |             B_vec[i] = 1
256 |         else:
257 |             B_vec[i] = 0
258 |             for j in range(n):
259 |                 if i != j:
260 |                     A_mat[i,j] = -trans_arr[j,i]
261 |                 else:
262 |                     A_mat[i,i] = 1-trans_arr[j,i]
263 | 
264 |     Q_vec = np.linalg.solve(A_mat,B_vec)
265 | 
266 |     return np.array([1-Q_vec,Q_vec]).T
267 | 
268 | def _extend(transmat,hubstates):
269 |     """
270 |     This function returns an extended transition matrix (2N x 2N)
271 |     where one set of states (0..N-1) have NOT yet visited hubstates,
272 |     and states (N..2N-1) HAVE visited the hubstates.
273 |     """
274 |     n = transmat.shape[0]
275 | 
276 |     # data, rows and cols of the future extended matrix
277 |     data = []
278 |     rows = []
279 |     cols = []
280 | 
281 |     for i in range(len(transmat.data)):
282 |         if transmat.row[i] in hubstates:
283 |             # transition TO a hubstate, add to lower left and lower right
284 |             # lower left
285 |             data.append(transmat.data[i])
286 |             rows.append(transmat.row[i] + n)
287 |             cols.append(transmat.col[i])
288 |             # lower right
289 |             data.append(transmat.data[i])
290 |             rows.append(transmat.row[i] + n)
291 |             cols.append(transmat.col[i] + n)
292 |         else:
293 |             # transition not to a hubstate, add to upper left and lower right
294 |             # upper left
295 |             data.append(transmat.data[i])
296 |             rows.append(transmat.row[i])
297 |             cols.append(transmat.col[i])
298 |             # lower right
299 |             data.append(transmat.data[i])
300 |             rows.append(transmat.row[i] + n)
301 |             cols.append(transmat.col[i] + n)
302 | 
303 |     ext_mat = scipy.sparse.coo_matrix((data, (rows, cols)), shape=(2*n,2*n))
304 |     return ext_mat
305 | 
306 | def _getring(transmat,basin,wts,tol,maxstep):
307 |     """
308 |     Given a transition matrix, and a set of states that form a basin,
309 |     this returns a vector describing how probability exits that basin.
310 |     """
311 |     # make a matrix with sink states in every non-basin state
312 |     n = transmat.shape[0]
313 |     flat_sink = [i for i in range(n) if i not in basin]
314 |     sink_mat = _make_sink(transmat,flat_sink)
315 | 
316 |     # see where the probability goes
317 |     sink_results = _trans_mult_iter(sink_mat,tol,maxstep)
318 | 
319 |     ringprob = np.zeros((n))
320 |     for b in basin:
321 |         for i in range(n):
322 |             if i not in basin:
323 |                 ringprob[i] += wts[b]*sink_results[i][b]
324 | 
325 |     return ringprob/wts[basin].sum()
326 | 
327 | def hubscores(transmat,hubstates,basins,tol=1e-6,maxstep=30,wts=None):
328 |     """
329 |     This function computes hub scores, which are the probabilities that
330 |     transitions between a set of communities will use a given community as
331 |     an intermediate.  e.g. h_a,b,c is the probability that transitions from
332 |     basin a to basin b will use c as an intermediate.
333 | 
334 |     For more information see:
335 |     Dickson, A and Brooks III, CL. JCTC, 8, 3044-3052 (2012).
336 | 
337 |     Input:
338 | 
339 |     transmat -- An N x N transition matrix in scipy sparse coo format.
340 |                 Columns should sum to 1. Indices: [to][from]
341 | 
342 |     hubstates -- A list describing the states in transmat that make up
343 |               the hub being measured.
344 | 
345 |     basins -- A list of two lists, describing which two states make up the
346 |               basins of attraction.
347 |               e.g. [[basin_a_1,basin_a_2,...],[basin_b_1,basin_b_2,...]].
348 | 
349 |     wts    -- The equilibrium weights of all states in transmat.  If this is not
350 |               given then the function will compute them from eig_weights.
351 | 
352 |     Output:   [h_a,b,c , h_b,a,c]
353 |     """
354 | 
355 |     # make extended sink_matrix
356 |     n = transmat.shape[0]
357 |     ext_transmat = _extend(transmat,hubstates)
358 | 
359 |     flat_sink = [i for b in basins for i in b]
360 |     flat_sink_ext = flat_sink + [i + n for i in flat_sink]
361 | 
362 |     sink_mat = _make_sink(ext_transmat,flat_sink_ext)
363 | 
364 |     sink_results = _trans_mult_iter(sink_mat,tol,maxstep)
365 | 
366 |     if wts is None:
367 |         wts = eig_weights(transmat)
368 | 
369 | 
370 |     h = np.zeros((2,2),dtype=float)
371 |     ring = [_getring(transmat,b,wts,tol,maxstep) for b in basins]
372 | 
373 |     for source,sink in [[0,1],[1,0]]:
374 |         for i,p in enumerate(ring[source]):
375 |             if p > 0:
376 |                 # i is a ring state of source basin, with probability p
377 |                 if i in hubstates:
378 |                     testi = i + n
379 |                 else:
380 |                     testi = i
381 |                 c_no = 0
382 |                 c_yes = 0
383 |                 for b in basins[sink]:
384 |                     c_no += sink_results[b][testi]
385 |                     c_yes += sink_results[b+n][testi]
386 |                 if (c_no + c_yes) > 0:
387 |                     h[source][sink] += p*c_yes/(c_no+c_yes)
388 | 
389 |     return [h[0,1],h[1,0]]
390 | 
391 | def get_eigenvectors(transmat, n_eig=3, return_wt_vec=False):
392 |     """
393 |     This function returns a set of eigenvectors with the highest
394 |     eigenvalues. It wraps the scipy.linalg.eig function.
395 | 
396 |     Input:
397 |     
398 |     transmat -- An N x N transition matrix in scipy sparse coo format.
399 |                 Columns should sum to 1. Indices: [to][from]
400 | 
401 |     n_eig    -- The number of eigenvectors to return
402 | 
403 |     return_wt_vec -- Whether or not to include the eigenvector with
404 |                      eigenvalue = 1.  Note that this is equal to the 
405 |                      steady state weights.
406 | 
407 |     Output:
408 | 
409 |     eig_vecs -- A numpy array (N, n_eig) of eigenvector elements (real part only)
410 | 
411 |     eig_vals -- A numpy array of the n_eig eigenvalues (real part only)
412 | 
413 |     eig_vecs_imag -- A numpy array (N, n_eig) of eigenvector elements (imaginary part only)
414 | 
415 |     eig_vals_imag - A numpy array of the n_eig eigenvalues (imaginary part only)
416 |     """
417 | 
418 |     e_vals_complex, e_vecs_complex = scipy.linalg.eig(transmat)
419 | 
420 |     e_vals_real = np.real(e_vals_complex)
421 |     e_vals_imag = np.imag(e_vals_complex)
422 | 
423 |     sort_idxs = list(np.argsort(e_vals_real))
424 | 
425 |     if return_wt_vec:
426 |         idxs_to_return = sort_idxs[-n_eig:]
427 |     else:
428 |         idxs_to_return = sort_idxs[-(n_eig+1):-1]
429 | 
430 |     # change order to highest to lowest
431 |     idxs_to_return.reverse()
432 | 
433 |     return np.real(e_vecs_complex)[:,idxs_to_return], e_vals_real[idxs_to_return], \
434 |         np.imag(e_vecs_complex)[:,idxs_to_return], e_vals_imag[idxs_to_return]
435 | 
436 | def well_conditioned(transmat):
437 |     tol = 1e-5
438 |     minval = transmat.sum(axis=0).min()
439 |     maxval = transmat.sum(axis=0).max()
440 |     if 1 - minval > tol or maxval - 1 > tol:
441 |         return False
442 |     else:
443 |         return True
444 | 


--------------------------------------------------------------------------------
/examples/committor_net_3state.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ADicksonLab/CSNAnalysis/7700653374937c179a441c656f0783f80e44c26d/examples/committor_net_3state.png


--------------------------------------------------------------------------------
/examples/examples.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# CSNAnalysis Tutorial\n",
  8 |     "### A brief introduction to the use of the CSNAnalysis package\n",
  9 |     "---\n",
 10 |     "**Updated Aug 19, 2020**\n",
 11 |     "*Dickson Lab, Michigan State University*"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "## Overview\n",
 19 |     "\n",
 20 |     "The CSNAnalysis package is a set of tools for network-based analysis of molecular dynamics trajectories.\n",
 21 |     "  CSNAnalysis is an easy interface between enhanced sampling algorithms\n",
 22 |     "  (e.g. WExplore implemented in `wepy`), molecular clustering programs (e.g. `MSMBuilder`), graph analysis packages (e.g. `networkX`) and graph visualization programs (e.g. `Gephi`).\n",
 23 |     "\n",
 24 |     "### What are conformation space networks?\n",
 25 |     "\n",
 26 |     "A conformation space network is a visualization of a free energy landscape, where each node is a cluster of molecular conformations, and the edges show which conformations can directly interconvert during a molecular dynamics simulation. A CSN can be thought of as a visual representation of a transition matrix, where the nodes represent the row / column indices and the edges show the off-diagonal elements. `CSNAnalysis` offers a concise set of tools for the creation, analysis and visualization of CSNs.\n",
 27 |     "\n",
 28 |     "**This tutorial will give quick examples for the following use cases:**\n",
 29 |     "\n",
 30 |     "1. Initializing CSN objects from count matrices\n",
 31 |     "2. Trimming CSNs\n",
 32 |     "2. Obtaining steady-state weights from a transition matrix\n",
 33 |     "  * By eigenvalue\n",
 34 |     "  * By iterative multiplication\n",
 35 |     "3. Computing committor probabilities to an arbitrary set of basins\n",
 36 |     "4. Exporting gexf files for visualization with the Gephi program"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "metadata": {},
 42 |    "source": [
 43 |     "## Getting started\n",
 44 |     "\n",
 45 |     "Clone the CSNAnalysis repository:\n",
 46 |     "\n",
 47 |     "```\n",
 48 |     "git clone https://github.com/ADicksonLab/CSNAnalysis.git```\n",
 49 |     "\n",
 50 |     "Navigate to the examples directory and install using pip:\n",
 51 |     "\n",
 52 |     "```\n",
 53 |     "cd CSNAnalysis\n",
 54 |     "pip install --user -e\n",
 55 |     "```\n",
 56 |     "\n",
 57 |     "Go to the examples directory and open this notebook (`examples.ipynb`):\n",
 58 |     "\n",
 59 |     "```\n",
 60 |     "cd examples; jupyter notebook```"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "markdown",
 65 |    "metadata": {},
 66 |    "source": [
 67 |     "## Dependencies\n",
 68 |     "\n",
 69 |     "I highly recommend using Anaconda and working in a `python3` environment. CSNAnalysis uses the packages `numpy`, `scipy` and `networkx`.  If these are installed then the following lines of code should run without error:"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": 1,
 75 |    "metadata": {},
 76 |    "outputs": [],
 77 |    "source": [
 78 |     "import numpy as np\n",
 79 |     "import networkx as nx\n",
 80 |     "import scipy"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "markdown",
 85 |    "metadata": {},
 86 |    "source": [
 87 |     "If `CSNAnalysis` was installed (i.e. added to your `sys.path`), then this should also work:"
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "code",
 92 |    "execution_count": 2,
 93 |    "metadata": {},
 94 |    "outputs": [],
 95 |    "source": [
 96 |     "from csnanalysis.csn import CSN\n",
 97 |     "from csnanalysis.matrix import *"
 98 |    ]
 99 |   },
100 |   {
101 |    "cell_type": "markdown",
102 |    "metadata": {},
103 |    "source": [
104 |     "This notebook also uses `matplotlib`, to visualize output."
105 |    ]
106 |   },
107 |   {
108 |    "cell_type": "code",
109 |    "execution_count": 3,
110 |    "metadata": {},
111 |    "outputs": [],
112 |    "source": [
113 |     "import matplotlib"
114 |    ]
115 |   },
116 |   {
117 |    "cell_type": "markdown",
118 |    "metadata": {},
119 |    "source": [
120 |     "Great!  Now let's load in the count matrix that we'll use for all the examples here:"
121 |    ]
122 |   },
123 |   {
124 |    "cell_type": "code",
125 |    "execution_count": 4,
126 |    "metadata": {},
127 |    "outputs": [],
128 |    "source": [
129 |     "count_mat = scipy.sparse.load_npz('matrix.npz')"
130 |    ]
131 |   },
132 |   {
133 |    "cell_type": "markdown",
134 |    "metadata": {},
135 |    "source": [
136 |     "## Background: Sparse matrices"
137 |    ]
138 |   },
139 |   {
140 |    "cell_type": "markdown",
141 |    "metadata": {
142 |     "collapsed": true
143 |    },
144 |    "source": [
145 |     "It's worth knowing a little about sparse matrices before we start. If we have a huge $N$ by $N$ matrix, where $N > 1000$, but most of the elements are zero, it is more efficient to store the data as a sparse matrix."
146 |    ]
147 |   },
148 |   {
149 |    "cell_type": "code",
150 |    "execution_count": 5,
151 |    "metadata": {},
152 |    "outputs": [
153 |     {
154 |      "data": {
155 |       "text/plain": [
156 |        "scipy.sparse.coo.coo_matrix"
157 |       ]
158 |      },
159 |      "execution_count": 5,
160 |      "metadata": {},
161 |      "output_type": "execute_result"
162 |     }
163 |    ],
164 |    "source": [
165 |     "type(count_mat)"
166 |    ]
167 |   },
168 |   {
169 |    "cell_type": "markdown",
170 |    "metadata": {},
171 |    "source": [
172 |     "`coo_matrix` refers to \"coordinate format\", where the matrix is essentially a set of lists of matrix \"coordinates\" (rows, columns) and data:"
173 |    ]
174 |   },
175 |   {
176 |    "cell_type": "code",
177 |    "execution_count": 6,
178 |    "metadata": {},
179 |    "outputs": [
180 |     {
181 |      "name": "stdout",
182 |      "output_type": "stream",
183 |      "text": [
184 |       "0 0 382.0\n",
185 |       "0 651 2.0\n",
186 |       "0 909 2.0\n",
187 |       "0 920 2.0\n",
188 |       "0 1363 1.0\n",
189 |       "0 1445 2.0\n",
190 |       "0 2021 5.0\n",
191 |       "0 2022 7.0\n",
192 |       "0 2085 4.0\n",
193 |       "0 2131 1.0\n"
194 |      ]
195 |     }
196 |    ],
197 |    "source": [
198 |     "rows = count_mat.row\n",
199 |     "cols = count_mat.col\n",
200 |     "data = count_mat.data\n",
201 |     "\n",
202 |     "for r,c,d in zip(rows[0:10],cols[0:10],data[0:10]):\n",
203 |     "    print(r,c,d)"
204 |    ]
205 |   },
206 |   {
207 |    "cell_type": "markdown",
208 |    "metadata": {},
209 |    "source": [
210 |     "Although it can be treated like a normal matrix ($4000$ by $4000$ in this case):"
211 |    ]
212 |   },
213 |   {
214 |    "cell_type": "code",
215 |    "execution_count": 7,
216 |    "metadata": {},
217 |    "outputs": [
218 |     {
219 |      "data": {
220 |       "text/plain": [
221 |        "(4000, 4000)"
222 |       ]
223 |      },
224 |      "execution_count": 7,
225 |      "metadata": {},
226 |      "output_type": "execute_result"
227 |     }
228 |    ],
229 |    "source": [
230 |     "count_mat.shape"
231 |    ]
232 |   },
233 |   {
234 |    "cell_type": "markdown",
235 |    "metadata": {},
236 |    "source": [
237 |     "It only needs to store non-zero elements, which are much fewer than $4000^2$:"
238 |    ]
239 |   },
240 |   {
241 |    "cell_type": "code",
242 |    "execution_count": 8,
243 |    "metadata": {},
244 |    "outputs": [
245 |     {
246 |      "data": {
247 |       "text/plain": [
248 |        "44163"
249 |       ]
250 |      },
251 |      "execution_count": 8,
252 |      "metadata": {},
253 |      "output_type": "execute_result"
254 |     }
255 |    ],
256 |    "source": [
257 |     "len(rows)"
258 |    ]
259 |   },
260 |   {
261 |    "cell_type": "markdown",
262 |    "metadata": {},
263 |    "source": [
264 |     "**OK, let's get started building a Conformation Space Network!**\n",
265 |     "\n",
266 |     "---"
267 |    ]
268 |   },
269 |   {
270 |    "cell_type": "markdown",
271 |    "metadata": {},
272 |    "source": [
273 |     "## 1) Initializing CSN objects from count matrices\n",
274 |     "\n",
275 |     "To get started we need a count matrix, which can be a `numpy` array, or a `scipy.sparse` matrix, or a list of lists:"
276 |    ]
277 |   },
278 |   {
279 |    "cell_type": "code",
280 |    "execution_count": 9,
281 |    "metadata": {},
282 |    "outputs": [],
283 |    "source": [
284 |     "our_csn = CSN(count_mat,symmetrize=True)"
285 |    ]
286 |   },
287 |   {
288 |    "cell_type": "markdown",
289 |    "metadata": {},
290 |    "source": [
291 |     "Any of the `CSNAnalysis` functions can be queried using \"?\""
292 |    ]
293 |   },
294 |   {
295 |    "cell_type": "code",
296 |    "execution_count": 10,
297 |    "metadata": {},
298 |    "outputs": [],
299 |    "source": [
300 |     "CSN?"
301 |    ]
302 |   },
303 |   {
304 |    "cell_type": "markdown",
305 |    "metadata": {},
306 |    "source": [
307 |     "The `our_csn` object now holds three different representations of our data.  The original counts can now be found in `scipy.sparse` format:"
308 |    ]
309 |   },
310 |   {
311 |    "cell_type": "code",
312 |    "execution_count": 11,
313 |    "metadata": {},
314 |    "outputs": [
315 |     {
316 |      "data": {
317 |       "text/plain": [
318 |        "<4000x4000 sparse matrix of type '<class 'numpy.float64'>'\n",
319 |        "\twith 62280 stored elements in COOrdinate format>"
320 |       ]
321 |      },
322 |      "execution_count": 11,
323 |      "metadata": {},
324 |      "output_type": "execute_result"
325 |     }
326 |    ],
327 |    "source": [
328 |     "our_csn.countmat"
329 |    ]
330 |   },
331 |   {
332 |    "cell_type": "markdown",
333 |    "metadata": {},
334 |    "source": [
335 |     "A transition matrix has been computed from this count matrix according to: \n",
336 |     "\\begin{equation}\n",
337 |     "t_{ij} = \\frac{c_{ij}}{\\sum_j c_{ij}}\n",
338 |     "\\end{equation}"
339 |    ]
340 |   },
341 |   {
342 |    "cell_type": "code",
343 |    "execution_count": 12,
344 |    "metadata": {},
345 |    "outputs": [
346 |     {
347 |      "data": {
348 |       "text/plain": [
349 |        "<4000x4000 sparse matrix of type '<class 'numpy.float64'>'\n",
350 |        "\twith 62280 stored elements in COOrdinate format>"
351 |       ]
352 |      },
353 |      "execution_count": 12,
354 |      "metadata": {},
355 |      "output_type": "execute_result"
356 |     }
357 |    ],
358 |    "source": [
359 |     "our_csn.transmat"
360 |    ]
361 |   },
362 |   {
363 |    "cell_type": "markdown",
364 |    "metadata": {},
365 |    "source": [
366 |     "where the elements in each column sum to one:"
367 |    ]
368 |   },
369 |   {
370 |    "cell_type": "code",
371 |    "execution_count": 13,
372 |    "metadata": {},
373 |    "outputs": [
374 |     {
375 |      "data": {
376 |       "text/plain": [
377 |        "matrix([[1., 1., 1., ..., 1., 1., 1.]])"
378 |       ]
379 |      },
380 |      "execution_count": 13,
381 |      "metadata": {},
382 |      "output_type": "execute_result"
383 |     }
384 |    ],
385 |    "source": [
386 |     "our_csn.transmat.sum(axis=0)"
387 |    ]
388 |   },
389 |   {
390 |    "cell_type": "markdown",
391 |    "metadata": {},
392 |    "source": [
393 |     "Lastly, the data has been stored in a `networkx` directed graph:"
394 |    ]
395 |   },
396 |   {
397 |    "cell_type": "code",
398 |    "execution_count": 14,
399 |    "metadata": {},
400 |    "outputs": [
401 |     {
402 |      "data": {
403 |       "text/plain": [
404 |        "<networkx.classes.digraph.DiGraph at 0x81f64e050>"
405 |       ]
406 |      },
407 |      "execution_count": 14,
408 |      "metadata": {},
409 |      "output_type": "execute_result"
410 |     }
411 |    ],
412 |    "source": [
413 |     "our_csn.graph"
414 |    ]
415 |   },
416 |   {
417 |    "cell_type": "markdown",
418 |    "metadata": {},
419 |    "source": [
420 |     "that holds the nodes and edges of our csn, and we can use in other `networkx` functions.  For example, we can calculate the shortest path between nodes 0 and 10:"
421 |    ]
422 |   },
423 |   {
424 |    "cell_type": "code",
425 |    "execution_count": 15,
426 |    "metadata": {},
427 |    "outputs": [
428 |     {
429 |      "data": {
430 |       "text/plain": [
431 |        "[0, 1445, 2125, 2043, 247, 1780, 10]"
432 |       ]
433 |      },
434 |      "execution_count": 15,
435 |      "metadata": {},
436 |      "output_type": "execute_result"
437 |     }
438 |    ],
439 |    "source": [
440 |     "nx.shortest_path(our_csn.graph,0,10)"
441 |    ]
442 |   },
443 |   {
444 |    "cell_type": "markdown",
445 |    "metadata": {},
446 |    "source": [
447 |     "---\n",
448 |     "## 2) Trimming CSNs\n",
449 |     "\n",
450 |     "A big benefit of coupling the count matrix, transition matrix and graph representations is that elements can be \"trimmed\" from all three simultaneously.  The `trim` function will eliminate nodes that are not connected to the main component (by inflow, outflow, or both), and can also eliminate nodes that do not meet a minimum count requirement:"
451 |    ]
452 |   },
453 |   {
454 |    "cell_type": "code",
455 |    "execution_count": 16,
456 |    "metadata": {},
457 |    "outputs": [],
458 |    "source": [
459 |     "our_csn.trim(by_inflow=True, by_outflow=True, min_count=20)"
460 |    ]
461 |   },
462 |   {
463 |    "cell_type": "markdown",
464 |    "metadata": {},
465 |    "source": [
466 |     "The trimmed graph, count matrix and transition matrix are stored as `our_csn.trim_graph`, `our_csn.trim_countmat` and `our_csn.trim_transmat`, respectively."
467 |    ]
468 |   },
469 |   {
470 |    "cell_type": "code",
471 |    "execution_count": 17,
472 |    "metadata": {},
473 |    "outputs": [
474 |     {
475 |      "data": {
476 |       "text/plain": [
477 |        "2282"
478 |       ]
479 |      },
480 |      "execution_count": 17,
481 |      "metadata": {},
482 |      "output_type": "execute_result"
483 |     }
484 |    ],
485 |    "source": [
486 |     "our_csn.trim_graph.number_of_nodes()"
487 |    ]
488 |   },
489 |   {
490 |    "cell_type": "code",
491 |    "execution_count": 18,
492 |    "metadata": {},
493 |    "outputs": [
494 |     {
495 |      "data": {
496 |       "text/plain": [
497 |        "(2282, 2282)"
498 |       ]
499 |      },
500 |      "execution_count": 18,
501 |      "metadata": {},
502 |      "output_type": "execute_result"
503 |     }
504 |    ],
505 |    "source": [
506 |     "our_csn.trim_countmat.shape"
507 |    ]
508 |   },
509 |   {
510 |    "cell_type": "code",
511 |    "execution_count": 19,
512 |    "metadata": {},
513 |    "outputs": [
514 |     {
515 |      "data": {
516 |       "text/plain": [
517 |        "(2282, 2282)"
518 |       ]
519 |      },
520 |      "execution_count": 19,
521 |      "metadata": {},
522 |      "output_type": "execute_result"
523 |     }
524 |    ],
525 |    "source": [
526 |     "our_csn.trim_transmat.shape"
527 |    ]
528 |   },
529 |   {
530 |    "cell_type": "markdown",
531 |    "metadata": {},
532 |    "source": [
533 |     "## 3) Obtaining steady-state weights from the transition matrix\n",
534 |     "\n",
535 |     "Now that we've ensured that our transition matrix is fully-connected, we can compute its equilibrium weights.  This is implemented in two ways.\n",
536 |     "\n",
537 |     "First, we can compute the eigenvector of the transition matrix with eigenvalue one:"
538 |    ]
539 |   },
540 |   {
541 |    "cell_type": "code",
542 |    "execution_count": 20,
543 |    "metadata": {},
544 |    "outputs": [],
545 |    "source": [
546 |     "wt_eig = our_csn.calc_eig_weights()"
547 |    ]
548 |   },
549 |   {
550 |    "cell_type": "markdown",
551 |    "metadata": {},
552 |    "source": [
553 |     "This can exhibit some instability, especially for low-weight states, so we can also calculate weights by iterative multiplication of the transition matrix, which can take a little longer:"
554 |    ]
555 |   },
556 |   {
557 |    "cell_type": "code",
558 |    "execution_count": 21,
559 |    "metadata": {},
560 |    "outputs": [],
561 |    "source": [
562 |     "wt_mult = our_csn.calc_mult_weights()"
563 |    ]
564 |   },
565 |   {
566 |    "cell_type": "code",
567 |    "execution_count": 22,
568 |    "metadata": {},
569 |    "outputs": [
570 |     {
571 |      "data": {
572 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEGCAYAAABy53LJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3dd3xUVfrH8c9DaFEpClgoCioWUJcSQeyrorAWUFlBWcv+QGzYRcGK2EBcXQEbinVtrAqyKkYUBBVBQhcQBUQpKiAdIiTh+f1xb3AYQzKRTGYm+b5fr3kxc+65N89hNA/nnnPPMXdHREQkVhUSHYCIiKQWJQ4RESkWJQ4RESkWJQ4RESkWJQ4RESmWiokOoDTUrl3bGzZsmOgwRERSytSpU1e5e53o8nKROBo2bEhWVlaiwxARSSlm9kNB5bpVJSIixaLEISIixaLEISIixaLEISIixaLEISIixVIuZlWJiJQnI6cvY2DmfJavzaZuzXR6nXEoHZvXK7HrK3GIiJQhI6cvo887s8nOyQNg2dps+rwzG6DEkkdcb1WZWTszm29mC8ysdwHHq5jZm+HxyWbWMCxvZWYzwtdMMzs31muKiJRnAzPnk52TxyErF9Nr/EvgTnZOHgMz55fYz4hb4jCzNOAJoD3QBLjQzJpEVesGrHH3g4HHgAFh+ddAhrs3A9oBz5hZxRivKSJSbq38dT3Xf/4a7714A11mZrLfhlUALF+bXWI/I563qloBC9x9EYCZvQF0AOZG1OkA9A3fvwUMMTNz980RdaoC+btNxXJNEZHyacoURr9yEwf98j0jm5xEv1N7sHq3GgDUrZleYj8mnreq6gFLIj4vDcsKrOPuucA6oBaAmbU2sznAbODK8Hgs1yQ8v4eZZZlZ1sqVK0ugOSIiSWrzZrjlFjjmGOp6Nld17ssNZ/fanjTSK6XR64xDS+zHJe10XHef7O5NgaOBPmZWtZjnD3X3DHfPqFPnD2t0iYiUDePGwZFHwr/+BZdfTvq333DGbd2pVzMdA+rVTOeh845MmVlVy4AGEZ/rh2UF1VlqZhWBGsCvkRXcfZ6ZbQSOiPGaIiJl37p1cOutMHQoHHRQkEBOPhmAjs1rlGiiiBbPHscUoLGZNTKzykAXYFRUnVHApeH7TsBYd/fwnIoAZnYAcBiwOMZrioiUbf/7HzRpAs89F9yimjVre9IoDXHrcbh7rpn1BDKBNOB5d59jZv2ALHcfBQwDXjGzBcBqgkQAcDzQ28xygG3A1e6+CqCga8arDSIiSWXlSrj+enj99eD21MiRcPTRpR6GuXvRtVJcRkaGaz8OEUlZ7kGyuO46WL8e7roLbrsNKleO6481s6nunhFdrifHRUSS2ZIlcNVV8P770Lo1DBsGTZsmNKSknVUlIlKubdsGzzwTJIlx4+Cxx+CLLxKeNEA9DhGR5PPdd3D55TB+PJx6ajBz6sADEx3VdupxiIgki9xceOQROOoomDEjmDU1ZkxSJQ1Qj0NEJDnMmgXdukFWFnToAE8+CXXrJjqqAqnHISKSSFu2wN13Q8uW8OOPMHw4jBiRtEkD1OMQEUmcSZOCXsbcuXDxxcEAeK1aiY6qSOpxiIiUtk2b4MYb4dhjYcMG+OADePnllEgaoB6HiEjp+uSTYMbU99/D1VfDQw9B9eqJjqpY1OMQESkNa9dC9+5w2mlQsWIw1faJJ1IuaYASh4hI/L37brAo4YsvBkuFzJwJJ56Y6Kj+NN2qEhGJl19+CdaXGj4c/vKXYFXbli0THdUuU+IQEdlFI6cvY2DmfJavzaZuzXR6nX4IHeeMgxtugI0b4YEHoFcvqFQp0aGWCCUOEZFdMHL6Mvq8M5vsnDwA/Mcf2OuC22BhFrRpEyxKePjhCY6yZClxiIjsgoGZ88nOycN8G12nj6b3+Bcxdx47uyc3jvg3pKUlOsQSp8QhIlJMkbemHGi0ehn9Rw+i9dI5TGjYnNvb9WRZjX24sQwmDVDiEBEplshbU2nb8rj8qxHc+Pmr/FaxMrf87QbeOuJUMKNezfREhxo3ShwiIsWQf2uqyS+LGDD6cY78ZSGjDzmWu9texco99gQgvVIavc44NMGRxo8Sh4hIMaxatY5bJr7BlZPeYs1u1bmyYx8+PPQ4AAyCWVVnHErH5vUSG2gcKXGIiMRq4kQyX76Bhit/5K0jTuW+U7qzLr0aAPVqpvNF71MSHGDpUOIQESnKxo1w++0wZAh771OX7hfez8f7N9t+uKzfmoqmJUdERArz0UdwxBEwZAj07Mlu387jrF6XUa9mOkbQ03jovCPL9K2paOpxiIgUZPVquPnmYH2pQw+Fzz6D44KxjI7Nq5WrRBFNPQ4RkWhvvx0sSvjKK8EtqhkzticNiXPiMLN2ZjbfzBaYWe8CjlcxszfD45PNrGFY3tbMpprZ7PDPUyLO+TS85ozwtXc82yAi5cjPP0OnTsGrbt1g/+8HHoCqVRMdWVKJW+IwszTgCaA90AS40MyaRFXrBqxx94OBx4ABYfkq4Gx3PxK4FHgl6ryu7t4sfK2IVxtEpJxwD25JNWkC770XbK40eTI0a1bkqeVRPHscrYAF7r7I3bcCbwAdoup0AF4K378FnGpm5u7T3X15WD4HSDezKnGMVUTKq8WLoV07+Oc/oWnTYK+M3r3LzEq28RDPxFEPWBLxeWlYVmAdd88F1gHRm+6eD0xz9y0RZS+Et6nuMjMr6IebWQ8zyzKzrJUrV+5KO0SkLNq2DQYPDmZMTZwY7MY3fnwwEC6FSurBcTNrSnD76oqI4q7hLawTwtfFBZ3r7kPdPcPdM+rUqRP/YEUkdcybByecEGyydMIJ8PXXwf7fFZL6V2LSiOff0jKgQcTn+mFZgXXMrCJQA/g1/FwfGAFc4u4L809w92XhnxuA1whuiYmIFC0nBx58MBi7+OYbePll+OADOOCAREeWUuKZOKYAjc2skZlVBroAo6LqjCIY/AboBIx1dzezmsD7QG93/yK/splVNLPa4ftKwFnA13Fsg4iUFdOmQatWcMcd0KEDzJ0LF18MBd/tlkLELXGEYxY9gUxgHjDc3eeYWT8zOyesNgyoZWYLgJuA/Cm7PYGDgbujpt1WATLNbBYwg6DH8my82iAiZUB2NvTpEySNn3+Gd94J9gDfZ59ER5ayzN0THUPcZWRkeFZWVqLDEJHS9vnn0K0bfPst/N//wSOPwJ57JjqqlGFmU909I7pcI0EiUvZs2AA9ewYD31u3wpgxwd7fSholQolDRMqW0aOD5zGefBJuuCGYMXXaaYmOqkzRIocikrIi9/4+rNJWnpn+Kvu//zYcfjh88QW0aZPoEMskJQ4RSUnb9/7emsvf5n/BvWOepuZvG/jm8us5bPAAqKLFJuJFiUNEUtLAzPnssWYl//7oSc74bhKz9j2YSzr3Y/2BTflCSSOulDhEJPW4c9yEd7lz7DAq5+Xw4Mn/ZNjRHcmrkIatzU50dGWeEoeIpJZFi6BHDx7+5BMmNziC29pdy+K9fl8Gr27N9AQGVz4ocYhIasjLCxYlvOMOSEtjxu0P8U87is25vz+LVt72/k4UTccVkeQ3dy4cfzzceCOcfDLMmUOzB3rz4Pl/Kdd7fyeKehwikry2boUBA+D++6FaNfjPf+Cii7avL9WxeT0ligRQ4hCR5JSVFSwXMmsWdOkCjz8Oe2un6GSgW1Uiklw2b4Zbb4XWrWHVKnj3XXj9dSWNJKIeh4gkj/HjoXt3WLAALr8cBg6EGjUSHZVEUY9DRBJv/Xq46qpg4HvbNvjkExg6VEkjSanHISKlKnJ9qbo103lkt6W0GXgHLF8ON90E990Hu+2W6DClEEocIlJqtq8vlZPHnpvXccv/HqHN3E9Zf9ChVJ84MRjXkKSnxCEipWZg5nyyt+Zy9rwJ9P34Gapt2cy/j7uQEe0uZbySRspQ4hCRUpO3ZAnPfvQUbRdMZsZ+jbmt/fXMr9MQ25ib6NCkGJQ4RCT+3OG55/h42I2k5eVy31+78ULGOWyrkAZofalUo8QhIvG1cGEwtXbcOLIzjqVrq258W22f7Ye1vlTq0XRcEYmPvDx49FE48kiYOhWGDqXOV59zdfcztL5UilOPQ0RK3tdfB8uFfPUVnH02PPUU1AuSg9aXSn3qcYhIydm6Ffr2hRYtgn0zXn89WDKknhJFWRLXxGFm7cxsvpktMLPeBRyvYmZvhscnm1nDsLytmU01s9nhn6dEnNMyLF9gZoPMwmUyRSSxvvoqSBj33gsXXADz5gWLE+p/0TInbonDzNKAJ4D2QBPgQjNrElWtG7DG3Q8GHgMGhOWrgLPd/UjgUuCViHOeAi4HGoevdvFqg4jEYPNmuPlmaNMG1q2D994Llj+vXTvRkUmcxLPH0QpY4O6L3H0r8AbQIapOB+Cl8P1bwKlmZu4+3d2Xh+VzgPSwd7IfUN3dJ7m7Ay8DHePYBhEpzLhxweD3o49Cjx4wZw6ceWaio5I4i2fiqAcsifi8NCwrsI675wLrgFpRdc4Hprn7lrD+0iKuKSLxtm5dkChOOQUqVIBPPw0GwKtXT3RkUgqSenDczJoS3L664k+c28PMsswsa+XKlSUfnEh59b//QZMmMGwY9OoFM2fCSSclOiopRfFMHMuABhGf64dlBdYxs4pADeDX8HN9YARwibsvjKhfv4hrAuDuQ909w90z6tSps4tNERFWrIALL4RzzoFatWDyZHj4Ya1kWw7FM3FMARqbWSMzqwx0AUZF1RlFMPgN0AkY6+5uZjWB94He7v5FfmV3/wlYb2bHhLOpLgHejWMbRMQdXn016GW8/Tb06xds65qRkejIJEGKTBxm9vdYyqKFYxY9gUxgHjDc3eeYWT8zOyesNgyoZWYLgJuA/Cm7PYGDgbvNbEb4yt838mrgOWABsBAYXVQsIvInLVkSPMD3j39A48YwYwbcdRdUrpzoyCSBLJicVEgFs2nu3qKosmSWkZHhWVlZiQ5DJHVs2xbswHfrrcHSIQ8+CD17QlpaoiOTUmRmU939D13LnS45Ymbtgb8B9cxsUMSh6oDWQBYpq777LliUcPx4OPXUIIEceGCio5IkUtitquVAFvAbMDXiNQo4I/6hiUipys2FgQPhqKOCW1LDhsGYMUoa8gc77XG4+0xgppm95u45pRiTiJS2mTODRQmnToWOHeGJJ6Bu3URHJUkqlllVrcxsjJl9a2aLzOx7M1sU98hEJP62bAkGuzMygoHw4cPhnXeUNKRQsSyrPgy4keA2VV58wxGRUvPll0EvY948uOSSYNmQWtELN4j8USyJY527a8qrSFmxaRPccQcMGgT168MHH0D79omOSlJIYbOq8qfbjjOzgcA7wJb84+4+Lc6xiUhJ+/jjYMbU4sVwzTXw0ENQrVqio5IUU1iP419RnyPn8jpwCiKSGtasgVtugeefDx7kmzABTjgh0VFJiipsVtVfSzMQEYmTESPg6qth5Uro3RvuvhvS0xMdlaSwIsc4zOymAorXAVPdfUbJhyQiJeKXX+Daa+G//4VmzeD994Md+kR2USzTcTOAKwn2vahHsMR5O+BZM7s1jrGJyJ/hDi+/DIcfHuz3/cADv2/rKlICYplVVR9o4e4bAczsHoKVa08kmKL7cPzCE5Fi+fFHuOIK+PBDOPbY4Onvww5LdFRSxsTS49ibiNlUQA6wj7tnR5WLSKJs2xY87d20KXz2WTDV9rPPlDQkLmLpcbwKTDaz/H0vzgZeM7Pdgblxi0xEYjN/PnTvDp9/Dm3bBosSNmyY6KikDCsycbj7fWY2GjguLLrS3fPXKO8at8hEZAcjpy9jYOZ8lq/Npm7NdG495UA6jH0D+vYNduF78cXgCXCzRIcqZVxhDwBWd/f1ZrYXsCh85R/by91Xl0aAIhIkjT7vzCY7J1j1p+b8r2n8eA/4eSGcfz4MGQL77pvgKKW8KKzH8RpwFsEAuAMW9afWWhYpJQMz55Odk0eV3K1cO/ENrpz0Fmt2q84dXe/hgf/0TXB0Ut4U9gDgWeGfjUovHBEpyPK12bRcOpeHRw/ioNVL+e8Rp3H/Kd1Yn16NBxIdnJQ7sTwAaARjGY3C8Y79gX3d/au4RycisHEjD08YxvlfjmR59TpcfEE/PmsUPJNRr6aeAJfSF8usqieBbQRrU90HbADeBo6OY1wiApCZCT160GnJEv5z9Nk8dPzFbK4cJIv0Smn0OuPQBAco5VEsz3G0dvdrCLaQxd3XAJXjGpVIebd6NVx2GbRrB7vthn32GdWeeZI9994LI+hpPHTekXRsXi/RkUo5FEuPI8fM0ggGxDGzOgQ9EBGJh7ffDpY8X7Uq2DfjzjuhalU6ghKFJIVYEscgYASwt5k9AHQC7oxrVCLl0U8/Qc+ewdatzZsHy4Y0a5boqET+IJYHAF81s6nAqQRTcTu6+7y4RyZSxm1/oG/NZrovmkCvzKFU3vob9O8PN98MFWP5d51I6YtlVtV9wATgRXffVJyLm1k74HEgDXjO3ftHHa8CvAy0BH4FOrv7YjOrBbxFMAD/orv3jDjnU2A/IDssOt3dVxQnLpFEy3+gr9aq5bz04RBOXDydrAZNWfv4k5x27omJDk+kULH8k2YRcCEwyMw2AJ8BE9z93cJOCsdFngDaAkuBKWY2yt0j17fqBqxx94PNrAswAOhMMBB/F3BE+IrWNWLZE5GU86/Rc+n85Qh6TXgZN+POtlfxavP21J2fy2mJDk6kCLHcqnoBeMHM9gUuAG4BegBFbVTcCljg7osAzOwNoAM7LozYAegbvn8LGGJmFvZsPjezg4vRFpHUMG8ejz15PRnL5vFpo5bc3u4allffGwge9BNJdkVOxzWz58xsIvAUQaLpBOwZw7XrAUsiPi8Nywqs4+65BDsL1orh2i+Y2Qwzuyt8QLGguHuYWZaZZa1cuTKGS4rEWU5OsKlSs2Y0Xr2UG8+8icv+3nd70gCoqwf6JAXE8hxHLYIxirXAamBV+Es+Ubq6+5HACeHr4oIquftQd89w94w6deqUaoAifzBtGhx9dDC1tmNHJo6awIfN2+6wkq0e6JNUUWTicPdz3b01wU5/NYFxZrY0hmsvAxpEfK4flhVYx8wqAjUIBskLi2dZ+OcGgoUYW8UQi0hiZGdD797QqlWwB/iIEfDmm7Q/rRkPnXck9Wqm64E+STmxzKo6i+Bf9icSJI6xBAPkRZkCNDazRgQJogtwUVSdUcClwJcEt8DGursXEktFoKa7rzKzSgSr934cQywipW/ChGCDpe++g27dYOBA2PP3u7wdm9dTopCUFMusqnYEieJxd18e64XdPdfMegKZBLe6nnf3OWbWD8hy91HAMOAVM1tAcBusS/75ZrYYqA5UNrOOwOnAD0BmmDTSCJLGs7HGJFIq1q+HPn3gySehUSMYMwZO01wpKTuskH/glxkZGRmelaXZu1IKRo+GK66ApUvh+uvh/vth990THZXIn2JmU909I7pcj6aKlIRff4Ubb4RXXoEmTWDiRDjmmERHJRIXscyqEpGdcYfhw+Hww+H11+Guu4IZVEoaUobF8hzH9bGUiZQ7y5fDuedC586w//4wdSr06wdVqiQ6MpG4iqXHcWkBZZeVcBwiqcMdhg0LbkllZsLDD8OkSXDUUYmOTKRU7HSMw8wuJJg+28jMRkUcqkYwA0qk/Fm0CC6/HMaOhZNOgmefhcaNEx2VSKkqbHB8IvATUBv4V0T5BmBWPIMSSTp5eTB4cLCxUloaPP10kEAqaJhQyp+dJg53/4HguYk2pReOSBKaMyd4gG/yZDjzzCBp1K+f6KhEEqawW1UbCLeLjT4EuLtXj1tUIslg69ZgU6X774fq1eHVV+HCC3dYX0qkPCqsx1HUsukiZdeUKUEvY/Zs6NIFBg0CLZYpAsS2VtX+BZW7+48lH45I6dq+fevabOrWTKf3iftz9ohn4NFHYd994d134ZxzEh2mSFKJ5cnx9yPeVwUaAfOBpnGJSKSU5G/fmp2TB0CDWZM56uGLYc1y6NEjmGZbo0aCoxRJPrHsAHhk5GczawFcHbeIRErByOnLuHn4TPLcqbZlE70/fYGuMz5kcc39uLb7Iwx+5uZEhyiStIq9VpW7TzOz1vEIRqQ05Pc08tz568IpPPjhEPbetIahR5/Loyd0ZUulqgxOdJAiSSyWMY6bIj5WAFoAMS+vLpJsBmbOJ33dah76ZCgd547nm9oHcOW5tzOzbrD7Xj1t3ypSqFh6HJGzq3IJxjzejk84InHmTsuJH3LPx89QbctmHjvuIp5s83dy0ioB2r5VJBaxjHHcWxqBiMTd0qVw1VUMeu89Zux3CLe2v45v6zTcfjjNTNu3isSgsAcAR+3sGIC7a46iJK3Iabb1qldh8OapNB/8IOTkMPume+ha9Wg25f1eP71SmpKGSIwK63G0AZYArwOTCZ4YF0l6kdNsD1iznP6vD6b5j7NZmXEsdd54mSMPOogHop7f6HXGoUoaIjEqLHHsC7QF8lfJfR943d3nlEZgIn/WwMz5bN2yle5Z73LzZ6+SUyGN29pdy+cnduCLgw4CoGPzekoUIn9SYUuO5AEfAh+aWRWCBPKpmd3r7kNKK0CR4qr23TyeGP04zX76jjEHt+LO06/ml2q1sXW/JTo0kTKh0MHxMGGcSZA0GgKDgBHxD0vkT9iyBR58kPdefIC1Vfeg5zm38t5hJ2xflLCuptmKlIjCBsdfBo4APgDudfevSy0qkeKaPDlYlHDOHH5qfy4XHN6Znyrtsf2wptmKlJzCdqH5B9AYuB6YaGbrw9cGM1tfOuGJFGHTJrjpJmjTBtatg/feo8EH73DbP46nXs10jOCBPs2YEik5hY1xaGszSW5jxwa78C1aBFddFeydUT3YJkaD3yLxE9fkYGbtzGy+mS0ws94FHK9iZm+GxyebWcOwvJaZjTOzjWY2JOqclmY2OzxnkJl21Sl31q4NEsappwZbt376KTz55PakISLxFbfEYWZpwBNAe6AJcKGZNYmq1g1Y4+4HA48BA8Ly34C7gFsKuPRTwOUEt9EaA+1KPnpJWu++C02awPPPw623wqxZcNJJiY5KpFyJZ4+jFbDA3Re5+1bgDaBDVJ0OwEvh+7eAU83M3H2Tu39OkEC2M7P9gOruPsndHXgZ6BjHNkiyWLEi2ImvY0eoXTsYDB8wANI1U0qktMUzcdQjePI839KwrMA67p4LrANqFXHNpUVcEwAz62FmWWaWtXLlymKGLknDHf7zHzj8cBgxAu67D7KyICMj0ZGJlFtldgDc3Ye6e4a7Z9TRXtGpackSOOssuPhiOOQQmD4d7rwTKldOdGQi5VqxN3IqhmVAg4jP9cOyguosNbOKQA3g1yKuWb+Ia0oKil6UcMjGKTQb8hDk5cG//w09e0JaWqLDFBHimzimAI3NrBHBL/cuBGteRRoFXAp8CXQCxoZjFwVy95/CZ0mOIVh48RLQZm2pLnJRwkarl9H/tcE0W/I1K1qfwN6vvwSNGiU6RBGJELfE4e65ZtYTyATSgOfdfY6Z9QOy3H0UMAx4xcwWAKsJkgsAZrYYqA5UNrOOwOnuPpdgv/MXgXRgdPiSFJa/KOEVU0Zw4+evsTWtEr3aX8fEE87hCyUNkaQTzx4H7v4BwZIlkWV3R7z/Dfj7Ts5tuJPyLIKlUKSMqPHtHJ7+4HGO/GUhmY2P4a62V7GiWi0tSiiSpOKaOESiRY5lHLBHGs/88CGjXhrCmqrVuKpDb0YfepwWJRRJckocUmoixzJaLJvHgNGDaPzrEqaffDZXt/gHP1XafXtdLUookryUOKTUDMycj23ayN0TXuGyqf9jefXaXPr3e1nQ4nhuO+NQ7cgnkiKUOKTUHDh9Ig9mDqHBul94qcWZPHzipWyqshu2NluLEoqkECUOib81a+Dmm3ll+Ass3Ksef7+oP1Ma/D6/QWMZIqlFiUNKVOTgd92a6TxaaRGtH7kLVq5k/j97csE+bVnnvz/Ip7EMkdSjxCElJnLwu87GNdw+8iFaz/+CtYc2peb773NoixbcG5VYNJYhknqUOGSX5fcylq3NBnfOmzOWuz95lvScLTx84iW8d3pXJrRoAWiDJZGyQIlDdklkL6PeuhU8mDmEk76fRla9w7mt/XUsrNUA25CT6DBFpAQpccguGZg5n9+25nDJtPe5bXywtcrdp13BKy3OxC1YfFmD3yJlixKH7JKqC79j+OhBHL1sLhMaNuf2dj1ZWmOf7cc1+C1S9ihxyJ+TkwOPPMIHL95DdsUq3Py3G3n7iFO2LxcCUE+D3yJlkhKHFN/06dCtG0yfzqrTzqTLERexpEqN7YfTK6Xx0HlHKmGIlFFKHFKoO0fO5vXJS8hzJz0vh6e/f4+TRr4Y7Pv99tvUO+88btYUW5FyRYlDdurOkbP5z6QfAchYOocBowdx0OplTD2lIy3feh723BPQFFuR8kaJQ3YQ+eS3A7tv2cytE17i0mnvs7T63lx8QT8mHtiShWHSEJHyR4lDtrtz5GxenfQj+Xv3nrhoKg9mDqHu+lW80PJsBp54CZsrp8POd/cVkXJAiUOAoKeRnzRqZG/g7rHPcv7XY1mwV306dX2YafUP3143LWLmlIiUP0ocAgQP8jnQ/pvP6TfmaWr+toHBbToz5NjObKlYeYe6F7ZukJggRSQpKHEIADlLl/H0mKdo9+2XzN7nIC69oB9z9zlwhzppZlzYugH3dzwyQVGKSDJQ4ijv3OHFF/l42HVUydlC/5Mu49lW55JXIVj63IDHOjfTrCkR2U6Jo5yJnDXVcttanpzwDHtP/oytzVvT+ZjuzKu+3/a6BnQ9Zn8lDRHZgRJHOTFy+jL6jprD2uwcKmzL49Jp73PrhJdwq8CMPg/S7P7buGLmT3qQT0SKFNfEYWbtgMeBNOA5d+8fdbwK8DLQEvgV6Ozui8NjfYBuQB5wnbtnhuWLgQ1hea67Z8SzDWVB5NLnB61awsOjH6fl8m8Yd2BL7jjjGqz6AXxRoYIe5BORmMQtcZhZGvAE0BZYCkwxs1HuPjeiWjdgjbsfbGZdgAFAZzNrAnQBmgJ1gY/N7BB3zwvP+6u7r4pX7HvuPSoAABBDSURBVGXNwMz55Py2hZ6T3+LaiW+wuVI6N5x1MyObnAxm2NrsRIcoIikknj2OVsACd18EYGZvAB2AyMTRAegbvn8LGGJmFpa/4e5bgO/NbEF4vS/jGG+Ztdc3s3nug39z+MrFvHfYCdxz2hX8unvN7ce1X4aIFEc8E0c9YEnE56VA653VcfdcM1sH1ArLJ0Wdm38PxYGPzMyBZ9x9aEE/3Mx6AD0A9t9//11rSarKzoa+fRn58iOs2r0mPc69g48OabNDFe2XISLFlYqD48e7+zIz2xsYY2bfuPuE6EphQhkKkJGRUf7WyJgwAbp3h+++Y0nHLlxw8PmsSNuxZ7HnbpW45+ymGtcQkWKpEMdrLwMiHzGuH5YVWMfMKgI1CAbJd3quu+f/uQIYQXALS/KtXw9XXw0nnQS5ufDxxzQc8Tq3X9SGejXTMYINlv7duRnT7z5dSUNEii2ePY4pQGMza0TwS78LcFFUnVHApQRjF52Ase7uZjYKeM3MHiUYHG8MfGVmuwMV3H1D+P50oF8c25BaPvgArrwSli6FG2+E++6D3XcHtPS5iJScuCWOcMyiJ5BJMB33eXefY2b9gCx3HwUMA14JB79XEyQXwnrDCQbSc4Fr3D3PzPYBRgTj51QEXnP3D+PVhmQW+SDf4ZW28sy0/9Dgg3egSROYOBGOOSbRIYpIGWVeDpbIzsjI8KysrESHUWK2P5exNZezvvmMvh8/Q43fNrKw27UcNrg/VKmS6BBFpAwws6kFPSuXioPj5VJkD6OCGbXWr2LQR0/SdsFkZu7bmH90vp8NBzbhCyUNEYkzJY4U0PXZL/li4erggzudZmZyx7jnqZyXwwMn/x/PH92BvAppepBPREqFEkcSi9zzG6DB2p/p/+EgjvthFpMaHMFt7a/jhz3rbj+uB/lEpDQocSSpto9+yncrNgFQYVse/5z6P26Z8Aq5FSrQ54yevPGX03H7fTa1HuQTkdKixJFkgoHvWWTnbAPgkJWLGTB6MM1/ms8nBx3NHadfw8/VawPBxkrb3LWSrYiUKiWOJHLnyNnb9/2ulJfD1V/+l2u+HM6GKrtx3dm9GHX4iRCx3/e/LviLkoWIlDoljiQxcvqy7UnjqJ++5eEPHuewVT/w7uEnce9pPVi9W40d6h930F5KGiKSEEocCTRy+jLuGDGbTVuD1eKr5vzGTZ+9Sresd1mx+550O/8uPjl4x3Uh83fl077fIpIoShwJssMUW+CYH2fRf/RgGq79iVebtaP/yf9kQ5Xddzin8d67M+amk0s5UhGRHSlxlLLoXka1LZvoM+4FLpr5IYtr7seFXR7kywOO2uGcCgYXtVYvQ0SSgxJHKYruZZy6YDIPZD5BnU1reabVeTx2/EX8Vqnq9uO6LSUiyUiJoxSMnL6MG96csf3zXpvXcc/HQ+kwbzzf1D6AK869g5l1d3wGo56m2IpIklLiiLPIB/lw55x54+n78VD22LKZR4/vylPHdCInrdIO5/xDvQwRSWJKHHESfVtq3/WruP+jJzht4RSm73cot7a/ju/qHPCH8447aC8lDRFJakocJSz6tpT5Ni6cmUmfcc9Tcds27julOy+0PJttFdJ2OC+9UgUeOu8o3ZoSkaSnxFGCDu7zPrkR25scsGY5/T8cTJsfZ/PFAUfRu911LKm57w7nHHfQXrx6eZtSjlRE5M9T4igB0avYpm3L4/+mvMvNn/+HrRUqclu7a3nzqNN3WC4EoHqVNCUNEUk5Shy7KLqXcdiK7xkwehB/+fk7xhzcmjtPv4pfqtX+w3kVDWbd264UIxURKRlKHH9SdMKonJvDNV8O5+pJw1lXdQ96nnMr7x12wh96GaAnwEUktSlx/AkNe7+/w+fmy75hwOhBHPLrj7zT9K/cd0p31kQtSgjBA33f9z+zlKIUEYkPJY5iiE4Y6Vt/4+bPXuH/skbxc7VaXNbpHj496OgCz92nWmUm39G2NMIUEYkrJY4YRSeNYxfPoP+Hg9l/3S+80vxvDDjpMjZW2a3AcxerlyEiZYgSRxGiE0b13zZy+7jn6TLrIxbtWZcLLurPVw2OKPBcjWWISFmkxFGI6KTR9rtJ3P/Rk9TetJanW5/PY8ddxJZKVQo8V70MESmrKsTz4mbWzszmm9kCM+tdwPEqZvZmeHyymTWMONYnLJ9vZmfEes2SEpk0am9aw5B3B/DsO/ezOr06HS/+F/1P/meBSWOfapWVNESkTItbj8PM0oAngLbAUmCKmY1y97kR1boBa9z9YDPrAgwAOptZE6AL0BSoC3xsZoeE5xR1zV22PWm403Hup9zz8VB2y8lm4AkX80zr88lNK/ivTQlDRMqDeN6qagUscPdFAGb2BtABiPwl3wHoG75/CxhiZhaWv+HuW4DvzWxBeD1iuGaJqJiXy9B37ueURVlMrXsYt7a/noW1G+y0vpKGiJQX8Uwc9YAlEZ+XAq13Vsfdc81sHVArLJ8UdW7+6n9FXRMAM+sB9ADYf//9ix18blpFFu1VjwmNWvByizP/sChhPiUMESlvyuzguLsPBYYCZGRkeBHVC3T/qZcXelxJQ0TKo3gmjmVA5L2d+mFZQXWWmllFoAbwaxHnFnXNuFPCEJHyLJ6zqqYAjc2skZlVJhjsHhVVZxRwafi+EzDW3T0s7xLOumoENAa+ivGau2xnieHfnZspaYhIuRe3Hkc4ZtETyATSgOfdfY6Z9QOy3H0UMAx4JRz8Xk2QCAjrDScY9M4FrnH3PICCrhmP+JUgREQKZsE/8Mu2jIwMz8rKSnQYIiIpxcymuntGdHlcHwAUEZGyR4lDRESKRYlDRESKRYlDRESKpVwMjpvZSuCHP3l6bWBVCYaTKGpH8ikrbVE7kktJtuMAd68TXVguEseuMLOsgmYVpBq1I/mUlbaoHcmlNNqhW1UiIlIsShwiIlIsShxFG5roAEqI2pF8ykpb1I7kEvd2aIxDRESKRT0OEREpFiUOEREplnKVOMysnZnNN7MFZta7gONVzOzN8PhkM2sYcaxPWD7fzM6I9ZrxEqe2LDaz2WY2w8xKZVXIP9sOM6tlZuPMbKOZDYk6p2XYjgVmNijcjjgV2/FpeM0Z4WvvJG5HWzObGv69TzWzUyLOSaXvo7B2lPr3sYttaRUR60wzOzfWaxbJ3cvFi2AZ9oXAgUBlYCbQJKrO1cDT4fsuwJvh+yZh/SpAo/A6abFcM1XaEh5bDNROke9kd+B44EpgSNQ5XwHHAAaMBtqnaDs+BTJS5PtoDtQN3x8BLEvR76OwdpTq91ECbdkNqBi+3w9YQbCVxi7/3ipPPY5WwAJ3X+TuW4E3gA5RdToAL4Xv3wJODf911AF4w923uPv3wILwerFcM1Xakgh/uh3uvsndPwd+i6xsZvsB1d19kgf/x7wMdIxrK+LQjgTZlXZMd/flYfkcID38l3CqfR8FtiPO8RZmV9qy2d1zw/KqQP5MqF3+vVWeEkc9YEnE56VhWYF1wr/wdUCtQs6N5ZrxEI+2QPAf1kdhF71HHOKOtivtKOyaS4u4ZkmLRzvyvRDearirFG7xlFQ7zgemufsWUvv7iGxHvtL8PnaIM1SstphZazObA8wGrgyP7/LvrfKUOKRox7t7C6A9cI2ZnZjogMq5ru5+JHBC+Lo4wfEUycyaAgOAKxIdy67YSTtS7vtw98nu3hQ4GuhjZlVL4rrlKXEsAxpEfK4flhVYx8wqAjWAXws5N5ZrxkM82oK75/+5AhhB/G9h7Uo7Crtm/SKuWdLi0Y7I72MD8BpJ/n2YWX2C/24ucfeFEfVT6vvYSTsS8X3sEGfoT/235e7zgI2E4zYxXLNwpTnQk8gXwaDQIoIB4fwBoaZRda5hx0Gm4eH7puw4oLyIYICpyGumUFt2B6qFdXYHJgLtkrUdEccvo+jB8b+lWjvCa9YO31ciuHd9ZbK2A6gZ1j+vgOumzPexs3Yk4vsogbY04vfB8QOA5QQr5+7y7624NjrZXsDfgG8JZhTcEZb1A84J31cF/kswYPwVcGDEuXeE580nYlZIQddMxbYQzLCYGb7mlFZbdrEdi4HVBP+SWko4MwTIAL4OrzmEcIWEVGoHQfKeCswKv4/HCWe/JWM7gDuBTcCMiNfeqfZ97Kwdifo+drEtF4exzgCmAR0Lu2ZxXlpyREREiqU8jXGIiEgJUOIQEZFiUeIQEZFiUeIQEZFiUeIQEZFiUeKQMsPM8iJWA52Rv+qnmT1nZk2SIL6NCf75GWY2qIg6Dc3s650cu8zM6sYnOkklFRMdgEgJynb3ZtGF7t49EcEkG3fPAnZlufzLCJ7HWF5EPSnj1OOQMi/cRyEjfN/NzL41s6/M7Nn8PTDMrI6ZvW1mU8LXcWF5XzN7PrzGIjO7Lizvb2bXRPyMvmZ2i5ntYWafmNm0cE+HP6w6amYnm9l7EZ+HmNll4fuWZjY+XGgyM1xdNvLcNDP73gI1w17WieGxCWbW2Mx2D2P+ysym58cQ+XPD9o4xszlhj+wHM6sd/pi08O9mjpl9ZGbpZtaJ4EG+V8PeXHrJfDuSipQ4pCxJj7pV1TnyYHib5S6C5S+OAw6LOPw48Ji7H02wKupzEccOA84gWJvoHjOrBLwJXBBR54Kw7DfgXA8Wi/wr8K9YV1ENrzsY6OTuLYHngQci67h7HsET/00I9vGYBpwQLv3dwN2/I1gZYKy7twpjGGhmu0f9uHvCOk0Jls/YP+JYY+CJ8Nha4Hx3f4ugt9LV3Zu5e3YsbZKySbeqpCwp8FZVhFbAeHdfDWBm/wUOCY+dBjSJ+B1f3cz2CN+/78HS2lvMbAWwj7tPN7O9w2RUB1jj7kvCX/4Phr2AbQTLVe8D/BxD/IcSLEI3JowjDfipgHqfAScSrDX0EHA5MB6YEh4/HTjHzG4JP1dlx8QAQdI5F8DdPzSzNRHHvnf3GeH7qUDDGGKXckSJQyRQATjG3aM3hgKI3I8hj9//v/kv0AnYl6C3AdCVIJG0dPccM1tM8Is7Ui479vbzjxswx93bFBHrBOAqoC5wN9ALOJkgoeRf53x3nx/Vln2KuG6+6PbqtpTsQLeqpDyZApxkZnuGy0+fH3HsI+Da/A9mVljPJd+bBKuRdiJIIhAsab0iTBp/JViVNNoPBL2bKmZWEzg1LJ8P1DGzNmEMlcJ9IaJ9BRwLbAsT3QyCfSMmhMczgWvzb5GZWfMCrvEF4a02Mzsd2DOG9m4AqsVQT8o4JQ4pS6LHOPpHHvRgP4UHCX7xfkGwKu268PB1QIaZzTKzuQR7gBfK3ecQ/CJd5u75t5ReDa8zG7gE+KaA85YAwwlmKA0HpoflWwmS0AAzm0mQEI4t4PwtBDu4TQqLPgvjmB1+vo9g6e9ZFuz+dl8B4d8LnB5Ovf07wa20DUU0+UXgaQ2Oi1bHlXLFzPZw941hj2ME8Ly7j0h0XKUtHEzPc/fcsIfzVBHjQyLbaYxDypu+ZnYawbjCR8DIBMeTKPsDw82sArCVYIBdJCbqcYiISLFojENERIpFiUNERIpFiUNERIpFiUNERIpFiUNERIrl/wF5Y9VTEeC7QgAAAABJRU5ErkJggg==\n",
573 |       "text/plain": [
574 |        "<Figure size 432x288 with 1 Axes>"
575 |       ]
576 |      },
577 |      "metadata": {
578 |       "needs_background": "light"
579 |      },
580 |      "output_type": "display_data"
581 |     }
582 |    ],
583 |    "source": [
584 |     "import matplotlib.pyplot as plt\n",
585 |     "%matplotlib inline\n",
586 |     "\n",
587 |     "plt.scatter(wt_eig,wt_mult)\n",
588 |     "plt.plot([0,wt_mult.max()],[0,wt_mult.max()],'r-')\n",
589 |     "plt.xlabel(\"Eigenvalue weight\")\n",
590 |     "plt.ylabel(\"Mult weight\")\n",
591 |     "plt.show()"
592 |    ]
593 |   },
594 |   {
595 |    "cell_type": "markdown",
596 |    "metadata": {},
597 |    "source": [
598 |     "These weights are automatically added as attributes to the nodes in `our_csn.graph`:"
599 |    ]
600 |   },
601 |   {
602 |    "cell_type": "code",
603 |    "execution_count": 23,
604 |    "metadata": {},
605 |    "outputs": [
606 |     {
607 |      "data": {
608 |       "text/plain": [
609 |        "{'label': 0,\n",
610 |        " 'count': 482,\n",
611 |        " 'trim': 0.0,\n",
612 |        " 'eig_weights': 0.002595528367725156,\n",
613 |        " 'mult_weights': 0.0025955283677248217}"
614 |       ]
615 |      },
616 |      "execution_count": 23,
617 |      "metadata": {},
618 |      "output_type": "execute_result"
619 |     }
620 |    ],
621 |    "source": [
622 |     "our_csn.graph.node[0]"
623 |    ]
624 |   },
625 |   {
626 |    "cell_type": "markdown",
627 |    "metadata": {},
628 |    "source": [
629 |     "## 4) Committor probabilities to an arbitrary set of basins\n",
630 |     "\n",
631 |     "We are often doing simulations in the presence of one or more high probability \"basins\" of attraction.  When there more than one basin, it can be useful to find the probability that a simulation started in a given state will visit (or \"commit to\") a given basin before the others.\n",
632 |     "\n",
633 |     "`CSNAnalysis` calculates committor probabilities by creating a sink matrix ($S$), where each column in the transition matrix that corresponds to a sink state is replaced by an identity vector. This turns each state into a \"black hole\" where probability can get in, but not out.  \n",
634 |     "\n",
635 |     "By iteratively multiplying this matrix by itself, we can approximate $S^\\infty$.  The elements of this matrix reveal the probability of transitioning to any of the sink states, upon starting in any non-sink state, $i$.\n",
636 |     "\n",
637 |     "Let's see this in action.  We'll start by reading in a set of three basins:  $A$, $B$ and $U$."
638 |    ]
639 |   },
640 |   {
641 |    "cell_type": "code",
642 |    "execution_count": 24,
643 |    "metadata": {},
644 |    "outputs": [],
645 |    "source": [
646 |     "Astates = [2031,596,1923,3223,2715]\n",
647 |     "Bstates = [1550,3168,476,1616,2590]\n",
648 |     "Ustates = list(np.loadtxt('state_U.dat',dtype=int))"
649 |    ]
650 |   },
651 |   {
652 |    "cell_type": "markdown",
653 |    "metadata": {},
654 |    "source": [
655 |     "We can then use the `calc_committors` function to calculate committors between this set of three basins. This will calculate $p_A$, $p_B$, and $p_U$ for each state, which sum to one."
656 |    ]
657 |   },
658 |   {
659 |    "cell_type": "code",
660 |    "execution_count": 25,
661 |    "metadata": {},
662 |    "outputs": [],
663 |    "source": [
664 |     "basins = [Astates,Bstates,Ustates]\n",
665 |     "labels = ['pA','pB','pU']\n",
666 |     "comms = our_csn.calc_committors(basins,labels=labels)"
667 |    ]
668 |   },
669 |   {
670 |    "cell_type": "markdown",
671 |    "metadata": {},
672 |    "source": [
673 |     "The committors can be interpreted as follows:"
674 |    ]
675 |   },
676 |   {
677 |    "cell_type": "code",
678 |    "execution_count": 26,
679 |    "metadata": {},
680 |    "outputs": [
681 |     {
682 |      "name": "stdout",
683 |      "output_type": "stream",
684 |      "text": [
685 |       "comms[0] =  [0.26406217 0.29477873 0.44115911]\n",
686 |       "\n",
687 |       "In other words, if you start in state 0:\n",
688 |       "You will reach basin A first with probability 0.26, basin B with probability 0.29 and basin U with probability 0.44\n"
689 |      ]
690 |     }
691 |    ],
692 |    "source": [
693 |     "i = our_csn.trim_indices[0]\n",
694 |     "print('comms['+str(i)+'] = ',comms[i])\n",
695 |     "print('\\nIn other words, if you start in state {0:d}:'.format(i))\n",
696 |     "print('You will reach basin A first with probability {0:.2f}, basin B with probability {1:.2f} and basin U with probability {2:.2f}'.format(comms[i,0],comms[i,1],comms[i,2]))"
697 |    ]
698 |   },
699 |   {
700 |    "cell_type": "markdown",
701 |    "metadata": {},
702 |    "source": [
703 |     "## 5) Exporting graph for visualization in Gephi\n",
704 |     "\n",
705 |     "`NetworkX` is great for doing graph-based analyses, but not stellar at greating graph layouts for large(r) networks. However, they do have excellent built-in support for exporting graph objects in a variety of formats. \n",
706 |     "\n",
707 |     "Here we'll use the `.gexf` format to save our network, as well as all of the attributes we've calculated, to a file that can be read into [Gephi](https://gephi.org/), a powerful graph visualization program.  While support for Gephi has been spotty in the recent past, it is still one of the best available options for graph visualization.\n",
708 |     "\n",
709 |     "Before exporting to `.gexf`, let's use the committors we've calculated to add colors to the nodes:"
710 |    ]
711 |   },
712 |   {
713 |    "cell_type": "code",
714 |    "execution_count": 27,
715 |    "metadata": {},
716 |    "outputs": [],
717 |    "source": [
718 |     "rgb = our_csn.colors_from_committors(comms)\n",
719 |     "our_csn.set_colors(rgb)"
720 |    ]
721 |   },
722 |   {
723 |    "cell_type": "markdown",
724 |    "metadata": {},
725 |    "source": [
726 |     "Now we have added some properties to our nodes under 'viz', which will be interpreted by Gephi:"
727 |    ]
728 |   },
729 |   {
730 |    "cell_type": "code",
731 |    "execution_count": 28,
732 |    "metadata": {},
733 |    "outputs": [
734 |     {
735 |      "data": {
736 |       "text/plain": [
737 |        "{'label': 0,\n",
738 |        " 'count': 482,\n",
739 |        " 'trim': 0.0,\n",
740 |        " 'eig_weights': 0.002595528367725156,\n",
741 |        " 'mult_weights': 0.0025955283677248217,\n",
742 |        " 'pA': 0.26406216543613925,\n",
743 |        " 'pB': 0.2947787254045238,\n",
744 |        " 'pU': 0.4411591091593356,\n",
745 |        " 'viz': {'color': {'r': 152, 'g': 170, 'b': 255, 'a': 0}}}"
746 |       ]
747 |      },
748 |      "execution_count": 28,
749 |      "metadata": {},
750 |      "output_type": "execute_result"
751 |     }
752 |    ],
753 |    "source": [
754 |     "our_csn.graph.node[0]"
755 |    ]
756 |   },
757 |   {
758 |    "cell_type": "markdown",
759 |    "metadata": {},
760 |    "source": [
761 |     "And we can use an internal `networkx` function to write all of this to a `.gexf` file:"
762 |    ]
763 |   },
764 |   {
765 |    "cell_type": "code",
766 |    "execution_count": 29,
767 |    "metadata": {},
768 |    "outputs": [],
769 |    "source": [
770 |     "nx.readwrite.gexf.write_gexf(our_csn.graph.to_undirected(),'test.gexf')"
771 |    ]
772 |   },
773 |   {
774 |    "cell_type": "markdown",
775 |    "metadata": {},
776 |    "source": [
777 |     "After opening this file in Gephi, I recommend creating a layout using the \"Force Atlas 2\" algorithm in the layout panel.  I set the node sizes to the \"eig_weights\" variable, and after exporting to pdf and adding some labels, I get the following:"
778 |    ]
779 |   },
780 |   {
781 |    "cell_type": "markdown",
782 |    "metadata": {},
783 |    "source": [
784 |     "![Gephi graph export](committor_net_3state.png)"
785 |    ]
786 |   },
787 |   {
788 |    "cell_type": "markdown",
789 |    "metadata": {},
790 |    "source": [
791 |     "**That's the end of our tutorial!**  I hope you enjoyed it and you find `CSNAnalysis` useful in your research.  If you are having difficulties with the installation or running of the software, feel free to create an [issue on the Github page](https://github.com/ADicksonLab/CSNAnalysis)."
792 |    ]
793 |   },
794 |   {
795 |    "cell_type": "code",
796 |    "execution_count": null,
797 |    "metadata": {},
798 |    "outputs": [],
799 |    "source": []
800 |   }
801 |  ],
802 |  "metadata": {
803 |   "kernelspec": {
804 |    "display_name": "Python 3",
805 |    "language": "python",
806 |    "name": "python3"
807 |   },
808 |   "language_info": {
809 |    "codemirror_mode": {
810 |     "name": "ipython",
811 |     "version": 3
812 |    },
813 |    "file_extension": ".py",
814 |    "mimetype": "text/x-python",
815 |    "name": "python",
816 |    "nbconvert_exporter": "python",
817 |    "pygments_lexer": "ipython3",
818 |    "version": "3.7.7"
819 |   }
820 |  },
821 |  "nbformat": 4,
822 |  "nbformat_minor": 1
823 | }
824 | 


--------------------------------------------------------------------------------
/examples/matrix.npz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ADicksonLab/CSNAnalysis/7700653374937c179a441c656f0783f80e44c26d/examples/matrix.npz


--------------------------------------------------------------------------------
/examples/state_U.dat:
--------------------------------------------------------------------------------
1 | 365 2830 1155 1529 3242 2201 1854 3251 2303 3899 806 2952 2322 1154 189 2343 3080 1024 3385 968 2228 1298 2475 2493 615 3918 1394 2472 1734 1787 81 156 593 3668 1412 1965 3215 415 959 1201 3894 2893 1077 158 2651 3176 975 3999 73 1758 1861 3437 1595 329 863 3767 3859 1099 1103 165 1143 3256 1530 3128 1911 1093 1320 3502 1851 711 2156 1130 3335 218 1611 1624 2579 3904 3596 3046 3219 3775 65 2558 1706 180 2489 2887 3644 3930 462 2400 1378 2020 2589 1203 302 2731 1956 632 1435 712 1889 2749 1008 354 2549 1755 986 2784 442 2925 2091 2111 2163 2379 1812 185 1499 1300 140 12 2937 37 3598 1065 1645 2947 3018 1288 2622 1781 1352 2915 1586 3175 934 148 1780 209 3021 45 2846 1133 193 126 746 3225 3791 613 1598 1246 1166 1951 391 2088 1705 548 3858 1564 3280 3579 2413 555 215 1626 1795 2128 974 24 520 3650 401 2093 1351 2743 3054 1377 1756 2504 3069 2756 3951 3177 3141 120 2871 3924 1 3750 375 1170 2066 2458 3980 2710 3092 49 1711 2244 2392 2959 2901 2150 1072 2920 2039 2062 1441 3564 1896 1327 752 2196 1687 257 3680 286 1046 2770 1793 2647 2879 418 721 507 1197 
2 | 


--------------------------------------------------------------------------------
/requirements.in:
--------------------------------------------------------------------------------
1 | --index-url https://pypi.python.org/simple/
2 | 
3 | numpy
4 | scipy >= 0.19
5 | networkx >= 2.1
6 | 
7 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup, find_packages
 2 | 
 3 | setup(
 4 |     name='CSNAnalysis',
 5 |     version='0.1.0-beta',
 6 |     py_modules=['csnanalysis'],
 7 |     author='Alex Dickson',
 8 |     author_email='alexrd@msu.edu',
 9 |     packages=find_packages(),
10 |     include_package_data=True,
11 |     install_requires=[
12 |         'numpy',
13 |         'networkx>=2.1',
14 |         'scipy>=0.19',
15 |     ],
16 | )
17 | 


--------------------------------------------------------------------------------