├── .gitignore ├── LICENSE.txt ├── README.org ├── csnanalysis ├── __init__.py ├── csn.py └── matrix.py ├── examples ├── committor_net_3state.png ├── examples.ipynb ├── matrix.npz └── state_U.dat ├── requirements.in └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Created by https://www.gitignore.io/api/python 2 | # Edit at https://www.gitignore.io/?templates=python 3 | 4 | ### Python ### 5 | # Byte-compiled / optimized / DLL files 6 | __pycache__/ 7 | *.py[cod] 8 | *$py.class 9 | 10 | # C extensions 11 | *.so 12 | 13 | # Distribution / packaging 14 | .Python 15 | build/ 16 | develop-eggs/ 17 | dist/ 18 | downloads/ 19 | eggs/ 20 | .eggs/ 21 | lib/ 22 | lib64/ 23 | parts/ 24 | sdist/ 25 | var/ 26 | wheels/ 27 | pip-wheel-metadata/ 28 | share/python-wheels/ 29 | *.egg-info/ 30 | .installed.cfg 31 | *.egg 32 | MANIFEST 33 | 34 | # PyInstaller 35 | # Usually these files are written by a python script from a template 36 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 37 | *.manifest 38 | *.spec 39 | 40 | # Installer logs 41 | pip-log.txt 42 | pip-delete-this-directory.txt 43 | 44 | # Unit test / coverage reports 45 | htmlcov/ 46 | .tox/ 47 | .nox/ 48 | .coverage 49 | .coverage.* 50 | .cache 51 | nosetests.xml 52 | coverage.xml 53 | *.cover 54 | .hypothesis/ 55 | .pytest_cache/ 56 | 57 | # Translations 58 | *.mo 59 | *.pot 60 | 61 | # Scrapy stuff: 62 | .scrapy 63 | 64 | # Sphinx documentation 65 | docs/_build/ 66 | 67 | # PyBuilder 68 | target/ 69 | 70 | # pyenv 71 | .python-version 72 | 73 | # pipenv 74 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 75 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 76 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 77 | # install all needed dependencies. 78 | #Pipfile.lock 79 | 80 | # celery beat schedule file 81 | celerybeat-schedule 82 | 83 | # SageMath parsed files 84 | *.sage.py 85 | 86 | # Spyder project settings 87 | .spyderproject 88 | .spyproject 89 | 90 | # Rope project settings 91 | .ropeproject 92 | 93 | # Mr Developer 94 | .mr.developer.cfg 95 | .project 96 | .pydevproject 97 | 98 | # mkdocs documentation 99 | /site 100 | 101 | # mypy 102 | .mypy_cache/ 103 | .dmypy.json 104 | dmypy.json 105 | 106 | # Pyre type checker 107 | .pyre/ 108 | 109 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016 Samuel D. Lotz 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.org: -------------------------------------------------------------------------------- 1 | * CSNAnalysis: Tools for creating, analyzing and visualizing Conformation Space Networks. 2 | 3 | CSNAnalysis is a set of tools for network-based analysis of molecular dynamics trajectories. 4 | To use, initialize a `CSN` object using a matrix of transition counts. 5 | The "killer app" of CSNAnalysis is an easy interface between enhanced sampling algorithms 6 | (e.g. WExplore), molecular clustering programs (e.g. MSMBuilder), graph analysis packages (e.g. networkX) 7 | and graph visualization programs (e.g. Gephi). 8 | 9 | CSNAnalysis is currently in beta. 10 | 11 | * Installation 12 | 13 | To install CSNAnalysis, you can get the latest: 14 | 15 | #+begin_src bash 16 | pip install git+https://github.com/ADicksonLab/CSNAnalysis 17 | #+end_src 18 | 19 | Or one of the releases: 20 | 21 | #+begin_src bash 22 | pip install git+https://github.com/ADicksonLab/CSNAnalysis@v0.3 23 | #+end_src 24 | 25 | * Dependencies 26 | - numpy 27 | - scipy 28 | - networkx 29 | 30 | * Features 31 | CSNAnalysis has the following capabilities: 32 | 33 | - constructing transition probability matrices 34 | - trimming CSNs using a variety of criteria 35 | - computing committor probabilities with an arbitrary number of basins 36 | - export gexf files with custom node colorings 37 | 38 | * Tutorial 39 | See the Jupyter Notebook in examples/examples.ipynb 40 | 41 | * Misc 42 | ** Versioning 43 | 44 | See [[http://semver.org/]] for version number meanings. 45 | 46 | Version 1.0.0 will be released whenever the abstract layer API is stable. Subsequent 1.X.y releases will be made as applied and porcelain layer features are added. 47 | -------------------------------------------------------------------------------- /csnanalysis/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ADicksonLab/CSNAnalysis/7700653374937c179a441c656f0783f80e44c26d/csnanalysis/__init__.py -------------------------------------------------------------------------------- /csnanalysis/csn.py: -------------------------------------------------------------------------------- 1 | import itertools 2 | from copy import deepcopy 3 | 4 | import scipy 5 | import networkx as nx 6 | import numpy as np 7 | 8 | from csnanalysis.matrix import ( 9 | count_to_trans, 10 | symmetrize_matrix, 11 | eig_weights, 12 | mult_weights, 13 | committor, 14 | committor_linalg, 15 | get_eigenvectors, 16 | well_conditioned, 17 | fptd, 18 | ) 19 | 20 | class CSN(object): 21 | 22 | def __init__(self, counts, symmetrize=False): 23 | """ 24 | Initializes a CSN object using a counts matrix. This can either be a numpy array, 25 | a scipy sparse matrix, or a list of lists. Indices: [to][from], (or, [row][column]). 26 | """ 27 | if type(counts) is list: 28 | self.countmat = scipy.sparse.coo_matrix(counts) 29 | elif type(counts) is np.ndarray: 30 | self.countmat = scipy.sparse.coo_matrix(counts) 31 | elif type(counts) is scipy.sparse.coo.coo_matrix: 32 | self.countmat = counts 33 | else: 34 | try: 35 | self.countmat = counts.tocoo() 36 | except: 37 | raise TypeError("Count matrix is of unsupported type: ",type(counts)) 38 | 39 | if self.countmat.shape[0] != self.countmat.shape[1]: 40 | raise ValueError("Count matrix is not square: ",self.countmat.shape) 41 | 42 | totcounts = self.countmat.sum(axis=1).tolist() 43 | 44 | self.symmetrize = symmetrize 45 | if self.symmetrize: 46 | self.countmat = symmetrize_matrix(self.countmat) 47 | 48 | self.nnodes = self.countmat.shape[0] 49 | self.transmat = count_to_trans(self.countmat) 50 | 51 | self.trim_transmat = None 52 | 53 | # initialize networkX directed graph 54 | self.graph = nx.DiGraph() 55 | labels = [{'label' : i, 'count' : int(totcounts[i][0])} for i in range(self.nnodes)] 56 | self.graph.add_nodes_from(zip(range(self.nnodes),labels)) 57 | self.graph.add_weighted_edges_from(zip(self.transmat.col,self.transmat.row,100*self.transmat.data)) 58 | 59 | # remove self edges from graph 60 | self_edges = [(i,i) for i in range(self.nnodes)] 61 | self.graph.remove_edges_from(self_edges) 62 | 63 | def to_gephi_csv(self, cols='all', node_name='node.csv', edge_name='edge.csv', directed=False): 64 | """ 65 | Writes node and edge files for import into the Gephi network visualization program. 66 | 67 | cols -- A list of columns that should be written to the node file. ID and label are 68 | included by default. 'all' will include every attribute attached to the 69 | nodes in self.graph. 70 | 71 | """ 72 | if cols == 'all': 73 | cols = list(self.graph.node[0].keys()) 74 | else: 75 | if 'label' not in cols: 76 | cols = ['label'] + cols 77 | if 'ID' not in cols: 78 | cols = ['ID'] + cols 79 | 80 | with open(node_name,mode='w') as f: 81 | f.write(" ".join(cols)+"\n") 82 | for i in range(self.nnodes): 83 | data = [str(self.graph.node[i][c]) for c in cols] 84 | f.write(' '.join(data)+"\n") 85 | 86 | # compute edge weights 87 | if directed: 88 | with open(edge_name,mode='w') as f: 89 | f.write("source target type prob i_weight\n") 90 | for (from_ind, to_ind, weight_dict) in self.graph.edges.data(): 91 | wt = weight_dict['weight'] 92 | f.write("{0:d} {1:d} {2:s} {3:f} {4:d}\n".format(from_ind,to_ind,'Directed',wt,int(wt*100))) 93 | else: 94 | with open(edge_name,mode='w') as f: 95 | f.write("source target type prob i_weight\n") 96 | for (from_ind, to_ind, weight_dict) in self.graph.edges.data(): 97 | if from_ind <= to_ind: 98 | if self.graph.has_edge(to_ind,from_ind): 99 | back_wt = self.graph.edges[to_ind,from_ind]['weight'] 100 | else: 101 | back_wt = 0 102 | edge_weight = 0.5*(back_wt + weight_dict['weight']) 103 | f.write("{0:d} {1:d} {2:s} {3:f} {4:d}\n".format(from_ind,to_ind,'Undirected',edge_weight,int(edge_weight*100))) 104 | 105 | def add_attr(self, name, values): 106 | """ 107 | Adds an attribute to the set of nodes in the CSN. 108 | """ 109 | attr = {} 110 | for i, v in enumerate(values): 111 | attr[i] = v 112 | 113 | nx.set_node_attributes(self.graph,values=attr,name=name) 114 | 115 | def add_trim_attr(self, name, values, default=0): 116 | """ 117 | Adds an attribute to the set of nodes in the CSN. 118 | Values should be an iterable of the size of csn.trim_indices 119 | """ 120 | attr = {} 121 | for i in range(self.nnodes): 122 | if i in self.trim_indices: 123 | trim_idx = self.trim_indices.index(i) 124 | attr[i] = values[trim_idx] 125 | else: 126 | attr[i] = default 127 | 128 | nx.set_node_attributes(self.graph,values=attr,name=name) 129 | 130 | def set_colors(self, rgb): 131 | """ 132 | Adds colors to each node for gexf export of the graph. 133 | 134 | rgb: A dict that stores the rgb values of each node. 135 | 136 | Example: rgb['0']['r'] = 255 137 | rgb['0']['g'] = 0 138 | rgb['0']['b'] = 0 139 | """ 140 | for node in rgb: 141 | if 'viz' not in self.graph.node[node]: 142 | self.graph.node[node]['viz'] = {} 143 | self.graph.node[node]['viz']['color'] = {'r': rgb[node]['r'], 'g': rgb[node]['g'], 'b': rgb[node]['b'], 'a': 0} 144 | 145 | def set_positions(self, xy): 146 | """ 147 | Adds x,y positions to each node for gexf export of the graph. 148 | 149 | xy: A dict that stores the xy positions of each node. 150 | 151 | Example: xy[0]['x'] = 0.5 152 | xy[0]['y'] = 1.6 153 | """ 154 | for node in xy: 155 | if 'viz' not in self.graph.node[node]: 156 | self.graph.node[node]['viz'] = {} 157 | self.graph.node[node]['viz']['position'] = {'x': float(xy[node]['x']), 'y': float(xy[node]['y']), 'z': float(0)} 158 | 159 | 160 | def colors_from_committors(self,comm): 161 | """ 162 | Returns rgb dict using values of committor probabilities. 163 | Very useful for 3-basin committors! 164 | 165 | comm: Numpy array of committors, as returns from self.calc_committors 166 | """ 167 | highc = 255 168 | nbasin = comm.shape[1] 169 | rgb = {} 170 | colors = ['r','g','b'] 171 | for node in range(self.nnodes): 172 | maxc = comm[node,:].max() 173 | for i in range(min(3,nbasin)): 174 | if node not in rgb: 175 | rgb[node] = {} 176 | if maxc == 0: 177 | rgb[node][colors[i]] = 0 178 | else: 179 | rgb[node][colors[i]] = int(highc*comm[node,i]/maxc) 180 | 181 | return rgb 182 | 183 | 184 | def trim(self, by_inflow=True, by_outflow=True, min_count=None): 185 | """ 186 | Trims a graph to delete nodes that are not connected to the main 187 | component, which is the component containing the most-sampled node (MSN) 188 | by counts. 189 | 190 | by_inflow: whether to delete nodes that are not connected to the MSN by inflow 191 | 192 | by_outflow: whether to delete nodes that are not connected to the MSN by outflow 193 | 194 | min_count: nodes that do not have a count > min_count will be deleted 195 | 196 | Trimmed graph is saved as self.trim_graph. The trimmed transition matrix 197 | is saved as self.trim_transmat, and the count matrix is saved as 198 | self.trim_countmat. 199 | 200 | The mapping from the nodes in the trimmed set to the full set is given by 201 | self.trim_indices. 202 | """ 203 | 204 | totcounts = self.countmat.toarray().sum(axis=0) 205 | msn = totcounts.argmax() 206 | 207 | mask = np.ones(self.nnodes,dtype=bool) 208 | oldmask = np.zeros(self.nnodes,dtype=bool) 209 | 210 | if min_count is not None: 211 | mask[[i for i in range(self.nnodes) if totcounts[i] < min_count]] = False 212 | else: 213 | mask[[i for i in range(self.nnodes) if totcounts[i] == 0]] = False 214 | 215 | itercount = 0 216 | diff = [] 217 | while (mask != oldmask).any(): 218 | 219 | oldmask = mask.copy() 220 | self.trim_indices = [i for i in range(self.nnodes) if mask[i] == True] 221 | self.trim_graph = self.graph.subgraph(self.trim_indices) 222 | 223 | print(f"Iteration {itercount}:",diff) 224 | itercount += 1 225 | 226 | if by_outflow: 227 | downstream = [i for i in self.trim_indices if nx.has_path(self.trim_graph,msn,i)] 228 | mask[[i for i in range(self.nnodes) if i not in downstream]] = False 229 | 230 | if by_inflow: 231 | upstream = [i for i in self.trim_indices if nx.has_path(self.trim_graph,i,msn)] 232 | mask[[i for i in range(self.nnodes) if i not in upstream]] = False 233 | 234 | diff = [i for i in range(self.nnodes) if mask[i] != oldmask[i]] 235 | 236 | # count all transitions to masked states and add these as self-transitions 237 | # rows = to, cols = from 238 | to_add = {} 239 | rows = self.countmat.row 240 | cols = self.countmat.col 241 | data = self.countmat.data 242 | 243 | for i in range(len(data)): 244 | if mask[rows[i]] == False and mask[cols[i]] == True: 245 | if cols[i] in to_add: 246 | to_add[cols[i]] += data[i] 247 | else: 248 | to_add[cols[i]] = data[i] 249 | 250 | tmp_arr = self.countmat.toarray()[mask,...][...,mask] 251 | 252 | for ind,full_ind in enumerate(self.trim_indices): 253 | if full_ind in to_add: 254 | tmp_arr[ind][ind] += to_add[full_ind] 255 | 256 | assert tmp_arr.sum(axis=0).min() > 0, 'Error! A state in the trimmed countmat has no transitions' 257 | self.trim_countmat = scipy.sparse.coo_matrix(tmp_arr) 258 | 259 | if self.symmetrize: 260 | self.trim_countmat = symmetrize_matrix(self.trim_countmat) 261 | 262 | self.trim_nnodes = self.trim_countmat.shape[0] 263 | self.trim_transmat = count_to_trans(self.trim_countmat) 264 | 265 | is_trim = np.zeros((self.nnodes)) 266 | for i in range(self.nnodes): 267 | if i not in self.trim_indices: 268 | is_trim[i] = 1 269 | self.add_attr('trim',is_trim) 270 | 271 | if not well_conditioned(self.trim_transmat.toarray()): 272 | print("Warning: trimmed transition matrix is not well-conditioned.") 273 | 274 | def calc_eig_weights(self,label='eig_weights'): 275 | """ 276 | Calculates weights of states using the highest Eigenvalue of the 277 | transition matrix. By default it uses self.trim_transmat, but will 278 | use self.transmat if no trimming has been done. 279 | 280 | The weights are stored as node attributes in self.graph with the label 281 | 'label', and are also returned from the function. 282 | """ 283 | 284 | if self.trim_transmat is None: 285 | # use full transition matrix 286 | full_wts = eig_weights(self.transmat) 287 | else: 288 | # use trimmed transition matrix 289 | wts = eig_weights(self.trim_transmat) 290 | full_wts = np.zeros(self.nnodes,dtype=float) 291 | for i,ind in enumerate(self.trim_indices): 292 | full_wts[ind] = wts[i] 293 | 294 | fw_float = [float(i) for i in full_wts] 295 | self.add_attr(label, fw_float) 296 | 297 | return full_wts 298 | 299 | def calc_mult_weights(self,label='mult_weights',tol=1e-6): 300 | """ 301 | Calculates weights of states using iterative multiplication of the 302 | transition matrix. By default it uses self.trim_transmat, but will 303 | use self.transmat if no trimming has been done. 304 | 305 | The weights are stored as node attributes in self.graph with the label 306 | 'label', and are also returned from the function. 307 | """ 308 | 309 | if self.trim_transmat is None: 310 | # use full transition matrix 311 | full_wts = mult_weights(self.transmat,tol) 312 | else: 313 | # use trimmed transition matrix 314 | wts = mult_weights(self.trim_transmat,tol) 315 | full_wts = np.zeros(self.nnodes,dtype=float) 316 | for i,ind in enumerate(self.trim_indices): 317 | full_wts[ind] = wts[i] 318 | 319 | fw_float = [float(i) for i in full_wts] 320 | if label is not None: 321 | self.add_attr(label, fw_float) 322 | 323 | return full_wts 324 | 325 | def calc_committors(self, basins, 326 | labels=None, 327 | basin_labels=None, 328 | add_basins=False, 329 | tol=1e-6, 330 | maxstep=20, 331 | method='iter'): 332 | """ 333 | Calculates committor probabilities between an arbitrary set of N basins. 334 | 335 | basins -- A list of lists, describing which states make up the 336 | basins of attraction. There can be any number of basins. 337 | e.g. [[basin1_a,basin1_b,...],[basin2_a,basin2_b,...]] 338 | labels -- A list of labels given to the committors (one for each 339 | basin) in the attribute list. 340 | add_basins -- Whether to add basin vectors to attribute list. 341 | basin_labels -- List of names of the basins. 342 | tol -- Tolerance of iterative multiplication process 343 | (see matrix.trans_mult_iter) 344 | maxstep -- Maximum number of iteractions of multiplication process. 345 | method -- 'iter' for iterative multiplication, 'linalg' for 346 | linear algebra solve (two-basin only) 347 | 348 | The committors are also returned from the function as a numpy array. 349 | """ 350 | 351 | assert method in ['iter','linalg'], 'Error! method must be either iter or linalg' 352 | 353 | if self.trim_transmat is None: 354 | assert well_conditioned(self.transmat.toarray()), "Error: cannot calculate committors from transition matrix. Try trimming first." 355 | 356 | # use full transition matrix 357 | if method == 'iter': 358 | full_comm = committor(self.transmat,basins,tol=tol,maxstep=maxstep) 359 | elif method == 'linalg': 360 | full_comm = committor_linalg(self.transmat,basins) 361 | 362 | else: 363 | # use trimmed transition matrix 364 | trim_basins = [] 365 | for i,b in enumerate(basins): 366 | trim_basins.append([]) 367 | for state in b: 368 | if state in self.trim_indices: 369 | trim_basins[i].append(self.trim_indices.index(state)) 370 | 371 | if method == 'iter': 372 | comm = committor(self.trim_transmat,trim_basins,tol=tol,maxstep=maxstep) 373 | elif method == 'linalg': 374 | comm = committor_linalg(self.trim_transmat,trim_basins) 375 | 376 | full_comm = np.zeros((self.transmat.shape[0],len(basins)),dtype=float) 377 | for i,ind in enumerate(self.trim_indices): 378 | full_comm[ind] = comm[i] 379 | 380 | if labels is None: 381 | labels = ['p' + str(i) for i in range(len(basins))] 382 | 383 | for i in range(len(basins)): 384 | fc_float = [float(i) for i in full_comm[:,i]] 385 | self.add_attr(labels[i], fc_float) 386 | 387 | if add_basins: 388 | if basin_labels is None: 389 | basin_labels = [str(i) for i in range(len(basins))] 390 | for i,b in enumerate(basins): 391 | bvec = np.zeros(self.nnodes,dtype=int) 392 | bvec[b] = 1 393 | bv_int = [int(i) for i in bvec] 394 | self.add_attr(basin_labels[i],bv_int) 395 | 396 | return full_comm 397 | 398 | def calc_mfpt(self,sinks,maxsteps=None,tol=1e-3,sources=None): 399 | """ 400 | Calculates the mean first passage time (MFPT) and the first passage time distribution (FPTD) 401 | from every state in the matrix to a set of "sinks". 402 | 403 | sinks -- (list of int) A list of states that will be used as sinks 404 | 405 | stepsize -- (int) The lagtime, in multiples of tau, that is used to compute the MFPT, which is 406 | also the resolution of the FPTD. 407 | 408 | maxsteps -- (int) The maximum number of steps used to compute the FPTD. 409 | 410 | tol -- (float) The quitting criteria for FPTD calculation. The calculation will stop if the 411 | largest "un-sunk" probability is below tol. 412 | 413 | sources -- (None or list of int) List of source states to average over. If None, will return 414 | MFPT and FPTD of all states. 415 | """ 416 | 417 | assert tol is not None or maxsteps is not None, "Error: either maxsteps or tol must be defined!" 418 | 419 | if tol is None: 420 | tol = 0.0 421 | if maxsteps is None: 422 | maxsteps = np.inf 423 | 424 | if self.trim_transmat is None: 425 | assert well_conditioned(self.transmat.toarray()), "Error: cannot calculate mfpt from transition matrix. Try trimming first." 426 | 427 | # use full transition matrix 428 | full_fptd = fptd(self.transmat,sinks,maxsteps=maxsteps,tol=tol) 429 | full_mfpt = np.zeros((self.transmat.shape[0]),dtype=float) 430 | for i in range(full_fptd.shape[0]): 431 | # loop over exponentially placed timepoints 432 | # this entry is the flux between lag*(2**[i]) and lag*(2**[i+1]) 433 | # avg. of endpoints is (2**(i-1) + 2**(i)) 434 | full_mfpt += full_fptd[i,:]*(2**(i-1) + 2**(i)) # in units of lagtime 435 | 436 | else: 437 | # use trimmed transition matrix 438 | trim_sinks = [] 439 | for state in sinks: 440 | if state in self.trim_indices: 441 | trim_sinks.append(self.trim_indices.index(state)) 442 | 443 | trim_fptd = fptd(self.trim_transmat,trim_sinks,maxsteps=maxsteps,tol=tol) 444 | trim_mfpt = np.zeros((trim_fptd.shape[1]),dtype=float) 445 | for i in range(trim_fptd.shape[0]): 446 | # loop over exponentially placed timepoints 447 | # this entry is the flux between lag*(2**[i]) and lag*(2**[i+1]) 448 | trim_mfpt += trim_fptd[i,:]*(2**(i-1) + 2**(i)) # in units of lagtime 449 | 450 | full_fptd = np.zeros((trim_fptd.shape[0],self.transmat.shape[0]),dtype=float) 451 | full_mfpt = np.zeros((self.transmat.shape[0]),dtype=float) 452 | for i,ind in enumerate(self.trim_indices): 453 | full_fptd[:,ind] = trim_fptd[:,i] 454 | full_mfpt[ind] = trim_mfpt[i] 455 | 456 | if sources is not None: 457 | wts = self.calc_mult_weights(label=None,tol=1e-6) 458 | wt_sum = wts[sources].sum() 459 | 460 | avg_mfpt = 0 461 | avg_fptd = np.zeros((full_fptd.shape[0])) 462 | for s in sources: 463 | avg_mfpt += full_mfpt[s]*wts[s] 464 | avg_fptd += full_fptd[:,s]*wts[s] 465 | 466 | return np.array([avg_mfpt/wt_sum]), np.array([avg_fptd/wt_sum]) 467 | else: 468 | return full_mfpt, full_fptd 469 | 470 | 471 | 472 | def idxs_to_trim(self,idxs): 473 | """ 474 | Converts a list of idxs to trim_idxs. 475 | 476 | idxs -- List of states in the transition matrix. Elements should be 477 | integers from 0 to nstates. 478 | """ 479 | 480 | return [self.trim_indices.index(i) for i in idxs if i in self.trim_indices] 481 | 482 | def calc_eigvectors(self, n_eig=3, 483 | include_wt_vec=False, 484 | save_to_graph=True, 485 | save_imag_to_graph=False, 486 | save_label='eig'): 487 | """ 488 | Calculates committor probabilities between an arbitrary set of N basins. 489 | 490 | n_eig -- The number of eigenvectors to return 491 | 492 | include_wt_vec -- Whether or not to include the eigenvector with 493 | eigenvalue = 1. Note that this is equal to the 494 | steady state weights. 495 | 496 | save_to_graph -- Whether or not to save the eigenvectors to the graph 497 | (real part). 498 | 499 | save_imag_to_graph -- Whether or not to save the eigenvectors to the 500 | graph (imaginary part). 501 | 502 | save_label -- Labels given to each eigenvector when saving to the graph. 503 | Indices are appended and counting starts at zero (e.g. 504 | eig0, eig1, ..). If imaginary part is saved (eig0_imag, eig1_imag, ...) 505 | 506 | Output: 507 | eig_vecs -- A numpy array (N, n_eig) of eigenvector elements (real part only) 508 | 509 | eig_vals -- A numpy array of the n_eig eigenvalues (real part only) 510 | 511 | eig_vecs_imag -- A numpy array (N, n_eig) of eigenvector elements (imaginary part only) 512 | 513 | eig_vals_imag - A numpy array of the n_eig eigenvalues (imaginary part only) 514 | 515 | """ 516 | 517 | if self.trim_transmat is None: 518 | # use full transition matrix 519 | vec_real, val_real, vec_imag, val_imag = get_eigenvectors(self.transmat.toarray(), n_eig=n_eig, return_wt_vec=include_wt_vec) 520 | else: 521 | # use trimmed transition matrix 522 | trim_vec_real, val_real, trim_vec_imag, val_imag = get_eigenvectors(self.trim_transmat.toarray(), n_eig=n_eig, return_wt_vec=include_wt_vec) 523 | 524 | vec_real = np.zeros((self.transmat.shape[0],n_eig),dtype=float) 525 | vec_imag = np.zeros((self.transmat.shape[0],n_eig),dtype=float) 526 | for i,ind in enumerate(self.trim_indices): 527 | vec_real[ind] = trim_vec_real[i] 528 | vec_imag[ind] = trim_vec_imag[i] 529 | 530 | # add eigenvectors as attributes 531 | if save_to_graph: 532 | for idx in range(n_eig): 533 | label = f'{save_label}{idx}' 534 | self.add_attr(label, vec_real[:,idx]) 535 | 536 | if save_imag_to_graph: 537 | for idx in range(n_eig): 538 | label = f'{save_label}{idx}_imag' 539 | self.add_attr(label, vec_imag[:,idx]) 540 | 541 | return vec_real, val_real, vec_imag, val_imag 542 | -------------------------------------------------------------------------------- /csnanalysis/matrix.py: -------------------------------------------------------------------------------- 1 | import scipy 2 | import numpy as np 3 | from itertools import compress 4 | 5 | def count_to_trans(countmat): 6 | """ 7 | Converts a count matrix (in scipy sparse format) to a transition 8 | matrix. 9 | """ 10 | tmp = np.array(countmat.toarray(),dtype=float) 11 | colsums = tmp.sum(axis=0) 12 | for i,c in enumerate(colsums): 13 | if c > 0: 14 | tmp[:,i] /= c 15 | 16 | return(scipy.sparse.coo_matrix(tmp)) 17 | 18 | def symmetrize_matrix(countmat): 19 | """ 20 | Symmetrizes a count matrix (in scipy sparse format). 21 | """ 22 | return scipy.sparse.coo_matrix(0.5*(countmat + countmat.transpose())) 23 | 24 | def _make_sink(transmat,sink_states): 25 | """ 26 | Constructs a transition matrix with "sink states", where the columns are 27 | replaced with identity vectors (diagonal element = 1, off-diagonals = 0). 28 | 29 | Input: 30 | 31 | transmat -- An N x N transition matrix in scipy sparse coo format. 32 | Columns should sum to 1. Indices: [to][from]. 33 | 34 | sink_states: A list of integers denoting sinks. 35 | 36 | Output: A transition matrix in scipy sparse coo format. 37 | """ 38 | sink_mat = transmat.copy() 39 | 40 | # remove redundant elements in sink_states 41 | sink_states = list(set(sink_states)) 42 | 43 | set_to_one = np.zeros(len(sink_states),dtype=bool) 44 | for i in range(len(sink_mat.data)): 45 | if sink_mat.col[i] in sink_states: 46 | if sink_mat.col[i] != sink_mat.row[i]: 47 | sink_mat.data[i] = 0. 48 | else: 49 | sink_mat.data[i] = 1. 50 | set_to_one[sink_states.index(sink_mat.col[i])] = True 51 | 52 | # set diagonal elements to 1 that haven't been set to one already 53 | statelist = np.asarray(list(compress(sink_states, np.logical_not(set_to_one))), 54 | dtype=int) 55 | 56 | if statelist.shape[0] > 0: 57 | sink_mat.row = np.append(sink_mat.row, statelist) 58 | sink_mat.col = np.append(sink_mat.col,statelist) 59 | sink_mat.data = np.append(sink_mat.data, np.ones_like(statelist, dtype=int)) 60 | 61 | # remove zeros 62 | sink_mat.eliminate_zeros() 63 | 64 | # check if sink_mat is well-conditioned 65 | if not well_conditioned(sink_mat.toarray()): 66 | raise ValueError("Error! sink matrix is no longer well-conditioned in make_sink!") 67 | 68 | return sink_mat 69 | 70 | def eig_weights(transmat): 71 | """ 72 | Calculates the weights as the top eigenvector of the transition matrix. 73 | 74 | Input: 75 | 76 | transmat -- An N x N transition matrix as a numpy array or in 77 | scipy sparse coo format. Columns should sum to 1. 78 | Indices: [to][from] 79 | 80 | Output: An array of weights of size N. 81 | """ 82 | 83 | vals, vecs = scipy.sparse.linalg.eigs(transmat,k=1) 84 | return np.real(vecs[:,0])/np.real(vecs[:,0].sum()) 85 | 86 | def mult_weights(transmat,tol=1e-6): 87 | """ 88 | Calculates the steady state weights as the columns of transmat^infinity. 89 | transmat^infinity is approximated by successively squaring transmat until 90 | the maximum variation in the rows is less than tol. 91 | 92 | Input: 93 | 94 | transmat -- An N x N transition matrix as a numpy array or in 95 | scipy sparse coo format. Columns should sum to 1. 96 | Indices: [to][from] 97 | 98 | tol -- Threshold for stopping the iterative multiplication. 99 | 100 | Output: An array of weights of size N. 101 | """ 102 | 103 | banded_mat = _trans_mult_iter(transmat,tol) 104 | return banded_mat[:,0] 105 | 106 | def _renorm(mat,tol=1e-7): 107 | """ 108 | Renormalizes a matrix (to,from) so that its columns sum to one. 109 | This is meant to encourage numerical stability during long 110 | matrix multiplication chains. 111 | """ 112 | for i in range(mat.shape[1]): 113 | col_sum = mat[:,i].sum() 114 | assert np.abs(1.0-col_sum) < tol, f"Error! 1 - column sum ({1.0-col_sum}) is greater than tolerance ({tol}) in _renorm!" 115 | mat[:,i] /= col_sum 116 | 117 | return mat 118 | 119 | def fptd(transmat,sinks,maxsteps=200,tol=0.0): 120 | """ 121 | Calculates the first passage time distribution for transmat with a set 122 | of sink states (sinks). The FPTD is evaluated at the set of points lag*2^i. 123 | It will run for a total of maxsteps, or until the maximum "un-sunk" probability 124 | of a state falls below tol. 125 | """ 126 | 127 | sm = _renorm(_make_sink(transmat,sinks).toarray()) 128 | 129 | step = 0 130 | non_sinks = [i for i in range(transmat.shape[0]) if i not in sinks] 131 | max_prob_to = sm.sum(axis=1)[non_sinks].max() 132 | 133 | fptd = [] 134 | last_step_warped = np.zeros((transmat.shape[0])) 135 | while step < maxsteps and max_prob_to > tol: 136 | newmat = _renorm(np.matmul(sm,sm)) 137 | 138 | warped = newmat[sinks,:].sum(axis=0) 139 | warped[sinks] = 0 140 | fptd.append(warped-last_step_warped) 141 | last_step_warped = warped 142 | 143 | sm = newmat.copy() 144 | max_prob_to = sm.sum(axis=1)[non_sinks].max() 145 | step += 1 146 | 147 | return np.array(fptd) 148 | 149 | 150 | def _trans_mult_iter(transmat,tol,maxstep=200): 151 | """ 152 | Performs iterative multiplication of transmat until the maximum variation in 153 | the rows is less than tol. 154 | """ 155 | if type(transmat) is np.ndarray: 156 | t = transmat.copy() 157 | else: 158 | t = transmat.toarray() 159 | 160 | var = 1 161 | step = 0 162 | while (var > tol) and (step < maxstep): 163 | newmat = np.matmul(t,t) 164 | var = np.abs(newmat-t).max() 165 | t = newmat.copy() 166 | step += 1 167 | 168 | if step == maxstep and var > tol: 169 | print("Warning: iterative multiplication not converged after",step,"steps: (var = ",var,"), (tol = ",tol,")") 170 | 171 | return t 172 | 173 | def committor(transmat,basins,tol=1e-6,maxstep=20): 174 | """ 175 | This function computes committor probabilities, given a transition matrix 176 | and a list of states that comprise the basins. It uses iterative multiplication of 177 | a modified transition matrix, with identity vectors for each basin state. 178 | 179 | Note that this method works regardless of the number of basins. 180 | 181 | Input: 182 | 183 | transmat -- An N x N transition matrix in scipy sparse coo format. 184 | Columns should sum to 1. Indices: [to][from] 185 | 186 | basins -- A list of lists, describing which states make up the 187 | basins of attraction. There can be any number of basins. 188 | e.g. [[basin1_a,basin1_b,...],[basin2_a,basin2_b,...]] 189 | 190 | Output: An array of committor probabilities of size N x B, where B 191 | is the number of basins. Committors will sum to 1 for each state. 192 | """ 193 | 194 | # make sink_matrix 195 | 196 | flat_sink = [i for b in basins for i in b] 197 | sink_mat = _make_sink(transmat,flat_sink) 198 | sink_results = _trans_mult_iter(sink_mat,tol,maxstep) 199 | 200 | committor = np.zeros((transmat.shape[0],len(basins)),dtype=float) 201 | 202 | for i in range(transmat.shape[0]): 203 | comm_done = False 204 | for j,b in enumerate(basins): 205 | if i in b: 206 | committor[i][j] = 1 207 | comm_done = True 208 | break 209 | if not comm_done: 210 | for j,b in enumerate(basins): 211 | committor[i][j] = 0. 212 | for bstate in b: 213 | committor[i][j] += sink_results[bstate][i] 214 | 215 | return committor 216 | 217 | def committor_linalg(transmat,basins): 218 | """ 219 | This function computes committor probabilities, given a transition matrix 220 | and a list of states that comprise the basins, by solving the system 221 | of equations: 222 | 223 | 0 = q_i - sum_j T_ij * q_j for i not in a basin 224 | 225 | by solving the equation AQ = B. 226 | 227 | Note: this requires that the number of basins is 2, and q_i is the 228 | probability that a trajectory in state i commits to the SECOND basin. 229 | 230 | Input: 231 | 232 | transmat -- An N x N transition matrix in scipy sparse coo format. 233 | Columns should sum to 1. Indices: [to][from] 234 | 235 | basins -- A list of lists, describing which states make up the 236 | basins of attraction. There can be any number of basins. 237 | e.g. [[basin1_a,basin1_b,...],[basin2_a,basin2_b,...]] 238 | 239 | Output: An array of committor probabilities of size N x 2, where 2 240 | is the number of basins. Committors will sum to 1 for each state. 241 | """ 242 | 243 | assert len(basins) == 2, 'Error! linalg method only works with two basins.' 244 | 245 | trans_arr = transmat.toarray() 246 | n = trans_arr.shape[0] 247 | A_mat = np.zeros((n,n)) 248 | B_vec = np.zeros((n)) 249 | 250 | for i in range(n): 251 | A_mat[i,i] = 1 252 | if i in basins[0]: 253 | B_vec[i] = 0 254 | elif i in basins[1]: 255 | B_vec[i] = 1 256 | else: 257 | B_vec[i] = 0 258 | for j in range(n): 259 | if i != j: 260 | A_mat[i,j] = -trans_arr[j,i] 261 | else: 262 | A_mat[i,i] = 1-trans_arr[j,i] 263 | 264 | Q_vec = np.linalg.solve(A_mat,B_vec) 265 | 266 | return np.array([1-Q_vec,Q_vec]).T 267 | 268 | def _extend(transmat,hubstates): 269 | """ 270 | This function returns an extended transition matrix (2N x 2N) 271 | where one set of states (0..N-1) have NOT yet visited hubstates, 272 | and states (N..2N-1) HAVE visited the hubstates. 273 | """ 274 | n = transmat.shape[0] 275 | 276 | # data, rows and cols of the future extended matrix 277 | data = [] 278 | rows = [] 279 | cols = [] 280 | 281 | for i in range(len(transmat.data)): 282 | if transmat.row[i] in hubstates: 283 | # transition TO a hubstate, add to lower left and lower right 284 | # lower left 285 | data.append(transmat.data[i]) 286 | rows.append(transmat.row[i] + n) 287 | cols.append(transmat.col[i]) 288 | # lower right 289 | data.append(transmat.data[i]) 290 | rows.append(transmat.row[i] + n) 291 | cols.append(transmat.col[i] + n) 292 | else: 293 | # transition not to a hubstate, add to upper left and lower right 294 | # upper left 295 | data.append(transmat.data[i]) 296 | rows.append(transmat.row[i]) 297 | cols.append(transmat.col[i]) 298 | # lower right 299 | data.append(transmat.data[i]) 300 | rows.append(transmat.row[i] + n) 301 | cols.append(transmat.col[i] + n) 302 | 303 | ext_mat = scipy.sparse.coo_matrix((data, (rows, cols)), shape=(2*n,2*n)) 304 | return ext_mat 305 | 306 | def _getring(transmat,basin,wts,tol,maxstep): 307 | """ 308 | Given a transition matrix, and a set of states that form a basin, 309 | this returns a vector describing how probability exits that basin. 310 | """ 311 | # make a matrix with sink states in every non-basin state 312 | n = transmat.shape[0] 313 | flat_sink = [i for i in range(n) if i not in basin] 314 | sink_mat = _make_sink(transmat,flat_sink) 315 | 316 | # see where the probability goes 317 | sink_results = _trans_mult_iter(sink_mat,tol,maxstep) 318 | 319 | ringprob = np.zeros((n)) 320 | for b in basin: 321 | for i in range(n): 322 | if i not in basin: 323 | ringprob[i] += wts[b]*sink_results[i][b] 324 | 325 | return ringprob/wts[basin].sum() 326 | 327 | def hubscores(transmat,hubstates,basins,tol=1e-6,maxstep=30,wts=None): 328 | """ 329 | This function computes hub scores, which are the probabilities that 330 | transitions between a set of communities will use a given community as 331 | an intermediate. e.g. h_a,b,c is the probability that transitions from 332 | basin a to basin b will use c as an intermediate. 333 | 334 | For more information see: 335 | Dickson, A and Brooks III, CL. JCTC, 8, 3044-3052 (2012). 336 | 337 | Input: 338 | 339 | transmat -- An N x N transition matrix in scipy sparse coo format. 340 | Columns should sum to 1. Indices: [to][from] 341 | 342 | hubstates -- A list describing the states in transmat that make up 343 | the hub being measured. 344 | 345 | basins -- A list of two lists, describing which two states make up the 346 | basins of attraction. 347 | e.g. [[basin_a_1,basin_a_2,...],[basin_b_1,basin_b_2,...]]. 348 | 349 | wts -- The equilibrium weights of all states in transmat. If this is not 350 | given then the function will compute them from eig_weights. 351 | 352 | Output: [h_a,b,c , h_b,a,c] 353 | """ 354 | 355 | # make extended sink_matrix 356 | n = transmat.shape[0] 357 | ext_transmat = _extend(transmat,hubstates) 358 | 359 | flat_sink = [i for b in basins for i in b] 360 | flat_sink_ext = flat_sink + [i + n for i in flat_sink] 361 | 362 | sink_mat = _make_sink(ext_transmat,flat_sink_ext) 363 | 364 | sink_results = _trans_mult_iter(sink_mat,tol,maxstep) 365 | 366 | if wts is None: 367 | wts = eig_weights(transmat) 368 | 369 | 370 | h = np.zeros((2,2),dtype=float) 371 | ring = [_getring(transmat,b,wts,tol,maxstep) for b in basins] 372 | 373 | for source,sink in [[0,1],[1,0]]: 374 | for i,p in enumerate(ring[source]): 375 | if p > 0: 376 | # i is a ring state of source basin, with probability p 377 | if i in hubstates: 378 | testi = i + n 379 | else: 380 | testi = i 381 | c_no = 0 382 | c_yes = 0 383 | for b in basins[sink]: 384 | c_no += sink_results[b][testi] 385 | c_yes += sink_results[b+n][testi] 386 | if (c_no + c_yes) > 0: 387 | h[source][sink] += p*c_yes/(c_no+c_yes) 388 | 389 | return [h[0,1],h[1,0]] 390 | 391 | def get_eigenvectors(transmat, n_eig=3, return_wt_vec=False): 392 | """ 393 | This function returns a set of eigenvectors with the highest 394 | eigenvalues. It wraps the scipy.linalg.eig function. 395 | 396 | Input: 397 | 398 | transmat -- An N x N transition matrix in scipy sparse coo format. 399 | Columns should sum to 1. Indices: [to][from] 400 | 401 | n_eig -- The number of eigenvectors to return 402 | 403 | return_wt_vec -- Whether or not to include the eigenvector with 404 | eigenvalue = 1. Note that this is equal to the 405 | steady state weights. 406 | 407 | Output: 408 | 409 | eig_vecs -- A numpy array (N, n_eig) of eigenvector elements (real part only) 410 | 411 | eig_vals -- A numpy array of the n_eig eigenvalues (real part only) 412 | 413 | eig_vecs_imag -- A numpy array (N, n_eig) of eigenvector elements (imaginary part only) 414 | 415 | eig_vals_imag - A numpy array of the n_eig eigenvalues (imaginary part only) 416 | """ 417 | 418 | e_vals_complex, e_vecs_complex = scipy.linalg.eig(transmat) 419 | 420 | e_vals_real = np.real(e_vals_complex) 421 | e_vals_imag = np.imag(e_vals_complex) 422 | 423 | sort_idxs = list(np.argsort(e_vals_real)) 424 | 425 | if return_wt_vec: 426 | idxs_to_return = sort_idxs[-n_eig:] 427 | else: 428 | idxs_to_return = sort_idxs[-(n_eig+1):-1] 429 | 430 | # change order to highest to lowest 431 | idxs_to_return.reverse() 432 | 433 | return np.real(e_vecs_complex)[:,idxs_to_return], e_vals_real[idxs_to_return], \ 434 | np.imag(e_vecs_complex)[:,idxs_to_return], e_vals_imag[idxs_to_return] 435 | 436 | def well_conditioned(transmat): 437 | tol = 1e-5 438 | minval = transmat.sum(axis=0).min() 439 | maxval = transmat.sum(axis=0).max() 440 | if 1 - minval > tol or maxval - 1 > tol: 441 | return False 442 | else: 443 | return True 444 | -------------------------------------------------------------------------------- /examples/committor_net_3state.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ADicksonLab/CSNAnalysis/7700653374937c179a441c656f0783f80e44c26d/examples/committor_net_3state.png -------------------------------------------------------------------------------- /examples/examples.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# CSNAnalysis Tutorial\n", 8 | "### A brief introduction to the use of the CSNAnalysis package\n", 9 | "---\n", 10 | "**Updated Aug 19, 2020**\n", 11 | "*Dickson Lab, Michigan State University*" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Overview\n", 19 | "\n", 20 | "The CSNAnalysis package is a set of tools for network-based analysis of molecular dynamics trajectories.\n", 21 | " CSNAnalysis is an easy interface between enhanced sampling algorithms\n", 22 | " (e.g. WExplore implemented in `wepy`), molecular clustering programs (e.g. `MSMBuilder`), graph analysis packages (e.g. `networkX`) and graph visualization programs (e.g. `Gephi`).\n", 23 | "\n", 24 | "### What are conformation space networks?\n", 25 | "\n", 26 | "A conformation space network is a visualization of a free energy landscape, where each node is a cluster of molecular conformations, and the edges show which conformations can directly interconvert during a molecular dynamics simulation. A CSN can be thought of as a visual representation of a transition matrix, where the nodes represent the row / column indices and the edges show the off-diagonal elements. `CSNAnalysis` offers a concise set of tools for the creation, analysis and visualization of CSNs.\n", 27 | "\n", 28 | "**This tutorial will give quick examples for the following use cases:**\n", 29 | "\n", 30 | "1. Initializing CSN objects from count matrices\n", 31 | "2. Trimming CSNs\n", 32 | "2. Obtaining steady-state weights from a transition matrix\n", 33 | " * By eigenvalue\n", 34 | " * By iterative multiplication\n", 35 | "3. Computing committor probabilities to an arbitrary set of basins\n", 36 | "4. Exporting gexf files for visualization with the Gephi program" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "## Getting started\n", 44 | "\n", 45 | "Clone the CSNAnalysis repository:\n", 46 | "\n", 47 | "```\n", 48 | "git clone https://github.com/ADicksonLab/CSNAnalysis.git```\n", 49 | "\n", 50 | "Navigate to the examples directory and install using pip:\n", 51 | "\n", 52 | "```\n", 53 | "cd CSNAnalysis\n", 54 | "pip install --user -e\n", 55 | "```\n", 56 | "\n", 57 | "Go to the examples directory and open this notebook (`examples.ipynb`):\n", 58 | "\n", 59 | "```\n", 60 | "cd examples; jupyter notebook```" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "## Dependencies\n", 68 | "\n", 69 | "I highly recommend using Anaconda and working in a `python3` environment. CSNAnalysis uses the packages `numpy`, `scipy` and `networkx`. If these are installed then the following lines of code should run without error:" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 1, 75 | "metadata": {}, 76 | "outputs": [], 77 | "source": [ 78 | "import numpy as np\n", 79 | "import networkx as nx\n", 80 | "import scipy" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "If `CSNAnalysis` was installed (i.e. added to your `sys.path`), then this should also work:" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": 2, 93 | "metadata": {}, 94 | "outputs": [], 95 | "source": [ 96 | "from csnanalysis.csn import CSN\n", 97 | "from csnanalysis.matrix import *" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "This notebook also uses `matplotlib`, to visualize output." 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 3, 110 | "metadata": {}, 111 | "outputs": [], 112 | "source": [ 113 | "import matplotlib" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "Great! Now let's load in the count matrix that we'll use for all the examples here:" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 4, 126 | "metadata": {}, 127 | "outputs": [], 128 | "source": [ 129 | "count_mat = scipy.sparse.load_npz('matrix.npz')" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "## Background: Sparse matrices" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": { 142 | "collapsed": true 143 | }, 144 | "source": [ 145 | "It's worth knowing a little about sparse matrices before we start. If we have a huge $N$ by $N$ matrix, where $N > 1000$, but most of the elements are zero, it is more efficient to store the data as a sparse matrix." 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 5, 151 | "metadata": {}, 152 | "outputs": [ 153 | { 154 | "data": { 155 | "text/plain": [ 156 | "scipy.sparse.coo.coo_matrix" 157 | ] 158 | }, 159 | "execution_count": 5, 160 | "metadata": {}, 161 | "output_type": "execute_result" 162 | } 163 | ], 164 | "source": [ 165 | "type(count_mat)" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "`coo_matrix` refers to \"coordinate format\", where the matrix is essentially a set of lists of matrix \"coordinates\" (rows, columns) and data:" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": 6, 178 | "metadata": {}, 179 | "outputs": [ 180 | { 181 | "name": "stdout", 182 | "output_type": "stream", 183 | "text": [ 184 | "0 0 382.0\n", 185 | "0 651 2.0\n", 186 | "0 909 2.0\n", 187 | "0 920 2.0\n", 188 | "0 1363 1.0\n", 189 | "0 1445 2.0\n", 190 | "0 2021 5.0\n", 191 | "0 2022 7.0\n", 192 | "0 2085 4.0\n", 193 | "0 2131 1.0\n" 194 | ] 195 | } 196 | ], 197 | "source": [ 198 | "rows = count_mat.row\n", 199 | "cols = count_mat.col\n", 200 | "data = count_mat.data\n", 201 | "\n", 202 | "for r,c,d in zip(rows[0:10],cols[0:10],data[0:10]):\n", 203 | " print(r,c,d)" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": {}, 209 | "source": [ 210 | "Although it can be treated like a normal matrix ($4000$ by $4000$ in this case):" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 7, 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "data": { 220 | "text/plain": [ 221 | "(4000, 4000)" 222 | ] 223 | }, 224 | "execution_count": 7, 225 | "metadata": {}, 226 | "output_type": "execute_result" 227 | } 228 | ], 229 | "source": [ 230 | "count_mat.shape" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "It only needs to store non-zero elements, which are much fewer than $4000^2$:" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 8, 243 | "metadata": {}, 244 | "outputs": [ 245 | { 246 | "data": { 247 | "text/plain": [ 248 | "44163" 249 | ] 250 | }, 251 | "execution_count": 8, 252 | "metadata": {}, 253 | "output_type": "execute_result" 254 | } 255 | ], 256 | "source": [ 257 | "len(rows)" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "**OK, let's get started building a Conformation Space Network!**\n", 265 | "\n", 266 | "---" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "## 1) Initializing CSN objects from count matrices\n", 274 | "\n", 275 | "To get started we need a count matrix, which can be a `numpy` array, or a `scipy.sparse` matrix, or a list of lists:" 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": 9, 281 | "metadata": {}, 282 | "outputs": [], 283 | "source": [ 284 | "our_csn = CSN(count_mat,symmetrize=True)" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": {}, 290 | "source": [ 291 | "Any of the `CSNAnalysis` functions can be queried using \"?\"" 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": 10, 297 | "metadata": {}, 298 | "outputs": [], 299 | "source": [ 300 | "CSN?" 301 | ] 302 | }, 303 | { 304 | "cell_type": "markdown", 305 | "metadata": {}, 306 | "source": [ 307 | "The `our_csn` object now holds three different representations of our data. The original counts can now be found in `scipy.sparse` format:" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": 11, 313 | "metadata": {}, 314 | "outputs": [ 315 | { 316 | "data": { 317 | "text/plain": [ 318 | "<4000x4000 sparse matrix of type ''\n", 319 | "\twith 62280 stored elements in COOrdinate format>" 320 | ] 321 | }, 322 | "execution_count": 11, 323 | "metadata": {}, 324 | "output_type": "execute_result" 325 | } 326 | ], 327 | "source": [ 328 | "our_csn.countmat" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "A transition matrix has been computed from this count matrix according to: \n", 336 | "\\begin{equation}\n", 337 | "t_{ij} = \\frac{c_{ij}}{\\sum_j c_{ij}}\n", 338 | "\\end{equation}" 339 | ] 340 | }, 341 | { 342 | "cell_type": "code", 343 | "execution_count": 12, 344 | "metadata": {}, 345 | "outputs": [ 346 | { 347 | "data": { 348 | "text/plain": [ 349 | "<4000x4000 sparse matrix of type ''\n", 350 | "\twith 62280 stored elements in COOrdinate format>" 351 | ] 352 | }, 353 | "execution_count": 12, 354 | "metadata": {}, 355 | "output_type": "execute_result" 356 | } 357 | ], 358 | "source": [ 359 | "our_csn.transmat" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": {}, 365 | "source": [ 366 | "where the elements in each column sum to one:" 367 | ] 368 | }, 369 | { 370 | "cell_type": "code", 371 | "execution_count": 13, 372 | "metadata": {}, 373 | "outputs": [ 374 | { 375 | "data": { 376 | "text/plain": [ 377 | "matrix([[1., 1., 1., ..., 1., 1., 1.]])" 378 | ] 379 | }, 380 | "execution_count": 13, 381 | "metadata": {}, 382 | "output_type": "execute_result" 383 | } 384 | ], 385 | "source": [ 386 | "our_csn.transmat.sum(axis=0)" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "Lastly, the data has been stored in a `networkx` directed graph:" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 14, 399 | "metadata": {}, 400 | "outputs": [ 401 | { 402 | "data": { 403 | "text/plain": [ 404 | "" 405 | ] 406 | }, 407 | "execution_count": 14, 408 | "metadata": {}, 409 | "output_type": "execute_result" 410 | } 411 | ], 412 | "source": [ 413 | "our_csn.graph" 414 | ] 415 | }, 416 | { 417 | "cell_type": "markdown", 418 | "metadata": {}, 419 | "source": [ 420 | "that holds the nodes and edges of our csn, and we can use in other `networkx` functions. For example, we can calculate the shortest path between nodes 0 and 10:" 421 | ] 422 | }, 423 | { 424 | "cell_type": "code", 425 | "execution_count": 15, 426 | "metadata": {}, 427 | "outputs": [ 428 | { 429 | "data": { 430 | "text/plain": [ 431 | "[0, 1445, 2125, 2043, 247, 1780, 10]" 432 | ] 433 | }, 434 | "execution_count": 15, 435 | "metadata": {}, 436 | "output_type": "execute_result" 437 | } 438 | ], 439 | "source": [ 440 | "nx.shortest_path(our_csn.graph,0,10)" 441 | ] 442 | }, 443 | { 444 | "cell_type": "markdown", 445 | "metadata": {}, 446 | "source": [ 447 | "---\n", 448 | "## 2) Trimming CSNs\n", 449 | "\n", 450 | "A big benefit of coupling the count matrix, transition matrix and graph representations is that elements can be \"trimmed\" from all three simultaneously. The `trim` function will eliminate nodes that are not connected to the main component (by inflow, outflow, or both), and can also eliminate nodes that do not meet a minimum count requirement:" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": 16, 456 | "metadata": {}, 457 | "outputs": [], 458 | "source": [ 459 | "our_csn.trim(by_inflow=True, by_outflow=True, min_count=20)" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "The trimmed graph, count matrix and transition matrix are stored as `our_csn.trim_graph`, `our_csn.trim_countmat` and `our_csn.trim_transmat`, respectively." 467 | ] 468 | }, 469 | { 470 | "cell_type": "code", 471 | "execution_count": 17, 472 | "metadata": {}, 473 | "outputs": [ 474 | { 475 | "data": { 476 | "text/plain": [ 477 | "2282" 478 | ] 479 | }, 480 | "execution_count": 17, 481 | "metadata": {}, 482 | "output_type": "execute_result" 483 | } 484 | ], 485 | "source": [ 486 | "our_csn.trim_graph.number_of_nodes()" 487 | ] 488 | }, 489 | { 490 | "cell_type": "code", 491 | "execution_count": 18, 492 | "metadata": {}, 493 | "outputs": [ 494 | { 495 | "data": { 496 | "text/plain": [ 497 | "(2282, 2282)" 498 | ] 499 | }, 500 | "execution_count": 18, 501 | "metadata": {}, 502 | "output_type": "execute_result" 503 | } 504 | ], 505 | "source": [ 506 | "our_csn.trim_countmat.shape" 507 | ] 508 | }, 509 | { 510 | "cell_type": "code", 511 | "execution_count": 19, 512 | "metadata": {}, 513 | "outputs": [ 514 | { 515 | "data": { 516 | "text/plain": [ 517 | "(2282, 2282)" 518 | ] 519 | }, 520 | "execution_count": 19, 521 | "metadata": {}, 522 | "output_type": "execute_result" 523 | } 524 | ], 525 | "source": [ 526 | "our_csn.trim_transmat.shape" 527 | ] 528 | }, 529 | { 530 | "cell_type": "markdown", 531 | "metadata": {}, 532 | "source": [ 533 | "## 3) Obtaining steady-state weights from the transition matrix\n", 534 | "\n", 535 | "Now that we've ensured that our transition matrix is fully-connected, we can compute its equilibrium weights. This is implemented in two ways.\n", 536 | "\n", 537 | "First, we can compute the eigenvector of the transition matrix with eigenvalue one:" 538 | ] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "execution_count": 20, 543 | "metadata": {}, 544 | "outputs": [], 545 | "source": [ 546 | "wt_eig = our_csn.calc_eig_weights()" 547 | ] 548 | }, 549 | { 550 | "cell_type": "markdown", 551 | "metadata": {}, 552 | "source": [ 553 | "This can exhibit some instability, especially for low-weight states, so we can also calculate weights by iterative multiplication of the transition matrix, which can take a little longer:" 554 | ] 555 | }, 556 | { 557 | "cell_type": "code", 558 | "execution_count": 21, 559 | "metadata": {}, 560 | "outputs": [], 561 | "source": [ 562 | "wt_mult = our_csn.calc_mult_weights()" 563 | ] 564 | }, 565 | { 566 | "cell_type": "code", 567 | "execution_count": 22, 568 | "metadata": {}, 569 | "outputs": [ 570 | { 571 | "data": { 572 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEGCAYAAABy53LJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3dd3xUVfrH8c9DaFEpClgoCioWUJcSQeyrorAWUFlBWcv+QGzYRcGK2EBcXQEbinVtrAqyKkYUBBVBQhcQBUQpKiAdIiTh+f1xb3AYQzKRTGYm+b5fr3kxc+65N89hNA/nnnPPMXdHREQkVhUSHYCIiKQWJQ4RESkWJQ4RESkWJQ4RESkWJQ4RESmWiokOoDTUrl3bGzZsmOgwRERSytSpU1e5e53o8nKROBo2bEhWVlaiwxARSSlm9kNB5bpVJSIixaLEISIixaLEISIixaLEISIixaLEISIixVIuZlWJiJQnI6cvY2DmfJavzaZuzXR6nXEoHZvXK7HrK3GIiJQhI6cvo887s8nOyQNg2dps+rwzG6DEkkdcb1WZWTszm29mC8ysdwHHq5jZm+HxyWbWMCxvZWYzwtdMMzs31muKiJRnAzPnk52TxyErF9Nr/EvgTnZOHgMz55fYz4hb4jCzNOAJoD3QBLjQzJpEVesGrHH3g4HHgAFh+ddAhrs3A9oBz5hZxRivKSJSbq38dT3Xf/4a7714A11mZrLfhlUALF+bXWI/I563qloBC9x9EYCZvQF0AOZG1OkA9A3fvwUMMTNz980RdaoC+btNxXJNEZHyacoURr9yEwf98j0jm5xEv1N7sHq3GgDUrZleYj8mnreq6gFLIj4vDcsKrOPuucA6oBaAmbU2sznAbODK8Hgs1yQ8v4eZZZlZ1sqVK0ugOSIiSWrzZrjlFjjmGOp6Nld17ssNZ/fanjTSK6XR64xDS+zHJe10XHef7O5NgaOBPmZWtZjnD3X3DHfPqFPnD2t0iYiUDePGwZFHwr/+BZdfTvq333DGbd2pVzMdA+rVTOeh845MmVlVy4AGEZ/rh2UF1VlqZhWBGsCvkRXcfZ6ZbQSOiPGaIiJl37p1cOutMHQoHHRQkEBOPhmAjs1rlGiiiBbPHscUoLGZNTKzykAXYFRUnVHApeH7TsBYd/fwnIoAZnYAcBiwOMZrioiUbf/7HzRpAs89F9yimjVre9IoDXHrcbh7rpn1BDKBNOB5d59jZv2ALHcfBQwDXjGzBcBqgkQAcDzQ28xygG3A1e6+CqCga8arDSIiSWXlSrj+enj99eD21MiRcPTRpR6GuXvRtVJcRkaGaz8OEUlZ7kGyuO46WL8e7roLbrsNKleO6481s6nunhFdrifHRUSS2ZIlcNVV8P770Lo1DBsGTZsmNKSknVUlIlKubdsGzzwTJIlx4+Cxx+CLLxKeNEA9DhGR5PPdd3D55TB+PJx6ajBz6sADEx3VdupxiIgki9xceOQROOoomDEjmDU1ZkxSJQ1Qj0NEJDnMmgXdukFWFnToAE8+CXXrJjqqAqnHISKSSFu2wN13Q8uW8OOPMHw4jBiRtEkD1OMQEUmcSZOCXsbcuXDxxcEAeK1aiY6qSOpxiIiUtk2b4MYb4dhjYcMG+OADePnllEgaoB6HiEjp+uSTYMbU99/D1VfDQw9B9eqJjqpY1OMQESkNa9dC9+5w2mlQsWIw1faJJ1IuaYASh4hI/L37brAo4YsvBkuFzJwJJ56Y6Kj+NN2qEhGJl19+CdaXGj4c/vKXYFXbli0THdUuU+IQEdlFI6cvY2DmfJavzaZuzXR6nX4IHeeMgxtugI0b4YEHoFcvqFQp0aGWCCUOEZFdMHL6Mvq8M5vsnDwA/Mcf2OuC22BhFrRpEyxKePjhCY6yZClxiIjsgoGZ88nOycN8G12nj6b3+Bcxdx47uyc3jvg3pKUlOsQSp8QhIlJMkbemHGi0ehn9Rw+i9dI5TGjYnNvb9WRZjX24sQwmDVDiEBEplshbU2nb8rj8qxHc+Pmr/FaxMrf87QbeOuJUMKNezfREhxo3ShwiIsWQf2uqyS+LGDD6cY78ZSGjDzmWu9texco99gQgvVIavc44NMGRxo8Sh4hIMaxatY5bJr7BlZPeYs1u1bmyYx8+PPQ4AAyCWVVnHErH5vUSG2gcKXGIiMRq4kQyX76Bhit/5K0jTuW+U7qzLr0aAPVqpvNF71MSHGDpUOIQESnKxo1w++0wZAh771OX7hfez8f7N9t+uKzfmoqmJUdERArz0UdwxBEwZAj07Mlu387jrF6XUa9mOkbQ03jovCPL9K2paOpxiIgUZPVquPnmYH2pQw+Fzz6D44KxjI7Nq5WrRBFNPQ4RkWhvvx0sSvjKK8EtqhkzticNiXPiMLN2ZjbfzBaYWe8CjlcxszfD45PNrGFY3tbMpprZ7PDPUyLO+TS85ozwtXc82yAi5cjPP0OnTsGrbt1g/+8HHoCqVRMdWVKJW+IwszTgCaA90AS40MyaRFXrBqxx94OBx4ABYfkq4Gx3PxK4FHgl6ryu7t4sfK2IVxtEpJxwD25JNWkC770XbK40eTI0a1bkqeVRPHscrYAF7r7I3bcCbwAdoup0AF4K378FnGpm5u7T3X15WD4HSDezKnGMVUTKq8WLoV07+Oc/oWnTYK+M3r3LzEq28RDPxFEPWBLxeWlYVmAdd88F1gHRm+6eD0xz9y0RZS+Et6nuMjMr6IebWQ8zyzKzrJUrV+5KO0SkLNq2DQYPDmZMTZwY7MY3fnwwEC6FSurBcTNrSnD76oqI4q7hLawTwtfFBZ3r7kPdPcPdM+rUqRP/YEUkdcybByecEGyydMIJ8PXXwf7fFZL6V2LSiOff0jKgQcTn+mFZgXXMrCJQA/g1/FwfGAFc4u4L809w92XhnxuA1whuiYmIFC0nBx58MBi7+OYbePll+OADOOCAREeWUuKZOKYAjc2skZlVBroAo6LqjCIY/AboBIx1dzezmsD7QG93/yK/splVNLPa4ftKwFnA13Fsg4iUFdOmQatWcMcd0KEDzJ0LF18MBd/tlkLELXGEYxY9gUxgHjDc3eeYWT8zOyesNgyoZWYLgJuA/Cm7PYGDgbujpt1WATLNbBYwg6DH8my82iAiZUB2NvTpEySNn3+Gd94J9gDfZ59ER5ayzN0THUPcZWRkeFZWVqLDEJHS9vnn0K0bfPst/N//wSOPwJ57JjqqlGFmU909I7pcI0EiUvZs2AA9ewYD31u3wpgxwd7fSholQolDRMqW0aOD5zGefBJuuCGYMXXaaYmOqkzRIocikrIi9/4+rNJWnpn+Kvu//zYcfjh88QW0aZPoEMskJQ4RSUnb9/7emsvf5n/BvWOepuZvG/jm8us5bPAAqKLFJuJFiUNEUtLAzPnssWYl//7oSc74bhKz9j2YSzr3Y/2BTflCSSOulDhEJPW4c9yEd7lz7DAq5+Xw4Mn/ZNjRHcmrkIatzU50dGWeEoeIpJZFi6BHDx7+5BMmNziC29pdy+K9fl8Gr27N9AQGVz4ocYhIasjLCxYlvOMOSEtjxu0P8U87is25vz+LVt72/k4UTccVkeQ3dy4cfzzceCOcfDLMmUOzB3rz4Pl/Kdd7fyeKehwikry2boUBA+D++6FaNfjPf+Cii7avL9WxeT0ligRQ4hCR5JSVFSwXMmsWdOkCjz8Oe2un6GSgW1Uiklw2b4Zbb4XWrWHVKnj3XXj9dSWNJKIeh4gkj/HjoXt3WLAALr8cBg6EGjUSHZVEUY9DRBJv/Xq46qpg4HvbNvjkExg6VEkjSanHISKlKnJ9qbo103lkt6W0GXgHLF8ON90E990Hu+2W6DClEEocIlJqtq8vlZPHnpvXccv/HqHN3E9Zf9ChVJ84MRjXkKSnxCEipWZg5nyyt+Zy9rwJ9P34Gapt2cy/j7uQEe0uZbySRspQ4hCRUpO3ZAnPfvQUbRdMZsZ+jbmt/fXMr9MQ25ib6NCkGJQ4RCT+3OG55/h42I2k5eVy31+78ULGOWyrkAZofalUo8QhIvG1cGEwtXbcOLIzjqVrq258W22f7Ye1vlTq0XRcEYmPvDx49FE48kiYOhWGDqXOV59zdfcztL5UilOPQ0RK3tdfB8uFfPUVnH02PPUU1AuSg9aXSn3qcYhIydm6Ffr2hRYtgn0zXn89WDKknhJFWRLXxGFm7cxsvpktMLPeBRyvYmZvhscnm1nDsLytmU01s9nhn6dEnNMyLF9gZoPMwmUyRSSxvvoqSBj33gsXXADz5gWLE+p/0TInbonDzNKAJ4D2QBPgQjNrElWtG7DG3Q8GHgMGhOWrgLPd/UjgUuCViHOeAi4HGoevdvFqg4jEYPNmuPlmaNMG1q2D994Llj+vXTvRkUmcxLPH0QpY4O6L3H0r8AbQIapOB+Cl8P1bwKlmZu4+3d2Xh+VzgPSwd7IfUN3dJ7m7Ay8DHePYBhEpzLhxweD3o49Cjx4wZw6ceWaio5I4i2fiqAcsifi8NCwrsI675wLrgFpRdc4Hprn7lrD+0iKuKSLxtm5dkChOOQUqVIBPPw0GwKtXT3RkUgqSenDczJoS3L664k+c28PMsswsa+XKlSUfnEh59b//QZMmMGwY9OoFM2fCSSclOiopRfFMHMuABhGf64dlBdYxs4pADeDX8HN9YARwibsvjKhfv4hrAuDuQ909w90z6tSps4tNERFWrIALL4RzzoFatWDyZHj4Ya1kWw7FM3FMARqbWSMzqwx0AUZF1RlFMPgN0AkY6+5uZjWB94He7v5FfmV3/wlYb2bHhLOpLgHejWMbRMQdXn016GW8/Tb06xds65qRkejIJEGKTBxm9vdYyqKFYxY9gUxgHjDc3eeYWT8zOyesNgyoZWYLgJuA/Cm7PYGDgbvNbEb4yt838mrgOWABsBAYXVQsIvInLVkSPMD3j39A48YwYwbcdRdUrpzoyCSBLJicVEgFs2nu3qKosmSWkZHhWVlZiQ5DJHVs2xbswHfrrcHSIQ8+CD17QlpaoiOTUmRmU939D13LnS45Ymbtgb8B9cxsUMSh6oDWQBYpq777LliUcPx4OPXUIIEceGCio5IkUtitquVAFvAbMDXiNQo4I/6hiUipys2FgQPhqKOCW1LDhsGYMUoa8gc77XG4+0xgppm95u45pRiTiJS2mTODRQmnToWOHeGJJ6Bu3URHJUkqlllVrcxsjJl9a2aLzOx7M1sU98hEJP62bAkGuzMygoHw4cPhnXeUNKRQsSyrPgy4keA2VV58wxGRUvPll0EvY948uOSSYNmQWtELN4j8USyJY527a8qrSFmxaRPccQcMGgT168MHH0D79omOSlJIYbOq8qfbjjOzgcA7wJb84+4+Lc6xiUhJ+/jjYMbU4sVwzTXw0ENQrVqio5IUU1iP419RnyPn8jpwCiKSGtasgVtugeefDx7kmzABTjgh0VFJiipsVtVfSzMQEYmTESPg6qth5Uro3RvuvhvS0xMdlaSwIsc4zOymAorXAVPdfUbJhyQiJeKXX+Daa+G//4VmzeD994Md+kR2USzTcTOAKwn2vahHsMR5O+BZM7s1jrGJyJ/hDi+/DIcfHuz3/cADv2/rKlICYplVVR9o4e4bAczsHoKVa08kmKL7cPzCE5Fi+fFHuOIK+PBDOPbY4Onvww5LdFRSxsTS49ibiNlUQA6wj7tnR5WLSKJs2xY87d20KXz2WTDV9rPPlDQkLmLpcbwKTDaz/H0vzgZeM7Pdgblxi0xEYjN/PnTvDp9/Dm3bBosSNmyY6KikDCsycbj7fWY2GjguLLrS3fPXKO8at8hEZAcjpy9jYOZ8lq/Npm7NdG495UA6jH0D+vYNduF78cXgCXCzRIcqZVxhDwBWd/f1ZrYXsCh85R/by91Xl0aAIhIkjT7vzCY7J1j1p+b8r2n8eA/4eSGcfz4MGQL77pvgKKW8KKzH8RpwFsEAuAMW9afWWhYpJQMz55Odk0eV3K1cO/ENrpz0Fmt2q84dXe/hgf/0TXB0Ut4U9gDgWeGfjUovHBEpyPK12bRcOpeHRw/ioNVL+e8Rp3H/Kd1Yn16NBxIdnJQ7sTwAaARjGY3C8Y79gX3d/au4RycisHEjD08YxvlfjmR59TpcfEE/PmsUPJNRr6aeAJfSF8usqieBbQRrU90HbADeBo6OY1wiApCZCT160GnJEv5z9Nk8dPzFbK4cJIv0Smn0OuPQBAco5VEsz3G0dvdrCLaQxd3XAJXjGpVIebd6NVx2GbRrB7vthn32GdWeeZI9994LI+hpPHTekXRsXi/RkUo5FEuPI8fM0ggGxDGzOgQ9EBGJh7ffDpY8X7Uq2DfjzjuhalU6ghKFJIVYEscgYASwt5k9AHQC7oxrVCLl0U8/Qc+ewdatzZsHy4Y0a5boqET+IJYHAF81s6nAqQRTcTu6+7y4RyZSxm1/oG/NZrovmkCvzKFU3vob9O8PN98MFWP5d51I6YtlVtV9wATgRXffVJyLm1k74HEgDXjO3ftHHa8CvAy0BH4FOrv7YjOrBbxFMAD/orv3jDjnU2A/IDssOt3dVxQnLpFEy3+gr9aq5bz04RBOXDydrAZNWfv4k5x27omJDk+kULH8k2YRcCEwyMw2AJ8BE9z93cJOCsdFngDaAkuBKWY2yt0j17fqBqxx94PNrAswAOhMMBB/F3BE+IrWNWLZE5GU86/Rc+n85Qh6TXgZN+POtlfxavP21J2fy2mJDk6kCLHcqnoBeMHM9gUuAG4BegBFbVTcCljg7osAzOwNoAM7LozYAegbvn8LGGJmFvZsPjezg4vRFpHUMG8ejz15PRnL5vFpo5bc3u4allffGwge9BNJdkVOxzWz58xsIvAUQaLpBOwZw7XrAUsiPi8Nywqs4+65BDsL1orh2i+Y2Qwzuyt8QLGguHuYWZaZZa1cuTKGS4rEWU5OsKlSs2Y0Xr2UG8+8icv+3nd70gCoqwf6JAXE8hxHLYIxirXAamBV+Es+Ubq6+5HACeHr4oIquftQd89w94w6deqUaoAifzBtGhx9dDC1tmNHJo6awIfN2+6wkq0e6JNUUWTicPdz3b01wU5/NYFxZrY0hmsvAxpEfK4flhVYx8wqAjUIBskLi2dZ+OcGgoUYW8UQi0hiZGdD797QqlWwB/iIEfDmm7Q/rRkPnXck9Wqm64E+STmxzKo6i+Bf9icSJI6xBAPkRZkCNDazRgQJogtwUVSdUcClwJcEt8DGursXEktFoKa7rzKzSgSr934cQywipW/ChGCDpe++g27dYOBA2PP3u7wdm9dTopCUFMusqnYEieJxd18e64XdPdfMegKZBLe6nnf3OWbWD8hy91HAMOAVM1tAcBusS/75ZrYYqA5UNrOOwOnAD0BmmDTSCJLGs7HGJFIq1q+HPn3gySehUSMYMwZO01wpKTuskH/glxkZGRmelaXZu1IKRo+GK66ApUvh+uvh/vth990THZXIn2JmU909I7pcj6aKlIRff4Ubb4RXXoEmTWDiRDjmmERHJRIXscyqEpGdcYfhw+Hww+H11+Guu4IZVEoaUobF8hzH9bGUiZQ7y5fDuedC586w//4wdSr06wdVqiQ6MpG4iqXHcWkBZZeVcBwiqcMdhg0LbkllZsLDD8OkSXDUUYmOTKRU7HSMw8wuJJg+28jMRkUcqkYwA0qk/Fm0CC6/HMaOhZNOgmefhcaNEx2VSKkqbHB8IvATUBv4V0T5BmBWPIMSSTp5eTB4cLCxUloaPP10kEAqaJhQyp+dJg53/4HguYk2pReOSBKaMyd4gG/yZDjzzCBp1K+f6KhEEqawW1UbCLeLjT4EuLtXj1tUIslg69ZgU6X774fq1eHVV+HCC3dYX0qkPCqsx1HUsukiZdeUKUEvY/Zs6NIFBg0CLZYpAsS2VtX+BZW7+48lH45I6dq+fevabOrWTKf3iftz9ohn4NFHYd994d134ZxzEh2mSFKJ5cnx9yPeVwUaAfOBpnGJSKSU5G/fmp2TB0CDWZM56uGLYc1y6NEjmGZbo0aCoxRJPrHsAHhk5GczawFcHbeIRErByOnLuHn4TPLcqbZlE70/fYGuMz5kcc39uLb7Iwx+5uZEhyiStIq9VpW7TzOz1vEIRqQ05Pc08tz568IpPPjhEPbetIahR5/Loyd0ZUulqgxOdJAiSSyWMY6bIj5WAFoAMS+vLpJsBmbOJ33dah76ZCgd547nm9oHcOW5tzOzbrD7Xj1t3ypSqFh6HJGzq3IJxjzejk84InHmTsuJH3LPx89QbctmHjvuIp5s83dy0ioB2r5VJBaxjHHcWxqBiMTd0qVw1VUMeu89Zux3CLe2v45v6zTcfjjNTNu3isSgsAcAR+3sGIC7a46iJK3Iabb1qldh8OapNB/8IOTkMPume+ha9Wg25f1eP71SmpKGSIwK63G0AZYArwOTCZ4YF0l6kdNsD1iznP6vD6b5j7NZmXEsdd54mSMPOogHop7f6HXGoUoaIjEqLHHsC7QF8lfJfR943d3nlEZgIn/WwMz5bN2yle5Z73LzZ6+SUyGN29pdy+cnduCLgw4CoGPzekoUIn9SYUuO5AEfAh+aWRWCBPKpmd3r7kNKK0CR4qr23TyeGP04zX76jjEHt+LO06/ml2q1sXW/JTo0kTKh0MHxMGGcSZA0GgKDgBHxD0vkT9iyBR58kPdefIC1Vfeg5zm38t5hJ2xflLCuptmKlIjCBsdfBo4APgDudfevSy0qkeKaPDlYlHDOHH5qfy4XHN6Znyrtsf2wptmKlJzCdqH5B9AYuB6YaGbrw9cGM1tfOuGJFGHTJrjpJmjTBtatg/feo8EH73DbP46nXs10jOCBPs2YEik5hY1xaGszSW5jxwa78C1aBFddFeydUT3YJkaD3yLxE9fkYGbtzGy+mS0ws94FHK9iZm+GxyebWcOwvJaZjTOzjWY2JOqclmY2OzxnkJl21Sl31q4NEsappwZbt376KTz55PakISLxFbfEYWZpwBNAe6AJcKGZNYmq1g1Y4+4HA48BA8Ly34C7gFsKuPRTwOUEt9EaA+1KPnpJWu++C02awPPPw623wqxZcNJJiY5KpFyJZ4+jFbDA3Re5+1bgDaBDVJ0OwEvh+7eAU83M3H2Tu39OkEC2M7P9gOruPsndHXgZ6BjHNkiyWLEi2ImvY0eoXTsYDB8wANI1U0qktMUzcdQjePI839KwrMA67p4LrANqFXHNpUVcEwAz62FmWWaWtXLlymKGLknDHf7zHzj8cBgxAu67D7KyICMj0ZGJlFtldgDc3Ye6e4a7Z9TRXtGpackSOOssuPhiOOQQmD4d7rwTKldOdGQi5VqxN3IqhmVAg4jP9cOyguosNbOKQA3g1yKuWb+Ia0oKil6UcMjGKTQb8hDk5cG//w09e0JaWqLDFBHimzimAI3NrBHBL/cuBGteRRoFXAp8CXQCxoZjFwVy95/CZ0mOIVh48RLQZm2pLnJRwkarl9H/tcE0W/I1K1qfwN6vvwSNGiU6RBGJELfE4e65ZtYTyATSgOfdfY6Z9QOy3H0UMAx4xcwWAKsJkgsAZrYYqA5UNrOOwOnuPpdgv/MXgXRgdPiSFJa/KOEVU0Zw4+evsTWtEr3aX8fEE87hCyUNkaQTzx4H7v4BwZIlkWV3R7z/Dfj7Ts5tuJPyLIKlUKSMqPHtHJ7+4HGO/GUhmY2P4a62V7GiWi0tSiiSpOKaOESiRY5lHLBHGs/88CGjXhrCmqrVuKpDb0YfepwWJRRJckocUmoixzJaLJvHgNGDaPzrEqaffDZXt/gHP1XafXtdLUookryUOKTUDMycj23ayN0TXuGyqf9jefXaXPr3e1nQ4nhuO+NQ7cgnkiKUOKTUHDh9Ig9mDqHBul94qcWZPHzipWyqshu2NluLEoqkECUOib81a+Dmm3ll+Ass3Ksef7+oP1Ma/D6/QWMZIqlFiUNKVOTgd92a6TxaaRGtH7kLVq5k/j97csE+bVnnvz/Ip7EMkdSjxCElJnLwu87GNdw+8iFaz/+CtYc2peb773NoixbcG5VYNJYhknqUOGSX5fcylq3NBnfOmzOWuz95lvScLTx84iW8d3pXJrRoAWiDJZGyQIlDdklkL6PeuhU8mDmEk76fRla9w7mt/XUsrNUA25CT6DBFpAQpccguGZg5n9+25nDJtPe5bXywtcrdp13BKy3OxC1YfFmD3yJlixKH7JKqC79j+OhBHL1sLhMaNuf2dj1ZWmOf7cc1+C1S9ihxyJ+TkwOPPMIHL95DdsUq3Py3G3n7iFO2LxcCUE+D3yJlkhKHFN/06dCtG0yfzqrTzqTLERexpEqN7YfTK6Xx0HlHKmGIlFFKHFKoO0fO5vXJS8hzJz0vh6e/f4+TRr4Y7Pv99tvUO+88btYUW5FyRYlDdurOkbP5z6QfAchYOocBowdx0OplTD2lIy3feh723BPQFFuR8kaJQ3YQ+eS3A7tv2cytE17i0mnvs7T63lx8QT8mHtiShWHSEJHyR4lDtrtz5GxenfQj+Xv3nrhoKg9mDqHu+lW80PJsBp54CZsrp8POd/cVkXJAiUOAoKeRnzRqZG/g7rHPcv7XY1mwV306dX2YafUP3143LWLmlIiUP0ocAgQP8jnQ/pvP6TfmaWr+toHBbToz5NjObKlYeYe6F7ZukJggRSQpKHEIADlLl/H0mKdo9+2XzN7nIC69oB9z9zlwhzppZlzYugH3dzwyQVGKSDJQ4ijv3OHFF/l42HVUydlC/5Mu49lW55JXIVj63IDHOjfTrCkR2U6Jo5yJnDXVcttanpzwDHtP/oytzVvT+ZjuzKu+3/a6BnQ9Zn8lDRHZgRJHOTFy+jL6jprD2uwcKmzL49Jp73PrhJdwq8CMPg/S7P7buGLmT3qQT0SKFNfEYWbtgMeBNOA5d+8fdbwK8DLQEvgV6Ozui8NjfYBuQB5wnbtnhuWLgQ1hea67Z8SzDWVB5NLnB61awsOjH6fl8m8Yd2BL7jjjGqz6AXxRoYIe5BORmMQtcZhZGvAE0BZYCkwxs1HuPjeiWjdgjbsfbGZdgAFAZzNrAnQBmgJ1gY/N7BB3zwvP+6u7r4pX7HvuPSoAABBDSURBVGXNwMz55Py2hZ6T3+LaiW+wuVI6N5x1MyObnAxm2NrsRIcoIikknj2OVsACd18EYGZvAB2AyMTRAegbvn8LGGJmFpa/4e5bgO/NbEF4vS/jGG+Ztdc3s3nug39z+MrFvHfYCdxz2hX8unvN7ce1X4aIFEc8E0c9YEnE56VA653VcfdcM1sH1ArLJ0Wdm38PxYGPzMyBZ9x9aEE/3Mx6AD0A9t9//11rSarKzoa+fRn58iOs2r0mPc69g48OabNDFe2XISLFlYqD48e7+zIz2xsYY2bfuPuE6EphQhkKkJGRUf7WyJgwAbp3h+++Y0nHLlxw8PmsSNuxZ7HnbpW45+ymGtcQkWKpEMdrLwMiHzGuH5YVWMfMKgI1CAbJd3quu+f/uQIYQXALS/KtXw9XXw0nnQS5ufDxxzQc8Tq3X9SGejXTMYINlv7duRnT7z5dSUNEii2ePY4pQGMza0TwS78LcFFUnVHApQRjF52Ase7uZjYKeM3MHiUYHG8MfGVmuwMV3H1D+P50oF8c25BaPvgArrwSli6FG2+E++6D3XcHtPS5iJScuCWOcMyiJ5BJMB33eXefY2b9gCx3HwUMA14JB79XEyQXwnrDCQbSc4Fr3D3PzPYBRgTj51QEXnP3D+PVhmQW+SDf4ZW28sy0/9Dgg3egSROYOBGOOSbRIYpIGWVeDpbIzsjI8KysrESHUWK2P5exNZezvvmMvh8/Q43fNrKw27UcNrg/VKmS6BBFpAwws6kFPSuXioPj5VJkD6OCGbXWr2LQR0/SdsFkZu7bmH90vp8NBzbhCyUNEYkzJY4U0PXZL/li4erggzudZmZyx7jnqZyXwwMn/x/PH92BvAppepBPREqFEkcSi9zzG6DB2p/p/+EgjvthFpMaHMFt7a/jhz3rbj+uB/lEpDQocSSpto9+yncrNgFQYVse/5z6P26Z8Aq5FSrQ54yevPGX03H7fTa1HuQTkdKixJFkgoHvWWTnbAPgkJWLGTB6MM1/ms8nBx3NHadfw8/VawPBxkrb3LWSrYiUKiWOJHLnyNnb9/2ulJfD1V/+l2u+HM6GKrtx3dm9GHX4iRCx3/e/LviLkoWIlDoljiQxcvqy7UnjqJ++5eEPHuewVT/w7uEnce9pPVi9W40d6h930F5KGiKSEEocCTRy+jLuGDGbTVuD1eKr5vzGTZ+9Sresd1mx+550O/8uPjl4x3Uh83fl077fIpIoShwJssMUW+CYH2fRf/RgGq79iVebtaP/yf9kQ5Xddzin8d67M+amk0s5UhGRHSlxlLLoXka1LZvoM+4FLpr5IYtr7seFXR7kywOO2uGcCgYXtVYvQ0SSgxJHKYruZZy6YDIPZD5BnU1reabVeTx2/EX8Vqnq9uO6LSUiyUiJoxSMnL6MG96csf3zXpvXcc/HQ+kwbzzf1D6AK869g5l1d3wGo56m2IpIklLiiLPIB/lw55x54+n78VD22LKZR4/vylPHdCInrdIO5/xDvQwRSWJKHHESfVtq3/WruP+jJzht4RSm73cot7a/ju/qHPCH8447aC8lDRFJakocJSz6tpT5Ni6cmUmfcc9Tcds27julOy+0PJttFdJ2OC+9UgUeOu8o3ZoSkaSnxFGCDu7zPrkR25scsGY5/T8cTJsfZ/PFAUfRu911LKm57w7nHHfQXrx6eZtSjlRE5M9T4igB0avYpm3L4/+mvMvNn/+HrRUqclu7a3nzqNN3WC4EoHqVNCUNEUk5Shy7KLqXcdiK7xkwehB/+fk7xhzcmjtPv4pfqtX+w3kVDWbd264UIxURKRlKHH9SdMKonJvDNV8O5+pJw1lXdQ96nnMr7x12wh96GaAnwEUktSlx/AkNe7+/w+fmy75hwOhBHPLrj7zT9K/cd0p31kQtSgjBA33f9z+zlKIUEYkPJY5iiE4Y6Vt/4+bPXuH/skbxc7VaXNbpHj496OgCz92nWmUm39G2NMIUEYkrJY4YRSeNYxfPoP+Hg9l/3S+80vxvDDjpMjZW2a3AcxerlyEiZYgSRxGiE0b13zZy+7jn6TLrIxbtWZcLLurPVw2OKPBcjWWISFmkxFGI6KTR9rtJ3P/Rk9TetJanW5/PY8ddxJZKVQo8V70MESmrKsTz4mbWzszmm9kCM+tdwPEqZvZmeHyymTWMONYnLJ9vZmfEes2SEpk0am9aw5B3B/DsO/ezOr06HS/+F/1P/meBSWOfapWVNESkTItbj8PM0oAngLbAUmCKmY1y97kR1boBa9z9YDPrAgwAOptZE6AL0BSoC3xsZoeE5xR1zV22PWm403Hup9zz8VB2y8lm4AkX80zr88lNK/ivTQlDRMqDeN6qagUscPdFAGb2BtABiPwl3wHoG75/CxhiZhaWv+HuW4DvzWxBeD1iuGaJqJiXy9B37ueURVlMrXsYt7a/noW1G+y0vpKGiJQX8Uwc9YAlEZ+XAq13Vsfdc81sHVArLJ8UdW7+6n9FXRMAM+sB9ADYf//9ix18blpFFu1VjwmNWvByizP/sChhPiUMESlvyuzguLsPBYYCZGRkeBHVC3T/qZcXelxJQ0TKo3gmjmVA5L2d+mFZQXWWmllFoAbwaxHnFnXNuFPCEJHyLJ6zqqYAjc2skZlVJhjsHhVVZxRwafi+EzDW3T0s7xLOumoENAa+ivGau2xnieHfnZspaYhIuRe3Hkc4ZtETyATSgOfdfY6Z9QOy3H0UMAx4JRz8Xk2QCAjrDScY9M4FrnH3PICCrhmP+JUgREQKZsE/8Mu2jIwMz8rKSnQYIiIpxcymuntGdHlcHwAUEZGyR4lDRESKRYlDRESKRYlDRESKpVwMjpvZSuCHP3l6bWBVCYaTKGpH8ikrbVE7kktJtuMAd68TXVguEseuMLOsgmYVpBq1I/mUlbaoHcmlNNqhW1UiIlIsShwiIlIsShxFG5roAEqI2pF8ykpb1I7kEvd2aIxDRESKRT0OEREpFiUOEREplnKVOMysnZnNN7MFZta7gONVzOzN8PhkM2sYcaxPWD7fzM6I9ZrxEqe2LDaz2WY2w8xKZVXIP9sOM6tlZuPMbKOZDYk6p2XYjgVmNijcjjgV2/FpeM0Z4WvvJG5HWzObGv69TzWzUyLOSaXvo7B2lPr3sYttaRUR60wzOzfWaxbJ3cvFi2AZ9oXAgUBlYCbQJKrO1cDT4fsuwJvh+yZh/SpAo/A6abFcM1XaEh5bDNROke9kd+B44EpgSNQ5XwHHAAaMBtqnaDs+BTJS5PtoDtQN3x8BLEvR76OwdpTq91ECbdkNqBi+3w9YQbCVxi7/3ipPPY5WwAJ3X+TuW4E3gA5RdToAL4Xv3wJODf911AF4w923uPv3wILwerFcM1Xakgh/uh3uvsndPwd+i6xsZvsB1d19kgf/x7wMdIxrK+LQjgTZlXZMd/flYfkcID38l3CqfR8FtiPO8RZmV9qy2d1zw/KqQP5MqF3+vVWeEkc9YEnE56VhWYF1wr/wdUCtQs6N5ZrxEI+2QPAf1kdhF71HHOKOtivtKOyaS4u4ZkmLRzvyvRDearirFG7xlFQ7zgemufsWUvv7iGxHvtL8PnaIM1SstphZazObA8wGrgyP7/LvrfKUOKRox7t7C6A9cI2ZnZjogMq5ru5+JHBC+Lo4wfEUycyaAgOAKxIdy67YSTtS7vtw98nu3hQ4GuhjZlVL4rrlKXEsAxpEfK4flhVYx8wqAjWAXws5N5ZrxkM82oK75/+5AhhB/G9h7Uo7Crtm/SKuWdLi0Y7I72MD8BpJ/n2YWX2C/24ucfeFEfVT6vvYSTsS8X3sEGfoT/235e7zgI2E4zYxXLNwpTnQk8gXwaDQIoIB4fwBoaZRda5hx0Gm4eH7puw4oLyIYICpyGumUFt2B6qFdXYHJgLtkrUdEccvo+jB8b+lWjvCa9YO31ciuHd9ZbK2A6gZ1j+vgOumzPexs3Yk4vsogbY04vfB8QOA5QQr5+7y7624NjrZXsDfgG8JZhTcEZb1A84J31cF/kswYPwVcGDEuXeE580nYlZIQddMxbYQzLCYGb7mlFZbdrEdi4HVBP+SWko4MwTIAL4OrzmEcIWEVGoHQfKeCswKv4/HCWe/JWM7gDuBTcCMiNfeqfZ97Kwdifo+drEtF4exzgCmAR0Lu2ZxXlpyREREiqU8jXGIiEgJUOIQEZFiUeIQEZFiUeIQEZFiUeIQEZFiUeKQMsPM8iJWA52Rv+qnmT1nZk2SIL6NCf75GWY2qIg6Dc3s650cu8zM6sYnOkklFRMdgEgJynb3ZtGF7t49EcEkG3fPAnZlufzLCJ7HWF5EPSnj1OOQMi/cRyEjfN/NzL41s6/M7Nn8PTDMrI6ZvW1mU8LXcWF5XzN7PrzGIjO7Lizvb2bXRPyMvmZ2i5ntYWafmNm0cE+HP6w6amYnm9l7EZ+HmNll4fuWZjY+XGgyM1xdNvLcNDP73gI1w17WieGxCWbW2Mx2D2P+ysym58cQ+XPD9o4xszlhj+wHM6sd/pi08O9mjpl9ZGbpZtaJ4EG+V8PeXHrJfDuSipQ4pCxJj7pV1TnyYHib5S6C5S+OAw6LOPw48Ji7H02wKupzEccOA84gWJvoHjOrBLwJXBBR54Kw7DfgXA8Wi/wr8K9YV1ENrzsY6OTuLYHngQci67h7HsET/00I9vGYBpwQLv3dwN2/I1gZYKy7twpjGGhmu0f9uHvCOk0Jls/YP+JYY+CJ8Nha4Hx3f4ugt9LV3Zu5e3YsbZKySbeqpCwp8FZVhFbAeHdfDWBm/wUOCY+dBjSJ+B1f3cz2CN+/78HS2lvMbAWwj7tPN7O9w2RUB1jj7kvCX/4Phr2AbQTLVe8D/BxD/IcSLEI3JowjDfipgHqfAScSrDX0EHA5MB6YEh4/HTjHzG4JP1dlx8QAQdI5F8DdPzSzNRHHvnf3GeH7qUDDGGKXckSJQyRQATjG3aM3hgKI3I8hj9//v/kv0AnYl6C3AdCVIJG0dPccM1tM8Is7Ui479vbzjxswx93bFBHrBOAqoC5wN9ALOJkgoeRf53x3nx/Vln2KuG6+6PbqtpTsQLeqpDyZApxkZnuGy0+fH3HsI+Da/A9mVljPJd+bBKuRdiJIIhAsab0iTBp/JViVNNoPBL2bKmZWEzg1LJ8P1DGzNmEMlcJ9IaJ9BRwLbAsT3QyCfSMmhMczgWvzb5GZWfMCrvEF4a02Mzsd2DOG9m4AqsVQT8o4JQ4pS6LHOPpHHvRgP4UHCX7xfkGwKu268PB1QIaZzTKzuQR7gBfK3ecQ/CJd5u75t5ReDa8zG7gE+KaA85YAwwlmKA0HpoflWwmS0AAzm0mQEI4t4PwtBDu4TQqLPgvjmB1+vo9g6e9ZFuz+dl8B4d8LnB5Ovf07wa20DUU0+UXgaQ2Oi1bHlXLFzPZw941hj2ME8Ly7j0h0XKUtHEzPc/fcsIfzVBHjQyLbaYxDypu+ZnYawbjCR8DIBMeTKPsDw82sArCVYIBdJCbqcYiISLFojENERIpFiUNERIpFiUNERIpFiUNERIpFiUNERIrl/wF5Y9VTEeC7QgAAAABJRU5ErkJggg==\n", 573 | "text/plain": [ 574 | "
" 575 | ] 576 | }, 577 | "metadata": { 578 | "needs_background": "light" 579 | }, 580 | "output_type": "display_data" 581 | } 582 | ], 583 | "source": [ 584 | "import matplotlib.pyplot as plt\n", 585 | "%matplotlib inline\n", 586 | "\n", 587 | "plt.scatter(wt_eig,wt_mult)\n", 588 | "plt.plot([0,wt_mult.max()],[0,wt_mult.max()],'r-')\n", 589 | "plt.xlabel(\"Eigenvalue weight\")\n", 590 | "plt.ylabel(\"Mult weight\")\n", 591 | "plt.show()" 592 | ] 593 | }, 594 | { 595 | "cell_type": "markdown", 596 | "metadata": {}, 597 | "source": [ 598 | "These weights are automatically added as attributes to the nodes in `our_csn.graph`:" 599 | ] 600 | }, 601 | { 602 | "cell_type": "code", 603 | "execution_count": 23, 604 | "metadata": {}, 605 | "outputs": [ 606 | { 607 | "data": { 608 | "text/plain": [ 609 | "{'label': 0,\n", 610 | " 'count': 482,\n", 611 | " 'trim': 0.0,\n", 612 | " 'eig_weights': 0.002595528367725156,\n", 613 | " 'mult_weights': 0.0025955283677248217}" 614 | ] 615 | }, 616 | "execution_count": 23, 617 | "metadata": {}, 618 | "output_type": "execute_result" 619 | } 620 | ], 621 | "source": [ 622 | "our_csn.graph.node[0]" 623 | ] 624 | }, 625 | { 626 | "cell_type": "markdown", 627 | "metadata": {}, 628 | "source": [ 629 | "## 4) Committor probabilities to an arbitrary set of basins\n", 630 | "\n", 631 | "We are often doing simulations in the presence of one or more high probability \"basins\" of attraction. When there more than one basin, it can be useful to find the probability that a simulation started in a given state will visit (or \"commit to\") a given basin before the others.\n", 632 | "\n", 633 | "`CSNAnalysis` calculates committor probabilities by creating a sink matrix ($S$), where each column in the transition matrix that corresponds to a sink state is replaced by an identity vector. This turns each state into a \"black hole\" where probability can get in, but not out. \n", 634 | "\n", 635 | "By iteratively multiplying this matrix by itself, we can approximate $S^\\infty$. The elements of this matrix reveal the probability of transitioning to any of the sink states, upon starting in any non-sink state, $i$.\n", 636 | "\n", 637 | "Let's see this in action. We'll start by reading in a set of three basins: $A$, $B$ and $U$." 638 | ] 639 | }, 640 | { 641 | "cell_type": "code", 642 | "execution_count": 24, 643 | "metadata": {}, 644 | "outputs": [], 645 | "source": [ 646 | "Astates = [2031,596,1923,3223,2715]\n", 647 | "Bstates = [1550,3168,476,1616,2590]\n", 648 | "Ustates = list(np.loadtxt('state_U.dat',dtype=int))" 649 | ] 650 | }, 651 | { 652 | "cell_type": "markdown", 653 | "metadata": {}, 654 | "source": [ 655 | "We can then use the `calc_committors` function to calculate committors between this set of three basins. This will calculate $p_A$, $p_B$, and $p_U$ for each state, which sum to one." 656 | ] 657 | }, 658 | { 659 | "cell_type": "code", 660 | "execution_count": 25, 661 | "metadata": {}, 662 | "outputs": [], 663 | "source": [ 664 | "basins = [Astates,Bstates,Ustates]\n", 665 | "labels = ['pA','pB','pU']\n", 666 | "comms = our_csn.calc_committors(basins,labels=labels)" 667 | ] 668 | }, 669 | { 670 | "cell_type": "markdown", 671 | "metadata": {}, 672 | "source": [ 673 | "The committors can be interpreted as follows:" 674 | ] 675 | }, 676 | { 677 | "cell_type": "code", 678 | "execution_count": 26, 679 | "metadata": {}, 680 | "outputs": [ 681 | { 682 | "name": "stdout", 683 | "output_type": "stream", 684 | "text": [ 685 | "comms[0] = [0.26406217 0.29477873 0.44115911]\n", 686 | "\n", 687 | "In other words, if you start in state 0:\n", 688 | "You will reach basin A first with probability 0.26, basin B with probability 0.29 and basin U with probability 0.44\n" 689 | ] 690 | } 691 | ], 692 | "source": [ 693 | "i = our_csn.trim_indices[0]\n", 694 | "print('comms['+str(i)+'] = ',comms[i])\n", 695 | "print('\\nIn other words, if you start in state {0:d}:'.format(i))\n", 696 | "print('You will reach basin A first with probability {0:.2f}, basin B with probability {1:.2f} and basin U with probability {2:.2f}'.format(comms[i,0],comms[i,1],comms[i,2]))" 697 | ] 698 | }, 699 | { 700 | "cell_type": "markdown", 701 | "metadata": {}, 702 | "source": [ 703 | "## 5) Exporting graph for visualization in Gephi\n", 704 | "\n", 705 | "`NetworkX` is great for doing graph-based analyses, but not stellar at greating graph layouts for large(r) networks. However, they do have excellent built-in support for exporting graph objects in a variety of formats. \n", 706 | "\n", 707 | "Here we'll use the `.gexf` format to save our network, as well as all of the attributes we've calculated, to a file that can be read into [Gephi](https://gephi.org/), a powerful graph visualization program. While support for Gephi has been spotty in the recent past, it is still one of the best available options for graph visualization.\n", 708 | "\n", 709 | "Before exporting to `.gexf`, let's use the committors we've calculated to add colors to the nodes:" 710 | ] 711 | }, 712 | { 713 | "cell_type": "code", 714 | "execution_count": 27, 715 | "metadata": {}, 716 | "outputs": [], 717 | "source": [ 718 | "rgb = our_csn.colors_from_committors(comms)\n", 719 | "our_csn.set_colors(rgb)" 720 | ] 721 | }, 722 | { 723 | "cell_type": "markdown", 724 | "metadata": {}, 725 | "source": [ 726 | "Now we have added some properties to our nodes under 'viz', which will be interpreted by Gephi:" 727 | ] 728 | }, 729 | { 730 | "cell_type": "code", 731 | "execution_count": 28, 732 | "metadata": {}, 733 | "outputs": [ 734 | { 735 | "data": { 736 | "text/plain": [ 737 | "{'label': 0,\n", 738 | " 'count': 482,\n", 739 | " 'trim': 0.0,\n", 740 | " 'eig_weights': 0.002595528367725156,\n", 741 | " 'mult_weights': 0.0025955283677248217,\n", 742 | " 'pA': 0.26406216543613925,\n", 743 | " 'pB': 0.2947787254045238,\n", 744 | " 'pU': 0.4411591091593356,\n", 745 | " 'viz': {'color': {'r': 152, 'g': 170, 'b': 255, 'a': 0}}}" 746 | ] 747 | }, 748 | "execution_count": 28, 749 | "metadata": {}, 750 | "output_type": "execute_result" 751 | } 752 | ], 753 | "source": [ 754 | "our_csn.graph.node[0]" 755 | ] 756 | }, 757 | { 758 | "cell_type": "markdown", 759 | "metadata": {}, 760 | "source": [ 761 | "And we can use an internal `networkx` function to write all of this to a `.gexf` file:" 762 | ] 763 | }, 764 | { 765 | "cell_type": "code", 766 | "execution_count": 29, 767 | "metadata": {}, 768 | "outputs": [], 769 | "source": [ 770 | "nx.readwrite.gexf.write_gexf(our_csn.graph.to_undirected(),'test.gexf')" 771 | ] 772 | }, 773 | { 774 | "cell_type": "markdown", 775 | "metadata": {}, 776 | "source": [ 777 | "After opening this file in Gephi, I recommend creating a layout using the \"Force Atlas 2\" algorithm in the layout panel. I set the node sizes to the \"eig_weights\" variable, and after exporting to pdf and adding some labels, I get the following:" 778 | ] 779 | }, 780 | { 781 | "cell_type": "markdown", 782 | "metadata": {}, 783 | "source": [ 784 | "![Gephi graph export](committor_net_3state.png)" 785 | ] 786 | }, 787 | { 788 | "cell_type": "markdown", 789 | "metadata": {}, 790 | "source": [ 791 | "**That's the end of our tutorial!** I hope you enjoyed it and you find `CSNAnalysis` useful in your research. If you are having difficulties with the installation or running of the software, feel free to create an [issue on the Github page](https://github.com/ADicksonLab/CSNAnalysis)." 792 | ] 793 | }, 794 | { 795 | "cell_type": "code", 796 | "execution_count": null, 797 | "metadata": {}, 798 | "outputs": [], 799 | "source": [] 800 | } 801 | ], 802 | "metadata": { 803 | "kernelspec": { 804 | "display_name": "Python 3", 805 | "language": "python", 806 | "name": "python3" 807 | }, 808 | "language_info": { 809 | "codemirror_mode": { 810 | "name": "ipython", 811 | "version": 3 812 | }, 813 | "file_extension": ".py", 814 | "mimetype": "text/x-python", 815 | "name": "python", 816 | "nbconvert_exporter": "python", 817 | "pygments_lexer": "ipython3", 818 | "version": "3.7.7" 819 | } 820 | }, 821 | "nbformat": 4, 822 | "nbformat_minor": 1 823 | } 824 | -------------------------------------------------------------------------------- /examples/matrix.npz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ADicksonLab/CSNAnalysis/7700653374937c179a441c656f0783f80e44c26d/examples/matrix.npz -------------------------------------------------------------------------------- /examples/state_U.dat: -------------------------------------------------------------------------------- 1 | 365 2830 1155 1529 3242 2201 1854 3251 2303 3899 806 2952 2322 1154 189 2343 3080 1024 3385 968 2228 1298 2475 2493 615 3918 1394 2472 1734 1787 81 156 593 3668 1412 1965 3215 415 959 1201 3894 2893 1077 158 2651 3176 975 3999 73 1758 1861 3437 1595 329 863 3767 3859 1099 1103 165 1143 3256 1530 3128 1911 1093 1320 3502 1851 711 2156 1130 3335 218 1611 1624 2579 3904 3596 3046 3219 3775 65 2558 1706 180 2489 2887 3644 3930 462 2400 1378 2020 2589 1203 302 2731 1956 632 1435 712 1889 2749 1008 354 2549 1755 986 2784 442 2925 2091 2111 2163 2379 1812 185 1499 1300 140 12 2937 37 3598 1065 1645 2947 3018 1288 2622 1781 1352 2915 1586 3175 934 148 1780 209 3021 45 2846 1133 193 126 746 3225 3791 613 1598 1246 1166 1951 391 2088 1705 548 3858 1564 3280 3579 2413 555 215 1626 1795 2128 974 24 520 3650 401 2093 1351 2743 3054 1377 1756 2504 3069 2756 3951 3177 3141 120 2871 3924 1 3750 375 1170 2066 2458 3980 2710 3092 49 1711 2244 2392 2959 2901 2150 1072 2920 2039 2062 1441 3564 1896 1327 752 2196 1687 257 3680 286 1046 2770 1793 2647 2879 418 721 507 1197 2 | -------------------------------------------------------------------------------- /requirements.in: -------------------------------------------------------------------------------- 1 | --index-url https://pypi.python.org/simple/ 2 | 3 | numpy 4 | scipy >= 0.19 5 | networkx >= 2.1 6 | 7 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | 3 | setup( 4 | name='CSNAnalysis', 5 | version='0.1.0-beta', 6 | py_modules=['csnanalysis'], 7 | author='Alex Dickson', 8 | author_email='alexrd@msu.edu', 9 | packages=find_packages(), 10 | include_package_data=True, 11 | install_requires=[ 12 | 'numpy', 13 | 'networkx>=2.1', 14 | 'scipy>=0.19', 15 | ], 16 | ) 17 | --------------------------------------------------------------------------------