├── LICENSE ├── README.md ├── assets ├── change_in_pseudotime.png ├── divergence_map.png ├── phase_simulations.png └── velocity_confidence.png ├── requirements.txt ├── setup.py └── velodyn ├── __init__.py ├── velocity_ci.py ├── velocity_divergence.py ├── velocity_dpst.py └── velocity_dynsys.py /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | 3 | Version 2.0, January 2004 4 | 5 | http://www.apache.org/licenses/ 6 | 7 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 8 | 9 | 1. Definitions. 10 | 11 | "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. 16 | 17 | "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. 18 | 19 | "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. 20 | 21 | "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. 22 | 23 | "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). 24 | 25 | "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. 26 | 27 | "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." 28 | 29 | "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 30 | 31 | 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 32 | 33 | 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 34 | 35 | 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: 36 | 37 | You must give any other recipients of the Work or Derivative Works a copy of this License; and 38 | You must cause any modified files to carry prominent notices stating that You changed the files; and 39 | You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and 40 | If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. 41 | 42 | You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 43 | 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 44 | 45 | 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 46 | 47 | 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 48 | 49 | 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 50 | 51 | 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. 52 | 53 | END OF TERMS AND CONDITIONS -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # VeloDyn -- Quantitative analysis of RNA velocity 2 | 3 | RNA velocity infers a rate of change for each transcript in an RNA-sequencing experiment based on the ratio of intronic to exonic reads. 4 | This inferred velocity vectors serves as a prediction for the *future* transcriptional state of a cell, while the current read counts serve as a measurement of the instantaneous state. 5 | Qualitative analysis of RNA velocity has been used to establish the order of gene expression states in a sequence, but quantitative analysis has generally been lacking. 6 | 7 | `velodyn` adopts formalisms from dynamical systems to provide a quantitative framework for RNA velocity analysis. 8 | The tools provided by `velodyn` along with their associated usage are described below. 9 | All `velodyn` tools are designed to integrate with the `scanpy` ecosystem and `anndata` structures. 10 | 11 | We have released `velodyn` in association with a recent paper. 12 | Please cite our paper if you find `velodyn` useful for your work. 13 | 14 | 15 | [**Differentiation reveals latent features of aging and an energy barrier in murine myogenesis**](https://pubmed.ncbi.nlm.nih.gov/33910007/) 16 | Jacob C Kimmel, Nelda Yi, Margaret Roy, David G Hendrickson, David R Kelley 17 | *Cell Reports* 2021, 35 (4); doi: https://doi.org/10.1016/j.celrep.2021.109046 18 | 19 | **BibTeX** 20 | 21 | ``` 22 | @article{kimmel_latent_2021, 23 | title = {Differentiation reveals latent features of aging and an energy barrier in murine myogenesis}, 24 | volume = {35}, 25 | issn = {2211-1247}, 26 | url = {https://www.cell.com/cell-reports/abstract/S2211-1247(21)00362-4}, 27 | doi = {10.1016/j.celrep.2021.109046}, 28 | language = {English}, 29 | number = {4}, 30 | urldate = {2021-05-19}, 31 | journal = {Cell Reports}, 32 | author = {Kimmel, Jacob C. and Yi, Nelda and Roy, Margaret and Hendrickson, David G. and Kelley, David R.}, 33 | month = apr, 34 | year = {2021}, 35 | pmid = {33910007}, 36 | note = {Publisher: Elsevier}, 37 | keywords = {aging, dynamical systems, fibro/adipogenic progenitor, muscle stem cell, myogenesis, RNA-seq, single cell, stem cell} 38 | } 39 | ``` 40 | 41 | If you have any questions or comments, please feel free to email me. 42 | 43 | Jacob C. Kimmel, PhD 44 | [jacobkimmel+velodyn@gmail.com](mailto:jacobkimmel+velodyn@gmail.com) 45 | Calico Life Sciences, LLC 46 | 47 | 48 | ## Installation 49 | 50 | ```bash 51 | git clone https://github.com/calico/velodyn 52 | cd velodyn 53 | pip install . 54 | ``` 55 | 56 | or 57 | 58 | ```bash 59 | pip install velodyn 60 | ``` 61 | 62 | # Tutorial 63 | 64 | We have provided a `velodyn` tutorial using the Colab computing environment from Google. 65 | This notebook allows for execution of a `velodyn` workflow, end-to-end, all within your web browser. 66 | 67 | [velodyn tutorial](https://colab.research.google.com/drive/1JMjw_nJYHmOAEn7ZHL8q2MQbyxmphbni) 68 | 69 | ## Gene expression state stability measurements 70 | 71 | `velodyn` can provide a quantitative measure of gene expression state stability based on the divergence of the RNA velocity field. 72 | The divergence reflects the net flow of cells to a particular region of state space and is frequently used to characterize vector fields in physical systems. 73 | Divergence measures can reveal stable attractor states and unstable repulsor states in gene expression space. 74 | For example, we computed the divergence of gene expression states during myogenic differentiation and identified two attractor states, separated by a repulsor state. 75 | This repulsor state is unstable, suggesting it represents a decision point where cells decide to adopt one of the attractor states. 76 | 77 | ![Divergence maps of myogenic differentiation. Two attractor states along a one-dimensional manifold are separated by a repulsor state in the center.](assets/divergence_map.png) 78 | 79 | 80 | ### Usage 81 | 82 | ```python 83 | from velodyn.velocity_divergence import compute_div, plot_div 84 | 85 | D = compute_div( 86 | adata=adata, 87 | use_rep='pca', 88 | n_grid_points=30, 89 | ) 90 | print(D.shape) # (30, 30,) 91 | 92 | fig, ax = plot_div(D) 93 | ``` 94 | 95 | ## State transition rate comparisons with phase simulations 96 | 97 | Across experimental conditions, the rates of change in gene expression space may change significantly. 98 | However, it is difficult to determine where RNA velocity fields differ across conditions, and what impact any differences may have on the transit time between states. 99 | In dynamical systems, phase point analysis is used to quantify the integrated behavior of a vector field. 100 | For a review of phase point simulation methods, we highly recommend *Nonlinear Dynamics & Chaos* by Steven Strogatz. 101 | 102 | In brief, a phase point simulation instantiates a particle ("phase point") at some position in a vector field. 103 | The position of the particle is updated ("evolved") over a number of timesteps using numerical methods. 104 | 105 | For `velodyn`, we implement our update step using a stochastic weighted nearest neighbors model. 106 | We have a collection of observed cells and their associated velocity vectors as the source of our vector field. 107 | For each point at each timestep, we estimate the parameters of a Gaussian distribution of possible update steps based on the mean and variance of observed velocity vectors in neighboring cells. 108 | We then draw a sample from this distribution to update the position of the phase point. 109 | The stochastic nature of this evolution mirrors the stochastic nature of gene expression. 110 | 111 | By applying phase point simulations to RNA velocity fields, `velodyn` allows for comparisons of state transition rates across experimental conditions. 112 | For example, we used phase point simulations to analyze the rate of myogenic differentiation in young and aged muscle stem cells. 113 | These analyses revealed that aged cells progress more slowly toward the differentiated state than their young counterparts. 114 | 115 | ![Phase point simulations show the direction and rate of motion in an RNA velocity field.](assets/phase_simulations.png) 116 | 117 | ### Usage 118 | 119 | ```python 120 | from velodyn.velocity_dynsys import PhaseSimulation 121 | 122 | simulator = PhaseSimulation( 123 | adata=adata, 124 | ) 125 | # set the velocity basis to use 126 | simulator.set_velocity_field(basis='pca') 127 | # set starting locations for phase points 128 | # using a categorical variable in `adata.obs` 129 | simulator.set_starting_point( 130 | method='metadata', 131 | groupby='starting_points', 132 | group='forward', 133 | ) 134 | # run simulations using the stochastic kNN velocity estimator 135 | trajectories = simulator.simulate_phase_points( 136 | n_points=n_points_to_simulate, 137 | n_timesteps=n_timesteps_to_simulate, 138 | velocity_method='knn', 139 | velocity_method_attrs={'vknn_method': 'stochastic'}, 140 | step_scale=float(step_scale), 141 | multiprocess=True, # use multiple cores 142 | ) 143 | 144 | print(trajectories.shape) 145 | # [ 146 | # n_points_to_simulate, 147 | # n_timesteps, 148 | # n_embedding_dims, 149 | # (position, velocity_mean, velocity_std), 150 | # ] 151 | ``` 152 | 153 | ## Change in pseudotime predictions 154 | 155 | Dynamic cell state transitions are often parameterized by a pseudotime curve, as introduced by Cole Trapnell in `monocle`. 156 | Given RNA velocity vectors and pseudotime coordinates, `velodyn` can predict a "change in pseudotime" for each individual cell. 157 | The procedure for predicting a change in pseudotime is fairly simple. 158 | `velodyn` trains a machine learning model to predict pseudotime coordinates from gene expression embedding coordinates (e.g. coordinates in principal component space). 159 | The future position of each cell in this embedding is computed as the current position shifted by the RNA velocity vector and a new pseudotime coordinate is predicted using the trained model. 160 | The "change in pseudotime" is then returned as the difference between the pseudotime coordinate for the predicted future point and the pseudotime coordinate for the observed point. 161 | 162 | ![Change in pseudotime is predicted using a machine learning model for each cell.](assets/change_in_pseudotime.png) 163 | 164 | ### Usage 165 | 166 | ```python 167 | from velodyn.velocity_dpst import dPseudotime 168 | 169 | DPST = dPseudotime( 170 | adata=adata, 171 | use_rep='pca', 172 | pseudotime_var='dpt_pseudotime', 173 | ) 174 | change_in_pseudotime = DPST.predict_dpst() 175 | ``` 176 | 177 | ## Velocity confidence intervals 178 | 179 | RNA velocity estimates for each cell are incredibly useful, but there is no notion of variance inherent to the inference procedure. 180 | If we wish to make comparisons between cells that moving in different directions in gene expression space, we require confidence intervals on each cell's RNA velocity vector. 181 | `velodyn` introduces a molecular parameteric bootstrapping procedure to compute these confidence intervals. 182 | Briefly, we parameterize a multinomial distribution across genes using the mRNA profile for each cell. 183 | We then parameterize a second multinomial distribution for each gene in each cell based on the observed counts of spliced, unspliced, and ambiguous reads. 184 | We sample reads to the observed depth across genes, using the gene-level multinomial to distribute these reads as spliced, unspliced, or ambiguous observations and repeat this prodcued many times for each cell. 185 | We then compute RNA velocity vectors for each bootstrap sample and use these vectors to compute RNA velocity confidence intervals. 186 | 187 | ![RNA velocity confidence intervals for each cell.](assets/velocity_confidence.png) 188 | 189 | ### Usage 190 | 191 | ```python 192 | from velodyn.velocity_ci import VelocityCI 193 | 194 | # initialize velocity CI 195 | vci = VelocityCI( 196 | adata=adata, 197 | ) 198 | # sample velocity vectors 199 | # returns [n_iter, Cells, Genes] 200 | velocity_bootstraps = vci.bootstrap_velocity( 201 | n_iter=n_iter, 202 | save_counts=out_path, 203 | embed=adata_embed, # experiment with genes of interest and relevant embedding 204 | ) 205 | ``` 206 | -------------------------------------------------------------------------------- /assets/change_in_pseudotime.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calico/velodyn/b98a1d15a031feff48479dc4e2963c4f62ba07d6/assets/change_in_pseudotime.png -------------------------------------------------------------------------------- /assets/divergence_map.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calico/velodyn/b98a1d15a031feff48479dc4e2963c4f62ba07d6/assets/divergence_map.png -------------------------------------------------------------------------------- /assets/phase_simulations.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calico/velodyn/b98a1d15a031feff48479dc4e2963c4f62ba07d6/assets/phase_simulations.png -------------------------------------------------------------------------------- /assets/velocity_confidence.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calico/velodyn/b98a1d15a031feff48479dc4e2963c4f62ba07d6/assets/velocity_confidence.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | anndata>=0.6.22.post1 2 | h5py>=2.10.0 3 | loompy>=2.0.16 4 | matplotlib>=3.0.2 5 | numpy>=1.17.4 6 | pandas>=0.23.4 7 | scanpy>=1.4 8 | scikit-learn>=0.21.3 9 | scipy>=1.2.0 10 | scvelo>=0.1.16.dev41+74978dd 11 | seaborn>=0.9.0 12 | pathos>=0.2.5 -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import sys 2 | if sys.version_info < (3, 6,): 3 | sys.exit('velodyn requires Python >= 3.6') 4 | from pathlib import Path 5 | 6 | from setuptools import setup, find_packages 7 | 8 | try: 9 | from velodyn import __author__, __email__ 10 | except ImportError: # Deps not yet installed 11 | __author__ = __email__ = '' 12 | 13 | 14 | long_description = ''' 15 | RNA velocity infers a rate of change for each transcript in an RNA-sequencing experiment based on the ratio of intronic to exonic reads. This inferred velocity vectors serves as a prediction for the future transcriptional state of a cell, while the current read counts serve as a measurement of the instantaneous state. Qualitative analysis of RNA velocity has been used to establish the order of gene expression states in a sequence, but quantitative analysis has generally been lacking.\n 16 | \n 17 | velodyn adopts formalisms from dynamical systems to provide a quantitative framework for RNA velocity analysis. The tools provided by velodyn along with their associated usage are described below. All velodyn tools are designed to integrate with the scanpy ecosystem and anndata structures.\n 18 | \n 19 | We have released velodyn in association with a recent pre-print. Please cite our pre-print if you find velodyn useful for your work.\n 20 | \n 21 | Differentiation reveals the plasticity of age-related change in murine muscle progenitors\n 22 | Jacob C Kimmel, David G Hendrickson, David R Kelley\n 23 | bioRxiv 2020.03.05.979112; doi: https://doi.org/10.1101/2020.03.05.979112 24 | ''' 25 | 26 | setup( 27 | name='velodyn', 28 | version='0.1.0', 29 | description='Dynamical systems approaches for RNA velocity analysis', 30 | long_description=long_description, 31 | url='http://github.com/calico/velodyn', 32 | author=__author__, 33 | author_email=__email__, 34 | license='Apache', 35 | python_requires='>=3.6', 36 | install_requires=[ 37 | l.strip() for l in 38 | Path('requirements.txt').read_text('utf-8').splitlines() 39 | ], 40 | packages=find_packages(), 41 | classifiers=[ 42 | 'Intended Audience :: Science/Research', 43 | 'Topic :: Scientific/Engineering :: Bio-Informatics', 44 | ], 45 | ) 46 | -------------------------------------------------------------------------------- /velodyn/__init__.py: -------------------------------------------------------------------------------- 1 | __author__ = 'Jacob C. Kimmel' 2 | __email__ = 'jacobkimmel@gmail.com' 3 | __version__ = '0.1' 4 | 5 | # populate the namespace so top level imports work 6 | # e.g. 7 | # >> from velodyn.velocity_divergence import VelocityDivergence 8 | from . import velocity_ci, velocity_divergence, velocity_dpst, velocity_dynsys -------------------------------------------------------------------------------- /velodyn/velocity_ci.py: -------------------------------------------------------------------------------- 1 | """Generate confidence intervals for RNA velocity models by bootstrapping 2 | across reads. 3 | 4 | Our bootstrapping procedure is as follows: 5 | 6 | 1. Given a spliced count matrix ([Cells, Genes]) S and an unspliced matrix U, 7 | create a total counts matrix X = S + U. 8 | 2.1 For each cell X_i \in X, fit a multinomial distribution. Sample D (depth) reads 9 | from each multinomial to create a sampled count distribution across genes \hat X_i. 10 | 2.2 For each gene g in \hat X_i, fit a binomial distribution Binom(n=\hat X_ig, p=\frac{S_ig}{X_ig}) 11 | which represents the distribution of spliced vs. unspliced counts. 12 | 2.3 Sample an estimate of the spliced counts for X_ig, \hat S_ig ~ Binom(n=X_ig, p=S_ig/X_ig). 13 | Compute the conjugate unspliced read count \hat U_ig = \hat X_ig - \hat S_ig. 14 | 3. Given the complete bootstrapped samples \hat S, \hat U, estimate a bootstrapped 15 | velocity vector for consideration. 16 | 17 | Bootstrap samples of cell counts therefore have the same number of counts as the original 18 | cell, preventing any issues due to differing library depths: 19 | 20 | \sum_i \sum_j X_{ij} \equiv \sum_i \sum_j \hat X_{ij} 21 | 22 | """ 23 | import numpy as np 24 | import anndata 25 | import scvelo as scv 26 | import time 27 | import os.path as osp 28 | import argparse 29 | import multiprocessing 30 | 31 | 32 | class VelocityCI(object): 33 | """Compute confidence intervals for RNA velocity vectors 34 | 35 | Attributes 36 | ---------- 37 | adata : anndata.AnnData 38 | [Cells, Genes] experiment with spliced and unspliced read 39 | matrices in `.layers` as "spliced", "unspliced", "ambiguous". 40 | `.X` should contain raw count values, rather than transformed 41 | counts. 42 | S : np.ndarray 43 | [Cells, Genes] spliced read counts. 44 | U : np.ndarray 45 | [Cells, Genes] unspliced read counts. 46 | A : np.ndarray 47 | [Cells, Genes] ambiguous read counts. 48 | 49 | Methods 50 | ------- 51 | _sample_abundance_profile(x) 52 | sample a total read count vector from a multinomial fit 53 | to the observed count vector `x`. 54 | _sample_spliced_unspliced(s, u, a, x_hat) 55 | sample spliced, unspliced, and ambiguous read counts from 56 | a multinomial given a sample of total read counts `x_hat` 57 | and observed `s`pliced, `u`nspliced, `a`mbigious counts. 58 | _sample_matrices() 59 | samples a matrix of spliced, unspliced and ambiguous read 60 | counts for all cells and genes in `.adata`. 61 | _fit_velocity(SUA_hat,) 62 | fits a velocity model to sampled spliced, unspliced counts 63 | in an output from `_sample_matrices()` 64 | bootstrap_velocity(n_iter, embed) 65 | generate bootstrap samples of RNA velocity estimates using 66 | `_sample_matrices` and `_fit_velocity` sequentially. 67 | 68 | Notes 69 | ----- 70 | Parallelization requires use of shared ctypes to avoid copying our 71 | large data arrays for each child process. See `_sample_matrices` for 72 | a discussion of the relevant considerations and solutions. 73 | Due to this issue, we have modified `__getstate__` such that pickling 74 | this object will not preserve all of the relevant data. 75 | """ 76 | 77 | def __init__( 78 | self, 79 | adata: anndata.AnnData, 80 | ) -> None: 81 | """Compute confidence intervals for RNA velocity vectors 82 | 83 | Parameters 84 | ---------- 85 | adata : anndata.AnnData 86 | [Cells, Genes] experiment with spliced and unspliced read 87 | matrices in `.layers` as "spliced", "unspliced", "ambiguous". 88 | `.X` should contain raw count values, rather than transformed 89 | counts. 90 | 91 | Returns 92 | ------- 93 | None. 94 | """ 95 | # check that all necessary layers are present 96 | if 'spliced' not in adata.layers.keys(): 97 | msg = 'spliced matrix must be available in `adata.layers`.' 98 | raise ValueError(msg) 99 | if 'unspliced' not in adata.layers.keys(): 100 | msg = 'unspliced matrix must be available in `adata.layers`.' 101 | raise ValueError(msg) 102 | if 'ambiguous' not in adata.layers.keys(): 103 | msg = 'ambiguous matrix must be available in `adata.layers`.' 104 | raise ValueError(msg) 105 | 106 | # copy relevant layers in memory to avoid altering the original 107 | # input 108 | self.adata = adata 109 | self.S = adata.layers['spliced'].copy() 110 | self.U = adata.layers['unspliced'].copy() 111 | self.A = adata.layers['ambiguous'].copy() 112 | 113 | # convert arrays to dense format if they are sparse 114 | if type(self.S) != np.ndarray: 115 | try: 116 | self.S = self.S.toarray() 117 | except ValueError: 118 | msg = 'self.S was not np.ndarray, failed .toarray()' 119 | print(msg) 120 | 121 | if type(self.U) != np.ndarray: 122 | try: 123 | self.U = self.U.toarray() 124 | except ValueError: 125 | msg = 'self.U was not np.ndarray, failed .toarray()' 126 | print(msg) 127 | 128 | if type(self.A) != np.ndarray: 129 | try: 130 | self.A = self.A.toarray() 131 | except ValueError: 132 | msg = 'self.A was not np.ndarray, failed .toarray()' 133 | print(msg) 134 | 135 | # here, `X` is the total number of counts per feature regardless 136 | # of the region where the reads map 137 | self.X = self.S + self.U + self.A 138 | self.data_shape = self.X.shape 139 | assert type(self.X) == np.ndarray 140 | 141 | # set normalization scale for velocity fitting 142 | self.counts_per_cell_after = 1e4 143 | 144 | return 145 | 146 | def __getstate__(self,) -> dict: 147 | """ 148 | Override the default `__getstate__` behavior 149 | so we do not pickle huge arrays. 150 | 151 | Returns 152 | ------- 153 | d : dict 154 | object state dictionary, with large arrays removed 155 | to allow pickling and passage to child processes. 156 | 157 | Notes 158 | ----- 159 | When we perform multiprocessing, we pickly the `VelocityCI` 160 | class to pass to workers. Here, we remove all large memory 161 | objects from the `__getstate__` method which is used during 162 | the pickle process to gather all the relevant components of 163 | an object in memory. We provide access to a shared buffer 164 | with these objects to each worker to avoid copying them. 165 | """ 166 | d = dict(self.__dict__) 167 | for attr in ['X', 'S', 'U', 'A']: 168 | del d[attr] 169 | del d[attr+'_batch'] 170 | large_arr = ['adata', 'SUA_hat', 'embed', 'velocity_estimates'] 171 | for k in large_arr: 172 | if k in d.keys(): 173 | del d[k] 174 | return d 175 | 176 | def _sample_abundance_profile( 177 | self, 178 | x: np.ndarray, 179 | ) -> np.ndarray: 180 | """Given an observed mRNA abundance profile, fit a multinomial 181 | distribution and randomly sample a corresponding profile. 182 | 183 | Parameters 184 | ---------- 185 | x : np.ndarray 186 | [Genes,] observed mRNA counts vector. 187 | 188 | Returns 189 | ------- 190 | x_hat : np.ndarray 191 | [Genes,] a randomly sampled abundance profile, 192 | given the multinomial distribution specified by `x`. 193 | """ 194 | # we need to instantiate a local random state to ensure 195 | # each multiprocess thread generates true random numbers 196 | local_rnd = np.random.RandomState() 197 | # cast everything to `np.float64` before operations due to a 198 | # `numpy` bug 199 | # https://github.com/numpy/numpy/issues/8317 200 | x = x.astype(np.float64) 201 | # compute relative abundance profile as feature proportions 202 | pvals = x / np.sum(x) 203 | # sample a count distribution from the multinomial 204 | x_hat = local_rnd.multinomial( 205 | n=int(np.sum(x)), 206 | pvals=pvals, 207 | ) 208 | return x_hat 209 | 210 | def _sample_spliced_unspliced( 211 | self, 212 | s: np.ndarray, 213 | u: np.ndarray, 214 | a: np.ndarray, 215 | x_hat: np.ndarray, 216 | ) -> np.ndarray: 217 | """Sample the proportion of spliced/unspliced reads for a 218 | randomly sampled mRNA profile given observed spliced and 219 | unspliced read counts. 220 | 221 | Parameters 222 | ---------- 223 | s : np.ndarray 224 | [Genes,] observed spliced read counts for each gene. 225 | u : np.ndarray 226 | [Genes,] observed unspliced read counts for each gene. 227 | a : np.ndarray 228 | [Genes,] ambiguous read counts for each gene. 229 | x_hat : np.ndarray 230 | [Genes,] sampled total gene counts profile. 231 | 232 | Returns 233 | ------- 234 | sua_hat : np.ndarray 235 | [Genes, (Spliced, Unspliced, Ambiguous)] read counts 236 | randomly sampled from a multinomial. 237 | """ 238 | # we need to instantiate a local random state to ensure 239 | # each multiprocess thread generates true random numbers 240 | local_rnd = np.random.RandomState() 241 | # Genes, (Spliced, Unspliced, Ambiguous) 242 | sua_hat = np.zeros((len(x_hat), 3)) 243 | # compute total reads per feature 244 | x = s + u + a 245 | x = x.astype(np.float64) 246 | 247 | # for each gene, sample the proportion of counts that originate 248 | # from spliced, unspliced, or ambiguous regions using a multinomial 249 | # distribution parameterized with the observed proportions 250 | for g in range(len(x_hat)): 251 | 252 | if x[g] == 0: 253 | sua_hat[g, :] = 0 254 | continue 255 | 256 | pvals = np.array([s[g], u[g], a[g]], dtype=np.float64) / x[g] 257 | sua_hat[g, :] = local_rnd.multinomial( 258 | n=x_hat[g], 259 | pvals=pvals, 260 | ) 261 | 262 | return sua_hat 263 | 264 | def _sample_cell(self, 265 | i: int, 266 | ) -> np.ndarray: 267 | """Draw samples for a single cell. 268 | 269 | Parameters 270 | ---------- 271 | i : int 272 | cell index in `.X, .S, .U, .A` matrices. 273 | 274 | Returns 275 | ------- 276 | sua_hat : np.ndarray 277 | [Genes, (Spliced, Unspliced, Ambig.)] for a single 278 | cell at index `i` in `.X`, ... 279 | 280 | Notes 281 | ----- 282 | This implementation allows for simple parallelization with 283 | a map across the cell indices. 284 | """ 285 | # gather the count arrays from a shared `RawArray` 286 | # buffer and reshape them from flat [N*M,] to array 287 | # [N, M] format 288 | X = np.frombuffer( 289 | var_args['X_batch'], 290 | dtype=np.float64, 291 | ).reshape(var_args['data_shape_batch']) 292 | S = np.frombuffer( 293 | var_args['S_batch'], 294 | ).reshape(var_args['data_shape_batch']) 295 | U = np.frombuffer( 296 | var_args['U_batch'], 297 | dtype=np.float64, 298 | ).reshape(var_args['data_shape_batch']) 299 | A = np.frombuffer( 300 | var_args['A_batch'], 301 | dtype=np.float64, 302 | ).reshape(var_args['data_shape_batch']) 303 | 304 | # get the read counts of each type for 305 | # a single cell 306 | 307 | x = X[i, :] # total read counts 308 | s = S[i, :] # spliced read counts 309 | u = U[i, :] # unspliced read counts 310 | a = A[i, :] # ambiguous read counts 311 | 312 | # sample the relative abudance across genes 313 | x_hat = self._sample_abundance_profile( 314 | x=x, 315 | ) 316 | # for each gene, sample the proportion of reads 317 | # originating from each type of region 318 | sua_hat = self._sample_spliced_unspliced( 319 | s=s, 320 | u=u, 321 | a=a, 322 | x_hat=x_hat, 323 | ) 324 | return sua_hat 325 | 326 | def _sample_matrices( 327 | self, 328 | batch_size: int = 256, 329 | ) -> np.ndarray: 330 | """Sample a spliced and unspliced counts matrix 331 | for a bootstrapped velocity vector estimation. 332 | 333 | Parameters 334 | ---------- 335 | batch_size : int 336 | number of cells to sample in parallel. 337 | smaller batches use less RAM. 338 | 339 | Returns 340 | ------- 341 | SUA_hat : np.ndarray 342 | [Cells, Genes, (Spliced, Unspliced, Ambiguous)] 343 | randomly sampled array of read counts assigned 344 | to a splicing status. 345 | 346 | Notes 347 | ----- 348 | `_sample_matrices` uses `multiprocessing` to parallelize 349 | bootstrap simulations. We run into a somewhat tricky issue 350 | do to the size of our source data arrays (`.X, .S, .U, .A`). 351 | The usual approach to launching multiple processes is to use 352 | a `multiprocessing.Pool` to launch child processes, then copy 353 | the relevant data to each process by passing it as arguments 354 | or through pickling of object attributes. 355 | 356 | Here, the size of our arrays means that copying the large matrices 357 | to memory for each child process is (1) memory prohibitive and 358 | (2) really, really slow, defeating the whole purpose of parallelization. 359 | 360 | Here, we've implemented a batch processing solution to preserve RAM. 361 | We also use shared ctype arrays to avoid copying memory across workers. 362 | Use of ctype arrays increases the performance by ~5-fold. From this, we 363 | infer that copying even just the minibatch count arrays across all the 364 | child processes is prohibitively expensive. 365 | 366 | We can create shared ctype arrays using `multiprocessing.sharedctypes` 367 | that allow child processes to reference a single copy of each 368 | relevant array in memory. 369 | Because these data are read-only, we can get away with using 370 | `multiprocessing.RawArray` since we don't need process synchronization 371 | locks or any other sophisticated synchronization. 372 | 373 | Using `RawArray` with child processes in a pool is a little strange. 374 | We can't pass the `RawArray` pointer through a pickle, so we have to 375 | declare the pointers as global variables that get inherited by each 376 | child process through use of an `initializer` function in the pool. 377 | We also have to ensure that our parent object `__getstate__` function 378 | doesn't contain any of these large arrays, so that they aren't 379 | accidently pickled in with the class methods. To fix that, we modify 380 | `__getstate__` above to remove large attributes from the object dict. 381 | """ 382 | # [Cells, Genes, (Spliced, Unspliced, Ambiguous)] 383 | SUA_hat = np.zeros( 384 | self.X.shape + (3,) 385 | ) 386 | # compute the total number of batches to use 387 | n_batches = int(np.ceil(self.X.shape[0]/batch_size)) 388 | 389 | batch_idx = 0 390 | for batch in range(n_batches): 391 | end_idx = min(batch_idx+batch_size, self.X.shape[0]) 392 | 393 | # set batch specific count arrays as attributes 394 | for attr in ['X', 'S', 'U', 'A']: 395 | attr_all = getattr(self, attr) 396 | attr_batch = attr_all[batch_idx:end_idx, :] 397 | setattr(self, attr+'_batch', attr_batch) 398 | 399 | # generate shared arrays for child processes 400 | shared_arrays = {'data_shape_batch': self.X_batch.shape} 401 | for attr in ['X_batch', 'S_batch', 'U_batch', 'A_batch']: 402 | data = getattr(self, attr) 403 | # create the shared array 404 | # RawArray will only take a flat, 1D array 405 | # so we create it with as many elements as 406 | # our desired data 407 | shared = multiprocessing.RawArray( 408 | 'd', # doubles 409 | int(np.prod(data.shape)), 410 | ) 411 | # load our new shared array into a numpy frame 412 | # and copy data into it after reshaping 413 | shared_np = np.frombuffer( 414 | shared, 415 | dtype=np.float64, 416 | ) 417 | shared_np = shared_np.reshape(data.shape) 418 | # copy data into the new shared buffer 419 | # this is reflected in `shared`, even though we're 420 | # copying to the numpy frame here 421 | np.copyto(shared_np, data) 422 | 423 | shared_arrays[attr] = shared 424 | 425 | # create a global dictionary to hold arguments 426 | # we pass to each worker using an initializer. 427 | # this is necessary because we can't pass `RawArray` 428 | # in a pickled object (e.g. as an attribute of `self`) 429 | global var_args 430 | var_args = {} 431 | 432 | # this method is called after each work is initialized 433 | # and sets all of the shared arrays as part of the global 434 | # variable `var_args` 435 | def init_worker(shared_arrays): 436 | for k in shared_arrays: 437 | var_args[k] = shared_arrays[k] 438 | 439 | start = time.time() 440 | print(f'Drawing bootstrapped samples, batch {batch:04}...') 441 | with multiprocessing.Pool( 442 | initializer=init_worker, 443 | initargs=(shared_arrays,)) as P: 444 | results = P.map( 445 | self._sample_cell, 446 | range(self.X_batch.shape[0]), 447 | ) 448 | 449 | # [Cells, Genes, (Spliced, Unspliced, Ambiguous)] 450 | batch_SUA_hat = np.stack(results, 0) 451 | SUA_hat[batch_idx:end_idx, :, :] = batch_SUA_hat 452 | batch_idx += batch_size 453 | 454 | end = time.time() 455 | print('Duration: ', end-start) 456 | 457 | return SUA_hat 458 | 459 | def _fit_velocity( 460 | self, 461 | SUA_hat: np.ndarray, 462 | velocity_mode: str='deterministic', 463 | ) -> np.ndarray: 464 | """Fit a deterministic RNA velocity model to the 465 | bootstrapped count matrices. 466 | 467 | Parameters 468 | ---------- 469 | SUA_hat : np.ndarray 470 | [Cells, Genes, (Spliced, Unspliced, Ambiguous)] 471 | randomly sampled array of read counts assigned 472 | to a splicing status. 473 | velocity_mode : str 474 | mode argument for `scvelo.tl.velocity`. 475 | one of ("deterministic", "stochastic", "dynamical"). 476 | 477 | Returns 478 | ------- 479 | velocity : np.ndarray 480 | [Cells, Genes] RNA velocity estimates. 481 | """ 482 | dtype = np.float64 483 | # create an AnnData object from a bootstrap sample 484 | # of counts 485 | boot = anndata.AnnData( 486 | X=SUA_hat[:, :, 0].astype(dtype).copy(), 487 | obs=self.adata.obs.copy(), 488 | var=self.adata.var.copy(), 489 | ) 490 | for i, k in enumerate(['spliced', 'unspliced', 'ambiguous']): 491 | boot.layers[k] = SUA_hat[:, :, i].astype(dtype) 492 | 493 | if self.velocity_prefilter_genes is not None: 494 | # filter genes to match a pre-existing velocity computation 495 | # this is useful for e.g. embedding in a common PC space 496 | # with the observed velocity 497 | boot = boot[:, self.velocity_prefilter_genes].copy() 498 | 499 | # normalize 500 | scv.pp.normalize_per_cell( 501 | boot, 502 | counts_per_cell_after=self.counts_per_cell_after, 503 | ) 504 | 505 | # filter genes as in the embedding 506 | if hasattr(self, 'embed'): 507 | # if an embedded AnnData is provided 508 | # subset to genes used for the original embedding 509 | cell_bidx = np.array([ 510 | x in self.embed.obs_names for x in boot.obs_names 511 | ]) 512 | 513 | boot = boot[:, self.embed.var_names].copy() 514 | boot = boot[cell_bidx, :].copy() 515 | print( 516 | 'Subset bootstrap samples to embedding dims: ', 517 | boot.shape, 518 | ) 519 | else: 520 | msg = 'must providing an embedding object containing\n' 521 | msg += 'cells and genes to use for velocity estimation.' 522 | raise ValueError(msg) 523 | 524 | # log1p only the `.X` layer, leaving `.layers` untouched. 525 | scv.pp.log1p(boot) 526 | 527 | # fit the velocity model deterministically, following the original 528 | # RNA velocity publication 529 | scv.pp.pca(boot, use_highly_variable=False) 530 | scv.pp.moments(boot, n_pcs=30, n_neighbors=100) 531 | scv.tl.velocity(boot, mode=velocity_mode) 532 | 533 | return boot.layers['velocity'] 534 | 535 | def bootstrap_velocity( 536 | self, 537 | n_iter: int = 100, 538 | embed: anndata.AnnData = None, 539 | velocity_prefilter_genes: list = None, 540 | verbose: bool = False, 541 | save_counts: str = None, 542 | **kwargs, 543 | ) -> np.ndarray: 544 | """ 545 | Generated bootstrap estimates of the RNA velocity for 546 | each cell and gene. 547 | 548 | Parameters 549 | ---------- 550 | n_iter : int 551 | number of bootstrap iterations to perform. 552 | embed : anndata.AnnData, optional 553 | [Cells, Genes] experiment describing the genes of interest 554 | and containing a relevant embedding for projection of 555 | velocity vectors. 556 | velocity_prefilter_genes : list 557 | genes selected by `scv.pp.filter_genes` in the embedding object 558 | before normalization. often selected with `min_shared_counts`. 559 | it is important to carry over this prefiltering step to ensure 560 | that normalization is comparable to the original embedding. 561 | verbose : bool 562 | use verbose stdout printing. 563 | save_counts : str, optional 564 | save sampled count matrices to the specified path as 565 | `sampled_counts_{_iter:04}.npy` with shape 566 | [Sample, Cells, Genes, (Spliced, Unspliced, Ambig.)]. 567 | **kwargs passed to `_sample_matrices()`. 568 | 569 | Returns 570 | ------- 571 | velocity : np.ndarray 572 | [Sample, Cells, Genes] bootstrap estimates of RNA 573 | velocity for each cell and gene. 574 | """ 575 | # use genes in an embedding object if provided, otherwise 576 | # get the n_top_genes most variable genes 577 | if embed is not None: 578 | self.embed = embed 579 | embed_genes = self.embed.shape[1] 580 | else: 581 | embed_genes = self.n_top_genes 582 | 583 | if velocity_prefilter_genes is not None: 584 | self.velocity_prefilter_genes = velocity_prefilter_genes 585 | else: 586 | self.velocity_prefilter_genes = None 587 | 588 | # store velocity estimates for each gene 589 | # [Iterations, Cells, Genes] 590 | velocity = np.zeros((n_iter, self.embed.shape[0], embed_genes)) 591 | 592 | for _iter in range(n_iter): 593 | if verbose: 594 | print('Beginning sampling for iteration %03d' % _iter) 595 | 596 | # sample a counts matrix 597 | SUA_hat = self._sample_matrices(**kwargs) 598 | 599 | if save_counts is not None: 600 | # save the raw counts sample to disk 601 | np.save( 602 | osp.join( 603 | save_counts, 604 | f'sampled_counts_{_iter:04}.npy', 605 | ), 606 | SUA_hat, 607 | ) 608 | 609 | if verbose: 610 | print('Sampling complete.') 611 | print('Fitting velocity model...') 612 | # fit a velocity model to the sampled counts matrix 613 | # yielding an estimate of velocity for each gene 614 | iter_velo = self._fit_velocity( 615 | SUA_hat=SUA_hat, 616 | ) 617 | velocity[_iter, :, :] = iter_velo 618 | if verbose: 619 | print('Velocity fit, iteration %03d complete.' % _iter) 620 | 621 | self.velocity_estimates = velocity 622 | return velocity 623 | 624 | def bootstrap_vectors( 625 | self, 626 | embed: anndata.AnnData = None, 627 | ) -> np.ndarray: 628 | """ 629 | Generate embedded velocity vectors for each bootstrapped sample 630 | of spliced/unspliced counts. 631 | 632 | Returns 633 | ------- 634 | velocity_embeddings : np.ndarray 635 | [n_iter, Cells, EmbeddingDims] RNA velocity vectors 636 | for each bootstrap sampled set of counts in the 637 | provided PCA embedding space. 638 | """ 639 | if embed is not None: 640 | self.embed = embed 641 | 642 | if not hasattr(self, 'embed'): 643 | msg = 'must provide an `embed` argument.' 644 | raise AttributeError(msg) 645 | 646 | # copy the embedding object to use for low-rank embedding 647 | project = self.embed.copy() 648 | # remove any extant `velocity_settings` to use defaults. 649 | # in the current `scvelo`, using non-default settings will throw a silly 650 | # error in `scv.tl.velocity_embedding`. 651 | if 'velocity_settings' in project.uns.keys(): 652 | project.uns.pop('velocity_settings') 653 | 654 | # for each velocity profile estimate, compute the corresponding 655 | # PCA embedding of those vectors using "direct_projection", 656 | # aka as standard matrix multiplication. 657 | # 658 | # the `scvelo` nearest neighbor projection method introduces 659 | # several assumptions that we do not wish to inherit here. 660 | velocity_embeddings = [] 661 | for _iter in range(self.velocity_estimates.shape[0]): 662 | V = self.velocity_estimates[_iter, :, :] 663 | project.layers['velocity'] = V 664 | 665 | scv.tl.velocity_embedding( 666 | project, 667 | basis='pca', 668 | direct_pca_projection=True, 669 | autoscale=False, # do not adjust vectors for aesthetics 670 | ) 671 | velocity_embeddings.append( 672 | project.obsm['velocity_pca'], 673 | ) 674 | velocity_embeddings = np.stack( 675 | velocity_embeddings, 676 | axis=0, 677 | ) 678 | self.velocity_embeddings = velocity_embeddings 679 | return velocity_embeddings 680 | 681 | def compute_ci(self,) -> np.ndarray: 682 | """ 683 | Compute confidence intervals for the velocity vector 684 | on each cell from bootstrap samples of embedded velocity vectors. 685 | 686 | Returns 687 | ------- 688 | velocity_intervals : np.ndarray 689 | [Cells, EmbeddingDims, (Mean, Std, LowerCI, UpperCI)] 690 | estimates of the mean and confidence interval around the 691 | RNA velocity vector computed for each cell. 692 | """ 693 | if not hasattr(self, 'velocity_embeddings'): 694 | msg = 'must run `bootstrap_vectors` first to generate vector samples.' 695 | raise AttributeError(msg) 696 | 697 | # [Cells, Dims, (Mean, SD, Lower, Upper)] 698 | self.velocity_intervals = np.zeros( 699 | self.velocity_embeddings.shape[1:] + (4,) 700 | ) 701 | # for each cell, compute the mean, std, and CI for 702 | # each dimension in the embedding 703 | # this provides a hypersphere of confidence for cell state transitions 704 | # in the embedding space 705 | for j in range(self.velocity_embeddings.shape[1]): 706 | cell = self.velocity_embeddings[:, j, :] # Iter, Dims 707 | mean = np.mean(cell, axis=0) # Dims 708 | std = np.std(cell, axis=0) # Dims 709 | # compute the 95% CI assuming normality 710 | l_ci = mean - 1.96*std 711 | u_ci = mean + 1.96*std 712 | self.velocity_intervals[j, :, 0] = mean 713 | self.velocity_intervals[j, :, 1] = std 714 | self.velocity_intervals[j, :, 2] = l_ci 715 | self.velocity_intervals[j, :, 3] = u_ci 716 | 717 | return self.velocity_intervals 718 | 719 | 720 | ################################################## 721 | # main 722 | ################################################## 723 | 724 | 725 | def add_parser_arguments(parser): 726 | """Add arguments to an `argparse.ArgumentParser`.""" 727 | parser.add_argument( 728 | '--data', 729 | type=str, 730 | help='path to AnnData object with "spliced", "unspliced", "ambiguous" in `.layers`', 731 | ) 732 | parser.add_argument( 733 | '--out_path', 734 | type=str, 735 | help='output path for velocity bootstrap samples.' 736 | ) 737 | parser.add_argument( 738 | '--n_iter', 739 | type=int, 740 | default=100, 741 | help='number of bootstrap iterations to perform.' 742 | ) 743 | return parser 744 | 745 | 746 | def make_parser(): 747 | """Generate an `argparse.ArgumentParser`.""" 748 | parser = argparse.ArgumentParser( 749 | description='Compute confidence intervals for RNA velocity by molecular bootstrapping' 750 | ) 751 | parser = add_parser_arguments(parser) 752 | return parser 753 | 754 | 755 | def main(): 756 | parser = make_parser() 757 | args = parser.parse_args() 758 | 759 | # load anndata 760 | print('Loading data...') 761 | adata = anndata.read_h5ad(args.data) 762 | print(f'{adata.shape[0]} cells and {adata.shape[1]} genes loaded.') 763 | 764 | # check for layers 765 | for k in ['spliced', 'unspliced', 'ambiguous']: 766 | if k not in adata.layers.keys(): 767 | msg = f'{k} not found in `adata.layers`' 768 | raise ValueError(msg) 769 | 770 | # intialize velocity bootstrap object 771 | print('\nBootstrap sampling velocity...\n') 772 | vci = VelocityCI( 773 | adata=adata, 774 | ) 775 | 776 | # sample velocity vectors 777 | velocity_bootstraps = vci.bootstrap_velocity( 778 | n_iter=args.n_iter, 779 | save_counts=args.out_path, 780 | ) 781 | 782 | # save bootstrap samples to disk 783 | np.save( 784 | osp.join(args.out_path, 'velocity_bootstrap_samples.npy'), 785 | velocity_bootstraps, 786 | ) 787 | print('Done.') 788 | return 789 | 790 | 791 | if __name__ == '__main__': 792 | main() 793 | -------------------------------------------------------------------------------- /velodyn/velocity_divergence.py: -------------------------------------------------------------------------------- 1 | """Compute divergence maps from RNA velocity fields""" 2 | import numpy as np 3 | import anndata 4 | 5 | from sklearn.neighbors import NearestNeighbors 6 | from scipy.stats import norm as normal 7 | 8 | import matplotlib 9 | import matplotlib.pyplot as plt 10 | import seaborn as sns 11 | 12 | 13 | # modified from 14 | # https://github.com/theislab/scvelo/blob/master/scvelo/plotting/velocity_embedding_grid.py 15 | def compute_velocity_on_grid( 16 | X_emb: np.ndarray, 17 | V_emb: np.ndarray, 18 | density: float = None, 19 | smooth: float = None, 20 | n_neighbors: int = None, 21 | min_mass: float = None, 22 | n_grid_points: int = 50, 23 | adjust_for_stream: bool = False, 24 | grid_min_max: tuple = None, 25 | ) -> (np.ndarray, np.ndarray): 26 | """ 27 | Compute a grid of velocity vectors in gene expression space 28 | where each vector in the grid is a Gaussian weighted average of 29 | neighboring observed cell vectors. 30 | 31 | Parameters 32 | ---------- 33 | X_emb : np.ndarray 34 | [Cells, (embedding0, embedding1)] cell coordinates in the 35 | embedding. 36 | V_emb : np.ndarray 37 | [Cells, (embedding0, embedding1)] cell velocities in the 38 | embedding. 39 | density : float 40 | [0, 1.] proportion of n_grid_points to use. 41 | smooth : float 42 | smoothing parameter for the Gaussian kernel. 43 | n_neighbors : int 44 | number of neighbors to consider. 45 | min_mass : float 46 | minimum probability mass to return a value for a grid cell. 47 | n_grid_points : int 48 | number of grid points along each dimension. 49 | adjust_for_stream : bool 50 | adjust grid velocities to be compatible with stream plots. 51 | grid_min_max : tuple 52 | ((min, max), (min, max)) values for coarse-graining grid 53 | coordinates. set manually to ensure coarse-grained coordinates 54 | are consistent across samples passed to `X_emb`. 55 | 56 | Returns 57 | ------- 58 | X_grid : np.ndarray 59 | [n_grid_points, n_grid_points] locations of each vector 60 | in embedding space. 61 | V_grid : np.ndarray 62 | [n_grid_points, n_grid_points] RNA velocity vectors in 63 | the local neighborhood at a series of grid points. 64 | """ 65 | # remove invalid cells 66 | idx_valid = np.isfinite(X_emb.sum(1) + V_emb.sum(1)) 67 | X_emb = X_emb[idx_valid] 68 | V_emb = V_emb[idx_valid] 69 | 70 | # prepare grid 71 | n_obs, n_dim = X_emb.shape 72 | density = 1 if density is None else density 73 | smooth = .5 if smooth is None else smooth 74 | 75 | # Generates a linearly spaced grid from the minimum to maximum 76 | # embedding coordinate along each dimension 77 | # the number of grid locations is specified with `n_grid_points` 78 | grs = [] 79 | for dim_i in range(n_dim): 80 | if grid_min_max is None: 81 | m, M = np.min(X_emb[:, dim_i]), np.max(X_emb[:, dim_i]) 82 | m = m - .01 * np.abs(M - m) 83 | M = M + .01 * np.abs(M - m) 84 | else: 85 | m, M = grid_min_max[dim_i] 86 | pass 87 | gr = np.linspace(m, M, n_grid_points * density) 88 | grs.append(gr) 89 | 90 | meshes_tuple = np.meshgrid(*grs) 91 | X_grid = np.vstack([i.flat for i in meshes_tuple]).T 92 | 93 | # estimate grid velocities 94 | # find nearest neighbors to each grid point using `n_neighbors` 95 | # determine their relative distances 96 | if n_neighbors is None: 97 | n_neighbors = int(n_obs/50) 98 | nn = NearestNeighbors(n_neighbors=n_neighbors, n_jobs=-1) 99 | nn.fit(X_emb) 100 | # [GridPoints, Dims] floats, array indices for nearest neighbors 101 | dists, neighs = nn.kneighbors(X_grid) 102 | 103 | # weight the contribution of each point with a Gaussian kernel 104 | # centered on the point of interest 105 | 106 | # here, `smooth` is a scaling factor that determines the sigma 107 | # of the Gaussian, which is the product of the total range of a dimension 108 | # and the scaling parameter 109 | # defaults to a sigma == 0.5*DimensionRange 110 | scale = np.mean([(g[1] - g[0]) for g in grs]) * smooth 111 | 112 | # here, we evaluate a weight for each point as the PDF of a Gaussian with 113 | # the specified scale centered at the point, since we feed in distances 114 | # rather than coordinates 115 | weight = normal.pdf(x=dists, scale=scale) # weights is [GridPoints, Dims] 116 | 117 | # p_mass stores how much probability mass is near a point 118 | # if all neighbors are very far away, this will be small 119 | p_mass = weight.sum(1) # p_mass is [GridPoints,] 120 | 121 | V_grid = (V_emb[neighs] * weight[:, :, None]).sum(1) / \ 122 | np.maximum(1, p_mass)[:, None] 123 | 124 | if adjust_for_stream: 125 | X_grid = np.stack([np.unique(X_grid[:, 0]), np.unique(X_grid[:, 1])]) 126 | ns = int(np.sqrt(len(V_grid[:, 0]))) 127 | V_grid = V_grid.T.reshape(2, ns, ns) 128 | 129 | mass = np.sqrt((V_grid ** 2).sum(0)) 130 | V_grid[0][mass.reshape(V_grid[0].shape) < 1e-5] = np.nan 131 | else: 132 | if min_mass is None: 133 | min_mass = np.clip(np.percentile(p_mass, 95) / 100, 1e-2, 1) 134 | # zero out vectors with little support 135 | V_grid[p_mass < min_mass] = 0. 136 | 137 | return X_grid, V_grid 138 | 139 | 140 | def divergence(f): 141 | """ 142 | Computes the divergence of the vector field. 143 | 144 | Parameters 145 | ---------- 146 | f : list of ndarrays 147 | [D,] each array contains values for one dimension of 148 | the vector field. 149 | 150 | Returns 151 | ------- 152 | D : np.ndarray 153 | divergence values in the same shape a items in `f` 154 | 155 | Notes 156 | ----- 157 | The divergence of a vector field :math:`V(x, y)` is given by the sum of 158 | partial derivatives of the d-component with respect to d, where d is either 159 | x or y. 160 | 161 | .. math:: 162 | 163 | \nabla V = \sum_{d \in \{x, y\}} \partial V_d(x, y) / \partial d 164 | 165 | \nabla V = \partial V_x(x, y)/\partial x + \partial V_y(x, y)/\partial y 166 | """ 167 | num_dims = len(f) 168 | # for each dimension of the vector field `i`, compute the gradient with 169 | # respect to that dimension and add the results 170 | D = np.ufunc.reduce( 171 | np.add, 172 | [np.gradient(f[num_dims - i - 1], axis=i) for i in range(num_dims)] 173 | ) 174 | return D 175 | 176 | 177 | def compute_div( 178 | adata: anndata.AnnData, 179 | use_rep: str = 'pca', 180 | n_grid_points: int = 30, 181 | return_grid: bool=False, 182 | **kwargs, 183 | ) -> np.ndarray: 184 | """ 185 | Compute divergence in gene expression space for a single 186 | cell experiment. 187 | 188 | Parameters 189 | ---------- 190 | adata : anndata.AnnData 191 | [Cells, Genes] single cell experiment containing velocity 192 | vectors for each cell. 193 | use_rep : str 194 | representation to use for divergence field calculation. 195 | `adata.obsm[f'X_{use_rep}']` and `adata.obsm[f'velocity_{use_rep}']` 196 | must be present. 197 | n_grid_points : int 198 | number of grid points along each dimension. 199 | **kwargs passed to `compute_velocity_on_grid`. 200 | 201 | Returns 202 | ------- 203 | D : np.ndarray 204 | [n_grid_points, n_grid_points] divergence values. 205 | X_grid : np.ndarray, optional 206 | [n_grid_points, EmbedDims] grid locations in the embedding. 207 | returned if `return_grid=True`. 208 | V_grid : np.ndarray, optional 209 | [n_grid_points, EmbedDims] velocity values at grid locations. 210 | returned if `return_grid=True`. 211 | 212 | See Also 213 | -------- 214 | compute_velocity_on_grid 215 | divergence 216 | """ 217 | # compute a grid of positions and their Gaussian 218 | # weighted velocities across the embedding space 219 | X_grid, V_grid = compute_velocity_on_grid( 220 | adata.obsm[f'X_{use_rep}'][:, :2], 221 | adata.obsm[f'velocity_{use_rep}'][:, :2], 222 | n_grid_points=n_grid_points, 223 | **kwargs, 224 | ) 225 | # reshape the grid points into an [X, Y, 2] matrix 226 | V_spatial = V_grid.reshape( 227 | n_grid_points, 228 | n_grid_points, 229 | 2, 230 | ) 231 | # compute the divergence 232 | D_spatial = divergence([V_spatial[:, :, i] 233 | for i in range(V_spatial.shape[2])]) 234 | if return_grid: 235 | return D_spatial, X_grid, V_grid 236 | 237 | return D_spatial 238 | 239 | 240 | def plot_div( 241 | D_spatial, 242 | pal='PRGn', 243 | center: float = 0., 244 | cbar_label='Divergence', 245 | xticklabels: bool = False, 246 | yticklabels: bool = False, 247 | figsize: tuple = (6, 4), 248 | **kwargs, 249 | ) -> (matplotlib.figure.Figure, matplotlib.axes.Axes): 250 | """Plot a heatmap of the divergence values in an RNA velocity field. 251 | 252 | Parameters 253 | ---------- 254 | D_spatial : np.ndarray 255 | [n_grid_points, n_grid_points] divergence values. 256 | pal : Union[str, matplotlib.colors.Colormap] 257 | color map for divergence colors. can be a matplotlib 258 | named colormap. 259 | center : float 260 | value for centering a divergent colormap. 261 | cbar_label : str 262 | label for the colorbar. 263 | xticklabels : bool 264 | use x-axis tick labels. 265 | yticklabels : bool 266 | use y-axis tick labels. 267 | figsize : tuple 268 | (W, H) of the matplotlib figure. 269 | 270 | Returns 271 | ------- 272 | fig : matplotlib.figure.Figure 273 | ax : matplotlib.axes.Axes 274 | """ 275 | fig, ax = plt.subplots(1, 1, figsize=figsize) 276 | sns.heatmap( 277 | D_spatial, 278 | cmap=pal, 279 | ax=ax, 280 | center=center, 281 | cbar_kws={'label': cbar_label}, 282 | xticklabels=xticklabels, 283 | yticklabels=yticklabels, 284 | **kwargs, 285 | ) 286 | ax.invert_yaxis() 287 | ax.set_xlabel('PC1') 288 | ax.set_ylabel('PC2') 289 | return fig, ax 290 | -------------------------------------------------------------------------------- /velodyn/velocity_dpst.py: -------------------------------------------------------------------------------- 1 | """Compute a change in pseudotime for each cell""" 2 | import numpy as np 3 | import anndata 4 | from sklearn.neighbors import KNeighborsRegressor 5 | from sklearn.model_selection import cross_val_score 6 | 7 | 8 | class dPseudotime(object): 9 | """Compute a change in pseudotime value for each cell 10 | in a single cell experiment. 11 | 12 | Attributes 13 | ---------- 14 | adata : anndata.AnnData 15 | [Cells, Genes] single cell experiment. 16 | use_rep : str 17 | representation to use for predicting pseudotime coordinates. 18 | `adata.obsm[f'X_{use_rep}']`, `adata.obsm[f'velocity_{use_rep}']` 19 | must be present. 20 | pseudotime_var : str 21 | scalar variable in `adata.obs` encoding pseudotime coordinates. 22 | model : sklearn.neighbors.KNeighborsRegressor 23 | k-nearest neighbors regression model for pseudotime prediction. 24 | X : np.ndarray 25 | [Cells, Embedding] observed coordinates in embedding space. 26 | V : np.ndarray 27 | [Cells, Embedding] velocity vectors in embedding space. 28 | y : np.ndarray 29 | [Cells,] pseudotime coordinates. 30 | X_pred : np.ndarray 31 | [Cells, Embedding] predicted future coordinates. 32 | pst_pred : np.ndarray 33 | [Cells,] pseudotime coordinates inferred for positions `X_pred`. 34 | dpst : np.ndarray 35 | [Cells,] change in pseudotime coordinate. 36 | 37 | Methods 38 | ------- 39 | _fit_model 40 | predict_dpst 41 | """ 42 | 43 | def __init__( 44 | self, 45 | adata: anndata.AnnData, 46 | use_rep: str = 'pca', 47 | pseudotime_var: str = 'dpt_pseudotime', 48 | ) -> None: 49 | """Compute a change in pseudotime value for each cell 50 | in a single cell experiment. 51 | 52 | Parameters 53 | ---------- 54 | adata : anndata.AnnData 55 | [Cells, Genes] single cell experiment. 56 | use_rep : str 57 | representation to use for predicting pseudotime coordinates. 58 | `adata.obsm[f'X_{use_rep}']`, `adata.obsm[f'velocity_{use_rep}']` 59 | must be present. 60 | pseudotime_var : str 61 | scalar variable in `adata.obs` encoding pseudotime coordinates. 62 | 63 | Returns 64 | ------- 65 | None. 66 | """ 67 | self.adata = adata 68 | self.use_rep = use_rep 69 | self.pseudotime_var = pseudotime_var 70 | 71 | # check that necessary matrices are present 72 | if f'X_{use_rep}' in self.adata.obsm.keys(): 73 | self.X = self.adata.obsm[f'X_{use_rep}'] 74 | else: 75 | msg = f'X_{use_rep} is not in `adata.obsm' 76 | raise ValueError(msg) 77 | 78 | if f'velocity_{use_rep}' in self.adata.obsm.keys(): 79 | self.V = self.adata.obsm[f'velocity_{use_rep}'] 80 | else: 81 | msg = f'velocity_{use_rep} is not in `adata.obsm' 82 | raise ValueError(msg) 83 | 84 | if pseudotime_var in self.adata.obs.columns: 85 | self.y = self.adata.obs[pseudotime_var] 86 | else: 87 | msg = f'{pseudotime_var} is not in `adata.obs' 88 | raise ValueError(msg) 89 | 90 | return 91 | 92 | def _fit_model( 93 | self, 94 | n_neighbors: int = 50, 95 | weights: str = 'distance', 96 | ) -> None: 97 | """Fit a regression model to predict pseudotime coordinates 98 | from the specified embedding. 99 | 100 | Parameters 101 | ---------- 102 | n_neighbors : int 103 | number of neighbors to use for regression model. 104 | weights : str 105 | method to weight neighbor contributions. 106 | passed to `sklearn.neighbors.KNeighborsRegressor`. 107 | 108 | Returns 109 | ------- 110 | None. assigns `self.model`, `self.cv_scores`. 111 | """ 112 | # initialize a simple kNN regressor with multiprocessing 113 | self.model = KNeighborsRegressor( 114 | n_neighbors=n_neighbors, 115 | weights=weights, 116 | n_jobs=-1, 117 | ) 118 | 119 | # perform cross-validation scoring 120 | self.cv_scores = cross_val_score( 121 | self.model, 122 | self.X, 123 | self.y, 124 | cv=5, 125 | ) 126 | print('Cross-validation scores for prediction model:') 127 | print(self.cv_scores) 128 | print('Mean : ', np.mean(self.cv_scores)) 129 | print() 130 | 131 | # fit the final model on all the data 132 | self.model.fit(self.X, self.y) 133 | return 134 | 135 | def predict_dpst( 136 | self, 137 | step_size: float = 0.01, 138 | **kwargs, 139 | ) -> np.ndarray: 140 | """Predict a change in pseudotime coordinate for each cell 141 | in the experiment. 142 | 143 | Parameters 144 | ---------- 145 | step_size : float 146 | step size to use for future cell state predictions. 147 | the RNA velocity vector is scaled by this coefficient 148 | before addition to the current position. 149 | we recommend step sizes smaller than `1`. 150 | **kwargs are passed to `self._fit_model`. 151 | 152 | Returns 153 | ------- 154 | dpst : np.ndarray 155 | [Cells,] change in pseudotime value predicted for each 156 | cell. 157 | Also sets `self.pst_pred`, `self.dpst` atttributes. 158 | 159 | See Also 160 | -------- 161 | self._fit_model 162 | """ 163 | self._fit_model(**kwargs) 164 | 165 | # the predicted new pseudotime coordinate is the current 166 | # coordinate + the velocity vector, scaled by a step size 167 | self.X_pred = self.X + step_size * self.V 168 | # we predict the new coordinate's pseudotime position 169 | self.pst_pred = self.model.predict(self.X_pred) 170 | # the \Delta pseudotime coordinate is the difference between 171 | # predicted and observed coordinates 172 | self.dpst = self.pst_pred - self.y 173 | return self.dpst 174 | -------------------------------------------------------------------------------- /velodyn/velocity_dynsys.py: -------------------------------------------------------------------------------- 1 | """ 2 | Dynamical systems simulations in RNA velocity space 3 | """ 4 | import numpy as np 5 | from scipy import stats 6 | import anndata 7 | import tqdm 8 | import typing 9 | from typing import Collection 10 | import warnings 11 | # multiprocessing tools. pathos uses `dill` rather than `pickle`, 12 | # which provides more robust serialization. 13 | from pathos.multiprocessing import ProcessPool 14 | from sklearn.neighbors import NearestNeighbors 15 | # plotting 16 | import matplotlib 17 | import matplotlib.pyplot as plt 18 | import seaborn as sns 19 | 20 | 21 | class PhaseSimulation(object): 22 | """Perform phase point simulations in velocity fields. 23 | 24 | Attributes 25 | ---------- 26 | adata : anndata.AnnData 27 | [Cells, Genes] object with precomputed attributes 28 | for RNA velocity in `.layers`. 29 | keys: {velocity, spliced, unspliced}. 30 | vadata : anndata.AnnData 31 | view of `.adata` used for velocity field estimation. 32 | pfield : np.ndarray 33 | [Cells, Features] positions of cells in the velocity field. 34 | vfield : np.ndarray 35 | [Cells, Features] velocities of cells in the velocity field. 36 | starting_points : np.ndarray 37 | [Cells, Features] starting points for phase points in the 38 | velocity field. 39 | v_model : Callable 40 | a model of RNA velocity that predicts velocity given a positional 41 | coordinate in the desired basis. 42 | trajectories : np.ndarray 43 | [PhasePoints, Time, Dimensions, (Position, V_mu, V_sig)] 44 | trajectories of phase points in the velocity field. 45 | boundary_fence : dict 46 | {"min", "max"} specifies fence conditions if the boundary constraint 47 | is set to obey a predefined fence. minimum and maximum values for 48 | each dimension are stored as lists. 49 | timesteps : int 50 | [T,] number of timesteps for phase point evolution. 51 | step_scale : float 52 | scaling factor for phase point steps in the chosen basis. 53 | noise_scale : float 54 | scaling factor for noise introduced during phase point evolution. 55 | defaults to a noiseless simulation. 56 | velocity_k : int 57 | number of nearest neighbors to consider when employing a 'knn' 58 | velocity model. 59 | vknn_method : str 60 | method by which to the kNN model computes velocity estimates for 61 | phase points. 62 | "deterministic" -- use the mean of kNN RNA velocity vectors. 63 | "stochastic" -- fit a multivar. Gaussian to kNN vectors and sample. 64 | "knn_random_sample" -- randomly sample an observed vector from kNN. 65 | Methods 66 | ------- 67 | boundary_contraint(position, velocity) 68 | impose a boundary constraint by modifying the predicted position 69 | of an evolving phase point. defaults to an identity function. 70 | v_fxn : callable 71 | returns velocity as a function of position in the 72 | embedding space. takes a [D,] np.ndarray as input, returns 73 | a [D,] np.ndarray. 74 | """ 75 | 76 | def __init__( 77 | self, 78 | adata: anndata.AnnData, 79 | **kwargs, 80 | ) -> None: 81 | """Perform phase point simulations in velocity fields. 82 | 83 | Parameters 84 | ---------- 85 | adata : anndata.AnnData 86 | [Cells, Genes] object with precomputed attributes 87 | for RNA velocity in `.layers`. 88 | keys: {velocity, spliced, unspliced}. 89 | 90 | Returns 91 | ------- 92 | None. 93 | """ 94 | self.adata = adata 95 | if self.adata.raw is not None: 96 | print('`adata.raw` is not `None`.') 97 | print('This can cause indexing issues with some anndata versions.') 98 | print('Consider setting `adata.raw = None`\n.') 99 | 100 | # set the number of nearest neighbors to use when inferring 101 | # phase point velocities 102 | self.velocity_k = 100 103 | if 'velocity_k' in kwargs.keys(): 104 | self.velocity_k = kwargs['velocity_k'] 105 | 106 | # set an identity function as our initial boundary constraint 107 | # until we choose a different one 108 | self.boundary_constraint = self._identity_placeholder 109 | return 110 | 111 | def set_velocity_field( 112 | self, 113 | groupby: str = None, 114 | group: typing.Any = None, 115 | basis: str = 'counts', 116 | ) -> None: 117 | """Set a subset of cells to use when defining the 118 | velocity field. 119 | 120 | Parameters 121 | ---------- 122 | groupby : str 123 | column in `.adata.obs` to use for group selection. 124 | group : Any 125 | value in `groupby` to use for selecting cells. 126 | basis : str 127 | basis for setting the velocity field. must be one 128 | of {'counts', 'pca', 'umap', 'tsne'}. 129 | if not 'counts', must have 'velocity_%s'%basis attribute. 130 | 131 | Returns 132 | ------- 133 | None. Sets `.vadata`. 134 | 135 | Notes 136 | ----- 137 | Generates a view of `.adata` with only the selected 138 | cells, `.vadata`. 139 | Sets the `.vfield` and `.pfield` attribute with selected 140 | cells in the desired basis. 141 | """ 142 | # ensure that arguments are valid 143 | if groupby is not None and group is None: 144 | raise ValueError('Must supply a `group` for cell selection.') 145 | if group is not None and groupby is None: 146 | raise ValueError('Must supply a `groupby` for cell selection.') 147 | 148 | # check that the specified basis is supported 149 | bases = ['counts', 'pca', 'umap', 'tsne'] 150 | if basis not in bases: 151 | raise ValueError('%s is not a valid basis.' % basis) 152 | 153 | # if no grouping variable is provided 154 | # create a single "dummy" group 155 | if groupby is not None and group is not None: 156 | bidx = self.adata.obs[groupby] == group 157 | else: 158 | bidx = np.ones(self.adata.shape[0]).astype(np.bool) 159 | 160 | # get the relevant cells from the grouping 161 | self.vadata = self.adata[bidx, :].copy() 162 | 163 | # set the velocity field and position field using 164 | # cell observations 165 | if basis == 'counts': 166 | self.vfield = self.vadata.layers['velocity'] 167 | self.pfield = self.vadata.X 168 | else: 169 | self.vfield = self.vadata.obsm['velocity_%s' % basis] 170 | self.pfield = self.vadata.obsm['X_%s' % basis] 171 | 172 | # convert to dense if not 173 | # TODO: make downstream ops sparse compatible 174 | if type(self.vfield) != np.ndarray: 175 | self.vfield = self.vfield.toarray() 176 | if type(self.pfield) != np.ndarray: 177 | self.pfield = self.pfield.toarray() 178 | 179 | return 180 | 181 | def _set_starting_point_metadata( 182 | self, 183 | groupby: str = None, 184 | group: typing.Any = None, 185 | ) -> None: 186 | """Set starting points for phase point simulations based 187 | on sample annotations. 188 | 189 | Parameters 190 | ---------- 191 | groupby : str 192 | column in `.adata.obs` to use for group selection. 193 | group : Any 194 | value in `groupby` to use for selecting cells. 195 | 196 | Returns 197 | ------- 198 | None. Sets `.starting_points`. 199 | """ 200 | # check that arguments are valid 201 | if groupby is None or group is None: 202 | raise ValueError('must supply both groupby and group.') 203 | 204 | # set starting points as the designated positions in the 205 | # position field 206 | bidx = self.vadata.obs[groupby] == group 207 | print(f'Found {sum(bidx)} points matching starting criteria.') 208 | self.starting_points = self.pfield[bidx, :] 209 | return 210 | 211 | def _set_starting_point_embedding( 212 | self, 213 | basis: str = None, 214 | borders: tuple = None, 215 | ) -> None: 216 | """Set starting points for phase point simulations based 217 | on embedding locations. 218 | 219 | Parameters 220 | ---------- 221 | basis : str 222 | embedding basis to use for selection. 223 | expects `'X_'+basis` in `.obsm.keys()`. 224 | e.g. 'pca', 'umap', 'tsne'. 225 | borders : tuple 226 | [N,] minimum and maximum values in each dimension of the 227 | embedding to use for starting point selection. 228 | e.g. ((-1, 1), (-3, 1)) for a 2D embedding. 229 | 230 | Returns 231 | ------- 232 | None. sets `.starting_points`. 233 | """ 234 | # check that the basis is present 235 | bindices = [] 236 | if 'X_'+basis not in self.adata.obsm.keys(): 237 | raise ValueError( 238 | 'X_%s is not an embedding in `.adata.obsm`.' % basis) 239 | embed = self.vadata.obsm['X_'+basis] 240 | 241 | # get all cells within the borders specified along each dimension 242 | for i, min_max in enumerate(borders): 243 | bidx = np.logical_and( 244 | embed[:, i] > min_max[0], 245 | embed[:, i] < min_max[1], 246 | ) 247 | bindices.append(bidx) 248 | 249 | # use cells that meet all border criteria as starting points 250 | bidx = np.logical_and.reduce(bindices) 251 | self.starting_points = self.pfield[bidx, :] 252 | return 253 | 254 | def _set_starting_point_expression( 255 | self, 256 | genes: Collection[str] = None, 257 | min_expr_levels: Collection[float] = None, 258 | use_raw: bool = True, 259 | ) -> None: 260 | """Set starting points for phase point simulations based 261 | on gene expression levels. 262 | 263 | Parameters 264 | ---------- 265 | genes : Collection[str] 266 | [N,] gene names to use for starting point selection. 267 | min_expr_levels : Collection[float] 268 | [N,] minimum expression level for each gene. 269 | use_raw : bool 270 | use the `.adata.raw.X` attribute for gene expression levels 271 | instead of `.adata.X`. 272 | 273 | Returns 274 | ------- 275 | None. sets `.starting_points`. 276 | """ 277 | # check argument validity 278 | if genes is None or min_expr_levels is None: 279 | raise ValueError('must supply both genes and min_expr_levels') 280 | 281 | if len(genes) != len(min_expr_levels): 282 | ll = (len(genes), len(min_expr_levels)) 283 | raise ValueError( 284 | '%d genes and %d min_expr_levels, must be equal.' % ll) 285 | 286 | # tolerate singleton arguments begrudgingly 287 | if type(genes) == str: 288 | warnings.warn( 289 | 'casting `genes` to list in `_set_starint_point_expression`.' 290 | ) 291 | genes = [genes] 292 | if type(min_expr_levels) == float: 293 | min_expr_levels = [min_expr_levels] 294 | warnings.warn( 295 | 'casting `min_expr_level` to list `_set_starint_point_expression`.' 296 | ) 297 | 298 | if use_raw: 299 | ad = self.vadata.raw 300 | else: 301 | ad = self.vadata 302 | 303 | # get cells that express the relevant genes at the minimum 304 | # levels specified 305 | bindices = [] 306 | for i, g in enumerate(genes): 307 | expr = ad[:, g].X 308 | if type(expr) != np.ndarray: 309 | expr = expr.toarray() 310 | bidx = expr > min_expr_levels[i] 311 | bindices.append(bidx) 312 | 313 | # take only cells meeting all criteria as starting points 314 | bidx = np.logical_and.reduce(bindices) 315 | self.starting_points = self.pfield[bidx, :] 316 | return 317 | 318 | def set_starting_point( 319 | self, 320 | method: str, 321 | **kwargs, 322 | ) -> None: 323 | """Set starting points for phase point simulations. 324 | Uses metadata, embedding locations, or gene expression values. 325 | 326 | Parameters 327 | ---------- 328 | method : str 329 | {'metadata', 'embedding', 'expression'}. 330 | **kwargs : dict 331 | passed to the relevant `._set_starting_point_{method}` function. 332 | 333 | Returns 334 | ------- 335 | None. sets `.starting_points`. 336 | 337 | Notes 338 | ----- 339 | Calls the relevant method for setting starting points based 340 | on the `method` argument and passes remaining keyword arguments. 341 | """ 342 | # check argument validity 343 | acceptable_methods = ['metadata', 'embedding', 'expression'] 344 | if method not in acceptable_methods: 345 | raise ValueError('%s is not an acceptable method.' % method) 346 | 347 | if not hasattr(self, 'pfield'): 348 | raise ValueError( 349 | 'must set a `pfield` with `set_velocity_field` first.') 350 | 351 | f = getattr(self, '_set_starting_point_'+method) 352 | f(**kwargs) 353 | return 354 | 355 | def _identity_placeholder( 356 | self, 357 | x: typing.Any, 358 | ) -> typing.Any: 359 | """An identity function that returns an argument 360 | without modification. Useful as a placeholder.""" 361 | return x 362 | 363 | def _boundary_constraint_fence( 364 | self, 365 | x: np.ndarray, 366 | ) -> np.ndarray: 367 | """Imposes a boundary constraint on phase point position 368 | `x` by forcing each dimension to sit within a pre-defined 369 | fence. 370 | 371 | Parameters 372 | ---------- 373 | x : np.ndarray 374 | [D,] position of a phase point. 375 | 376 | Returns 377 | ------- 378 | x_constrained : np.ndarray 379 | [D,] position of the phase point with dimensions clamped 380 | to a pre-defined fence. 381 | """ 382 | # clip dimensions to fit within the boundary 383 | x_constrained = np.clip( 384 | x, 385 | self.boundary_fence['min'], 386 | self.boundary_fence['max'], 387 | ) 388 | return x_constrained 389 | 390 | def _boundary_constraint_nn_dist( 391 | self, 392 | x: np.ndarray, 393 | ) -> np.ndarray: 394 | """Imposes a boundary constraint on phase point position 395 | `x` by forcing `x` to the nearest point that is less than 396 | a predefined distance from its nearest neighbors. 397 | 398 | Parameters 399 | ---------- 400 | x : np.ndarray 401 | [D,] position of a phase point. 402 | 403 | Returns 404 | ------- 405 | x_constrained : np.ndarray 406 | [D,] position of the phase point with dimensions clamped 407 | to a pre-defined fence. 408 | 409 | Notes 410 | ----- 411 | Phase points are contrained to a maximum distance from their 412 | nearest neighbor. This distance can be adaptively determined 413 | by taking the median nearest neighbor distance from the data 414 | set and using some multiple of this distance as the boundary 415 | constraint. 416 | 417 | When a phase point passes beyond this distance, a distance 418 | vector is computed between the point and the neighbor, and 419 | the point location is shrunken along the vector to satisfy 420 | the boundary constraint. 421 | 422 | See Also 423 | -------- 424 | `.set_boundaries`. 425 | """ 426 | if len(x.shape) == 1: 427 | # pad to a [1, N] matrix for sklearn 428 | x = np.expand_dims(x, 0) 429 | # compute the distance to the nearest neighbor 430 | distances, indices = self.boundary_nn.kneighbors(x) 431 | if distances[0, 0] < self.max_nn_distance: 432 | x_constrained = x 433 | else: 434 | nn_point = self.pfield[indices[0, 0]:indices[0, 0]+1, :] 435 | d_vec = x - nn_point 436 | # how much larger is the difference vector than what we allow? 437 | scale_factor = self.max_nn_distance / distances[0, 0] 438 | # scale the difference vector and compute x_constrained 439 | # as this scaled vector moving away from the NN 440 | d_vec *= scale_factor 441 | x_constrained = nn_point + d_vec 442 | return x_constrained 443 | 444 | def set_boundaries( 445 | self, 446 | method: str = 'fence', 447 | borders: tuple = None, 448 | max_nn_distance: float = None, 449 | boundary_knn: int = 5, 450 | ) -> None: 451 | """Impose boundaries for phase point simulations. 452 | During evolution, phase points will not move beyond 453 | these boundaries. This can prevent numerical instability 454 | issues where a phase point travels "off the map". 455 | 456 | Parameters 457 | ---------- 458 | method : str 459 | one of {'fence', 'nn'}. 460 | fence - restrict phase points to a "fence" of the basis described 461 | with minimum and maximum values for each dimension. 462 | nn - restrict phase points to a maximum distance away from their 463 | nearest neighbor. this maximum distance is determined either 464 | empirically or by taking the median nearest neighbor distance 465 | from the data set. when points travel beyond this distance, they 466 | are shrunken back toward the neighbor along the distance vector. 467 | borders : tuple 468 | ((min_i, max_i), ...) for each dimension of the basis. 469 | only used if `method` is "fence". 470 | max_nn_distance : float 471 | maximum distance a phase point may travel from the 472 | nearest neighbor. if `None`, set to the median nearest neighbor 473 | distance in the data set. 474 | only used if `method` is "nn". 475 | boundary_knn : int 476 | number of nearest neighbors to use for 'nn' boundary fencing. 477 | moves cells toward the centroid of this nearest neighbor group. 478 | 479 | Returns 480 | ------- 481 | None. Sets `.boundary_constraint` attribute. 482 | 483 | See Also 484 | -------- 485 | _boundary_constraint_fence 486 | _boundary_constraint_nn_distance 487 | """ 488 | # check argument validity 489 | if method not in ('fence', 'nn'): 490 | raise NotImplementedError( 491 | '%s is not an implemented method.' % method) 492 | 493 | if method.lower() == 'fence': 494 | if borders is None: 495 | raise ValueError('must specify borders if method is fence.') 496 | # unpack border criteria into an attribute 497 | self.boundary_fence = {} 498 | self.boundary_fence['min'] = [x[0] for x in borders] 499 | self.boundary_fence['max'] = [x[1] for x in borders] 500 | # set the boundary contraint function to consider 501 | # the border fence during phase point updates 502 | self.boundary_constraint = self._boundary_constraint_fence 503 | elif method.lower() == 'nn': 504 | # the "nearest neighbor" to each point after fitting the NN 505 | # model is the point itself, so we fit k = 2 here and take 506 | # the "second" nearest neighbor for each point when predicting 507 | # on the points themselves. Note that since phase points aren't 508 | # in the training set, we subsequently use only the first neighbor. 509 | self.boundary_nn = NearestNeighbors( 510 | n_neighbors=2, metric='euclidean') 511 | self.boundary_nn.fit(self.pfield) 512 | if max_nn_distance is None: 513 | # Compute nearest neighbor distances in the data set 514 | if not hasattr(self, 'pfield'): 515 | raise ValueError( 516 | 'must `set_velocity_field` before NN boundaries.') 517 | distances, indices = self.boundary_nn.kneighbors(self.pfield) 518 | median_distance = np.median( 519 | distances[:, 1:self.boundary_knn+1]) 520 | self.max_nn_distance = median_distance 521 | else: 522 | self.max_nn_distance = max_nn_distance 523 | self.boundary_constraint = self._boundary_constraint_nn_dist 524 | return 525 | 526 | def _velocity_knn( 527 | self, 528 | x: np.ndarray, 529 | ) -> np.ndarray: 530 | """Calculate the velocity of a given position based 531 | on the average velocity of the k-NN to that position. 532 | 533 | Parameters 534 | ---------- 535 | x : np.ndarray 536 | [D,] position vector in embedding space. 537 | 538 | Returns 539 | ------- 540 | nn_v : np.ndarray 541 | [D, (Mean, Std)] velocity vector in embedding space. 542 | 543 | See Also 544 | -------- 545 | .k 546 | """ 547 | # find nearest neighbors 548 | nn_dist, nn_idx = self.v_nn.kneighbors( 549 | x.reshape(1, -1), 550 | return_distance=True, 551 | ) 552 | 553 | nn_idx = nn_idx.flatten() 554 | 555 | # calculate the velocity vector 556 | if self.vknn_method == 'deterministic': 557 | nn_v_mu = self.vfield[nn_idx, :].mean(0) 558 | elif self.vknn_method == 'stochastic': 559 | # fit a multivariate Gaussian to the observed 560 | # RNA velocity vectors of the nearest neighbors 561 | 562 | # compute weights for each neighboring cell 563 | weights = stats.norm.pdf(x=nn_dist, scale=self.mean_nn_distance) 564 | 565 | weights_mat = np.tile( 566 | weights.reshape(-1, 1), 567 | (1, self.vfield.shape[1]), 568 | ) 569 | mu = np.sum(weights_mat*self.vfield[nn_idx, :], 0)/np.sum(weights) 570 | # get weighted covariance 571 | # \Sigma = \frac{1}{\sum_{i=1}^{N} w_i - 1} 572 | # {\sum_{i=1}^N w_i \left(x_i - \mu^*\right)^T\left(x_i - \mu^*\right)} 573 | 574 | cov = np.cov( 575 | self.vfield[nn_idx, :], 576 | aweights=weights.flatten(), 577 | rowvar=False, 578 | ) 579 | 580 | # init a multivariate normal with the weighted 581 | # mean and covariance 582 | norm = stats.multivariate_normal( 583 | mean=mu, 584 | cov=cov, 585 | ) 586 | # sample from the fitted Gaussian 587 | nn_v_mu = norm.rvs() 588 | elif self.vknn_method == 'knn_random_sample': 589 | # randomly sample a velocity vector 590 | # from one of the nearest neighbors 591 | ridx = int(np.random.choice(nn_idx, size=1)) 592 | nn_v_mu = self.vfield[ridx, :] 593 | else: 594 | msg = f'{self.vknn_method} is not a valid method for ._velocity_knn' 595 | raise AttributeError(msg) 596 | 597 | nn_v_sd = self.vfield[nn_idx, :].std(0) 598 | nn_v = np.stack([nn_v_mu, nn_v_sd], -1) 599 | return nn_v 600 | 601 | def _evolve( 602 | self, 603 | x0_idx: int, 604 | ) -> np.ndarray: 605 | """ 606 | Place a phase point at `x0` and evolve for `t` timesteps. 607 | 608 | Parameters 609 | ---------- 610 | x0_idx : int 611 | index for starting point `self.starting_points`. 612 | 613 | Returns 614 | ------- 615 | trajectory : np.ndarray 616 | [T, D, (Position, V_mu, V_sig)] trajectory of the 617 | phase point. 618 | """ 619 | x0 = self.starting_points[x0_idx, :] 620 | if type(x0) != np.ndarray: 621 | x0 = x0.toarray() 622 | x0 = x0.flatten() 623 | # [T, Dims, (Position, Velocity)] 624 | trajectory = np.zeros( 625 | (self.timesteps, x0.shape[0], 3), dtype=np.float32) 626 | 627 | # for each timestep, update the position of the phase point 628 | # based on the velocity of nearest neighbors and obey any 629 | # boundary constraints 630 | x = x0 631 | for t in range(self.timesteps): 632 | trajectory[t, :, 0] = x # match x position to dv/dx 633 | v = self.v_fxn(x=x.reshape(-1),) 634 | # add white noise if desired to better emulate a stochastic process 635 | noise = v[:, 1] * np.random.randn(v.shape[0]) * self.noise_scale 636 | x_new = x + (v[:, 0] + noise)*self.step_scale 637 | # constrain to a set of pre-defined boundaries 638 | # defaults to an identity if not set explicitly 639 | x_new = self.boundary_constraint(x_new) 640 | trajectory[t, :, 1] = v[:, 0] 641 | trajectory[t, :, 2] = v[:, 1] 642 | x = x_new 643 | return trajectory 644 | 645 | def _evolve2disk(self, **kwargs) -> str: 646 | """Performs phase point evolution, but saves results to disk rather 647 | than returning the array.""" 648 | raise NotImplementedError('evolve2disk is not yet implemented.') 649 | 650 | def __getstate__(self) -> dict: 651 | """Redefine __getstate__ to allow serialization of class methods. 652 | `anndata.AnnData` doesnt support serialization. 653 | """ 654 | self_dict = self.__dict__.copy() 655 | # we remove large objects from `__getstate__` to allow 656 | # pickling for `multiprocessing.Pool` workers without 657 | # high memory overhead 658 | del self_dict['adata'] 659 | del self_dict['vadata'] 660 | return self_dict 661 | 662 | def simulate_phase_points( 663 | self, 664 | n_points: int = 1000, 665 | n_timesteps: int = 1000, 666 | velocity_method: str = 'knn', 667 | velocity_method_attrs: dict = { 668 | 'vknn_method': 'deterministic', 669 | }, 670 | step_scale: float = 1., 671 | noise_scale: float = 0., 672 | multiprocess: bool = False, 673 | ) -> np.ndarray: 674 | """Simulate phase points moving through the velocity field. 675 | 676 | Parameters 677 | ---------- 678 | n_points : int 679 | number of points to simulate. 680 | n_timesteps : int 681 | number of timesteps for evolution. 682 | velocity_method : str 683 | method for estimating velocity during phase point evolution. 684 | one of {'knn', 'v_model'}. 685 | if 'v_model', must set the `.v_model` attribute with a Callable 686 | that takes in a position and outputs a velocity. useful if you 687 | want to train a model to map positions to velocities. 688 | velocity_method_attrs: dict 689 | attributes for use in a particular velocity method. 690 | keys are attribute names added to `self` with corresponding 691 | values. 692 | step_scale : float 693 | scaling factor for steps in the embedding space. 694 | noise_scale : float 695 | scaling factor for noise introduced during simulation. 696 | defaults to a noiseless simulation. 697 | multiprocess : bool 698 | use multiprocessing. 699 | 700 | Returns 701 | ------- 702 | trajectories : np.ndarray 703 | [PhasePoints, Time, Dimensions, (Position, V_mu, V_sig)] 704 | trajectories of phase points in the velocity field. 705 | also sets `.trajectories` attribute. 706 | 707 | Notes 708 | ----- 709 | TODO: multithread these operations 710 | """ 711 | # check argument validity 712 | if velocity_method not in ['knn', 'v_model']: 713 | raise ValueError( 714 | '%s is not a valid velocity method.' % velocity_method) 715 | 716 | if not hasattr(self, 'vfield') or not hasattr(self, 'pfield'): 717 | raise ValueError( 718 | 'must first set velocity field with `set_velocity_field`.') 719 | 720 | if not hasattr(self, 'starting_points'): 721 | raise ValueError( 722 | 'must first set starting points with `set_starting_points`.') 723 | 724 | if velocity_method == 'knn': 725 | self.v_fxn = self._velocity_knn 726 | if 'vknn_method' not in velocity_method_attrs: 727 | msg = 'velocity_method knn requires a "vknn_method" attribute.' 728 | raise ValueError(msg) 729 | # fit a nearest neighbors model to the data 730 | self.v_nn = NearestNeighbors(n_neighbors=self.velocity_k) 731 | self.v_nn.fit(self.pfield) 732 | 733 | # get the mean distance between nearest neighbors 734 | d, _ = self.v_nn.kneighbors(self.pfield) 735 | self.mean_nn_distance = d[:, 1].mean() 736 | 737 | elif velocity_method == 'v_model': 738 | if not hasattr(self, 'v_model'): 739 | raise ValueError('must specify a neural network model first.') 740 | self.v_fxn = self.v_model 741 | else: 742 | msg = f'{velocity_method} is not a valid velocity method.' 743 | raise ValueError(msg) 744 | 745 | if velocity_method_attrs is not None: 746 | # add the velocity method attrs to self 747 | for k in velocity_method_attrs.keys(): 748 | setattr(self, k, velocity_method_attrs[k]) 749 | 750 | self.timesteps = n_timesteps 751 | self.step_scale = step_scale 752 | self.noise_scale = noise_scale 753 | 754 | if multiprocess: 755 | # get a set of starting locations 756 | ridx = np.random.choice(np.arange(self.starting_points.shape[0]), 757 | size=n_points, 758 | replace=True) 759 | # open a process pool 760 | p = ProcessPool() 761 | # distribute tasks to workers 762 | res = p.map(self._evolve, ridx.tolist()) 763 | p.close() 764 | # aggregate trajectory results 765 | trajectories = np.stack(res, 0) 766 | else: 767 | trajectories = np.zeros( 768 | ( 769 | n_points, 770 | n_timesteps, 771 | self.pfield.shape[1], 772 | 3, 773 | ), 774 | dtype=np.float32, 775 | ) 776 | for i in tqdm.tqdm( 777 | range(n_points), 778 | desc='simulating trajectories' 779 | ): 780 | 781 | # select a random starting point 782 | ridx = np.random.choice( 783 | np.arange(self.starting_points.shape[0]), 784 | size=1, 785 | replace=False, 786 | ) 787 | 788 | # simulate the trajectory! 789 | phase_traj = self._evolve(x0_idx=ridx,) 790 | trajectories[i, :, :, :] = phase_traj 791 | 792 | self.trajectories = trajectories 793 | return trajectories 794 | 795 | 796 | ########################################## 797 | # plotting methods 798 | ########################################## 799 | 800 | def plot_phase_simulations( 801 | adata: anndata.AnnData, 802 | trajectories: np.ndarray, 803 | basis: str = 'pca', 804 | figsize: tuple = (6, 4), 805 | point_color='lightgray', 806 | trajectory_cmap='Purples', 807 | n_colors: int = 40, 808 | **kwargs, 809 | ) -> (matplotlib.figure.Figure, matplotlib.axes.Axes): 810 | """Plot phase simulation trajectories. 811 | 812 | Parameters 813 | ---------- 814 | adata : anndata.AnnData 815 | [Cells, Genes] experiment object. 816 | trajectories : np.ndarray 817 | [PhasePoints, Time, Dimensions, (Position, V_mu, V_sig)] 818 | trajectories of phase points in the velocity field. 819 | basis : str 820 | coordinate basis in `adata.obsm` to use. 821 | retrieves `adata.obsm[f'X_{basis}']`. 822 | figsize : tuple 823 | (W, H) for matplotlib figure. 824 | point_color : str 825 | color to use for observed cell coordinate points. 826 | trajectory_cmap : str 827 | colormap to use for plotting trajectories. 828 | single color maps (e.g. "Purples", "Blues") work well. 829 | n_colors : int 830 | number of steps in the color gradient and number of unique 831 | points to plot for each trajectory. 832 | 833 | Returns 834 | ------- 835 | fig : matplotlib.figure.Figure 836 | ax : matplotlib.axes.Axes 837 | """ 838 | 839 | E = adata.obsm[f'X_{basis}'] 840 | 841 | fig, ax = plt.subplots(1, 1, figsize=figsize) 842 | ax.scatter( 843 | E[:, 0], 844 | E[:, 1], 845 | color=point_color, 846 | alpha=0.5, 847 | ) 848 | 849 | n_steps = trajectories.shape[1] 850 | 851 | gradient = sns.color_palette(trajectory_cmap, n_colors) 852 | for i, t in enumerate( 853 | np.arange(0, n_steps, n_steps//n_colors)[:-1][:n_colors] 854 | ): 855 | T = trajectories[:, t, :, 0] 856 | ax.scatter( 857 | T[:, 0], 858 | T[:, 1], 859 | color=gradient[i], 860 | **kwargs, 861 | ) 862 | ax.set_xlabel(f'{basis} 1') 863 | ax.set_ylabel(f'{basis} 2') 864 | ax.set_title(f'Phase Points - {basis} Basis') 865 | return fig, ax 866 | --------------------------------------------------------------------------------