├── .gitignore ├── LICENSE ├── Makefile ├── README.md ├── concordex ├── .DS_Store ├── __init__.py ├── neighbors │ ├── __init__.py │ └── _neighborhood.py ├── tools │ ├── __init__.py │ └── _concordex.py └── utils │ └── _labels.py ├── examples ├── concordex_py_pbmc_demo.ipynb ├── concordex_py_starmap_demo.ipynb └── data │ └── starmap_processed.h5ad ├── pyproject.toml ├── requirements.txt └── setup.cfg /.gitignore: -------------------------------------------------------------------------------- 1 | # Python build files 2 | __pycache__/ 3 | **/__pycache__/ 4 | 5 | *.egg-info/ 6 | 7 | /dist/ 8 | /build/ 9 | /*-env/ 10 | /env-*/ 11 | /environment.yml 12 | 13 | # IDEs and OS things 14 | .DS_Store 15 | /.vscode/ 16 | .ipynb_checkpoints/ -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Pachter Lab 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | .PHONY : clean build upload 2 | 3 | clean: 4 | rm -rf build 5 | rm -rf dist 6 | rm -rf concordex.egg-info 7 | rm -rf docs/_build 8 | rm -rf docs/api 9 | rm -rf .coverage 10 | 11 | build: 12 | python -m build --wheel 13 | 14 | upload: 15 | twine upload dist/* -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # concordex 1.1.1 2 | 3 | The goal of `concordex` is to identify spatial homogeneous regions (SHRs) as defined in the recent manuscript, [“Identification of spatial homogenous regions in tissues with concordex”](https://doi.org/10.1101/2023.06.28.546949). Briefly, SHRs are are domains that are homogeneous with respect to cell type composition. `concordex` relies on the the k-nearest-neighbor (kNN) graph to representing similarities between cells and uses common clustering algorithms to identify SHRs. 4 | 5 | ## Installation 6 | 7 | `concordex` can be installed via pip 8 | ```bash 9 | pip install concordex 10 | ``` 11 | .... and from Github 12 | ```bash 13 | pip install git+https://github.com/pachterlab/concordex.git 14 | ``` 15 | 16 | ## Usage 17 | 18 | After installing, `concordex` can be run as follows: 19 | ``` 20 | import scanpy as sc 21 | from concordex.tools import calculate_concordex 22 | 23 | ad = sc.datasets.pbmc68k_reduced() 24 | 25 | # Compute concordex with discrete labels 26 | calculate_concordex(ad, 'louvain', n_neighbors=10) 27 | 28 | # Neighborhood consolidation information is stored in `adata.obsm` 29 | ad.obsm['X_nbc'][:3] 30 | 31 | # The column names are stored in `adata.uns` 32 | ad.uns['nbc_params']['nbc_colnames'] 33 | ``` 34 | 35 | ## Citation 36 | 37 | If you’d like to use the `concordex` package in your research, please 38 | cite our recent bioRxiv preprint: 39 | 40 | > Jackson, K.; Booeshaghi, A. S.; Gálvez-Merchán, Á.; Moses, L.; Chari, 41 | > T.; Kim, A.; Pachter, L. Identification of spatial homogeneous regions in tissues 42 | > with concordex. bioRxiv (Cold Spring Harbor Laboratory) 2023. 43 | > . 44 | 45 | @article {Jackson2023.06.28.546949, 46 | author = {Jackson, Kayla C. and Booeshaghi, A. Sina and G{'a}lvez-Merch{'a}n, {'A}ngel and Moses, Lambda and Chari, Tara and Kim, Alexandra and Pachter, Lior}, 47 | title = {Identification of spatial homogeneous regions in tissues with concordex}, 48 | year = {2024}, 49 | doi = {10.1101/2023.06.28.546949}, 50 | publisher = {Cold Spring Harbor Laboratory}, 51 | URL = {}, 52 | journal = {bioRxiv} 53 | } 54 | 55 | ## Maintainer 56 | 57 | [Kayla Jackson](https://github.com/kayla-jackson) 58 | -------------------------------------------------------------------------------- /concordex/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pachterlab/concordex/77ef3bf7382c461ae3c9a9362bf51a3283175347/concordex/.DS_Store -------------------------------------------------------------------------------- /concordex/__init__.py: -------------------------------------------------------------------------------- 1 | """Identification of spatial homogeneous regions""" 2 | 3 | __version__ = '1.1.1' 4 | -------------------------------------------------------------------------------- /concordex/neighbors/__init__.py: -------------------------------------------------------------------------------- 1 | from ._neighborhood import ( 2 | compute_neighbors, 3 | consolidate 4 | ) 5 | 6 | from ..utils._labels import Labels 7 | 8 | __all__ = [ 9 | 'compute_neighbors', 10 | 'consolidate' 11 | ] -------------------------------------------------------------------------------- /concordex/neighbors/_neighborhood.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import warnings 3 | 4 | from anndata import AnnData 5 | from sklearn.neighbors import NearestNeighbors 6 | 7 | from ..utils._labels import Labels 8 | 9 | def consolidate( 10 | adata: AnnData, 11 | labels, 12 | *, 13 | compute_similarity: bool = False, 14 | key_added: str | None = None, 15 | copy: bool = False 16 | ): 17 | """ 18 | Compute the neighborhood consolidation matrix. 19 | 20 | adata 21 | The AnnData object 22 | labels 23 | Observation labels used to compute the neighborhood 24 | consolidation matrix. Continuous or discrete labels are allowed, 25 | and typically, integer labels are assumed to be discrete. 26 | compute_similarity 27 | Whether to return the label similarity matrix. Only useful if 28 | discrete labels are provided. 29 | key_added 30 | If not specified, the neighborhood consolidation matrix is stored as 31 | :attr:`~anndata.AnnData.obsm`\\ `['X_nbc']`, and the parameters as 32 | :attr:`~anndata.AnnData.uns`\\ `['nbc_params']`. 33 | The index is assumed to be stored as 34 | :attr:`~anndata.AnnData.obsm`\\`['index']` 35 | 36 | If specified, ``[key_added]`` is prepended to the default keys. 37 | copy 38 | If ``copy=True``, return the neighborhood consolidation matrix instead 39 | of updating adata. 40 | """ 41 | 42 | if key_added is None: 43 | nbc_uns_key = "nbc_params" 44 | nbc_key = "X_nbc" 45 | index_key = "index" 46 | else: 47 | nbc_key = key_added + "_nbc" 48 | nbc_uns_key = key_added + "_nbc_params" 49 | index_key = key_added + "_index" 50 | 51 | if index_key in adata.obsm.keys(): 52 | Index = adata.obsm[index_key] 53 | else: 54 | raise ValueError("Must run ``concordex.neighbors.compute_neighbors()``") 55 | 56 | labels = Labels(labels) 57 | labels.extract(adata) 58 | 59 | if labels.labeltype != "discrete" and compute_similarity: 60 | compute_similarity = False 61 | warnings.warn("Expected discrete labels to compute similarity matrix") 62 | 63 | 64 | if compute_similarity: 65 | print("Computing neighborhood consolidation and similarity matrices...\n") 66 | nbc, sim = _consolidate(Index, labels, compute_similarity) 67 | else: 68 | print("Computing neighborhood consolidation matrix...\n") 69 | nbc = _consolidate(Index, labels, compute_similarity) 70 | 71 | adata.uns[nbc_uns_key] = {} 72 | nbc_index_dict = adata.uns[nbc_uns_key] 73 | 74 | nbc_index_dict = { 75 | "nbc_key": nbc_key, 76 | "labels": labels, 77 | "labels_found" : labels.labelnames, 78 | "nbc_colnames" : labels.nbccolumns, 79 | "params": { 80 | "compute_similarity": compute_similarity 81 | }, 82 | } 83 | 84 | if compute_similarity: 85 | nbc_index_dict['similarity'] = sim 86 | nbc_index_dict['labelorder'] = labels.discretelabelsunique 87 | 88 | if copy: 89 | return nbc, nbc_index_dict 90 | 91 | # Update adata 92 | adata.uns[nbc_uns_key] = nbc_index_dict 93 | adata.obsm[nbc_key] = nbc 94 | 95 | 96 | def _consolidate(X, labels, compute_similarity): 97 | 98 | def take_col_means(indices, take_from, take_by='row'): 99 | if take_by == 'row': 100 | axis=0 101 | else: 102 | axis=1 103 | sub = np.take(take_from, indices, axis=axis) 104 | 105 | return sub.mean(axis=0) 106 | 107 | labels_values = labels.values 108 | 109 | Nbc = np.apply_along_axis(take_col_means, 1, X, take_from=labels_values) 110 | 111 | if compute_similarity: 112 | nlab = labels.n_unique_labels 113 | labels_new = labels.discretelabelscollapsed 114 | 115 | labels_uniq = labels.discretelabelsunique 116 | 117 | Sim = np.empty((nlab, nlab), dtype=np.float64) 118 | for i, lab in enumerate(labels_uniq): 119 | m = np.isin(labels_new, lab) 120 | 121 | Sim[i,:] = Nbc[m, :].mean(axis=0) 122 | 123 | return Nbc, Sim 124 | 125 | return Nbc 126 | 127 | 128 | def compute_neighbors( 129 | adata: AnnData, 130 | *, 131 | use_rep: str | None = None, 132 | n_neighbors: int = 30, 133 | metric: str = "euclidean", 134 | metric_params: dict | None = None, 135 | n_jobs: int | None = None, 136 | key_added: str | None = None, 137 | recompute_index: bool = False, 138 | copy: bool = False, 139 | **kwargs 140 | ): 141 | """ 142 | A very thin wrapper around `sklearn.neighbors.NearestNeighbors` 143 | 144 | adata 145 | The adata object 146 | use_rep 147 | Key in adata.obsm to use for constructing the kNN graph 148 | n_neighbors 149 | Number of neighbors used to compute the kNN graph. Defaults to 30. 150 | metric 151 | Metric used to compute distance 152 | metric_params 153 | Additional params passed to metric function 154 | n_jobs 155 | Used to control parallel evaluation 156 | key_added 157 | Key which controls where the results are saved if ``copy = False``. 158 | recompute_index 159 | If a neighborhood graph exists at the specified key, should the 160 | data be overwritten? 161 | copy : bool 162 | If ``copy = True``, return the nearest neighbor graph instead of 163 | updating adata. 164 | **kwargs 165 | Additional keyword arguments passed to sklearn.neighbors.NearestNeighbors 166 | """ 167 | if use_rep is None or use_rep == 'X': 168 | X = adata.X 169 | else: 170 | if use_rep in adata.obsm.keys(): 171 | X = adata.obsm[use_rep] 172 | else: 173 | raise ValueError( 174 | f"Did not find {use_rep} in ``.obsm.keys()``. " 175 | ) 176 | 177 | nn_kwargs = {} 178 | if kwargs: 179 | nn_kwargs = kwargs 180 | if metric_params is not None: 181 | if 'p' in metric_params.keys(): 182 | p = metric_params.pop('p') 183 | nn_kwargs['p'] = p 184 | 185 | nn_kwargs['metric_params'] = metric_params 186 | 187 | if key_added is None: 188 | index_uns_key = "index_params" 189 | index_key = "index" 190 | else: 191 | index_uns_key = key_added + "_index_params" 192 | index_key = key_added + "_index" 193 | 194 | index_exists = index_key in adata.obsm.keys() 195 | 196 | print("Computing nearest neighbors...\n") 197 | if index_exists and not recompute_index: 198 | warnings.warn( 199 | f"A neighborhood graph already exists at ``adata.obsm[{index_key}]``. \ 200 | Set ``recompute_index = TRUE`` to overwrite the existing graph.") 201 | 202 | Index = adata.obsm[index_key] 203 | neighbors_index_dict = adata.uns[index_uns_key] 204 | 205 | if recompute_index or not index_exists: 206 | 207 | Index = _compute_neighbors( 208 | X, n_neighbors=n_neighbors, metric=metric, n_jobs=n_jobs, **nn_kwargs 209 | ) 210 | 211 | adata.uns[index_uns_key] = {} 212 | neighbors_index_dict = adata.uns[index_uns_key] 213 | 214 | neighbors_index_dict = { 215 | "index_key": index_key, 216 | "params": { 217 | "n_neighbors": n_neighbors, 218 | "metric": metric, 219 | } 220 | } 221 | 222 | if nn_kwargs: 223 | neighbors_index_dict['params']['nn_kwargs'] = nn_kwargs 224 | 225 | if use_rep is not None: 226 | neighbors_index_dict["params"]["use_rep"] = use_rep 227 | 228 | if copy: 229 | return Index 230 | 231 | # Update adata 232 | adata.uns[index_uns_key] = neighbors_index_dict 233 | adata.obsm[index_key] = Index 234 | 235 | def _compute_neighbors( 236 | X, 237 | *, 238 | n_neighbors: int = 30, 239 | include_self: bool = False, 240 | **kwargs 241 | ): 242 | 243 | N = X.shape[0] 244 | 245 | if include_self: 246 | index = 0 247 | else: 248 | index = 1 249 | n_neighbors = n_neighbors+1 250 | 251 | nbrs = NearestNeighbors(n_neighbors=n_neighbors, **kwargs) 252 | nbrs.fit(X) 253 | 254 | g = nbrs.kneighbors(X, return_distance=False) 255 | 256 | return g[:, index:] 257 | 258 | -------------------------------------------------------------------------------- /concordex/tools/__init__.py: -------------------------------------------------------------------------------- 1 | from ._concordex import calculate_concordex 2 | 3 | __all__ = [ 4 | 'calculate_concordex' 5 | ] -------------------------------------------------------------------------------- /concordex/tools/_concordex.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from anndata import AnnData 3 | 4 | from ..neighbors._neighborhood import ( 5 | compute_neighbors, 6 | consolidate 7 | ) 8 | 9 | 10 | def calculate_concordex( 11 | adata: AnnData, 12 | labels, 13 | *, 14 | n_neighbors: int = 30, 15 | use_rep: str | None = "X", 16 | metric: str = "euclidean", 17 | metric_params: dict | None = None, 18 | n_jobs: int | None = None, 19 | key_added: str | None = None, 20 | compute_similarity: bool = False, 21 | recompute_index: bool = False 22 | ): 23 | """ 24 | adata 25 | The AnnData object 26 | labels 27 | Observation labels used to compute the neighborhood 28 | consolidation matrix. Continuous or discrete labels are allowed, 29 | and typically, integer labels are assumed to be discrete. 30 | n_neighbors 31 | Number of neighbors used to compute the kNN graph. Defaults to 30. 32 | metric 33 | Metric used to compute distance 34 | metric_params 35 | Additional parameters passed to metric function 36 | n_jobs 37 | Used to control parallel evaluation 38 | key_added 39 | If not specified, the relevant results are stored as 40 | :attr:`~anndata.AnnData.obsm`\\ `['index']`, the neighborhood consolidation matrix as 41 | :attr:`~anndata.AnnData.obsm`\\ `['nbc']`, and the parameters as 42 | :attr:`~anndata.AnnData.uns`\\ `['index_params']` and 43 | :attr:`~anndata.AnnData.uns`\\ `['nbc_params']`. 44 | If specified, ``[key_added]`` is prepended to the default keys. 45 | compute_similarity 46 | Whether to return the label similarity matrix and stores this information in 47 | adata.uns['nbc_params']['similarity']. Only implemented for discrete labels. 48 | recompute_index 49 | If a neighborhood graph exists at the specified key, should the 50 | data be overwritten? 51 | """ 52 | 53 | # 1. Compute neighborhood graph 54 | compute_neighbors(adata, 55 | use_rep=use_rep, n_neighbors=n_neighbors, metric=metric, 56 | metric_params=metric_params, n_jobs=n_jobs, key_added=key_added, recompute_index=recompute_index) 57 | 58 | # 2. Then consolidate 59 | consolidate(adata, labels, key_added=key_added, compute_similarity=compute_similarity) 60 | 61 | 62 | -------------------------------------------------------------------------------- /concordex/utils/_labels.py: -------------------------------------------------------------------------------- 1 | import re 2 | from warnings import warn 3 | 4 | import numpy as np 5 | import pandas as pd 6 | 7 | from anndata import AnnData 8 | 9 | class Labels: 10 | def __init__(self, names): 11 | 12 | if names is None: 13 | raise ValueError("No labels to search for. Must provide labels.") 14 | 15 | self._lookup = names 16 | 17 | @property 18 | def labeltype(self) -> str: 19 | if not hasattr(self, "_labeltype"): 20 | raise AttributeError("`labeltype` has not been set. Call `extract(adata)` first") 21 | 22 | return self._labeltype 23 | 24 | @property 25 | def labelnames(self) -> str | list: 26 | if not hasattr(self, '_labelnames'): 27 | raise AttributeError("`labelnames` has not been set. Call `extract(adata)` first.") 28 | 29 | return self._labelnames 30 | 31 | @property 32 | def values(self): 33 | if not hasattr(self, '_values'): 34 | raise AttributeError("`values` has not been set. Call `extract(adata)` first.") 35 | return self._values 36 | 37 | @property 38 | def n_unique_labels(self): 39 | if not hasattr(self, '_labelshape'): 40 | raise AttributeError("Unable to determine number of labels. Call `extract(adata)` first.") 41 | return self._labelshape[1] 42 | 43 | @property 44 | def discretelabelscollapsed(self): 45 | if not hasattr(self, '_discretelabelscollapsed'): 46 | raise AttributeError("Ensure that labels are discrete and call `extract(adata)`.") 47 | return self._discretelabelscollapsed 48 | 49 | @property 50 | def discretelabelsunique(self): 51 | if not hasattr(self, '_discretelabelsunique'): 52 | raise AttributeError("Ensure that labels are discrete and call `extract(adata)`.") 53 | return self._discretelabelsunique 54 | 55 | @property 56 | def nbccolumns(self): 57 | if not hasattr(self, '_labelcolumns'): 58 | if not hasattr(self, '_values'): 59 | return [] 60 | else: 61 | _nattr = self._values.shape[1] 62 | self._labelcolumns = [f"X_{i}" for i in range(_nattr)] 63 | 64 | return self._labelcolumns 65 | 66 | def extract(self, adata: AnnData): 67 | """ 68 | Extract labels from adata.obs (or adata.obsm) and update 69 | """ 70 | if self._lookup is not None: 71 | 72 | obs_keys = adata.obs.keys() 73 | m = np.isin(obs_keys, self._lookup) 74 | 75 | if any(m): 76 | labels_sub = adata.obs[obs_keys[m]] 77 | types = [dt.name for dt in labels_sub.dtypes] 78 | 79 | self._labelnames = labels_sub.columns.tolist() 80 | self._labelcolumns = labels_sub.columns.tolist() 81 | 82 | self._validate(types) 83 | _values = labels_sub.values 84 | 85 | if self._labeltype == "discrete": 86 | self._discretelabelscollapsed = self.collapse(_values) 87 | _values_ohe = self.one_hot_encode(self._discretelabelscollapsed) 88 | 89 | self._discretelabelsunique = _values_ohe.columns.tolist() 90 | self._labelcolumns = _values_ohe.columns.tolist() 91 | 92 | _values = _values_ohe.values 93 | 94 | else: 95 | # Check if labels are in .obsm 96 | lookup_key = self._lookup 97 | if not isinstance(lookup_key, str): 98 | lookup_key = self._lookup[0] 99 | warn( 100 | f"Looking for labels in `adata.obsm`. Only the first key, {lookup_key}, will be used.", 101 | category=UserWarning 102 | ) 103 | 104 | if lookup_key in adata.obsm.keys(): 105 | self._labeltype = 'continuous' 106 | _values = adata.obsm[lookup_key] 107 | self._labelnames = lookup_key 108 | 109 | # Keep track of colnames for NBC 110 | _nattr = _values.shape[1] 111 | self._labelcolumns = [f"{lookup_key}_{i}" for i in range(_nattr)] 112 | 113 | else: 114 | raise KeyError( 115 | f"{lookup_key} not found in `adata`" 116 | ) 117 | self._labelshape = (_values.shape) 118 | self._values = _values 119 | 120 | return None 121 | 122 | 123 | def _validate(self, types) -> bool: 124 | """ 125 | Confirm that labels are either all discrete, or all continuous 126 | """ 127 | discrete_pattern = r"category|int|str" 128 | check_discrete = [bool(re.search(discrete_pattern, s)) for s in types] 129 | 130 | if all(check_discrete): 131 | self._labeltype='discrete' 132 | 133 | return True 134 | elif not any(check_discrete): 135 | self._labeltype='continuous' 136 | 137 | return True 138 | else: 139 | raise ValueError("Labels should be discrete or continous, not both.") 140 | 141 | @staticmethod 142 | def one_hot_encode(values): 143 | return pd.get_dummies(values) 144 | 145 | @staticmethod 146 | def collapse(values, sep="_"): 147 | return np.array(["_".join(row) for row in values]) -------------------------------------------------------------------------------- /examples/concordex_py_pbmc_demo.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Install concordex" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": { 14 | "colab": { 15 | "base_uri": "https://localhost:8080/" 16 | }, 17 | "id": "pPgFOM-reZJN", 18 | "outputId": "2549548d-4b73-4740-d1a2-99b5a6a65da7" 19 | }, 20 | "outputs": [], 21 | "source": [ 22 | "# !pip install anndata\n", 23 | "# !pip3 install scanpy\n", 24 | "!pip3 install concordex" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 1, 30 | "metadata": { 31 | "id": "dpfhvyjafJo3" 32 | }, 33 | "outputs": [], 34 | "source": [ 35 | "# Import libraries\n", 36 | "import seaborn as sns\n", 37 | "import scanpy as sc\n", 38 | "\n", 39 | "from concordex.tools import calculate_concordex\n", 40 | "\n", 41 | "import session_info" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "## Load dataset\n", 49 | "\n", 50 | "For this demonstration of the nonspatial applications of concordex, we will be using the processed [PBMC dataset](https://www.10xgenomics.com/datasets/fresh-68-k-pbm-cs-donor-a-1-standard-1-1-0) from 10x Genomics. This dataset is available using the `scanpy.datasets` interface. " 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 2, 56 | "metadata": { 57 | "id": "KnzLTX-EhF9w" 58 | }, 59 | "outputs": [], 60 | "source": [ 61 | "ad = sc.datasets.pbmc68k_reduced()" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "## Compute `concordex`" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "`concordex` computes a Neighborhood Consolidation Matrix (NBC) that quantifies the proportion of a given cell's neighbors sharing a specific label. This matrix helps capture the local structure of cell populations, reflecting how often cells with similar transcriptomic profiles are assigned the same discrete label. For this analysis, we will used the first 50 PCs to compute the k-nearest neighbor graph. The nodes of this graph will be colored by the cluster assignments derived from the Louvain community detection algorithm. \n", 76 | "\n", 77 | "The `compute_similarity=True` keyword argument summarizes the NBC into a cluster-by-cluster matrix. In this matrix, each entry reflects the average proportion of neighbors within a given cluster that share the same label. This provides a high-level view of the local similarity between cells across different clusters, revealing how homogenous or heterogeneous the neighborhoods are within each cluster. The similarity matrix provides a more intuitive understanding of the relationships between clusters based on their shared neighborhood structure." 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 3, 83 | "metadata": {}, 84 | "outputs": [ 85 | { 86 | "name": "stdout", 87 | "output_type": "stream", 88 | "text": [ 89 | "Computing nearest neighbors...\n", 90 | "\n", 91 | "Computing neighborhood consolidation and similarity matrices...\n", 92 | "\n" 93 | ] 94 | } 95 | ], 96 | "source": [ 97 | "# Update `ad` in place\n", 98 | "calculate_concordex(\n", 99 | " ad,\n", 100 | " 'louvain', \n", 101 | " n_neighbors=30,\n", 102 | " use_rep=\"X_pca\",\n", 103 | " compute_similarity=True\n", 104 | ")" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "The NBC is added to `ad.obsm['X_nbc']` and the similarity information can be found in `ad.uns['nbc_params']['similarity']`" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 4, 117 | "metadata": {}, 118 | "outputs": [ 119 | { 120 | "data": { 121 | "text/plain": [ 122 | "AnnData object with n_obs × n_vars = 700 × 765\n", 123 | " obs: 'bulk_labels', 'n_genes', 'percent_mito', 'n_counts', 'S_score', 'G2M_score', 'phase', 'louvain'\n", 124 | " var: 'n_counts', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'\n", 125 | " uns: 'bulk_labels_colors', 'louvain', 'louvain_colors', 'neighbors', 'pca', 'rank_genes_groups', 'index_params', 'nbc_params'\n", 126 | " obsm: 'X_pca', 'X_umap', 'index', 'X_nbc'\n", 127 | " varm: 'PCs'\n", 128 | " obsp: 'distances', 'connectivities'" 129 | ] 130 | }, 131 | "execution_count": 4, 132 | "metadata": {}, 133 | "output_type": "execute_result" 134 | } 135 | ], 136 | "source": [ 137 | "ad" 138 | ] 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "metadata": {}, 143 | "source": [ 144 | "We can easily plot the similarity matrix. To evaluate cluster assignments, we expect cell neighborhoods to be relatively homogeneous, meaning that, in most cases, a cell and its neighbors will share the same label." 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 5, 150 | "metadata": {}, 151 | "outputs": [], 152 | "source": [ 153 | "sim = ad.uns['nbc_params']['similarity']" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 6, 159 | "metadata": {}, 160 | "outputs": [ 161 | { 162 | "data": { 163 | "image/png": "", 164 | "text/plain": [ 165 | "
" 166 | ] 167 | }, 168 | "metadata": {}, 169 | "output_type": "display_data" 170 | } 171 | ], 172 | "source": [ 173 | "# Make sure axes are properly labeled\n", 174 | "axlabs = ad.uns['nbc_params']['labelorder']\n", 175 | "cg = cg = sns.clustermap(\n", 176 | " sim,\n", 177 | " row_cluster=False,\n", 178 | " col_cluster=False,\n", 179 | " xticklabels=axlabs, yticklabels=axlabs\n", 180 | ")" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 7, 186 | "metadata": {}, 187 | "outputs": [ 188 | { 189 | "data": { 190 | "text/html": [ 191 | "
\n", 192 | "Click to view session information\n", 193 | "
\n",
194 |        "-----\n",
195 |        "anndata             0.10.9\n",
196 |        "concordex           1.1.0\n",
197 |        "scanpy              1.10.1\n",
198 |        "seaborn             0.13.2\n",
199 |        "session_info        1.0.0\n",
200 |        "-----\n",
201 |        "
\n", 202 | "
\n", 203 | "Click to view modules imported as dependencies\n", 204 | "
\n",
205 |        "CoreFoundation              NA\n",
206 |        "Foundation                  NA\n",
207 |        "PIL                         10.4.0\n",
208 |        "PyObjCTools                 NA\n",
209 |        "anyio                       NA\n",
210 |        "appnope                     0.1.4\n",
211 |        "arrow                       1.3.0\n",
212 |        "asttokens                   NA\n",
213 |        "attr                        24.2.0\n",
214 |        "attrs                       24.2.0\n",
215 |        "babel                       2.14.0\n",
216 |        "brotli                      1.1.0\n",
217 |        "certifi                     2024.08.30\n",
218 |        "cffi                        1.17.1\n",
219 |        "charset_normalizer          3.3.2\n",
220 |        "colorama                    0.4.6\n",
221 |        "comm                        0.2.2\n",
222 |        "cycler                      0.12.1\n",
223 |        "cython_runtime              NA\n",
224 |        "dateutil                    2.9.0\n",
225 |        "debugpy                     1.8.5\n",
226 |        "decorator                   5.1.1\n",
227 |        "defusedxml                  0.7.1\n",
228 |        "executing                   2.1.0\n",
229 |        "fastjsonschema              NA\n",
230 |        "fqdn                        NA\n",
231 |        "h5py                        3.11.0\n",
232 |        "idna                        3.8\n",
233 |        "igraph                      0.11.6\n",
234 |        "ipykernel                   6.29.5\n",
235 |        "isoduration                 NA\n",
236 |        "jedi                        0.19.1\n",
237 |        "jinja2                      3.1.4\n",
238 |        "joblib                      1.4.2\n",
239 |        "json5                       0.9.25\n",
240 |        "jsonpointer                 3.0.0\n",
241 |        "jsonschema                  4.23.0\n",
242 |        "jsonschema_specifications   NA\n",
243 |        "jupyter_events              0.10.0\n",
244 |        "jupyter_server              2.14.2\n",
245 |        "jupyterlab_server           2.27.3\n",
246 |        "kiwisolver                  1.4.7\n",
247 |        "legacy_api_wrap             NA\n",
248 |        "leidenalg                   0.10.2\n",
249 |        "llvmlite                    0.43.0\n",
250 |        "markupsafe                  2.1.5\n",
251 |        "matplotlib                  3.9.2\n",
252 |        "matplotlib_inline           0.1.7\n",
253 |        "mpl_toolkits                NA\n",
254 |        "natsort                     8.4.0\n",
255 |        "nbformat                    5.10.4\n",
256 |        "numba                       0.60.0\n",
257 |        "numpy                       2.0.2\n",
258 |        "objc                        10.3.1\n",
259 |        "overrides                   NA\n",
260 |        "packaging                   24.1\n",
261 |        "pandas                      2.2.2\n",
262 |        "parso                       0.8.4\n",
263 |        "patsy                       0.5.6\n",
264 |        "pickleshare                 0.7.5\n",
265 |        "platformdirs                4.3.2\n",
266 |        "prometheus_client           NA\n",
267 |        "prompt_toolkit              3.0.47\n",
268 |        "psutil                      6.0.0\n",
269 |        "pure_eval                   0.2.3\n",
270 |        "pydev_ipython               NA\n",
271 |        "pydevconsole                NA\n",
272 |        "pydevd                      2.9.5\n",
273 |        "pydevd_file_utils           NA\n",
274 |        "pydevd_plugins              NA\n",
275 |        "pydevd_tracing              NA\n",
276 |        "pygments                    2.18.0\n",
277 |        "pyparsing                   3.1.4\n",
278 |        "pythonjsonlogger            NA\n",
279 |        "pytz                        2024.1\n",
280 |        "referencing                 NA\n",
281 |        "requests                    2.32.3\n",
282 |        "rfc3339_validator           0.1.4\n",
283 |        "rfc3986_validator           0.1.1\n",
284 |        "rpds                        NA\n",
285 |        "scipy                       1.14.1\n",
286 |        "send2trash                  NA\n",
287 |        "six                         1.16.0\n",
288 |        "sklearn                     1.5.1\n",
289 |        "sniffio                     1.3.1\n",
290 |        "socks                       1.7.1\n",
291 |        "stack_data                  0.6.2\n",
292 |        "statsmodels                 0.14.2\n",
293 |        "texttable                   1.7.0\n",
294 |        "threadpoolctl               3.5.0\n",
295 |        "tornado                     6.4.1\n",
296 |        "traitlets                   5.14.3\n",
297 |        "uri_template                NA\n",
298 |        "urllib3                     2.2.2\n",
299 |        "wcwidth                     0.2.13\n",
300 |        "webcolors                   24.8.0\n",
301 |        "websocket                   1.8.0\n",
302 |        "yaml                        6.0.2\n",
303 |        "zmq                         26.2.0\n",
304 |        "zstandard                   0.23.0\n",
305 |        "
\n", 306 | "
\n", 307 | "
\n",
308 |        "-----\n",
309 |        "IPython             8.27.0\n",
310 |        "jupyter_client      8.6.2\n",
311 |        "jupyter_core        5.7.2\n",
312 |        "jupyterlab          4.2.5\n",
313 |        "-----\n",
314 |        "Python 3.12.5 | packaged by conda-forge | (main, Aug  8 2024, 18:31:54) [Clang 16.0.6 ]\n",
315 |        "macOS-15.1.1-x86_64-i386-64bit\n",
316 |        "-----\n",
317 |        "Session information updated at 2025-01-14 11:47\n",
318 |        "
\n", 319 | "
" 320 | ], 321 | "text/plain": [ 322 | "" 323 | ] 324 | }, 325 | "execution_count": 7, 326 | "metadata": {}, 327 | "output_type": "execute_result" 328 | } 329 | ], 330 | "source": [ 331 | "session_info.show()" 332 | ] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "execution_count": null, 337 | "metadata": {}, 338 | "outputs": [], 339 | "source": [] 340 | } 341 | ], 342 | "metadata": { 343 | "colab": { 344 | "authorship_tag": "ABX9TyPZmkptms3G6+WJ4RcIlU2N", 345 | "provenance": [] 346 | }, 347 | "interpreter": { 348 | "hash": "adaa19b3e1639a0b29506b5755c4bbe1fbe125a7ccca5eaffe8ceb5f98914033" 349 | }, 350 | "kernelspec": { 351 | "display_name": "Python 3 (ipykernel)", 352 | "language": "python", 353 | "name": "python3" 354 | }, 355 | "language_info": { 356 | "codemirror_mode": { 357 | "name": "ipython", 358 | "version": 3 359 | }, 360 | "file_extension": ".py", 361 | "mimetype": "text/x-python", 362 | "name": "python", 363 | "nbconvert_exporter": "python", 364 | "pygments_lexer": "ipython3", 365 | "version": "3.12.5" 366 | } 367 | }, 368 | "nbformat": 4, 369 | "nbformat_minor": 4 370 | } 371 | -------------------------------------------------------------------------------- /examples/data/starmap_processed.h5ad: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pachterlab/concordex/77ef3bf7382c461ae3c9a9362bf51a3283175347/examples/data/starmap_processed.h5ad -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["setuptools >= 61.0"] 3 | build-backend = "setuptools.build_meta" 4 | 5 | [project] 6 | name = "concordex" 7 | version = "1.1.1" 8 | description = "Identification of spatial homogeneous regions with concordex" 9 | readme = "README.md" 10 | requires-python = ">=3.9" 11 | authors = [ 12 | {name = "Kayla Jackson", email = "kaylajac@caltech.edu"} , 13 | {name = "A. Sina Booeshaghi", email = "sinab@berkeley.edu"}, 14 | {name = "Angel Galvez-Merchan", email = "angelgalvez94@gmail.com"}, 15 | {name = "Alexandra Kim", email = "alexandrasuriya@gmail.com"} 16 | ] 17 | maintainers = [{name = "Kayla Jackson", email = "kaylajac@caltech.edu"}] 18 | license = {file = "LICENSE"} 19 | keywords = ["SingleCell", "Clustering", "Spatial", "Transcriptomics"] 20 | dependencies = [ 21 | "anndata>=0.8", 22 | "numpy>=1.23", 23 | "pandas>=1.5", 24 | "scikit-learn>=0.24" 25 | ] 26 | classifiers = [ 27 | "Environment :: Console", 28 | "Intended Audience :: Science/Research", 29 | "License :: OSI Approved :: MIT License", 30 | "Operating System :: OS Independent", 31 | "Programming Language :: Python :: 3.6", 32 | "Programming Language :: Python :: 3.7", 33 | "Programming Language :: Python :: 3.8", 34 | "Programming Language :: Python :: 3.9", 35 | "Programming Language :: Python :: 3.10", 36 | "Topic :: Scientific/Engineering :: Bio-Informatics", 37 | "Topic :: Utilities" 38 | ] 39 | 40 | [project.urls] 41 | Repository = "https://github.com/pachterlab/concordex" 42 | 43 | [tool.setuptools] 44 | packages = { find = {} } 45 | 46 | [tool.bumpversion] 47 | current_version = "1.1.1" 48 | commit = true 49 | tag = false 50 | files = [ 51 | "setup.cfg", 52 | "pyproject.toml", 53 | "concordex/__init__.py", 54 | "README.md" 55 | ] 56 | 57 | [tool.flake8] 58 | exclude = [".git", ".github", "__pycache__", "build", "dist"] 59 | statistics = true 60 | max-line-length = 88 61 | extend-ignore = ["E203", "E501"] 62 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pachterlab/concordex/77ef3bf7382c461ae3c9a9362bf51a3283175347/requirements.txt -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [bumpversion] 2 | current_version = 1.1.1 3 | commit = True 4 | tag = False 5 | 6 | [metadata] 7 | name = concordex 8 | version = 1.1.1 9 | url = https://github.com/pachterlab/concordex 10 | description = Identification of spatial homogeneous regions with concordex 11 | long_description = file: README.md 12 | long_description_content_type = text/markdown 13 | author = Kayla Jackson, A. Sina Booeshaghi, Angel Galvez-Merchan, Alexandra Kim 14 | maintainer = Kayla Jackson 15 | maintainer_email = kaylajac@caltech.edu 16 | kewyords = SingleCell, Clustering, Spatial, Transcriptomics 17 | license = MIT 18 | classifiers = 19 | "Environment :: Console" 20 | "Intended Audience :: Science/Research" 21 | "License :: OSI Approved :: MIT License" 22 | "Operating System :: OS Independent" 23 | "Programming Language :: Python :: 3.6" 24 | "Programming Language :: Python :: 3.7" 25 | "Programming Language :: Python :: 3.8" 26 | "Programming Language :: Python :: 3.9" 27 | "Programming Language :: Python :: 3.10" 28 | "Topic :: Scientific/Engineering :: Bio-Informatics" 29 | "Topic :: Utilities" 30 | 31 | [options] 32 | zip_safe = False 33 | python_requires = >=3.9 34 | packages = find: 35 | install_requires = 36 | anndata>=0.8 37 | numpy>=1.23 38 | pandas>=1.5 39 | scikit-learn>=0.24 40 | 41 | [bumpversion:file:concordex/__init__.py] 42 | 43 | [bumpversion:file:README.md] 44 | 45 | [flake8] 46 | exclude = .git,.github,__pycache__,build,dist 47 | statistics = True 48 | max-line-length = 88 49 | extend-ignore = E203,E501 50 | --------------------------------------------------------------------------------