├── .gitignore ├── LICENSE ├── README.md ├── cytopus ├── __init__.py ├── data │ ├── Cytopus_1.2.txt │ ├── Cytopus_1.22.txt │ ├── Cytopus_1.23.txt │ ├── Cytopus_1.31nc.txt │ └── adata_spectra.h5ad ├── knowledge_base │ ├── __init__.py │ └── kb_queries.py └── tl │ ├── __init__.py │ ├── create.py │ ├── hierarchy.py │ └── label.py ├── img ├── celltype_hierarchy_1.2.png └── cytopus_v1.1_stable_graph.png ├── notebooks ├── Cytopus_utils_tutorial.ipynb ├── Hierarchical_annotation_tutorial.ipynb ├── KnowledgeBase_construct.ipynb ├── KnowledgeBase_queries_colaboratory.ipynb └── Utils_tutorial.ipynb └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | #Python build files 2 | /dist/ 3 | __pycache__/ 4 | /*-env/ 5 | /env-*/ 6 | /build/ 7 | /*.egg-info/ 8 | *.egg-info/ 9 | *.egg 10 | 11 | #Individual files 12 | pyproject_deprecated.toml -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Thomas Walle 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Cytopus :octopus: [![DOI](https://zenodo.org/badge/389175717.svg)](https://zenodo.org/badge/latestdoi/389175717) 2 | 3 | 4 | ## Single cell omics biology annotations 5 | 6 | ![Image of Cytopus](https://github.com/wallet-maker/cytopus/blob/main/img/cytopus_v1.1_stable_graph.png) 7 | 8 | 9 | ## Overview: 10 | 11 | Package to query our single cell genomics KnowledgeBase. 12 | 13 | If you use Cytopus :octopus: or its gene sets please cite the original Cytopus [publications](https://doi.org/10.1038/s41587-023-01940-3) & 14 | [![DOI](https://zenodo.org/badge/389175717.svg)](https://zenodo.org/badge/latestdoi/389175717). 15 | 16 | For details see the [license](https://github.com/wallet-maker/cytopus/blob/Cytopus_1.3/LICENSE) 17 | 18 | The KnowledgeBase is provided in graph format based on the networkx package. Central to the KnowledgeBase is a cell type hierarchy and **cellular_processess** which correspond to the cell types in this hierarchy. Cell types are supported by gene sets indicative of their **cellular identities**. Moreover, the KnowledgeBase contains metadata about the gene sets such as author ship, the gene set topic etc.. 19 | 20 | The KnowledgeBase can be queried to retrieve gene sets for specific cell types and organize them in a dictionary format for downstream use with the [Spectra](https://github.com/dpeerlab/spectra) package: 21 | 22 | 23 | ## Installation 24 | 25 | install from pypi: 26 | 27 | ``` 28 | pip install cytopus 29 | ``` 30 | 31 | install from source: 32 | 33 | ``` 34 | pip install git+https://github.com/wallet-maker/cytopus.git 35 | ``` 36 | 37 | Some plotting functions require pygraphviz or pyvis. Install either or both: 38 | 39 | pygraphviz using conda: 40 | ``` 41 | conda install --channel conda-forge pygraphviz 42 | ``` 43 | 44 | pyvis using pip 45 | ``` 46 | pip install pyvis 47 | ``` 48 | 49 | ## Tutorial 50 | 51 | ### Quickstart - Querying the Knowledge Base: 52 | 53 | Retrieve default KnowledgeBase (human only): 54 | 55 | ``` 56 | import cytopus as cp 57 | G = cp.KnowledgeBase() 58 | ``` 59 | Retrieve custom KnowledgeBase (documentation to build KnowledgeBase object [here](https://github.com/wallet-maker/cytopus/blob/Cytopus_1.3/notebooks/KnowledgeBase_construct.ipynb)): 60 | ``` 61 | file_path = '~/dir1/dir2/knowledgebase_file.txt' 62 | G = cp.KnowledgeBase(file_path) 63 | ``` 64 | Access data in KnowledgeBase: 65 | ``` 66 | #list of all cell types in KnowledgeBase 67 | G.celltypes 68 | #dictionary of all cellular processes in KnowledgeBase as a dictionary {'process_1':['gene_a','gene_e','gene_y',...],'process_2':['gene_b','gene_u',...],...} 69 | G.processes 70 | #dictionary of all cellular identities in KnowledgeBase as a dictionary {'identity_1':['gene_j','gene_k','gene_z',...],'identity_2':['gene_y','gene_p',...],...} 71 | G.identities 72 | #dictionary with gene set properties (for cellular processes or identities) 73 | G.graph.nodes['gene_set_name'] 74 | ``` 75 | 76 | Plot the cell type hierarchy stored in the KnowledgeBase as a directed graph with edges pointing into the direction of the parents: 77 | ``` 78 | G.plot_celltypes() 79 | ``` 80 | 81 | 82 | ![Image of Cell type hierarchy](https://github.com/wallet-maker/cytopus/blob/main/img/celltype_hierarchy_1.2.png) 83 | 84 | 85 | 86 | Prepare a nested dictionary assigning cell types to their cellular processes and cellular processes to their corresponding genes. This dictionary can be used as an input for Spectra. 87 | 88 | First, select the cell types which you want to retrieve gene sets for. 89 | These cell types can be selected from the cell type hierarchy (see .plot_celltypes() method above) 90 | ``` 91 | celltype_of_interest = ['M','T','B','epi'] 92 | ``` 93 | 94 | Second, select the cell types which you want merge gene sets and set them as global gene sets for the Spectra package. These gene sets should be valid for all cell types in the data. 95 | ``` 96 | ##e.g. if you are working with different human cells 97 | global_celltypes = ['all-cells'] 98 | ##e.g. if you are working with human leukocytes 99 | global_celltypes = ['all-cells','leukocyte'] 100 | ##e.g. if you are working with B cells 101 | global_celltypes = ['all-cells','leukocyte','B'] 102 | ``` 103 | 104 | Third retrieve dictionary of format {celltype_a:{process_a:[gene_a,gene_b,...],...},...}. 105 | Decide whether you want to merge gene sets for all children or all parents (unusual) of the selected cell types. 106 | ``` 107 | G.get_celltype_processes(celltype_of_interest,global_celltypes = global_celltypes,get_children=True,get_parents =False) 108 | ``` 109 | 110 | Fourth, dictionary will be stored in the KnowledgeBase 111 | ``` 112 | G.celltype_process_dict 113 | ``` 114 | 115 | ### Detailed tutorial for Querying the Knowledge Base: 116 | Learn how to explore the Knowledge Base and retrieve a dicitionary which can be used for [Spectra](https://github.com/dpeerlab/spectra): 117 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/wallet-maker/cytopus/blob/main/notebooks/KnowledgeBase_queries_colaboratory.ipynb) 118 | 119 | ### Detailed tutorial for Generating a cytopus Knowledge Base object: 120 | Learn how to create a Knowledge Base object from gene sets annotations and cell type hierarchies stored in .csv files: 121 | [here](https://github.com/wallet-maker/cytopus/blob/main/notebooks/KnowledgeBase_construct.ipynb) 122 | 123 | ### Utils tutorial - Labeling Factor Analysis Outputs (Spectra): 124 | Learn how to label marker genes from factor analysis, determine factor cell type specificity and export the Knowledge Base content as .gmt files for other applications: 125 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/wallet-maker/cytopus/blob/main/notebooks/Cytopus_utils_tutorial.ipynb) 126 | 127 | ### Hierarchy tutorial 128 | Hierarchically annotate and query cells using AnnData and Cytopus: 129 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/wallet-maker/cytopus/blob/main/notebooks/Hierarchical_annotation_tutorial.ipynb) 130 | 131 | ## you can 132 | submit gene sets to be added to the KnowledgeBase here: 133 | 134 | https://docs.google.com/forms/d/e/1FAIpQLSfWU7oTZH8jI7T8vFK0Nqq2rfz6_83aJIVamH5cogZQMlciFQ/viewform?usp=sf_link 135 | 136 | All submissions will be reviewed and if needed revised before they will be added to the database. This will ensure consistency of the annotations and avoid gene set duplication. Authorship will be acknowledged in the KnowledgeBase for all submitted gene sets which pass review and are added to the KnowledgeBase. You can also create entirely new KnowledgeBase objects with this package. 137 | 138 | ## Citation and Usage 139 | 140 | For gene sets from external sources you must also abide to the licenses of the original gene sets. To make this easier we have stored these in the Knowledge Base object: 141 | 142 | ``` 143 | import cytopus as cp 144 | G = cp.KnowledgeBase() 145 | gene_set_of_interest = 'all_macroautophagy_regulation_positive' 146 | print(G.graph.nodes[gene_set_of_interest]) 147 | ``` 148 | 149 | -------------------------------------------------------------------------------- /cytopus/__init__.py: -------------------------------------------------------------------------------- 1 | """A Knowledge Base for Single Cell Biology""" 2 | 3 | from .tl import * 4 | from .knowledge_base import KnowledgeBase, get_data -------------------------------------------------------------------------------- /cytopus/data/Cytopus_1.2.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wallet-maker/cytopus/638dd917cdd047e322213b6e4767bf0b9b78bf80/cytopus/data/Cytopus_1.2.txt -------------------------------------------------------------------------------- /cytopus/data/Cytopus_1.22.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wallet-maker/cytopus/638dd917cdd047e322213b6e4767bf0b9b78bf80/cytopus/data/Cytopus_1.22.txt -------------------------------------------------------------------------------- /cytopus/data/Cytopus_1.23.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wallet-maker/cytopus/638dd917cdd047e322213b6e4767bf0b9b78bf80/cytopus/data/Cytopus_1.23.txt -------------------------------------------------------------------------------- /cytopus/data/Cytopus_1.31nc.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wallet-maker/cytopus/638dd917cdd047e322213b6e4767bf0b9b78bf80/cytopus/data/Cytopus_1.31nc.txt -------------------------------------------------------------------------------- /cytopus/data/adata_spectra.h5ad: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wallet-maker/cytopus/638dd917cdd047e322213b6e4767bf0b9b78bf80/cytopus/data/adata_spectra.h5ad -------------------------------------------------------------------------------- /cytopus/knowledge_base/__init__.py: -------------------------------------------------------------------------------- 1 | """Tools to plot and query KnowledgeBase""" 2 | from .kb_queries import KnowledgeBase, get_data 3 | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /cytopus/knowledge_base/kb_queries.py: -------------------------------------------------------------------------------- 1 | from os.path import dirname 2 | import matplotlib.pyplot as plt 3 | import pickle 4 | import pkg_resources 5 | import networkx as nx 6 | import numpy as np 7 | 8 | 9 | def get_data(filename): 10 | """ 11 | Load data from cytopus/data. 12 | """ 13 | return pkg_resources.resource_filename('cytopus', 'data/' + filename) 14 | 15 | #nested dict with celltype hierarchy 16 | def extract_hierarchy(G, node='all-cells',invert=False): 17 | ''' 18 | extract cell type hierarchy from KnowledgeBase object as a nested dictionary 19 | G: cytopus.kb.KnowledgeBase, with a hierarchy of celltypes 20 | node: str, celltype to use as starting points in the hiearchy (e.g. 'all-cells') 21 | invert: bool, if False the dict will contain all children below the node, if True the dict will contain all parents above the node 22 | ''' 23 | node_list_plot = G.celltypes 24 | 25 | def filter_node(n1): 26 | return n1 in node_list_plot 27 | 28 | view = nx.subgraph_view(G.graph, filter_node=filter_node) 29 | if invert: 30 | predecessors = view.successors(node) 31 | else: 32 | predecessors = view.predecessors(node) 33 | if not predecessors: 34 | return node 35 | else: 36 | return {s: extract_hierarchy(G, s) for s in predecessors} 37 | 38 | 39 | class KnowledgeBase: 40 | def __init__(self, graph=None): 41 | ''' 42 | load KnowledgeBase from file 43 | retrieve all cell types in KnowledgeBase 44 | create dictionary for cellular processes in KnowledgeBase 45 | graph: str or networkx.DiGraph, path to pickled networkx.DiGraph object formatted for cytopus or networkx.DiGraph 46 | ''' 47 | 48 | # Initialise default graph data 49 | if graph is None: 50 | graph = get_data("Cytopus_1.31nc.txt") 51 | # load KnowledgeBase from pickled file 52 | if isinstance(graph, nx.classes.digraph.DiGraph): 53 | self.graph = graph 54 | elif isinstance(graph, str): 55 | with open(graph, 'rb') as f: # notice the r instead of w 56 | self.graph = pickle.load(f) 57 | else: 58 | raise(ValueError('graph must be path (str) or networkx.classes.digraph.DiGraph object')) 59 | #retrieve all cell types from data 60 | self.celltypes = self.filter_nodes(attribute_name = 'class',attributes= ['cell_type'],origin=None,target=None) 61 | 62 | #create gene set : gene dict for all cellular processes 63 | processes = self.get_processes(gene_sets = list(set([x[0] for x in self.filter_edges(attribute_name = 'class', attributes = ['process_OF'],target=self.celltypes)]))) 64 | processes = {k:[x for x in v if x not in ['nan',np.nan]] for k,v in processes.items()} 65 | self.processes = processes 66 | #self.processes = #self.filter_nodes(attribute_name = 'class',attributes= ['processes'],origin=None,target=None) 67 | print(self) 68 | #create gene set : gene dict for all cellular identities 69 | identities = self.get_identities(self.filter_nodes(attribute_name = 'class',attributes=['cell_type'],origin=None,target=None)) 70 | identities = {k:[x for x in v if x not in ['nan',np.nan]] for k,v in identities.items()} 71 | self.identities = identities 72 | 73 | 74 | def __str__(self): 75 | print(f"KnowledgeBase object containing {len(self.celltypes)} cell types and {len(self.processes)} cellular processes") 76 | return "" 77 | 78 | def filter_nodes(self, attributes,attribute_name=None, 79 | origin=None,target=None): 80 | ''' 81 | filter nodes in networkx graph for their attributes 82 | G: networkx graph 83 | attribute_name: attribute key in node dictionary 84 | attributes: attribute list to select 85 | origin: list of node origin of node 86 | target: list of node end/target 87 | ''' 88 | node_list = [] 89 | if attribute_name == None: 90 | node_list = self.graph.nodes 91 | else: 92 | for x in self.graph.nodes: 93 | if attribute_name in self.graph.nodes[x].keys(): 94 | if self.graph.nodes[x][attribute_name] in attributes: 95 | node_list.append(x) 96 | if origin!=None: 97 | node_list = [x for x in node_list if x[0] in origin] 98 | if target!=None: 99 | node_list = [x for x in node_list if x[1] in target] 100 | return node_list 101 | 102 | 103 | 104 | 105 | def filter_edges(self, attributes,attribute_name=None,origin=None,target=None, ): 106 | ''' 107 | filter edges in networkx graph for their attributes 108 | G: networkx graph 109 | attribute_name: attribute key in node dictionary 110 | attributes: attribute list to select 111 | origin: list of node origin of edge 112 | target: list of node end/target 113 | ''' 114 | edge_list = [] 115 | if attribute_name == None: 116 | edge_list = self.graph.edges 117 | else: 118 | for x in self.graph.edges: 119 | if attribute_name in self.graph.edges[x].keys(): 120 | if self.graph.edges[x][attribute_name] in attributes: 121 | edge_list.append(x) 122 | if origin!=None: 123 | edge_list = [x for x in edge_list if x[0] in origin] 124 | if target!=None: 125 | edge_list = [x for x in edge_list if x[1] in target] 126 | return edge_list 127 | 128 | def get_celltype_hierarchy(self, node='all-cells',invert=False): 129 | #retrieve hierarchy 130 | hierarchy_dict = extract_hierarchy(self, node=node,invert=invert) 131 | return hierarchy_dict 132 | 133 | def get_processes(self,gene_sets): 134 | ''' 135 | create dictionary gene sets for cellular processes : genes 136 | self: KnowledgeBase object (networkx) 137 | gene_sets: list of gene sets for cellular processes 138 | ''' 139 | #get genes per gene set 140 | genes = self.filter_nodes(attribute_name = 'class',attributes =['gene']) 141 | #dictionary geneset : genes 142 | gene_edges = self.filter_edges( attribute_name ='class', attributes = ['gene_OF'],origin=gene_sets,target=genes) 143 | gene_set_dict = {} 144 | for i in gene_edges: 145 | if i[0] in gene_set_dict.keys(): 146 | gene_set_dict[i[0]].append(i[1])# 147 | else: 148 | gene_set_dict[i[0]]= [i[1]]# 149 | return gene_set_dict 150 | 151 | def get_celltype_processes(self,celltypes,global_celltypes=[None],get_parents =True, get_children =True, parent_depth=1, child_depth= None, fill_missing=True,parent_depth_dict=None, child_depth_dict=None, inplace=True): 152 | ''' 153 | get gene sets for specific cell types 154 | self: KnowledgeBase object (networkx) 155 | celltypes: list of celltypes to retrieve 156 | global_celltypes: list of celltypes to set as 'global' for Spectra 157 | get_parent: also retrieve gene sets for the parents of the cell types in celltypes 158 | get_children: also retrieve gene sets for the parents of the cell types in celltypes 159 | fill_missing: add an empty dictionary for cell types not found in KnowledgeBase 160 | parent_depth: steps from cell type to go up the hierarchie to retrieve gene sets linked to parents (e.g. 2 would be up to grandparents) 161 | parent_depth_dict: you can also set the depth for specific celltype with a dictionary {celltype1:depth1,celltype2:depth2} 162 | child_depth: steps from cell type to go down the hierarchie to retrieve gene sets linked to children (e.g. 2 would be down to grandchildren) 163 | child_depth_dict: you can also set the depth for specific celltype with a dictionary {celltype1:depth1,celltype2:depth2} 164 | inplace: bool, if True save output under self.celltype_process_dict 165 | ''' 166 | import itertools 167 | import warnings 168 | from collections import Counter 169 | 170 | ## limit to celltype subgraph to retrieve relevant celltypes 171 | 172 | node_list_plot = self.celltypes 173 | 174 | def filter_node(n1): 175 | return n1 in node_list_plot 176 | 177 | view = nx.subgraph_view(self.graph, filter_node=filter_node) 178 | 179 | for x in list(set(celltypes+global_celltypes)): 180 | if x not in list(view.nodes): 181 | warnings.warn('Not all cell types are contained in the Immune Knowledge base') 182 | if get_parents: 183 | all_celltypes_parents = {} 184 | if parent_depth_dict == None: 185 | parent_depth_dict = {} 186 | 187 | for i in celltypes: 188 | if i in view.nodes:#is celltype in KnowledgeBase 189 | if i in parent_depth_dict.keys(): #check if query depth was manually defined 190 | if parent_depth_dict[i] == None: 191 | all_celltypes_parents[i]= [i] 192 | else: 193 | all_celltypes_parents[i]= [n for n in nx.traversal.bfs_tree(view, i,depth_limit=parent_depth_dict[i])] 194 | else: 195 | all_celltypes_parents[i]= [n for n in nx.traversal.bfs_tree(view, i,depth_limit=parent_depth)] 196 | elif fill_missing: 197 | all_celltypes_parents[i] = {} #if not add an empty dictionary 198 | print('adding empty dictionary for cell type:',i) 199 | else: 200 | all_celltypes_parents[i]= [i] 201 | print('cell type of interest',i,'is not in the knowledge base') 202 | if get_children: 203 | all_celltypes_children = {} 204 | if child_depth_dict == None: 205 | child_depth_dict = {} 206 | 207 | for i in celltypes: 208 | if i in view.nodes: 209 | if i in child_depth_dict.keys(): #check if query depth was manually defined 210 | if child_depth_dict[i] ==None: 211 | all_celltypes_children[i]= [i] 212 | else: 213 | all_celltypes_children[i]= [n for n in nx.traversal.bfs_tree(view, i,reverse=True,depth_limit=child_depth_dict[i])] 214 | else: 215 | all_celltypes_children[i]= [n for n in nx.traversal.bfs_tree(view, i,reverse=True,depth_limit=child_depth)] 216 | else: 217 | all_celltypes_children[i]= [i] 218 | print('cell type of interest',i,'is not in the knowledge base') 219 | 220 | if get_parents ==True and get_children==True: 221 | all_celltypes = list(itertools.chain.from_iterable(list(all_celltypes_children.values())+list(all_celltypes_parents.values()))) 222 | elif get_parents==True and get_children==False: 223 | all_celltypes = list(itertools.chain.from_iterable(list(all_celltypes_parents.values()))) 224 | elif get_parents == False and get_children==True: 225 | all_celltypes = list(itertools.chain.from_iterable(list(all_celltypes_children.values()))) 226 | else: 227 | all_celltypes = [] 228 | all_celltypes = list(set(all_celltypes + global_celltypes + celltypes)) 229 | 230 | #get process genesets connected to these celltypes 231 | gene_set_edges =self.filter_edges(attribute_name = 'class', attributes = ['process_OF'],target=all_celltypes) 232 | 233 | #dictionary gene set : cell type 234 | gene_set_celltype_dict = dict(gene_set_edges) 235 | #dictionary cell type: [gene_set1, gene_set2,...] 236 | celltype_gene_set_dict = {} 237 | 238 | for key,value in gene_set_celltype_dict.items(): 239 | if value in celltype_gene_set_dict.keys(): 240 | celltype_gene_set_dict[value].append(key) 241 | else: 242 | celltype_gene_set_dict[value] = [key] 243 | 244 | #get dict gene sets for cellular processes : genes 245 | gene_set_dict = self.get_processes(gene_sets = list(set([x[0] for x in gene_set_edges]))) 246 | 247 | #construct dictionary 248 | process_dict = {} 249 | 250 | for key,value in celltype_gene_set_dict.items(): 251 | process_dict[key] = {} 252 | for gene_set in value: 253 | process_dict[key][gene_set] = gene_set_dict[gene_set] 254 | if global_celltypes != [None]: 255 | global_gs = {} 256 | for i in global_celltypes: 257 | if i in process_dict.keys(): 258 | global_gs.update(process_dict[i]) 259 | del process_dict[i] 260 | else: 261 | print('did not find',i,'in cell type keys to set as global') 262 | process_dict['global'] = global_gs 263 | 264 | else: 265 | print('you must add a "global" key to run Spectra. E.g. set to one cell type key to be set as "global"') 266 | 267 | ## merge relevant children and parents into cell type specific keys 268 | 269 | process_dict_merged = {} 270 | 271 | 272 | if get_children: 273 | for key,value in all_celltypes_children.items(): 274 | merged_dict = {} 275 | for cell_type in value: 276 | if cell_type in process_dict.keys(): 277 | merged_dict.update(process_dict[cell_type]) 278 | if key in process_dict_merged.keys(): 279 | process_dict_merged[key].update(merged_dict) 280 | else: 281 | process_dict_merged[key]=merged_dict 282 | 283 | if get_parents: 284 | for key,value in all_celltypes_parents.items(): 285 | merged_dict = {} 286 | for cell_type in value: 287 | if cell_type in process_dict.keys(): 288 | merged_dict.update(process_dict[cell_type]) 289 | if key in process_dict_merged.keys(): 290 | process_dict_merged[key].update(merged_dict) 291 | else: 292 | process_dict_merged[key]=merged_dict 293 | 294 | if get_children==False and get_parents==False: 295 | process_dict_merged =process_dict 296 | 297 | if global_celltypes != [None]: 298 | process_dict_merged['global'] = process_dict['global'] 299 | 300 | ## check if cell types contain shared children or parents 301 | if get_children: 302 | shared_children = [] 303 | for key,value in Counter(list(itertools.chain.from_iterable(list(all_celltypes_children.values())))).items(): 304 | if value >1: 305 | shared_children.append(key) 306 | if shared_children != []: 307 | 308 | print('cell types of interest share the following children:',shared_children,'This may be desired.') 309 | if get_parents: 310 | shared_parents = [] 311 | for key,value in Counter(list(itertools.chain.from_iterable(list(all_celltypes_parents.values())))).items(): 312 | if value >1: 313 | shared_parents.append(key) 314 | if shared_parents != []: 315 | print('cell types of interest share the following parents:',shared_parents,'This may be desired.') 316 | if inplace: 317 | self.celltype_process_dict = process_dict_merged 318 | #self.processes = gene_set_dict 319 | else: 320 | return process_dict_merged 321 | 322 | def get_identities(self, celltypes_identities,include_subsets=False): 323 | ''' 324 | self: KnowledgeBase object (networkx) 325 | celltypes: list of cell types to retrieve identity gene sets for 326 | ''' 327 | if not isinstance(celltypes_identities, list): 328 | raise TypeError('celltypes_identities must of be of type: list') 329 | if include_subsets: 330 | def filter_node(n1): 331 | return n1 in self.celltypes 332 | 333 | 334 | celltype_view =nx.subgraph_view(self.graph, filter_node=filter_node) 335 | celltypes_new = [] 336 | for i in celltypes_identities: 337 | nodes_of_specific_type = [n for n in nx.traversal.bfs_tree(celltype_view, i,reverse=True)] 338 | celltypes_new += nodes_of_specific_type 339 | celltypes_identities = list(set(celltypes_new)) 340 | 341 | identity_edges = self.filter_edges( attribute_name ='class', attributes = ['identity_OF'],target=celltypes_identities) 342 | 343 | #construct dictionary geneset:gene 344 | gene_edges = self.filter_edges( attribute_name ='class', attributes = ['gene_OF']) 345 | gene_set_dict = {} 346 | for i in gene_edges: 347 | if i[0] in gene_set_dict.keys(): 348 | gene_set_dict[i[0]].append(i[1])# 349 | else: 350 | gene_set_dict[i[0]]= [i[1]]# 351 | 352 | #construct dictionary celltype: identity_geneset 353 | identity_dict = {} 354 | 355 | for edge in identity_edges: 356 | if edge[1] in list(self.celltypes): 357 | identity_gs = gene_set_dict[edge[0]] 358 | identity_dict[edge[1]] = identity_gs 359 | else: 360 | print(edge[1],'not contained in KnowledgeBase') 361 | return identity_dict 362 | 363 | def plot_celltypes(self, figure_size = [30,30], node_size = 1000, edge_width= 1, arrow_size=20, 364 | edge_color= 'k', node_color='#8decf5', label_size = 20): 365 | '''' 366 | plot all celltypes contained in the KnowledgeBase using matplotlib and graphviz 367 | self: KnowledgeBase object (networkx) 368 | figure_size: figure size 369 | node_size: node size in graph 370 | edge_with: edge width in graph 371 | arrow_size: arrow size of directed edges 372 | edge_color: edge color 373 | node_color: node color 374 | label_size: size of node labels 375 | ''' 376 | try: 377 | from networkx.drawing.nx_agraph import graphviz_layout 378 | except ModuleNotFoundError: 379 | print('please install graphviz') 380 | pass 381 | node_list_plot = self.filter_nodes(attribute_name='class', attributes = ['cell_type']) 382 | 383 | def filter_node(n1): 384 | return n1 in node_list_plot 385 | 386 | plt.rcParams["figure.figsize"] = figure_size 387 | plt.rcParams["figure.autolayout"] = True 388 | 389 | view = nx.subgraph_view(self.graph, filter_node=filter_node) 390 | 391 | pos=graphviz_layout(view) 392 | 393 | nodes = nx.draw_networkx_nodes(view, pos=pos,node_color=node_color,nodelist=None,node_size=node_size,label=True) 394 | edges = nx.draw_networkx_edges(view, pos=pos, edgelist=None, width=edge_width, edge_color=edge_color, style='solid', alpha=None, arrowstyle=None, 395 | arrowsize=arrow_size, 396 | edge_cmap=None, edge_vmin=None, edge_vmax=None, ax=None, arrows=None, label=None, 397 | node_size=node_size, nodelist=None, node_shape='o', connectionstyle='arc3', 398 | min_source_margin=0, min_target_margin=0) 399 | labels = nx.draw_networkx_labels(view,pos=pos,font_size=label_size) 400 | print('all celltypes in knowledge base:',list(labels.keys())) 401 | 402 | def plot_graph_interactive(self, attributes=['cell_type','cellular_process'],colors= ['red','blue'], save_path = 'graph.html'): 403 | ''' 404 | plot excerpt from the KnowledgeBase using the pyvis package 405 | self: KnowledgeBase object (networkx) 406 | attributes: list of node classes to plot 407 | colors: list of colors in the order of the node classes to plot 408 | save_path: save path for .html file 409 | ''' 410 | 411 | try: 412 | from pyvis.network import Network 413 | except ModuleNotFoundError: 414 | print('please install pyvis') 415 | pass 416 | 417 | while len(attributes)!=len(colors): 418 | print('attributes and colors have to be same length') 419 | break 420 | while len(set(attributes))!=len(attributes): 421 | print('attributes have to be unique') 422 | break 423 | 424 | net = Network(notebook=True) 425 | #cell types 426 | node_list_plot = self.filter_nodes(attribute_name='class', attributes = attributes) 427 | def filter_node(n1): 428 | return n1 in node_list_plot 429 | view = nx.subgraph_view(self.graph,filter_node=filter_node) 430 | net.from_nx(view) 431 | 432 | for i in net.nodes: 433 | index_range = list(range(len(attributes))) 434 | for v in index_range: 435 | if i['class'] == attributes[v]: 436 | i['color']= colors[v] 437 | 438 | #show 439 | net.show(save_path) 440 | 441 | -------------------------------------------------------------------------------- /cytopus/tl/__init__.py: -------------------------------------------------------------------------------- 1 | """Tools to use KnowledgeBase to label and interpret data""" 2 | from . import label 3 | from . import create 4 | from . import hierarchy as hier -------------------------------------------------------------------------------- /cytopus/tl/create.py: -------------------------------------------------------------------------------- 1 | #import networkx as nx 2 | 3 | def construct_kb(celltype_edges, geneset_gene_edges,geneset_celltype_edges,annotation_dict,metadata_dict=None,save=False, save_path=None): 4 | ''' 5 | construct a cytopus.kb.KnowledgeBase object 6 | celltype_edges: list, list of tuples storing the edges of the cell type hierarchy as ('child', 'parent') 7 | geneset_gene_edges: list, list of tuples storing the edges connecting every gene_set with every gene as ('gene_set','gene') 8 | geneset_celltype_edges: list, list of tuples storing the edges connecting every gene sets with its cell type as ('gene_set','celltype') 9 | annotation_dict: dict, containing the gene set names as keys and their annotation names (cellular_process or cellular_identity) as values 10 | metadata_dict: dict, nested dict containing the gene set names as keys and a dict storing their attributes_categories as keys and corresponding attributes as values 11 | save: bool, if True saves the data to the path provided in save_path 12 | save_path: str, path to save the data to (.txt file) 13 | ''' 14 | 15 | #get genes, genesets, celltypes 16 | genes = list(set([x[1] for x in geneset_gene_edges])) 17 | genes = [(x,{'class':'gene'}) for x in genes] 18 | gene_sets = list(set([x[0] for x in geneset_gene_edges])) 19 | celltypes = list(set([x[0] for x in celltype_edges]).union(set([x[1] for x in celltype_edges]))) 20 | celltypes = [(x,{'class':'cell_type'})for x in celltypes] 21 | 22 | #some sanity checks 23 | celltypes_in_hierarchy = set([x[0] for x in celltypes]) 24 | celltypes_of_genesets = set([x[1] for x in geneset_celltype_edges]) 25 | set_dif = celltypes_of_genesets - celltypes_in_hierarchy 26 | if set_dif != set(): 27 | print('WARNING: missing cell types:',set_dif,'in the cell type hierarchy. Please append cell type hierarchy.') 28 | else: 29 | print('all cell types in gene set are contained in the cell type hierarchy') 30 | 31 | genesets_in_celltype_edges = set([x[0] for x in geneset_celltype_edges]) 32 | genesets_in_gene_edges = set([x[0] for x in geneset_gene_edges]) 33 | 34 | if genesets_in_celltype_edges != genesets_in_gene_edges: 35 | print('WARNING: Gene sets in geneset_celltype_edges and geneset_gene_edges are not identical') 36 | 37 | #set edge attributes (important for queries) 38 | geneset_gene_edges = [x + ({'class':'gene_OF'},) for x in geneset_gene_edges] 39 | celltype_edges = [x + ({'class':'SUBSET_OF'},) for x in celltype_edges] 40 | 41 | 42 | #sort processes and identities 43 | processes = [] 44 | identities = [] 45 | 46 | for i in gene_sets: 47 | if annotation_dict[i] == 'cellular_process': 48 | processes.append(i) 49 | elif annotation_dict[i] == 'cellular_identity': 50 | identities.append(i) 51 | else: 52 | raise(ValueError('all gene sets annotation names should be either cellular_process or cellular_identity')) 53 | 54 | geneset_gene_edges_processes = [x for x in geneset_gene_edges if x[0] in processes] 55 | geneset_gene_edges_identities = [x for x in geneset_gene_edges if x[0] in identities] 56 | geneset_celltype_edge_processes = [x + ({'class':'process_OF'},) for x in geneset_celltype_edges if x[0] in processes] 57 | geneset_celltype_edge_identities = [x + ({'class':'identity_OF'},) for x in geneset_celltype_edges if x[0] in identities] 58 | 59 | #construct graph 60 | G = nx.DiGraph() 61 | G.add_nodes_from(genes) 62 | G.add_nodes_from(gene_sets) 63 | G.add_nodes_from(identities) 64 | G.add_nodes_from(celltypes) 65 | G.add_edges_from(geneset_gene_edges_processes) 66 | G.add_edges_from(geneset_gene_edges_identities) 67 | G.add_edges_from(celltype_edges) 68 | G.add_edges_from(geneset_celltype_edge_processes) 69 | G.add_edges_from(geneset_celltype_edge_identities) 70 | 71 | #set node metadata 72 | if isinstance(metadata_dict,dict): 73 | nx.set_node_attributes(G, metadata_dict) 74 | else: 75 | print('No metadata dictionary provided (optional), skipping metadata assignment.') 76 | if save: 77 | if not isinstance(save_path,str): 78 | print('WARNING: Please provide save_path if you want to save the data. Skipping saving step.') 79 | else: 80 | import pickle 81 | with open(save_path, 'wb') as f: 82 | pickle.dump(G, f) 83 | print('Pickled and saved to:',save_path) 84 | return KnowledgeBase(graph=G) 85 | 86 | -------------------------------------------------------------------------------- /cytopus/tl/hierarchy.py: -------------------------------------------------------------------------------- 1 | #import networkx as nx 2 | from networkx.drawing.nx_agraph import graphviz_layout 3 | def build_nested_dict(graph, node): 4 | ''' 5 | build nested dictionary from reverse view of cytopus cell type hierarchy 6 | graph: networkx.DiGraph.view, reverse view of Cytopus cell type hierarchy 7 | root: str, name of root node in the reversed view 8 | ''' 9 | nested_dict = {node: {}} 10 | for neighbor in graph.successors(node): 11 | nested_dict[node].update(build_nested_dict(graph, neighbor)) 12 | return nested_dict 13 | 14 | def get_hierarchy_dict(G): 15 | ''' 16 | reverse Cytopus cell type hierarchy and build nested hierarchy from it 17 | G: Cytopus.KnowledgeBase, containing cell type hierarchy 18 | 19 | ''' 20 | import networkx as nx 21 | #get view of cell type hierarchy 22 | node_list_plot = G.filter_nodes(attribute_name='class', attributes = ['cell_type']) 23 | def filter_node(n1): 24 | return n1 in node_list_plot 25 | 26 | view = nx.subgraph_view(G.graph, filter_node=filter_node) 27 | 28 | #reverse graph view (going from least granular to most granular cell type) 29 | reversed_view = view.reverse(copy=True) 30 | root_nodes = [n for n in reversed_view.nodes if reversed_view.in_degree(n) == 0] 31 | 32 | #build the nested dictionary 33 | hierarchy_dict = {} 34 | for root in root_nodes: 35 | hierarchy_dict.update(build_nested_dict(reversed_view, root)) 36 | 37 | return hierarchy_dict 38 | 39 | def create_hierarchical_graph(data, type_label): 40 | import networkx as nx 41 | G = nx.DiGraph() 42 | for parent, children in data.items(): 43 | G.add_node(parent) 44 | if isinstance(children, dict): 45 | child_node = create_hierarchical_graph(children,type_label) 46 | G.add_nodes_from(child_node.nodes(data=True)) 47 | G.add_edges_from([(u,v) for u,v in child_node.edges()]) 48 | for child in children: 49 | G.add_edge(child, parent) 50 | else: 51 | for child in children: 52 | G.add_node(child) 53 | G.add_edge(child, parent) 54 | nx.set_node_attributes(G, type_label,'type') 55 | return G 56 | 57 | def get_all_keys(d): 58 | keys = set() 59 | for k, v in d.items(): 60 | keys.add(k) 61 | if isinstance(v, dict): 62 | keys |= get_all_keys(v) 63 | return keys 64 | 65 | def get_nodes_of_type(graph, node_type): 66 | nodes = [node for node in graph.nodes() if graph.nodes[node]['type'] == node_type] 67 | nodes.sort(key=lambda x: x.split('.')) 68 | return nodes 69 | 70 | def get_indices(df, value): 71 | return df.index[df.astype(str).apply(lambda x: x == value).any(axis=1)].tolist() 72 | 73 | def get_node_labels(graph, node_type): 74 | import networkx as nx 75 | nodes = [node for node in nx.dfs_postorder_nodes(graph) if graph.nodes[node]['type'] == node_type] 76 | return nodes[::-1] 77 | 78 | 79 | class Hierarchy: 80 | import networkx as nx 81 | def __init__(self, hierarchy_dict): 82 | ''' 83 | load hierarchy class 84 | hierarchy_dict: dict, nested dict containing the cell type hierarchy 85 | ''' 86 | self.graph = create_hierarchical_graph(hierarchy_dict,type_label = 'cell_type') 87 | print(self.__str__()) 88 | 89 | def __str__(self): 90 | all_celltypes = get_nodes_of_type(self.graph, 'cell_type') 91 | return f"Hierarchy class containing {len(all_celltypes)} cell types:{all_celltypes}" 92 | 93 | 94 | def plot_celltypes(self, node_color='#8decf5', node_size = 1000,edge_width= 1,arrow_size=20 ,edge_color= 'k',label_size = 10, figsize=[30,30]): 95 | ''' 96 | plot all cell types contained in hierarchy object 97 | ''' 98 | 99 | 100 | #plt.rcParams["figure.figsize"] = figure_size 101 | #plt.rcParams["figure.autolayout"] = True 102 | import networkx as nx 103 | import matplotlib.pyplot as plt 104 | node_list_plot = get_nodes_of_type(self.graph, 'cell_type') 105 | def filter_node(n1): 106 | return n1 in node_list_plot 107 | 108 | view = nx.subgraph_view(self.graph, filter_node=filter_node) 109 | 110 | pos=graphviz_layout(view) 111 | plt.rcParams["figure.figsize"] = figsize 112 | nodes = nx.draw_networkx_nodes(view, pos=pos,node_color=node_color,nodelist=None,node_size=node_size,label=True) 113 | edges = nx.draw_networkx_edges(view, pos=pos, edgelist=None, width=edge_width, edge_color=edge_color, style='solid', alpha=None, arrowstyle=None, 114 | arrowsize=arrow_size, 115 | edge_cmap=None, edge_vmin=None, edge_vmax=None, ax=None, arrows=None, label=None, 116 | node_size=node_size, nodelist=None, node_shape='o', connectionstyle='arc3', 117 | min_source_margin=0, min_target_margin=0) 118 | labels = nx.draw_networkx_labels(view,pos=pos,font_size=label_size) 119 | 120 | def add_cells(self, adata, obs_columns=None): 121 | ''' 122 | add cells to their most granular annotation in the hierarchy object 123 | adata: anndata.AnnData, containing the cell type annotations under adata.obs 124 | obs_columns: ls, list of columns in adata.obs where the cell type annotations are stored (recommended) 125 | ''' 126 | import networkx as nx 127 | if obs_columns==None: 128 | adata_sub = adata.obs 129 | else: 130 | adata_sub = adata.obs[obs_columns] 131 | 132 | celltype_nodes = get_node_labels(self.graph,'cell_type') 133 | #loop over celltypes to retrieve cells of each celltype 134 | used_barcodes = set() 135 | for i in celltype_nodes: 136 | barcodes = get_indices(adata_sub, i) 137 | barcodes = [x for x in barcodes if x not in used_barcodes] 138 | used_barcodes = used_barcodes.union(set(barcodes)) 139 | for x in barcodes: 140 | self.graph.add_node(x, type='cell') 141 | self.graph.add_edge(i,x) 142 | 143 | def query_ancestors(self, query_node, adata=None, obs_key='hierarchical_query'): 144 | ''' 145 | retrieves all cell barcodes belonging to the cell type and all of its subsets 146 | query_node: str, cell type name fir which to retrieve barcodes 147 | node_type: str, node type of cell type node (here: 'cell_type') 148 | adata: anndata.AnnData, adata to store the cell type annotations under adata.obs[obs_key] 149 | obs_key: str, column label to store cell tyoe annotations under adata.obs[obs_key] 150 | returns: dict, containing the barcodes belonging to each annotation in self.annotations, if adata is provided they will also be stored in adata.obs[obs_key] 151 | ''' 152 | import networkx as nx 153 | import anndata 154 | node_type='cell_type' 155 | if node_type == self.graph.nodes[query_node]['type']: 156 | nodes_of_specific_type = [node for node in nx.ancestors(self.graph, query_node) if self.graph.nodes[node]['type'] == node_type] 157 | nodes_of_specific_type.append(query_node) 158 | cell_nodes = {} 159 | for node in set(nodes_of_specific_type): 160 | cell_edges = [edge for edge in self.graph.edges(node) if self.graph.nodes[edge[1]]['type'] == 'cell'] 161 | cell_nodes[node] = [edge[1] for edge in cell_edges] 162 | cell_nodes_inv = {} 163 | for k,v in cell_nodes.items(): 164 | for i in v: 165 | cell_nodes_inv[i] = k 166 | if isinstance(adata,anndata._core.anndata.AnnData): 167 | adata.obs[obs_key]= adata.obs_names.map(cell_nodes_inv) 168 | self.annotations = cell_nodes 169 | else: 170 | print('query_node:',query_node,'should be of type',node_type,'stopping...') 171 | -------------------------------------------------------------------------------- /cytopus/tl/label.py: -------------------------------------------------------------------------------- 1 | #import pandas as pd 2 | #import csv 3 | 4 | def overlap_coefficient(set_a,set_b): 5 | ''' 6 | calculate the overlap coefficient between two sets 7 | ''' 8 | min_len = min([len(set_a),len(set_b)]) 9 | intersect_len = len(set_a.intersection(set_b)) 10 | overlap = intersect_len/min_len 11 | return overlap 12 | 13 | def label_marker_genes(marker_genes, gs_label_dict, threshold = 0.4): 14 | ''' 15 | label an array of marker genes using a KnowledgeBase or a dictionary derived from the KnowledgeBase 16 | returns a dataframe of overlap coefficients for each gene set annotation and marker gene 17 | 18 | marker_genes: numpy.array or list of lists, factors x marker genes 19 | gs_label_dict: cytopus.KnowledgeBase or dict, with gene set names (str) as keys and gene sets (list) as values 20 | threshold: float, if overlap coefficient > than threshold the factor will be labeled with the gene set name with 21 | maximum overlap coefficient 22 | 23 | returns: pandas.DataFrame, with overlap coefficients of factors (rows) and gene sets (columns), indices are relabeled 24 | to the gene set with the maximum overlap coefficient 25 | ''' 26 | #import numpy as np 27 | 28 | if isinstance(gs_label_dict,KnowledgeBase): 29 | #collapse annotation dict 30 | gs_dict = {} 31 | key_list = [] 32 | for key, value in gs_label_dict.celltype_process_dict.items(): 33 | for k,v in value.items(): 34 | if k not in key_list: 35 | gs_dict[k]=v 36 | key_list.append(k) 37 | elif isinstance(gs_label_dict, dict): 38 | for v in gs_label_dict.values(): 39 | if isinstance(v,dict): 40 | raise ValueError('gs_label_dict is a nested dictionary. gs_label_dict must be a flat/non-nested dictionary with gene set names as keys (str) amd gene sets (lists of strings) as values') 41 | gs_dict = gs_label_dict 42 | else: 43 | raise ValueError('gs_label_dict must be a dictionary or a cytopus.kb.queries.KnowledgeBase object') 44 | 45 | overlap_df = pd.DataFrame() 46 | for i, v in pd.DataFrame(marker_genes).T.items(): 47 | overlap_temp = [] 48 | gs_names_temp = [] 49 | for gs_name, gs in gs_dict.items(): 50 | gene_set = set(gs) 51 | marker_set = set(v) 52 | #check and remove for nans 53 | if 'nan' in gene_set: 54 | gene_set.remove('nan') 55 | if 'nan' in marker_set: 56 | marker_set.remove('nan') 57 | if len(gene_set) > 0 and len(marker_set)>0: 58 | overlap_temp.append(overlap_coefficient(set(gene_set),set(marker_set))) 59 | else: 60 | overlap_temp.append(np.nan) 61 | gs_names_temp.append(gs_name) 62 | overlap_df_temp = pd.DataFrame(overlap_temp, columns=[i],index=gs_names_temp).T 63 | overlap_df = pd.concat([overlap_df,overlap_df_temp]) 64 | marker_gene_labels = [] #gene sets 65 | for marker_set in overlap_df.index: 66 | max_overlap = overlap_df.loc[marker_set].sort_values().index[-1] 67 | if overlap_df.loc[marker_set].sort_values().values[-1] >threshold: 68 | marker_gene_labels.append(max_overlap) 69 | else: 70 | marker_gene_labels.append(marker_set) 71 | overlap_df.index = marker_gene_labels 72 | 73 | return overlap_df 74 | 75 | 76 | def get_celltype(adata, celltype_key,factor_list=None,Spectra_cell_scores= 'SPECTRA_cell_scores'): 77 | ''' 78 | For a list of factors check in which cell types they are expressed 79 | adata: anndata.AnnData, containing cell type labels in adata.obs[celltype_key] 80 | celltype_key: str, key for adata.obs containing the cell type labels 81 | factor_list: list, list of keys for factor loadings in .obs, if none use factor loadings in adata.obsm['SPECTRA_factors'] 82 | return: dictionary mapping factor names and celltypes 83 | Spectra_cell_scores: str, key for Spectra cell scores in adata.obsm 84 | ''' 85 | 86 | if factor_list!= None: 87 | factors= adata.obs[factor_list] 88 | factors['celltype'] = list(adata.obs[celltype_key]) 89 | else: 90 | factors = pd.DataFrame(adata.obsm[Spectra_cell_scores]) 91 | factors['celltype'] = list(adata.obs[celltype_key]) 92 | 93 | #create factor:celltype dict 94 | grouped_df = factors.groupby('celltype').mean() 95 | #get factor names for global (expressed in all cells) and cell type spec factors 96 | global_factor_names = grouped_df.T[(grouped_df!=0).all()].index 97 | specific_factor_names= [x for x in grouped_df.columns if x not in global_factor_names] 98 | #add global factors to dict 99 | factor_names_global = {x:'global' for x in global_factor_names} 100 | 101 | #get celltype for celltype spec factors 102 | grouped_df_spec = grouped_df[specific_factor_names] 103 | 104 | for i in grouped_df_spec.columns: 105 | factor_names_global[i] = grouped_df_spec[i].sort_values(ascending=False).index[0] 106 | return factor_names_global 107 | 108 | 109 | def get_gmt(gs_dict,save=False,path=None): 110 | ''' 111 | transform a dictionary into a .gmt file 112 | gs_dict: dict, gene set dictionary with format {'gene set name':['Gene_a','Gene_b','Gene_c',...]} 113 | save: bool, if True saves .gmt file to path 114 | path: str, path to save .gmt file 115 | ''' 116 | #import numpy as np 117 | #import pandas as pd 118 | #retrieve all genes from dict 119 | genes = [] 120 | for k,v in gs_dict.items(): 121 | genes = genes+v 122 | genes = list(set(genes)) 123 | 124 | #pad the lists in gs_dict to equal lengths 125 | max_length = max(map(len, gs_dict.values())) 126 | 127 | for k,v in gs_dict.items(): 128 | if len(v)1.3", 13 | #"numpy>1.2", 14 | "networkx>2.7", 15 | #"matplotlib>3.4" 16 | ], 17 | include_package_data=True, 18 | package_data={'cytopus': ['data/*.txt','data/*.h5ad']}, 19 | ) 20 | --------------------------------------------------------------------------------