├── .gitattributes ├── .gitignore ├── LICENSE ├── README.md ├── Tutorial_scaleSC.ipynb ├── docs ├── 404.html ├── api-docs │ ├── harmonypy_gpu │ │ └── index.html │ ├── index.html │ ├── kernels │ │ └── index.html │ ├── pp │ │ └── index.html │ ├── trim_merge_marker │ │ └── index.html │ └── util │ │ └── index.html ├── css │ ├── base.css │ ├── bootstrap.min.css │ ├── bootstrap.min.css.map │ ├── brands.min.css │ ├── extra.css │ ├── fontawesome.min.css │ ├── solid.min.css │ └── v4-font-face.min.css ├── img │ ├── favicon.ico │ ├── grid.png │ ├── pipeline.png │ └── time_comp.png ├── index.html ├── js │ ├── base.js │ ├── bootstrap.bundle.min.js │ ├── bootstrap.bundle.min.js.map │ └── darkmode.js ├── search │ ├── lunr.js │ ├── main.js │ ├── search_index.json │ └── worker.js ├── sitemap.xml ├── sitemap.xml.gz └── webfonts │ ├── fa-brands-400.ttf │ ├── fa-brands-400.woff2 │ ├── fa-regular-400.ttf │ ├── fa-regular-400.woff2 │ ├── fa-solid-900.ttf │ ├── fa-solid-900.woff2 │ ├── fa-v4compatibility.ttf │ └── fa-v4compatibility.woff2 ├── img ├── pipeline.png ├── scalesc_overview.png ├── scalesc_pipeline.png └── time_comp.png ├── pyproject.toml └── scalesc ├── __init__.py ├── harmonypy_gpu.py ├── kernels.py ├── pp.py ├── trim_merge_marker.py └── util.py /.gitattributes: -------------------------------------------------------------------------------- 1 | *.py linguist-language=Python 2 | *.ipynb linguist-language=Python -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | */__pycache__/ 2 | build/ 3 | dist/ 4 | scalesc.egg-info/ 5 | html/ 6 | test*/ 7 | .ipynb_checkpoints/ 8 | Test.ipynb 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Haotian Zhang 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 |

3 | ScaleSC 4 |

5 | 6 |

7 | A GPU-accelerated tool for large scale scRNA-seq pipeline. 8 |

9 | 10 | 19 | 20 |

21 | Highlights • 22 | Why ScaleSC • 23 | Installation • 24 | Tutorial • 25 | API Reference 26 |

27 | 28 | ## Highlights 29 | 30 | - Fast scRNA-seq pipeline including QC, Normalization, Batch-effect Removal, Dimension Reduction in a ***similar syntax*** as `scanpy` and `rapids-singlecell`. 31 | - Scale to dataset with more than ***10M cells*** on a ***single*** GPU. (A100 80G) 32 | - Chunk the data to avoid the ***`int32` limitation*** in `cupyx.scipy.sparse` used by `rapids-singlecell` that disables the computing for moderate-size dataset (~1.3M) without Multi-GPU support. 33 | - Reconcile output at each step to ***`scanpy`*** to reproduce the ***same*** results as on CPU end. 34 | - Improvement on ***`harmonypy`*** which allows dataset with more than ***10M cells*** and more than ***1000 samples*** to be run on a single GPU. 35 | - Speedup and optimize ***`NSForest`*** algorithm using GPU for ***better*** maker gene identification. 36 | - ***Merge*** clusters according to the gene expression of detected markers by `NSForest`. 37 | 38 | ## Why ScaleSC 39 | 40 |
41 | 42 | What can ScaleSC do? 43 | 44 | pipeline 45 | 46 |
47 | 48 | 49 |
50 | 51 | ScaleSC Overview 52 | 53 | overview 54 | 55 |
56 | 57 |

58 | 59 |
60 | Overview of different packages* 61 | 62 | | | `scanpy` | `scalesc` | `rapids-singlecell` | 63 | |:----------:|:----------:|:----------:|:----------:| 64 | | GPU Support | ❌ | ✅ | ✅ | 65 | | `int32` Issue in Sparse | ✅ | ✅ | ❌ | 66 | | Upper Limit of #cell | 5M | **~20M** | ~1M | 67 | | Upper Limit of #sample | <100 | **>1000** | <100 | 68 | 69 |
70 | 71 |

72 | 73 |
74 | 75 | Time comparsion between `scanpy`(CPU) and `scalesc`(GPU) on A100(80G) 76 | 77 | time-comp 78 | 79 |
80 | 81 | ## How To Install 82 | >#### Note: ScaleSC requires a **high-end GPU** (> 24G VRAM) and a matching **CUDA** version to support GPU-accelerated computing. 83 |
84 | 85 | Requirements: 86 | - [**RAPIDS**](https://rapids.ai/) from Nvidia 87 | - [**rapids-singlecell**](https://rapids-singlecell.readthedocs.io/en/latest/index.html), an alternative of *scanpy* that employs GPU for acceleration. 88 | - [**Conda**](https://docs.conda.io/projects/conda/en/latest/index.html), version >=22.11 is strongly encoruaged, because *conda-libmamba-solver* is set as default, which significant speeds up solving dependencies. 89 | - [**pip**](), a python package installer. 90 | 91 | Environment Setup: 92 | 1. Install [**RAPIDS**](https://rapids.ai/) through Conda, \ 93 | `conda create -n scalesc -c rapidsai -c conda-forge -c nvidia \ 94 | rapids=25.02 python=3.12 'cuda-version>=12.0,<=12.8` 95 | Users have flexibility to install it according to their systems by using this [online selector](https://docs.rapids.ai/install/?_gl=1*1em94gj*_ga*OTg5MDQyNDkyLjE3MjM0OTAyNjk.*_ga_RKXFW6CM42*MTczMDIxNzIzOS4yLjAuMTczMDIxNzIzOS42MC4wLjA.#selector). We highly recommand to install `**RAPIDS**>=24.12`, it solves a bug related to the leiden algorithm which results in too many clusters. 96 | 97 | 2. Activate conda env, \ 98 | `conda activate scalesc` 99 | 3. Install [**rapids-singlecell**](https://rapids-singlecell.readthedocs.io/en/latest/index.html) using pip, \ 100 | `pip install rapids-singlecell` 101 | 102 | 4. Install scaleSC, 103 | - pull scaleSC from github \ 104 | `git clone https://github.com/interactivereport/scaleSC.git` 105 | - enter the folder and install scaleSC \ 106 | `cd scaleSC` \ 107 | `pip install .` 108 | 5. check env: 109 | - `python -c "import scalesc; print(scalesc.__version__)"` == 0.1.0 110 | - `python -c "import cupy; print(cupy.__version__)"` >= 13.3.0 111 | - `python -c "import cuml; print(cuml.__version__)"` >= 24.10 112 | - `python -c "import cupy; print(cupy.cuda.is_available())"` = True 113 | - `python -c "import xgboost; print(xgboost.__version__)` >= 2.1.1, optionally for marker annotation 114 | 115 |
116 | 117 | ## Tutorial: 118 | - See [this tutorial](./Tutorial_scaleSC.ipynb) for details. 119 | 120 | 121 | ## Citation 122 | 123 | Please cite [ScaleSC](https://doi.org/10.1101/2025.01.28.635256), and [Scanpy](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1382-0), [Rapids-singlecell](https://github.com/scverse/rapids_singlecell), [NSForest](https://github.com/JCVenterInstitute/NSForest), [AnnData](https://anndata.readthedocs.io/en/stable/#citation) according to their instructions respectively. 124 | 125 | 126 | ## Updates: 127 | - 2/26/2025: 128 | - adding a parameter `threshold` in function `adata_cluster_merge` to support cluster merging at various scales according to user's specification. `threshold` is between 0 and 1. set to 0 by default. 129 | - updating a few more examples of cluster merging in the tutorial. 130 | - future work: adding supports for loading from large `.h5ad` files. 131 | 132 | 133 | 134 | ## Contact 135 | - [@haotianzh](mailto:haotianzh@uconn.edu) 136 | 137 | 138 | 139 | 140 | ## API Reference 141 | 142 |
143 | 144 | ## class `ScaleSC` 145 | ScaleSC integrated pipeline in a scanpy-like style. 146 | 147 | It will automatcially load dataset in chunks, see `scalesc.util.AnnDataBatchReader` for details, and all methods in this class manipulate this chunked data. 148 | 149 | 150 | 151 | **Args:** 152 | 153 | 154 | 155 | - `data_dir` (`str`): Data folder of the dataset. 156 | - `max_cell_batch` (`int`): Maximum number of cells in a single batch. 157 | - `Default`: 100000. 158 | - `preload_on_cpu` (`bool`): If load the entire chunked data on CPU. Default: `True` 159 | - `preload_on_gpu` (`bool`): If load the entire chunked data on GPU, `preload_on_cpu` will be overwritten to `True` when this sets to `True`. Default is `True`. 160 | - `save_raw_counts` (`bool`): If save `adata_X` to disk after QC filtering. 161 | - `Default`: False. 162 | - `save_norm_counts` (`bool`): If save `adata_X` data to disk after normalization. 163 | - `Default`: False. 164 | - `save_after_each_step` (`bool`): If save `adata` (without .X) to disk after each step. 165 | - `Default`: False. 166 | - `output_dir` (`str`): Output folder. Default: './results'. 167 | - `gpus` (`list`): List of GPU ids, `[0]` is set if this is None. Default: None. 168 | 169 | 170 | 171 | ### method `__init__` 172 | 173 | ```python 174 | __init__( 175 | data_dir, 176 | max_cell_batch=100000.0, 177 | preload_on_cpu=True, 178 | preload_on_gpu=True, 179 | save_raw_counts=False, 180 | save_norm_counts=False, 181 | save_after_each_step=False, 182 | output_dir='results', 183 | gpus=None 184 | ) 185 | ``` 186 | 187 | 188 | 189 | 190 | 191 | 192 | --- 193 | 194 | #### property adata 195 | 196 | `AnnData`: An AnnData object that used to store all intermediate results without the count matrix. 197 | 198 | Note: This is always on CPU. 199 | 200 | --- 201 | 202 | #### property adata_X 203 | 204 | `AnnData`: An `AnnData` object that used to store all intermediate results including the count matrix. Internally, all chunks should be merged on CPU to avoid high GPU consumption, make sure to invoke `to_CPU()` before calling this object. 205 | 206 | 207 | 208 | --- 209 | 210 | 211 | 212 | ### method `calculate_qc_metrics` 213 | 214 | ```python 215 | calculate_qc_metrics() 216 | ``` 217 | 218 | Calculate quality control metrics. 219 | 220 | --- 221 | 222 | 223 | 224 | ### method `clear` 225 | 226 | ```python 227 | clear() 228 | ``` 229 | 230 | Clean the memory 231 | 232 | --- 233 | 234 | 235 | 236 | ### method `filter_cells` 237 | 238 | ```python 239 | filter_cells(min_count=0, max_count=None, qc_var='n_genes_by_counts', qc=False) 240 | ``` 241 | 242 | Filter genes based on number of a QC metric. 243 | 244 | 245 | 246 | **Args:** 247 | 248 | - `min_count` (`int`): Minimum number of counts required for a cell to pass filtering. 249 | - `max_count` (`int`): Maximum number of counts required for a cell to pass filtering. 250 | - `qc_var` (`str`='n_genes_by_counts'): Feature in QC metrics that used to filter cells. 251 | - `qc` (`bool`=`False`): Call `calculate_qc_metrics` before filtering. 252 | 253 | --- 254 | 255 | 256 | 257 | ### method `filter_genes` 258 | 259 | ```python 260 | filter_genes(min_count=0, max_count=None, qc_var='n_cells_by_counts', qc=False) 261 | ``` 262 | 263 | Filter genes based on number of a QC metric. 264 | 265 | 266 | 267 | **Args:** 268 | 269 | - `min_count` (`int`): Minimum number of counts required for a gene to pass filtering. 270 | - `max_count` (`int`): Maximum number of counts required for a gene to pass filtering. 271 | - `qc_var` (`str`='n_cells_by_counts'): Feature in QC metrics that used to filter genes. 272 | - `qc` (`bool`=`False`): Call `calculate_qc_metrics` before filtering. 273 | 274 | --- 275 | 276 | 277 | 278 | ### method `filter_genes_and_cells` 279 | 280 | ```python 281 | filter_genes_and_cells( 282 | min_counts_per_gene=0, 283 | min_counts_per_cell=0, 284 | max_counts_per_gene=None, 285 | max_counts_per_cell=None, 286 | qc_var_gene='n_cells_by_counts', 287 | qc_var_cell='n_genes_by_counts', 288 | qc=False 289 | ) 290 | ``` 291 | 292 | Filter genes based on number of a QC metric. 293 | 294 | 295 | 296 | **Note:** 297 | 298 | > This is an efficient way to perform a regular filtering on genes and cells without repeatedly iterating over chunks. 299 | > 300 | 301 | **Args:** 302 | 303 | - `min_counts_per_gene` (`int`): Minimum number of counts required for a gene to pass filtering. 304 | - `max_counts_per_gene` (`int`): Maximum number of counts required for a gene to pass filtering. 305 | - `qc_var_gene` (`str`='n_cells_by_counts'): Feature in QC metrics that used to filter genes. 306 | - `min_counts_per_cell` (`int`): Minimum number of counts required for a cell to pass filtering. 307 | - `max_counts_per_cell` (`int`): Maximum number of counts required for a cell to pass filtering. 308 | - `qc_var_cell` (`str`='n_genes_by_counts'): Feature in QC metrics that used to filter cells. 309 | - `qc` (`bool`=`False`): Call `calculate_qc_metrics` before filtering. 310 | 311 | --- 312 | 313 | 314 | 315 | ### method `harmony` 316 | 317 | ```python 318 | harmony(sample_col_name, n_init=10, max_iter_harmony=20) 319 | ``` 320 | 321 | Use Harmony to integrate different experiments. 322 | 323 | 324 | 325 | **Note:** 326 | 327 | > This modified harmony function can easily scale up to 15M cells with 50 pcs on GPU (A100 80G). Result after harmony is stored into `adata.obsm['X_pca_harmony']`. 328 | > 329 | 330 | **Args:** 331 | 332 | - `sample_col_name` (`str`): Column of sample ID. 333 | - `n_init` (`int`=`10`): Number of times the k-means algorithm is run with different centroid seeds. 334 | - `max_iter_harmony` (`int`=`20`): Maximum iteration number of harmony. 335 | 336 | --- 337 | 338 | 339 | 340 | ### method `highly_variable_genes` 341 | 342 | ```python 343 | highly_variable_genes(n_top_genes=4000, method='seurat_v3') 344 | ``` 345 | 346 | Annotate highly variable genes. 347 | 348 | 349 | 350 | **Note:** 351 | 352 | > Only `seurat_v3` is implemented. Raw count matrix is expected as input for `seurat_v3`. HVGs are set to `True` in `adata.var['highly_variable']`. 353 | > 354 | 355 | **Args:** 356 | 357 | - `n_top_genes` (`int`=`4000`): Number of highly-variable genes to keep. 358 | - `method` (`str`=`'seurat_v3'`): Choose the flavor for identifying highly variable genes. 359 | 360 | --- 361 | 362 | 363 | 364 | ### method `leiden` 365 | 366 | ```python 367 | leiden(resolution=0.5, random_state=42) 368 | ``` 369 | 370 | Performs Leiden clustering using `rapids-singlecell`. 371 | 372 | 373 | 374 | **Args:** 375 | 376 | - `resolution` (`float`=`0.5`): A parameter value controlling the coarseness of the clustering. (called gamma in the modularity formula). Higher values lead to more clusters. 377 | - `random_state` (`int`=`42`): Random seed. 378 | 379 | --- 380 | 381 | 382 | 383 | ### method `neighbors` 384 | 385 | ```python 386 | neighbors(n_neighbors=20, n_pcs=50, use_rep='X_pac_harmony', algorithm='cagra') 387 | ``` 388 | 389 | Compute a neighborhood graph of observations using `rapids-singlecell`. 390 | 391 | 392 | 393 | **Args:** 394 | 395 | - `n_neighbors` (`int`=`20`): The size of local neighborhood (in terms of number of neighboring data points) used for manifold approximation. 396 | - `n_pcs` (`int`=`50`): Use this many PCs. 397 | - `use_rep` (`str`=`'X_pca_harmony'`): Use the indicated representation. 398 | - `algorithm` (`str`=`'cagra'`): The query algorithm to use. 399 | 400 | --- 401 | 402 | 403 | 404 | ### method `normalize_log1p` 405 | 406 | ```python 407 | normalize_log1p(target_sum=10000.0) 408 | ``` 409 | 410 | Normalize counts per cell then log1p. 411 | 412 | 413 | 414 | **Note:** 415 | 416 | > If `save_raw_counts` or `save_norm_counts` is set, write `adata_X` to disk here automatically. 417 | > 418 | 419 | **Args:** 420 | 421 | - `target_sum` (`int`=`1e4`): If None, after normalization, each observation (cell) has a total count equal to the median of total counts for observations (cells) before normalization. 422 | 423 | --- 424 | 425 | 426 | 427 | ### method `normalize_log1p_pca` 428 | 429 | ```python 430 | normalize_log1p_pca( 431 | target_sum=10000.0, 432 | n_components=50, 433 | hvg_var='highly_variable' 434 | ) 435 | ``` 436 | 437 | An alternative for calling `normalize_log1p` and `pca` together. 438 | 439 | 440 | 441 | **Note:** 442 | 443 | > Used when `preload_on_cpu` is `False`. 444 | 445 | --- 446 | 447 | 448 | 449 | ### method `pca` 450 | 451 | ```python 452 | pca(n_components=50, hvg_var='highly_variable') 453 | ``` 454 | 455 | Principal component analysis. 456 | 457 | Computes PCA coordinates, loadings and variance decomposition. Uses the implementation of scikit-learn. 458 | 459 | 460 | 461 | **Note:** 462 | 463 | > Flip the directions according to the largest values in loadings. Results will match up with scanpy perfectly. Calculated PCA matrix is stored in `adata.obsm['X_pca']`. 464 | > 465 | 466 | **Args:** 467 | 468 | - `n_components` (`int`=`50`): Number of principal components to compute. 469 | - `hvg_var` (`str`=`'highly_variable'`): Use highly variable genes only. 470 | 471 | --- 472 | 473 | 474 | 475 | ### method `save` 476 | 477 | ```python 478 | save(data_name=None) 479 | ``` 480 | 481 | Save `adata` to disk. 482 | 483 | 484 | 485 | **Note:** 486 | 487 | > Save to '`output_dir`/`data_name`.h5ad'. 488 | > 489 | 490 | **Args:** 491 | 492 | - `data_name` (`str`): If `None`, set as `data_dir`. 493 | 494 | --- 495 | 496 | 497 | 498 | ### method `savex` 499 | 500 | ```python 501 | savex(name, data_name=None) 502 | ``` 503 | 504 | Save `adata` to disk in chunks. 505 | 506 | 507 | 508 | **Note:** 509 | 510 | > Each chunk will be saved individually in a subfolder under `output_dir`. Save to '`output_dir`/`name`/`data_name`_`i`.h5ad'. 511 | > 512 | 513 | **Args:** 514 | 515 | - `name` (`str`): Subfolder name. 516 | - `data_name` (`str`): If `None`, set as `data_dir`. 517 | 518 | --- 519 | 520 | 521 | 522 | ### method `to_CPU` 523 | 524 | ```python 525 | to_CPU() 526 | ``` 527 | 528 | Move all chunks to CPU. 529 | 530 | --- 531 | 532 | 533 | 534 | ### method `to_GPU` 535 | 536 | ```python 537 | to_GPU() 538 | ``` 539 | 540 | Move all chunks to GPU. 541 | 542 | --- 543 | 544 | 545 | 546 | ### method `umap` 547 | 548 | ```python 549 | umap(random_state=42) 550 | ``` 551 | 552 | Embed the neighborhood graph using `rapids-singlecell`. 553 | 554 | 555 | 556 | **Args:** 557 | 558 | - `random_state` (`int`=`42`): Random seed. 559 | 560 | 561 | 562 | --- 563 | 564 | 565 | 566 | ## class `AnnDataBatchReader` 567 | Chunked dataloader for extremely large single-cell dataset. Return a data chunk each time for further processing. 568 | 569 | 570 | 571 | ### method `__init__` 572 | 573 | ```python 574 | __init__( 575 | data_dir, 576 | preload_on_cpu=True, 577 | preload_on_gpu=False, 578 | gpus=None, 579 | max_cell_batch=100000, 580 | max_gpu_memory_usage=48.0, 581 | return_anndata=True 582 | ) 583 | ``` 584 | 585 | 586 | 587 | 588 | 589 | 590 | --- 591 | 592 | #### property shape 593 | 594 | 595 | 596 | 597 | 598 | 599 | 600 | --- 601 | 602 | 603 | 604 | ### method `batch_to_CPU` 605 | 606 | ```python 607 | batch_to_CPU() 608 | ``` 609 | 610 | 611 | 612 | 613 | 614 | --- 615 | 616 | 617 | 618 | ### method `batch_to_GPU` 619 | 620 | ```python 621 | batch_to_GPU() 622 | ``` 623 | 624 | 625 | 626 | 627 | 628 | --- 629 | 630 | 631 | 632 | ### method `batchify` 633 | 634 | ```python 635 | batchify(axis='cell') 636 | ``` 637 | 638 | Return a data generator if `preload_on_cpu` is set as `True`. 639 | 640 | --- 641 | 642 | 643 | 644 | ### method `clear` 645 | 646 | ```python 647 | clear() 648 | ``` 649 | 650 | 651 | 652 | 653 | 654 | --- 655 | 656 | 657 | 658 | ### method `get_merged_adata_with_X` 659 | 660 | ```python 661 | get_merged_adata_with_X() 662 | ``` 663 | 664 | 665 | 666 | 667 | 668 | --- 669 | 670 | 671 | 672 | ### method `gpu_wrapper` 673 | 674 | ```python 675 | gpu_wrapper(generator) 676 | ``` 677 | 678 | 679 | 680 | 681 | 682 | --- 683 | 684 | 685 | 686 | ### method `read` 687 | 688 | ```python 689 | read(fname) 690 | ``` 691 | 692 | 693 | 694 | 695 | 696 | --- 697 | 698 | 699 | 700 | ### method `set_cells_filter` 701 | 702 | ```python 703 | set_cells_filter(filter, update=True) 704 | ``` 705 | 706 | Update cells filter and applied on data chunks if `update` set to `True`, otherwise, update filter only. 707 | 708 | --- 709 | 710 | 711 | 712 | ### method `set_genes_filter` 713 | 714 | ```python 715 | set_genes_filter(filter, update=True) 716 | ``` 717 | 718 | Update genes filter and applied on data chunks if `update` set to True, otherwise, update filter only. 719 | 720 | 721 | 722 | **Note:** 723 | 724 | > Genes filter can be set sequentially, a new filter should be always compatible with the previous filtered data. 725 | 726 | --- 727 | 728 | 729 | 730 | ### method `update_by_cells_filter` 731 | 732 | ```python 733 | update_by_cells_filter(filter) 734 | ``` 735 | 736 | 737 | 738 | 739 | 740 | --- 741 | 742 | 743 | 744 | ### method `update_by_genes_filter` 745 | 746 | ```python 747 | update_by_genes_filter(filter) 748 | ``` 749 | 750 | 751 | 752 | 753 | 754 | 755 | 756 | 757 | --- 758 | 759 | _This file was automatically generated via [lazydocs](https://github.com/ml-tooling/lazydocs)._ 760 | 761 |
762 | 763 | -------------------------------------------------------------------------------- /docs/404.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ScaleSC 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 84 | 85 |
86 |
87 | 88 |
89 |
90 |

404

91 |

Page not found

92 |
93 |
94 | 95 | 96 |
97 |
98 | 99 | 103 | 104 | 108 | 109 | 110 | 111 | 171 | 172 | 173 | 174 | -------------------------------------------------------------------------------- /docs/api-docs/harmonypy_gpu/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | Harmonypy gpu - ScaleSC 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 85 | 86 |
87 |
88 |
138 |
139 | 140 | 141 | 142 |

143 |

module harmonypy_gpu

144 |
145 |

146 |

function get_usage

147 |
get_usage(s)
148 | 
149 |
150 |

151 |

function to_csr_cuda

152 |
to_csr_cuda(x, dtype)
153 | 
154 |

Move to GPU as a csr_matrix.

155 |
156 |

157 |

function to_csc_cuda

158 |
to_csc_cuda(x, dtype)
159 | 
160 |

Move to GPU as a csc_matrix, speed up column slice.

161 |
162 |

163 |

function get_dummies

164 |
get_dummies(x)
165 | 
166 |

Return a sparse dummy matrix.

167 |
168 |

169 |

function run_harmony

170 |
run_harmony(
171 |     data_mat: 'ndarray',
172 |     meta_data: 'DataFrame',
173 |     vars_use,
174 |     init_seeds=None,
175 |     theta=None,
176 |     lamb=None,
177 |     sigma=0.1,
178 |     nclust=None,
179 |     tau=0,
180 |     block_size=0.05,
181 |     max_iter_harmony=10,
182 |     max_iter_kmeans=20,
183 |     epsilon_cluster=1e-05,
184 |     epsilon_harmony=0.0001,
185 |     plot_convergence=False,
186 |     verbose=True,
187 |     reference_values=None,
188 |     cluster_prior=None,
189 |     n_init=1,
190 |     random_state=0,
191 |     dtype=<class 'numpy.float32'>
192 | )
193 | 
194 |

Run Harmony.

195 |
196 |

197 |

function safe_entropy

198 |
safe_entropy(x: 'array')
199 | 
200 |
201 |

202 |

function moe_correct_ridge

203 |
moe_correct_ridge(Z_orig, Z_cos, Z_corr, R, W, K, Phi_Rk, Phi_moe, lamb)
204 | 
205 |
206 |

207 |

class Harmony

208 |

209 |

method __init__

210 |
__init__(
211 |     Z,
212 |     init_seeds,
213 |     n_init,
214 |     Phi,
215 |     Phi_moe,
216 |     Pr_b,
217 |     sigma,
218 |     theta,
219 |     max_iter_harmony,
220 |     max_iter_kmeans,
221 |     epsilon_kmeans,
222 |     epsilon_harmony,
223 |     K,
224 |     block_size,
225 |     lamb,
226 |     verbose,
227 |     random_state,
228 |     dtype
229 | )
230 | 
231 |
232 |

233 |

method allocate_buffers

234 |
allocate_buffers()
235 | 
236 |
237 |

238 |

method check_convergence

239 |
check_convergence(i_type)
240 | 
241 |
242 |

243 |

method cluster

244 |
cluster()
245 | 
246 |
247 |

248 |

method compute_objective

249 |
compute_objective()
250 | 
251 |
252 |

253 |

method harmonize

254 |
harmonize(iter_harmony=10, verbose=True)
255 | 
256 |
257 |

258 |

method init_cluster

259 |
init_cluster()
260 | 
261 |
262 |

263 |

method kmeans_multirestart

264 |
kmeans_multirestart()
265 | 
266 |
267 |

268 |

method result

269 |
result()
270 | 
271 |
272 |

273 |

method update_R

274 |
update_R()
275 | 
276 |
277 |

This file was automatically generated via lazydocs.

278 |
279 |
280 | 281 | 285 | 286 | 290 | 291 | 292 | 293 | 353 | 354 | 355 | 356 | -------------------------------------------------------------------------------- /docs/api-docs/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | Overview - ScaleSC 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 95 | 96 |
97 |
98 |
128 |
129 | 130 | 131 | 132 |

API Reference

133 |

Modules

134 | 141 |

Classes

142 | 149 |

Functions

150 | 192 |
193 |

This file was automatically generated via lazydocs.

194 |
195 |
196 | 197 | 201 | 202 | 206 | 207 | 208 | 209 | 269 | 270 | 271 | 272 | -------------------------------------------------------------------------------- /docs/api-docs/kernels/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | scalesc.kernels - ScaleSC 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 95 | 96 |
97 |
98 |
132 |
133 | 134 | 135 | 136 |

137 |

module kernels

138 |

Global Variables

139 |
    140 |
  • get_mean_var_major_kernel
  • 141 |
  • get_mean_var_minor_kernel
  • 142 |
  • find_indices_kernel
  • 143 |
144 |
145 |

146 |

function get_mean_var_major

147 |
get_mean_var_major(dtype)
148 | 
149 |
150 |

151 |

function get_mean_var_minor

152 |
get_mean_var_minor(dtype)
153 | 
154 |
155 |

156 |

function get_find_indices

157 |
get_find_indices()
158 | 
159 |
160 |

This file was automatically generated via lazydocs.

161 |
162 |
163 | 164 | 168 | 169 | 173 | 174 | 175 | 176 | 236 | 237 | 238 | 239 | -------------------------------------------------------------------------------- /docs/api-docs/trim_merge_marker/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | scalesc.trim_merge_marker - ScaleSC 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 95 | 96 |
97 |
98 |
196 |
197 | 198 | 199 | 200 |

201 |

module trim_merge_marker

202 |

Global Variables

203 |
    204 |
  • TYPE_CHECKING
  • 205 |
  • get_mean_var_major_kernel
  • 206 |
  • get_mean_var_minor_kernel
  • 207 |
  • find_indices_kernel
  • 208 |
209 |
210 |

211 |

function timer

212 |
timer(func)
213 | 
214 |
215 |

216 |

function wrapper

217 |
wrapper(*args, **kwargs)
218 | 
219 |
220 |

221 |

function wrapper

222 |
wrapper(*args, **kwargs)
223 | 
224 |
225 |

226 |

function wrapper

227 |
wrapper(*args, **kwargs)
228 | 
229 |
230 |

231 |

function wrapper

232 |
wrapper(*args, **kwargs)
233 | 
234 |
235 |

236 |

function wrapper

237 |
wrapper(*args, **kwargs)
238 | 
239 |
240 |

241 |

function wrapper

242 |
wrapper(*args, **kwargs)
243 | 
244 |
245 |

246 |

function wrapper

247 |
wrapper(*args, **kwargs)
248 | 
249 |
250 |

251 |

function X_to_GPU

252 |
X_to_GPU(X)
253 | 
254 |

Transfers matrices and arrays to the GPU.

255 |

Args:

256 |
    257 |
  • X: Matrix or array to transfer to the GPU.
  • 258 |
259 |
260 |

261 |

function marker_filter_sort

262 |
marker_filter_sort(markers, cluster, df_sp, df_frac)
263 | 
264 |
265 |

266 |

function find_markers

267 |
find_markers(adata, subctype_col)
268 | 
269 |
270 |

271 |

function stds

272 |
stds(x, axis=None)
273 | 
274 |

Variance of sparse matrix a var = mean(a2) - mean(a)2

275 |

Standard deviation of sparse matrix a std = sqrt(var(a))

276 |
277 |

278 |

function find_cluster_pairs_to_merge

279 |
find_cluster_pairs_to_merge(adata, x, colname, cluster, markers)
280 | 
281 |
282 |

283 |

function adata_cluster_merge

284 |
adata_cluster_merge(adata, subctype_col)
285 | 
286 |

Need a description.

287 |
288 |

289 |

function specificity_score

290 |
specificity_score(adata=None, ctype_col: str = None, glist: list = None)
291 | 
292 |
293 |

294 |

function fraction_cells

295 |
fraction_cells(adata=None, ctype_col: str = None, glist: list = None)
296 | 
297 |

Given adata.X (n cells * m genes), ctype_col (a column name in adata.obs that stores the cell type annotation), and a glist (for example, [gene1, gene2, ..., genek]) The definiation of Fraction of expression := # cells>0 / # total cells. Assume in total c different cell types for each cell type, subset the adata, and then calculate the fraction of expression of each gene return the fraction dataframe, k rows, c columns.

298 |
299 |

300 |

function myNSForest

301 |
myNSForest(
302 |     adata,
303 |     cluster_header,
304 |     cluster_list=None,
305 |     medians_header=None,
306 |     n_trees=100,
307 |     n_jobs=-1,
308 |     beta=0.5,
309 |     n_top_genes=15,
310 |     n_binary_genes=10,
311 |     n_genes_eval=6,
312 |     output_folder='.',
313 |     save_results=False
314 | )
315 | 
316 |
317 |

318 |

class UF

319 |

320 |

method __init__

321 |
__init__(n)
322 | 
323 |
324 |

325 |

method current_kids_dict

326 |
current_kids_dict()
327 | 
328 |
329 |

330 |

method final

331 |
final()
332 | 
333 |
334 |

335 |

method find

336 |
find(x)
337 | 
338 |
339 |

340 |

method union

341 |
union(x, y)
342 | 
343 |
344 |

345 |

class data2UF

346 |

347 |

method __init__

348 |
__init__(celltypes: list, merge_pairs: list[tuple])
349 | 
350 |
351 |

352 |

method union_pairs

353 |
union_pairs() → int
354 | 
355 |
356 |

This file was automatically generated via lazydocs.

357 |
358 |
359 | 360 | 364 | 365 | 369 | 370 | 371 | 372 | 432 | 433 | 434 | 435 | -------------------------------------------------------------------------------- /docs/css/base.css: -------------------------------------------------------------------------------- 1 | html { 2 | /* The nav header is 3.5rem high, plus 20px for the margin-top of the 3 | main container. */ 4 | scroll-padding-top: calc(3.5rem + 20px); 5 | } 6 | 7 | /* Replacement for `body { background-attachment: fixed; }`, which has 8 | performance issues when scrolling on large displays. See #1394. */ 9 | body::before { 10 | content: ' '; 11 | position: fixed; 12 | width: 100%; 13 | height: 100%; 14 | top: 0; 15 | left: 0; 16 | background-color: var(--bs-body-bg); 17 | background: url(../img/grid.png) repeat-x; 18 | will-change: transform; 19 | z-index: -1; 20 | } 21 | 22 | body > .container { 23 | margin-top: 20px; 24 | min-height: 400px; 25 | } 26 | 27 | .navbar.fixed-top { 28 | position: -webkit-sticky; 29 | position: sticky; 30 | } 31 | 32 | .source-links { 33 | float: right; 34 | } 35 | 36 | .col-md-9 img { 37 | max-width: 100%; 38 | display: inline-block; 39 | padding: 4px; 40 | line-height: 1.428571429; 41 | background-color: var(--bs-secondary-bg-subtle); 42 | border: 1px solid var(--bs-secondary-border-subtle); 43 | border-radius: 4px; 44 | margin: 20px auto 30px auto; 45 | } 46 | 47 | h1 { 48 | color: inherit; 49 | font-weight: 400; 50 | font-size: 42px; 51 | } 52 | 53 | h2, h3, h4, h5, h6 { 54 | color: inherit; 55 | font-weight: 300; 56 | } 57 | 58 | hr { 59 | border-top: 1px solid #aaa; 60 | opacity: 1; 61 | } 62 | 63 | pre, .rst-content tt { 64 | max-width: 100%; 65 | background-color: var(--bs-body-bg); 66 | border: solid 1px var(--bs-border-color); 67 | color: var(--bs-body-color); 68 | overflow-x: auto; 69 | } 70 | 71 | code.code-large, .rst-content tt.code-large { 72 | font-size: 90%; 73 | } 74 | 75 | code { 76 | padding: 2px 5px; 77 | background-color: rgba(var(--bs-body-bg-rgb), 0.75); 78 | border: solid 1px var(--bs-border-color); 79 | color: var(--bs-body-color); 80 | white-space: pre-wrap; 81 | word-wrap: break-word; 82 | } 83 | 84 | pre code { 85 | display: block; 86 | border: none; 87 | white-space: pre; 88 | word-wrap: normal; 89 | font-family: SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; 90 | font-size: 12px; 91 | } 92 | 93 | kbd { 94 | padding: 2px 4px; 95 | font-size: 90%; 96 | color: var(--bs-secondary-text-emphasis); 97 | background-color: var(--bs-secondary-bg-subtle); 98 | border-radius: 3px; 99 | -webkit-box-shadow: inset 0 -1px 0 rgba(0,0,0,.25); 100 | box-shadow: inset 0 -1px 0 rgba(0,0,0,.25); 101 | } 102 | 103 | a code { 104 | color: inherit; 105 | } 106 | 107 | a:hover code, a:focus code { 108 | color: inherit; 109 | } 110 | 111 | footer { 112 | margin-top: 30px; 113 | margin-bottom: 10px; 114 | text-align: center; 115 | font-weight: 200; 116 | } 117 | 118 | .modal-dialog { 119 | margin-top: 60px; 120 | } 121 | 122 | /* 123 | * Side navigation 124 | * 125 | * Scrollspy and affixed enhanced navigation to highlight sections and secondary 126 | * sections of docs content. 127 | */ 128 | 129 | .bs-sidebar.affix { 130 | position: -webkit-sticky; 131 | position: sticky; 132 | /* The nav header is 3.5rem high, plus 20px for the margin-top of the 133 | main container. */ 134 | top: calc(3.5rem + 20px); 135 | } 136 | 137 | .bs-sidebar.card { 138 | padding: 0; 139 | max-height: 90%; 140 | overflow-y: auto; 141 | } 142 | 143 | /* Toggle (vertically flip) sidebar collapse icon */ 144 | .bs-sidebar .navbar-toggler span { 145 | -moz-transform: scale(1, -1); 146 | -webkit-transform: scale(1, -1); 147 | -o-transform: scale(1, -1); 148 | -ms-transform: scale(1, -1); 149 | transform: scale(1, -1); 150 | } 151 | 152 | .bs-sidebar .navbar-toggler.collapsed span { 153 | -moz-transform: scale(1, 1); 154 | -webkit-transform: scale(1, 1); 155 | -o-transform: scale(1, 1); 156 | -ms-transform: scale(1, 1); 157 | transform: scale(1, 1); 158 | } 159 | 160 | /* First level of nav */ 161 | .bs-sidebar > .navbar-collapse > .nav { 162 | padding-top: 10px; 163 | padding-bottom: 10px; 164 | border-radius: 5px; 165 | width: 100%; 166 | } 167 | 168 | /* All levels of nav */ 169 | .bs-sidebar .nav > li > a { 170 | display: block; 171 | padding: 5px 20px; 172 | z-index: 1; 173 | } 174 | .bs-sidebar .nav > li > a:hover, 175 | .bs-sidebar .nav > li > a:focus { 176 | text-decoration: none; 177 | border-right: 1px solid; 178 | } 179 | .bs-sidebar .nav > li > a.active, 180 | .bs-sidebar .nav > li > a.active:hover, 181 | .bs-sidebar .nav > li > a.active:focus { 182 | font-weight: bold; 183 | background-color: transparent; 184 | border-right: 1px solid; 185 | } 186 | 187 | .bs-sidebar .nav .nav .nav { 188 | margin-left: 1em; 189 | } 190 | 191 | .bs-sidebar .nav > li > a { 192 | font-weight: bold; 193 | } 194 | 195 | .bs-sidebar .nav .nav > li > a { 196 | font-weight: normal; 197 | } 198 | 199 | .headerlink { 200 | font-family: FontAwesome; 201 | font-size: 14px; 202 | display: none; 203 | padding-left: .5em; 204 | text-decoration: none; 205 | vertical-align: middle; 206 | } 207 | 208 | h1:hover .headerlink, h2:hover .headerlink, h3:hover .headerlink, h4:hover .headerlink, h5:hover .headerlink, h6:hover .headerlink { 209 | display:inline-block; 210 | } 211 | 212 | blockquote { 213 | padding-left: 10px; 214 | border-left: 4px solid #e6e6e6; 215 | } 216 | 217 | .admonition, details { 218 | padding: 15px; 219 | margin-bottom: 20px; 220 | border: 1px solid transparent; 221 | border-radius: 4px; 222 | text-align: left; 223 | } 224 | 225 | .admonition.note, details.note { 226 | color: var(--bs-primary-text-emphasis); 227 | background-color: var(--bs-primary-bg-subtle); 228 | border-color: var(--bs-primary-border-subtle); 229 | } 230 | 231 | .admonition.note h1, .admonition.note h2, .admonition.note h3, 232 | .admonition.note h4, .admonition.note h5, .admonition.note h6, 233 | details.note h1, details.note h2, details.note h3, 234 | details.note h4, details.note h5, details.note h6 { 235 | color: var(--bs-primary-text-emphasis); 236 | } 237 | 238 | .admonition.info, details.info { 239 | color: var(--bs-info-text-emphasis); 240 | background-color: var(--bs-info-bg-subtle); 241 | border-color: var(--bs-info-border-subtle); 242 | } 243 | 244 | .admonition.info h1, .admonition.info h2, .admonition.info h3, 245 | .admonition.info h4, .admonition.info h5, .admonition.info h6, 246 | details.info h1, details.info h2, details.info h3, 247 | details.info h4, details.info h5, details.info h6 { 248 | color: var(--bs-info-text-emphasis); 249 | } 250 | 251 | .admonition.warning, details.warning { 252 | color: var(--bs-warning-text-emphasis); 253 | background-color: var(--bs-warning-bg-subtle); 254 | border-color: var(--bs-warning-border-subtle); 255 | } 256 | 257 | .admonition.warning h1, .admonition.warning h2, .admonition.warning h3, 258 | .admonition.warning h4, .admonition.warning h5, .admonition.warning h6, 259 | details.warning h1, details.warning h2, details.warning h3, 260 | details.warning h4, details.warning h5, details.warning h6 { 261 | color: var(--bs-warning-text-emphasis); 262 | } 263 | 264 | .admonition.danger, details.danger { 265 | color: var(--bs-danger-text-emphasis); 266 | background-color: var(--bs-danger-bg-subtle); 267 | border-color: var(--bs-danger-border-subtle); 268 | } 269 | 270 | .admonition.danger h1, .admonition.danger h2, .admonition.danger h3, 271 | .admonition.danger h4, .admonition.danger h5, .admonition.danger h6, 272 | details.danger h1, details.danger h2, details.danger h3, 273 | details.danger h4, details.danger h5, details.danger h6 { 274 | color: var(--bs-danger-text-emphasis); 275 | } 276 | 277 | .admonition, details { 278 | color: var(--bs-light-text-emphasis); 279 | background-color: var(--bs-light-bg-subtle); 280 | border-color: var(--bs-light-border-subtle); 281 | } 282 | 283 | .admonition h1, .admonition h2, .admonition h3, 284 | .admonition h4, .admonition h5, .admonition h6, 285 | details h1, details h2, details h3, 286 | details h4, details h5, details h6 { 287 | color: var(--bs-light-text-emphasis); 288 | } 289 | 290 | .admonition-title, summary { 291 | font-weight: bold; 292 | text-align: left; 293 | } 294 | 295 | .admonition>p:last-child, details>p:last-child { 296 | margin-bottom: 0; 297 | } 298 | 299 | @media (max-width: 991.98px) { 300 | .navbar-collapse.show { 301 | overflow-y: auto; 302 | max-height: calc(100vh - 3.5rem); 303 | } 304 | } 305 | 306 | .dropdown-item.open { 307 | color: var(--bs-dropdown-link-active-color); 308 | background-color: var(--bs-dropdown-link-active-bg); 309 | } 310 | 311 | .dropdown-submenu > .dropdown-menu { 312 | margin: 0 0 0 1.5rem; 313 | padding: 0; 314 | border-width: 0; 315 | } 316 | 317 | .dropdown-submenu > a::after { 318 | display: block; 319 | content: " "; 320 | float: right; 321 | width: 0; 322 | height: 0; 323 | border-color: transparent; 324 | border-style: solid; 325 | border-width: 5px 0 5px 5px; 326 | border-left-color: var(--bs-dropdown-link-active-color); 327 | margin-top: 5px; 328 | margin-right: -10px; 329 | } 330 | 331 | .dropdown-submenu:hover > a::after { 332 | border-left-color: var(--bs-dropdown-link-active-color); 333 | } 334 | 335 | @media (min-width: 992px) { 336 | .dropdown-menu { 337 | overflow-y: auto; 338 | max-height: calc(100vh - 3.5rem); 339 | } 340 | 341 | .dropdown-submenu { 342 | position: relative; 343 | } 344 | 345 | .dropdown-submenu > .dropdown-menu { 346 | position: fixed !important; 347 | margin-top: -9px; 348 | margin-left: -2px; 349 | border-width: 1px; 350 | padding: 0.5rem 0; 351 | } 352 | 353 | .dropdown-submenu.pull-left { 354 | float: none; 355 | } 356 | 357 | .dropdown-submenu.pull-left > .dropdown-menu { 358 | left: -100%; 359 | margin-left: 10px; 360 | } 361 | } 362 | 363 | @media print { 364 | /* Remove sidebar when print */ 365 | .col-md-3 { display: none; } 366 | } 367 | -------------------------------------------------------------------------------- /docs/css/brands.min.css: -------------------------------------------------------------------------------- 1 | /*! 2 | * Font Awesome Free 6.5.1 by @fontawesome - https://fontawesome.com 3 | * License - https://fontawesome.com/license/free (Icons: CC BY 4.0, Fonts: SIL OFL 1.1, Code: MIT License) 4 | * Copyright 2023 Fonticons, Inc. 5 | */ 6 | :host,:root{--fa-style-family-brands:"Font Awesome 6 Brands";--fa-font-brands:normal 400 1em/1 "Font Awesome 6 Brands"}@font-face{font-family:"Font Awesome 6 Brands";font-style:normal;font-weight:400;font-display:block;src:url(../webfonts/fa-brands-400.woff2) format("woff2"),url(../webfonts/fa-brands-400.ttf) format("truetype")}.fa-brands,.fab{font-weight:400}.fa-monero:before{content:"\f3d0"}.fa-hooli:before{content:"\f427"}.fa-yelp:before{content:"\f1e9"}.fa-cc-visa:before{content:"\f1f0"}.fa-lastfm:before{content:"\f202"}.fa-shopware:before{content:"\f5b5"}.fa-creative-commons-nc:before{content:"\f4e8"}.fa-aws:before{content:"\f375"}.fa-redhat:before{content:"\f7bc"}.fa-yoast:before{content:"\f2b1"}.fa-cloudflare:before{content:"\e07d"}.fa-ups:before{content:"\f7e0"}.fa-pixiv:before{content:"\e640"}.fa-wpexplorer:before{content:"\f2de"}.fa-dyalog:before{content:"\f399"}.fa-bity:before{content:"\f37a"}.fa-stackpath:before{content:"\f842"}.fa-buysellads:before{content:"\f20d"}.fa-first-order:before{content:"\f2b0"}.fa-modx:before{content:"\f285"}.fa-guilded:before{content:"\e07e"}.fa-vnv:before{content:"\f40b"}.fa-js-square:before,.fa-square-js:before{content:"\f3b9"}.fa-microsoft:before{content:"\f3ca"}.fa-qq:before{content:"\f1d6"}.fa-orcid:before{content:"\f8d2"}.fa-java:before{content:"\f4e4"}.fa-invision:before{content:"\f7b0"}.fa-creative-commons-pd-alt:before{content:"\f4ed"}.fa-centercode:before{content:"\f380"}.fa-glide-g:before{content:"\f2a6"}.fa-drupal:before{content:"\f1a9"}.fa-hire-a-helper:before{content:"\f3b0"}.fa-creative-commons-by:before{content:"\f4e7"}.fa-unity:before{content:"\e049"}.fa-whmcs:before{content:"\f40d"}.fa-rocketchat:before{content:"\f3e8"}.fa-vk:before{content:"\f189"}.fa-untappd:before{content:"\f405"}.fa-mailchimp:before{content:"\f59e"}.fa-css3-alt:before{content:"\f38b"}.fa-reddit-square:before,.fa-square-reddit:before{content:"\f1a2"}.fa-vimeo-v:before{content:"\f27d"}.fa-contao:before{content:"\f26d"}.fa-square-font-awesome:before{content:"\e5ad"}.fa-deskpro:before{content:"\f38f"}.fa-brave:before{content:"\e63c"}.fa-sistrix:before{content:"\f3ee"}.fa-instagram-square:before,.fa-square-instagram:before{content:"\e055"}.fa-battle-net:before{content:"\f835"}.fa-the-red-yeti:before{content:"\f69d"}.fa-hacker-news-square:before,.fa-square-hacker-news:before{content:"\f3af"}.fa-edge:before{content:"\f282"}.fa-threads:before{content:"\e618"}.fa-napster:before{content:"\f3d2"}.fa-snapchat-square:before,.fa-square-snapchat:before{content:"\f2ad"}.fa-google-plus-g:before{content:"\f0d5"}.fa-artstation:before{content:"\f77a"}.fa-markdown:before{content:"\f60f"}.fa-sourcetree:before{content:"\f7d3"}.fa-google-plus:before{content:"\f2b3"}.fa-diaspora:before{content:"\f791"}.fa-foursquare:before{content:"\f180"}.fa-stack-overflow:before{content:"\f16c"}.fa-github-alt:before{content:"\f113"}.fa-phoenix-squadron:before{content:"\f511"}.fa-pagelines:before{content:"\f18c"}.fa-algolia:before{content:"\f36c"}.fa-red-river:before{content:"\f3e3"}.fa-creative-commons-sa:before{content:"\f4ef"}.fa-safari:before{content:"\f267"}.fa-google:before{content:"\f1a0"}.fa-font-awesome-alt:before,.fa-square-font-awesome-stroke:before{content:"\f35c"}.fa-atlassian:before{content:"\f77b"}.fa-linkedin-in:before{content:"\f0e1"}.fa-digital-ocean:before{content:"\f391"}.fa-nimblr:before{content:"\f5a8"}.fa-chromecast:before{content:"\f838"}.fa-evernote:before{content:"\f839"}.fa-hacker-news:before{content:"\f1d4"}.fa-creative-commons-sampling:before{content:"\f4f0"}.fa-adversal:before{content:"\f36a"}.fa-creative-commons:before{content:"\f25e"}.fa-watchman-monitoring:before{content:"\e087"}.fa-fonticons:before{content:"\f280"}.fa-weixin:before{content:"\f1d7"}.fa-shirtsinbulk:before{content:"\f214"}.fa-codepen:before{content:"\f1cb"}.fa-git-alt:before{content:"\f841"}.fa-lyft:before{content:"\f3c3"}.fa-rev:before{content:"\f5b2"}.fa-windows:before{content:"\f17a"}.fa-wizards-of-the-coast:before{content:"\f730"}.fa-square-viadeo:before,.fa-viadeo-square:before{content:"\f2aa"}.fa-meetup:before{content:"\f2e0"}.fa-centos:before{content:"\f789"}.fa-adn:before{content:"\f170"}.fa-cloudsmith:before{content:"\f384"}.fa-opensuse:before{content:"\e62b"}.fa-pied-piper-alt:before{content:"\f1a8"}.fa-dribbble-square:before,.fa-square-dribbble:before{content:"\f397"}.fa-codiepie:before{content:"\f284"}.fa-node:before{content:"\f419"}.fa-mix:before{content:"\f3cb"}.fa-steam:before{content:"\f1b6"}.fa-cc-apple-pay:before{content:"\f416"}.fa-scribd:before{content:"\f28a"}.fa-debian:before{content:"\e60b"}.fa-openid:before{content:"\f19b"}.fa-instalod:before{content:"\e081"}.fa-expeditedssl:before{content:"\f23e"}.fa-sellcast:before{content:"\f2da"}.fa-square-twitter:before,.fa-twitter-square:before{content:"\f081"}.fa-r-project:before{content:"\f4f7"}.fa-delicious:before{content:"\f1a5"}.fa-freebsd:before{content:"\f3a4"}.fa-vuejs:before{content:"\f41f"}.fa-accusoft:before{content:"\f369"}.fa-ioxhost:before{content:"\f208"}.fa-fonticons-fi:before{content:"\f3a2"}.fa-app-store:before{content:"\f36f"}.fa-cc-mastercard:before{content:"\f1f1"}.fa-itunes-note:before{content:"\f3b5"}.fa-golang:before{content:"\e40f"}.fa-kickstarter:before{content:"\f3bb"}.fa-grav:before{content:"\f2d6"}.fa-weibo:before{content:"\f18a"}.fa-uncharted:before{content:"\e084"}.fa-firstdraft:before{content:"\f3a1"}.fa-square-youtube:before,.fa-youtube-square:before{content:"\f431"}.fa-wikipedia-w:before{content:"\f266"}.fa-rendact:before,.fa-wpressr:before{content:"\f3e4"}.fa-angellist:before{content:"\f209"}.fa-galactic-republic:before{content:"\f50c"}.fa-nfc-directional:before{content:"\e530"}.fa-skype:before{content:"\f17e"}.fa-joget:before{content:"\f3b7"}.fa-fedora:before{content:"\f798"}.fa-stripe-s:before{content:"\f42a"}.fa-meta:before{content:"\e49b"}.fa-laravel:before{content:"\f3bd"}.fa-hotjar:before{content:"\f3b1"}.fa-bluetooth-b:before{content:"\f294"}.fa-square-letterboxd:before{content:"\e62e"}.fa-sticker-mule:before{content:"\f3f7"}.fa-creative-commons-zero:before{content:"\f4f3"}.fa-hips:before{content:"\f452"}.fa-behance:before{content:"\f1b4"}.fa-reddit:before{content:"\f1a1"}.fa-discord:before{content:"\f392"}.fa-chrome:before{content:"\f268"}.fa-app-store-ios:before{content:"\f370"}.fa-cc-discover:before{content:"\f1f2"}.fa-wpbeginner:before{content:"\f297"}.fa-confluence:before{content:"\f78d"}.fa-shoelace:before{content:"\e60c"}.fa-mdb:before{content:"\f8ca"}.fa-dochub:before{content:"\f394"}.fa-accessible-icon:before{content:"\f368"}.fa-ebay:before{content:"\f4f4"}.fa-amazon:before{content:"\f270"}.fa-unsplash:before{content:"\e07c"}.fa-yarn:before{content:"\f7e3"}.fa-square-steam:before,.fa-steam-square:before{content:"\f1b7"}.fa-500px:before{content:"\f26e"}.fa-square-vimeo:before,.fa-vimeo-square:before{content:"\f194"}.fa-asymmetrik:before{content:"\f372"}.fa-font-awesome-flag:before,.fa-font-awesome-logo-full:before,.fa-font-awesome:before{content:"\f2b4"}.fa-gratipay:before{content:"\f184"}.fa-apple:before{content:"\f179"}.fa-hive:before{content:"\e07f"}.fa-gitkraken:before{content:"\f3a6"}.fa-keybase:before{content:"\f4f5"}.fa-apple-pay:before{content:"\f415"}.fa-padlet:before{content:"\e4a0"}.fa-amazon-pay:before{content:"\f42c"}.fa-github-square:before,.fa-square-github:before{content:"\f092"}.fa-stumbleupon:before{content:"\f1a4"}.fa-fedex:before{content:"\f797"}.fa-phoenix-framework:before{content:"\f3dc"}.fa-shopify:before{content:"\e057"}.fa-neos:before{content:"\f612"}.fa-square-threads:before{content:"\e619"}.fa-hackerrank:before{content:"\f5f7"}.fa-researchgate:before{content:"\f4f8"}.fa-swift:before{content:"\f8e1"}.fa-angular:before{content:"\f420"}.fa-speakap:before{content:"\f3f3"}.fa-angrycreative:before{content:"\f36e"}.fa-y-combinator:before{content:"\f23b"}.fa-empire:before{content:"\f1d1"}.fa-envira:before{content:"\f299"}.fa-google-scholar:before{content:"\e63b"}.fa-gitlab-square:before,.fa-square-gitlab:before{content:"\e5ae"}.fa-studiovinari:before{content:"\f3f8"}.fa-pied-piper:before{content:"\f2ae"}.fa-wordpress:before{content:"\f19a"}.fa-product-hunt:before{content:"\f288"}.fa-firefox:before{content:"\f269"}.fa-linode:before{content:"\f2b8"}.fa-goodreads:before{content:"\f3a8"}.fa-odnoklassniki-square:before,.fa-square-odnoklassniki:before{content:"\f264"}.fa-jsfiddle:before{content:"\f1cc"}.fa-sith:before{content:"\f512"}.fa-themeisle:before{content:"\f2b2"}.fa-page4:before{content:"\f3d7"}.fa-hashnode:before{content:"\e499"}.fa-react:before{content:"\f41b"}.fa-cc-paypal:before{content:"\f1f4"}.fa-squarespace:before{content:"\f5be"}.fa-cc-stripe:before{content:"\f1f5"}.fa-creative-commons-share:before{content:"\f4f2"}.fa-bitcoin:before{content:"\f379"}.fa-keycdn:before{content:"\f3ba"}.fa-opera:before{content:"\f26a"}.fa-itch-io:before{content:"\f83a"}.fa-umbraco:before{content:"\f8e8"}.fa-galactic-senate:before{content:"\f50d"}.fa-ubuntu:before{content:"\f7df"}.fa-draft2digital:before{content:"\f396"}.fa-stripe:before{content:"\f429"}.fa-houzz:before{content:"\f27c"}.fa-gg:before{content:"\f260"}.fa-dhl:before{content:"\f790"}.fa-pinterest-square:before,.fa-square-pinterest:before{content:"\f0d3"}.fa-xing:before{content:"\f168"}.fa-blackberry:before{content:"\f37b"}.fa-creative-commons-pd:before{content:"\f4ec"}.fa-playstation:before{content:"\f3df"}.fa-quinscape:before{content:"\f459"}.fa-less:before{content:"\f41d"}.fa-blogger-b:before{content:"\f37d"}.fa-opencart:before{content:"\f23d"}.fa-vine:before{content:"\f1ca"}.fa-signal-messenger:before{content:"\e663"}.fa-paypal:before{content:"\f1ed"}.fa-gitlab:before{content:"\f296"}.fa-typo3:before{content:"\f42b"}.fa-reddit-alien:before{content:"\f281"}.fa-yahoo:before{content:"\f19e"}.fa-dailymotion:before{content:"\e052"}.fa-affiliatetheme:before{content:"\f36b"}.fa-pied-piper-pp:before{content:"\f1a7"}.fa-bootstrap:before{content:"\f836"}.fa-odnoklassniki:before{content:"\f263"}.fa-nfc-symbol:before{content:"\e531"}.fa-mintbit:before{content:"\e62f"}.fa-ethereum:before{content:"\f42e"}.fa-speaker-deck:before{content:"\f83c"}.fa-creative-commons-nc-eu:before{content:"\f4e9"}.fa-patreon:before{content:"\f3d9"}.fa-avianex:before{content:"\f374"}.fa-ello:before{content:"\f5f1"}.fa-gofore:before{content:"\f3a7"}.fa-bimobject:before{content:"\f378"}.fa-brave-reverse:before{content:"\e63d"}.fa-facebook-f:before{content:"\f39e"}.fa-google-plus-square:before,.fa-square-google-plus:before{content:"\f0d4"}.fa-mandalorian:before{content:"\f50f"}.fa-first-order-alt:before{content:"\f50a"}.fa-osi:before{content:"\f41a"}.fa-google-wallet:before{content:"\f1ee"}.fa-d-and-d-beyond:before{content:"\f6ca"}.fa-periscope:before{content:"\f3da"}.fa-fulcrum:before{content:"\f50b"}.fa-cloudscale:before{content:"\f383"}.fa-forumbee:before{content:"\f211"}.fa-mizuni:before{content:"\f3cc"}.fa-schlix:before{content:"\f3ea"}.fa-square-xing:before,.fa-xing-square:before{content:"\f169"}.fa-bandcamp:before{content:"\f2d5"}.fa-wpforms:before{content:"\f298"}.fa-cloudversify:before{content:"\f385"}.fa-usps:before{content:"\f7e1"}.fa-megaport:before{content:"\f5a3"}.fa-magento:before{content:"\f3c4"}.fa-spotify:before{content:"\f1bc"}.fa-optin-monster:before{content:"\f23c"}.fa-fly:before{content:"\f417"}.fa-aviato:before{content:"\f421"}.fa-itunes:before{content:"\f3b4"}.fa-cuttlefish:before{content:"\f38c"}.fa-blogger:before{content:"\f37c"}.fa-flickr:before{content:"\f16e"}.fa-viber:before{content:"\f409"}.fa-soundcloud:before{content:"\f1be"}.fa-digg:before{content:"\f1a6"}.fa-tencent-weibo:before{content:"\f1d5"}.fa-letterboxd:before{content:"\e62d"}.fa-symfony:before{content:"\f83d"}.fa-maxcdn:before{content:"\f136"}.fa-etsy:before{content:"\f2d7"}.fa-facebook-messenger:before{content:"\f39f"}.fa-audible:before{content:"\f373"}.fa-think-peaks:before{content:"\f731"}.fa-bilibili:before{content:"\e3d9"}.fa-erlang:before{content:"\f39d"}.fa-x-twitter:before{content:"\e61b"}.fa-cotton-bureau:before{content:"\f89e"}.fa-dashcube:before{content:"\f210"}.fa-42-group:before,.fa-innosoft:before{content:"\e080"}.fa-stack-exchange:before{content:"\f18d"}.fa-elementor:before{content:"\f430"}.fa-pied-piper-square:before,.fa-square-pied-piper:before{content:"\e01e"}.fa-creative-commons-nd:before{content:"\f4eb"}.fa-palfed:before{content:"\f3d8"}.fa-superpowers:before{content:"\f2dd"}.fa-resolving:before{content:"\f3e7"}.fa-xbox:before{content:"\f412"}.fa-searchengin:before{content:"\f3eb"}.fa-tiktok:before{content:"\e07b"}.fa-facebook-square:before,.fa-square-facebook:before{content:"\f082"}.fa-renren:before{content:"\f18b"}.fa-linux:before{content:"\f17c"}.fa-glide:before{content:"\f2a5"}.fa-linkedin:before{content:"\f08c"}.fa-hubspot:before{content:"\f3b2"}.fa-deploydog:before{content:"\f38e"}.fa-twitch:before{content:"\f1e8"}.fa-ravelry:before{content:"\f2d9"}.fa-mixer:before{content:"\e056"}.fa-lastfm-square:before,.fa-square-lastfm:before{content:"\f203"}.fa-vimeo:before{content:"\f40a"}.fa-mendeley:before{content:"\f7b3"}.fa-uniregistry:before{content:"\f404"}.fa-figma:before{content:"\f799"}.fa-creative-commons-remix:before{content:"\f4ee"}.fa-cc-amazon-pay:before{content:"\f42d"}.fa-dropbox:before{content:"\f16b"}.fa-instagram:before{content:"\f16d"}.fa-cmplid:before{content:"\e360"}.fa-upwork:before{content:"\e641"}.fa-facebook:before{content:"\f09a"}.fa-gripfire:before{content:"\f3ac"}.fa-jedi-order:before{content:"\f50e"}.fa-uikit:before{content:"\f403"}.fa-fort-awesome-alt:before{content:"\f3a3"}.fa-phabricator:before{content:"\f3db"}.fa-ussunnah:before{content:"\f407"}.fa-earlybirds:before{content:"\f39a"}.fa-trade-federation:before{content:"\f513"}.fa-autoprefixer:before{content:"\f41c"}.fa-whatsapp:before{content:"\f232"}.fa-slideshare:before{content:"\f1e7"}.fa-google-play:before{content:"\f3ab"}.fa-viadeo:before{content:"\f2a9"}.fa-line:before{content:"\f3c0"}.fa-google-drive:before{content:"\f3aa"}.fa-servicestack:before{content:"\f3ec"}.fa-simplybuilt:before{content:"\f215"}.fa-bitbucket:before{content:"\f171"}.fa-imdb:before{content:"\f2d8"}.fa-deezer:before{content:"\e077"}.fa-raspberry-pi:before{content:"\f7bb"}.fa-jira:before{content:"\f7b1"}.fa-docker:before{content:"\f395"}.fa-screenpal:before{content:"\e570"}.fa-bluetooth:before{content:"\f293"}.fa-gitter:before{content:"\f426"}.fa-d-and-d:before{content:"\f38d"}.fa-microblog:before{content:"\e01a"}.fa-cc-diners-club:before{content:"\f24c"}.fa-gg-circle:before{content:"\f261"}.fa-pied-piper-hat:before{content:"\f4e5"}.fa-kickstarter-k:before{content:"\f3bc"}.fa-yandex:before{content:"\f413"}.fa-readme:before{content:"\f4d5"}.fa-html5:before{content:"\f13b"}.fa-sellsy:before{content:"\f213"}.fa-sass:before{content:"\f41e"}.fa-wirsindhandwerk:before,.fa-wsh:before{content:"\e2d0"}.fa-buromobelexperte:before{content:"\f37f"}.fa-salesforce:before{content:"\f83b"}.fa-octopus-deploy:before{content:"\e082"}.fa-medapps:before{content:"\f3c6"}.fa-ns8:before{content:"\f3d5"}.fa-pinterest-p:before{content:"\f231"}.fa-apper:before{content:"\f371"}.fa-fort-awesome:before{content:"\f286"}.fa-waze:before{content:"\f83f"}.fa-cc-jcb:before{content:"\f24b"}.fa-snapchat-ghost:before,.fa-snapchat:before{content:"\f2ab"}.fa-fantasy-flight-games:before{content:"\f6dc"}.fa-rust:before{content:"\e07a"}.fa-wix:before{content:"\f5cf"}.fa-behance-square:before,.fa-square-behance:before{content:"\f1b5"}.fa-supple:before{content:"\f3f9"}.fa-webflow:before{content:"\e65c"}.fa-rebel:before{content:"\f1d0"}.fa-css3:before{content:"\f13c"}.fa-staylinked:before{content:"\f3f5"}.fa-kaggle:before{content:"\f5fa"}.fa-space-awesome:before{content:"\e5ac"}.fa-deviantart:before{content:"\f1bd"}.fa-cpanel:before{content:"\f388"}.fa-goodreads-g:before{content:"\f3a9"}.fa-git-square:before,.fa-square-git:before{content:"\f1d2"}.fa-square-tumblr:before,.fa-tumblr-square:before{content:"\f174"}.fa-trello:before{content:"\f181"}.fa-creative-commons-nc-jp:before{content:"\f4ea"}.fa-get-pocket:before{content:"\f265"}.fa-perbyte:before{content:"\e083"}.fa-grunt:before{content:"\f3ad"}.fa-weebly:before{content:"\f5cc"}.fa-connectdevelop:before{content:"\f20e"}.fa-leanpub:before{content:"\f212"}.fa-black-tie:before{content:"\f27e"}.fa-themeco:before{content:"\f5c6"}.fa-python:before{content:"\f3e2"}.fa-android:before{content:"\f17b"}.fa-bots:before{content:"\e340"}.fa-free-code-camp:before{content:"\f2c5"}.fa-hornbill:before{content:"\f592"}.fa-js:before{content:"\f3b8"}.fa-ideal:before{content:"\e013"}.fa-git:before{content:"\f1d3"}.fa-dev:before{content:"\f6cc"}.fa-sketch:before{content:"\f7c6"}.fa-yandex-international:before{content:"\f414"}.fa-cc-amex:before{content:"\f1f3"}.fa-uber:before{content:"\f402"}.fa-github:before{content:"\f09b"}.fa-php:before{content:"\f457"}.fa-alipay:before{content:"\f642"}.fa-youtube:before{content:"\f167"}.fa-skyatlas:before{content:"\f216"}.fa-firefox-browser:before{content:"\e007"}.fa-replyd:before{content:"\f3e6"}.fa-suse:before{content:"\f7d6"}.fa-jenkins:before{content:"\f3b6"}.fa-twitter:before{content:"\f099"}.fa-rockrms:before{content:"\f3e9"}.fa-pinterest:before{content:"\f0d2"}.fa-buffer:before{content:"\f837"}.fa-npm:before{content:"\f3d4"}.fa-yammer:before{content:"\f840"}.fa-btc:before{content:"\f15a"}.fa-dribbble:before{content:"\f17d"}.fa-stumbleupon-circle:before{content:"\f1a3"}.fa-internet-explorer:before{content:"\f26b"}.fa-stubber:before{content:"\e5c7"}.fa-telegram-plane:before,.fa-telegram:before{content:"\f2c6"}.fa-old-republic:before{content:"\f510"}.fa-odysee:before{content:"\e5c6"}.fa-square-whatsapp:before,.fa-whatsapp-square:before{content:"\f40c"}.fa-node-js:before{content:"\f3d3"}.fa-edge-legacy:before{content:"\e078"}.fa-slack-hash:before,.fa-slack:before{content:"\f198"}.fa-medrt:before{content:"\f3c8"}.fa-usb:before{content:"\f287"}.fa-tumblr:before{content:"\f173"}.fa-vaadin:before{content:"\f408"}.fa-quora:before{content:"\f2c4"}.fa-square-x-twitter:before{content:"\e61a"}.fa-reacteurope:before{content:"\f75d"}.fa-medium-m:before,.fa-medium:before{content:"\f23a"}.fa-amilia:before{content:"\f36d"}.fa-mixcloud:before{content:"\f289"}.fa-flipboard:before{content:"\f44d"}.fa-viacoin:before{content:"\f237"}.fa-critical-role:before{content:"\f6c9"}.fa-sitrox:before{content:"\e44a"}.fa-discourse:before{content:"\f393"}.fa-joomla:before{content:"\f1aa"}.fa-mastodon:before{content:"\f4f6"}.fa-airbnb:before{content:"\f834"}.fa-wolf-pack-battalion:before{content:"\f514"}.fa-buy-n-large:before{content:"\f8a6"}.fa-gulp:before{content:"\f3ae"}.fa-creative-commons-sampling-plus:before{content:"\f4f1"}.fa-strava:before{content:"\f428"}.fa-ember:before{content:"\f423"}.fa-canadian-maple-leaf:before{content:"\f785"}.fa-teamspeak:before{content:"\f4f9"}.fa-pushed:before{content:"\f3e1"}.fa-wordpress-simple:before{content:"\f411"}.fa-nutritionix:before{content:"\f3d6"}.fa-wodu:before{content:"\e088"}.fa-google-pay:before{content:"\e079"}.fa-intercom:before{content:"\f7af"}.fa-zhihu:before{content:"\f63f"}.fa-korvue:before{content:"\f42f"}.fa-pix:before{content:"\e43a"}.fa-steam-symbol:before{content:"\f3f6"} -------------------------------------------------------------------------------- /docs/css/extra.css: -------------------------------------------------------------------------------- 1 | table { 2 | border-collapse: collapse; 3 | width: 100%; 4 | text-align: center; 5 | background-color: white; 6 | box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1); 7 | } 8 | th, td { 9 | padding: 12px 20px; 10 | border: 1px solid #ddd; 11 | } 12 | th { 13 | background-color: #4CAF50; 14 | color: white; 15 | } -------------------------------------------------------------------------------- /docs/css/solid.min.css: -------------------------------------------------------------------------------- 1 | /*! 2 | * Font Awesome Free 6.5.1 by @fontawesome - https://fontawesome.com 3 | * License - https://fontawesome.com/license/free (Icons: CC BY 4.0, Fonts: SIL OFL 1.1, Code: MIT License) 4 | * Copyright 2023 Fonticons, Inc. 5 | */ 6 | :host,:root{--fa-style-family-classic:"Font Awesome 6 Free";--fa-font-solid:normal 900 1em/1 "Font Awesome 6 Free"}@font-face{font-family:"Font Awesome 6 Free";font-style:normal;font-weight:900;font-display:block;src:url(../webfonts/fa-solid-900.woff2) format("woff2"),url(../webfonts/fa-solid-900.ttf) format("truetype")}.fa-solid,.fas{font-weight:900} -------------------------------------------------------------------------------- /docs/css/v4-font-face.min.css: -------------------------------------------------------------------------------- 1 | /*! 2 | * Font Awesome Free 6.5.1 by @fontawesome - https://fontawesome.com 3 | * License - https://fontawesome.com/license/free (Icons: CC BY 4.0, Fonts: SIL OFL 1.1, Code: MIT License) 4 | * Copyright 2023 Fonticons, Inc. 5 | */ 6 | @font-face{font-family:"FontAwesome";font-display:block;src:url(../webfonts/fa-solid-900.woff2) format("woff2"),url(../webfonts/fa-solid-900.ttf) format("truetype")}@font-face{font-family:"FontAwesome";font-display:block;src:url(../webfonts/fa-brands-400.woff2) format("woff2"),url(../webfonts/fa-brands-400.ttf) format("truetype")}@font-face{font-family:"FontAwesome";font-display:block;src:url(../webfonts/fa-regular-400.woff2) format("woff2"),url(../webfonts/fa-regular-400.ttf) format("truetype");unicode-range:u+f003,u+f006,u+f014,u+f016-f017,u+f01a-f01b,u+f01d,u+f022,u+f03e,u+f044,u+f046,u+f05c-f05d,u+f06e,u+f070,u+f087-f088,u+f08a,u+f094,u+f096-f097,u+f09d,u+f0a0,u+f0a2,u+f0a4-f0a7,u+f0c5,u+f0c7,u+f0e5-f0e6,u+f0eb,u+f0f6-f0f8,u+f10c,u+f114-f115,u+f118-f11a,u+f11c-f11d,u+f133,u+f147,u+f14e,u+f150-f152,u+f185-f186,u+f18e,u+f190-f192,u+f196,u+f1c1-f1c9,u+f1d9,u+f1db,u+f1e3,u+f1ea,u+f1f7,u+f1f9,u+f20a,u+f247-f248,u+f24a,u+f24d,u+f255-f25b,u+f25d,u+f271-f274,u+f278,u+f27b,u+f28c,u+f28e,u+f29c,u+f2b5,u+f2b7,u+f2ba,u+f2bc,u+f2be,u+f2c0-f2c1,u+f2c3,u+f2d0,u+f2d2,u+f2d4,u+f2dc}@font-face{font-family:"FontAwesome";font-display:block;src:url(../webfonts/fa-v4compatibility.woff2) format("woff2"),url(../webfonts/fa-v4compatibility.ttf) format("truetype");unicode-range:u+f041,u+f047,u+f065-f066,u+f07d-f07e,u+f080,u+f08b,u+f08e,u+f090,u+f09a,u+f0ac,u+f0ae,u+f0b2,u+f0d0,u+f0d6,u+f0e4,u+f0ec,u+f10a-f10b,u+f123,u+f13e,u+f148-f149,u+f14c,u+f156,u+f15e,u+f160-f161,u+f163,u+f175-f178,u+f195,u+f1f8,u+f219,u+f27a} -------------------------------------------------------------------------------- /docs/img/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/docs/img/favicon.ico -------------------------------------------------------------------------------- /docs/img/grid.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/docs/img/grid.png -------------------------------------------------------------------------------- /docs/img/pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/docs/img/pipeline.png -------------------------------------------------------------------------------- /docs/img/time_comp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/docs/img/time_comp.png -------------------------------------------------------------------------------- /docs/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ScaleSC 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 95 | 96 |
97 |
98 |
132 |
133 | 134 | 135 |

Welcome to ScaleSC

136 |

137 | ScaleSC 138 |

139 | 140 |

141 | A GPU-accelerated tool for large scale scRNA-seq pipeline. 142 |

143 | 144 |

145 | Highlights • 146 | Why ScaleSC • 147 | Installation • 148 | API Reference 149 |

150 | 151 |

Highlights

152 |
    153 |
  • Fast scRNA-seq pipeline including QC, Normalization, Batch-effect Removal, Dimension Reduction in a similar syntax as scanpy and rapids-singlecell.
  • 154 |
  • Scale to dataset with more than 10M cells. Chunk the data to avoid the int32 limitation in cupyx.scipy.sparse used by rapids-singlecell that disables the computing for even moderate-size dataset (~1M).
  • 155 |
  • Reconcile output at each step to scanpy to reproduce the same results as on CPU end.
  • 156 |
  • Improvement on harmonypy which allows dataset with more than 10M cells and more than 1000 samples to be run on a single GPU (A100 80G).
  • 157 |
158 |

Why ScaleSC

159 |
160 | ScaleSC Pipeline 161 | time-comp 162 | ScaleSC includes regular prerpocessing steps: QC, Filtering, HVG, PCA, Batch Correction, Clustering, Annotation. 163 |
164 | 165 |


166 |
167 | 168 | Overview of 3 different packages* 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 |
scanpyscalescrapids-singlecell
GPU Support
int32 Issue
Upper Limit of # Cells♾️~20M~1M
Upper Limit of # Samples♾️>1000<100
206 | * Test on datasets with ~35k genes. ScaleSC only support to run Harmony on a single GPU, this memory limitation greatly limits the capability of scaling to even larger dataset. However, there would be no limitation on number of cells if you prefer not to run Harmony (QC, HVG, and PCA only). 207 |
208 | 209 |


210 |
211 | Time comparsion between Scanpy(CPU) and ScaleSC(GPU) on A100(80G) 212 | time-comp 213 | 214 | ScaleSC significantly reduces running time from hours to several minutes. For the extremely large 13M dataset, ScaleSC can finish all steps in just 1 hour! 215 |
216 | 217 |


218 |

How To Install

219 |
220 |

Note: ScaleSC requires a high-end GPU (> 24G VRAM) and a matching CUDA version to support GPU-accelerated computing.

221 |
222 |

Requirements:

223 |
    224 |
  • RAPIDS from Nvidia
  • 225 |
  • rapids-singlecell, an alternative of scanpy that employs GPU for acceleration.
  • 226 |
  • Conda, version >=22.11 is strongly encoruaged, because conda-libmamba-solver is set as default, which significant speeds up solving dependencies.
  • 227 |
  • pip, a python package installer.
  • 228 |
229 |

Environment Setup:

230 |
    231 |
  1. 232 |

    Install RAPIDS through Conda,

    233 |
      234 |
    • conda create -n scalesc -c rapidsai -c conda-forge -c nvidia rapids=24.10 python=3.10 'cuda-version>=11.4,<=11.8'
    • 235 |
    236 |

    Users have flexibility to install it according to their systems by using this online selector.

    237 |
  2. 238 |
  3. 239 |

    Activate conda env,

    240 |
      241 |
    • conda activate scalesc
    • 242 |
    243 |
  4. 244 |
  5. 245 |

    Install rapids-singlecell using pip,

    246 |
      247 |
    • pip install rapids-singlecell
    • 248 |
    249 |
  6. 250 |
  7. 251 |

    Install scaleSC,

    252 |
      253 |
    • pull scaleSC from github
        254 |
      • git clone https://github.com/interactivereport/scaleSC.git
      • 255 |
      256 |
    • 257 |
    • enter the folder and install scaleSC
        258 |
      • cd scaleSC
      • 259 |
      • pip install .
      • 260 |
      261 |
    • 262 |
    263 |
  8. 264 |
  9. check env:
      265 |
    • python -c "import scalesc; print(scalesc.__version__)" == 0.1.0
    • 266 |
    • python -c "import cupy; print(cupy.__version__)" >= 13.3.0
    • 267 |
    • python -c "import cuml; print(cuml.__version__)" >= 24.10
    • 268 |
    • python -c "import cupy; print(cupy.cuda.is_available())" = True
    • 269 |
    • python -c "import xgboost; print(xgboost.__version__) >= 2.1.1, optionally for marker annotation
    • 270 |
    271 |
  10. 272 |
273 |

Tutorial

274 |
277 |
278 |
279 | 280 | 284 | 285 | 289 | 290 | 291 | 292 | 352 | 353 | 354 | 355 | 356 | 360 | -------------------------------------------------------------------------------- /docs/js/base.js: -------------------------------------------------------------------------------- 1 | function getSearchTerm() { 2 | var sPageURL = window.location.search.substring(1); 3 | var sURLVariables = sPageURL.split('&'); 4 | for (var i = 0; i < sURLVariables.length; i++) { 5 | var sParameterName = sURLVariables[i].split('='); 6 | if (sParameterName[0] == 'q') { 7 | return sParameterName[1]; 8 | } 9 | } 10 | } 11 | 12 | function applyTopPadding() { 13 | // Update various absolute positions to match where the main container 14 | // starts. This is necessary for handling multi-line nav headers, since 15 | // that pushes the main container down. 16 | var container = document.querySelector('body > .container'); 17 | var offset = container.offsetTop; 18 | 19 | document.documentElement.style.scrollPaddingTop = offset + 'px'; 20 | document.querySelectorAll('.bs-sidebar.affix').forEach(function(sidebar) { 21 | sidebar.style.top = offset + 'px'; 22 | }); 23 | } 24 | 25 | document.addEventListener("DOMContentLoaded", function () { 26 | var search_term = getSearchTerm(); 27 | var search_modal = new bootstrap.Modal(document.getElementById('mkdocs_search_modal')); 28 | var keyboard_modal = new bootstrap.Modal(document.getElementById('mkdocs_keyboard_modal')); 29 | 30 | if (search_term) { 31 | search_modal.show(); 32 | } 33 | 34 | // make sure search input gets autofocus every time modal opens. 35 | document.getElementById('mkdocs_search_modal').addEventListener('shown.bs.modal', function() { 36 | document.getElementById('mkdocs-search-query').focus(); 37 | }); 38 | 39 | // Close search modal when result is selected 40 | // The links get added later so listen to parent 41 | document.getElementById('mkdocs-search-results').addEventListener('click', function(e) { 42 | if (e.target.tagName === 'A') { 43 | search_modal.hide(); 44 | } 45 | }); 46 | 47 | // Populate keyboard modal with proper Keys 48 | document.querySelector('.help.shortcut kbd').innerHTML = keyCodes[shortcuts.help]; 49 | document.querySelector('.prev.shortcut kbd').innerHTML = keyCodes[shortcuts.previous]; 50 | document.querySelector('.next.shortcut kbd').innerHTML = keyCodes[shortcuts.next]; 51 | document.querySelector('.search.shortcut kbd').innerHTML = keyCodes[shortcuts.search]; 52 | 53 | // Keyboard navigation 54 | document.addEventListener("keydown", function(e) { 55 | if (e.target.tagName === 'INPUT' || e.target.tagName === 'TEXTAREA') return true; 56 | var key = e.which || e.keyCode || window.event && window.event.keyCode; 57 | var page; 58 | switch (key) { 59 | case shortcuts.next: 60 | page = document.querySelector('.navbar a[rel="next"]'); 61 | break; 62 | case shortcuts.previous: 63 | page = document.querySelector('.navbar a[rel="prev"]'); 64 | break; 65 | case shortcuts.search: 66 | e.preventDefault(); 67 | keyboard_modal.hide(); 68 | search_modal.show(); 69 | document.getElementById('mkdocs-search-query').focus(); 70 | break; 71 | case shortcuts.help: 72 | search_modal.hide(); 73 | keyboard_modal.show(); 74 | break; 75 | default: break; 76 | } 77 | if (page && page.hasAttribute('href')) { 78 | keyboard_modal.hide(); 79 | window.location.href = page.getAttribute('href'); 80 | } 81 | }); 82 | 83 | document.querySelectorAll('table').forEach(function(table) { 84 | table.classList.add('table', 'table-striped', 'table-hover'); 85 | }); 86 | 87 | function showInnerDropdown(item) { 88 | var popup = item.nextElementSibling; 89 | popup.classList.add('show'); 90 | item.classList.add('open'); 91 | 92 | // First, close any sibling dropdowns. 93 | var container = item.parentElement.parentElement; 94 | container.querySelectorAll(':scope > .dropdown-submenu > a').forEach(function(el) { 95 | if (el !== item) { 96 | hideInnerDropdown(el); 97 | } 98 | }); 99 | 100 | var popupMargin = 10; 101 | var maxBottom = window.innerHeight - popupMargin; 102 | var bounds = item.getBoundingClientRect(); 103 | 104 | popup.style.left = bounds.right + 'px'; 105 | if (bounds.top + popup.clientHeight > maxBottom && 106 | bounds.top > window.innerHeight / 2) { 107 | popup.style.top = (bounds.bottom - popup.clientHeight) + 'px'; 108 | popup.style.maxHeight = (bounds.bottom - popupMargin) + 'px'; 109 | } else { 110 | popup.style.top = bounds.top + 'px'; 111 | popup.style.maxHeight = (maxBottom - bounds.top) + 'px'; 112 | } 113 | } 114 | 115 | function hideInnerDropdown(item) { 116 | var popup = item.nextElementSibling; 117 | popup.classList.remove('show'); 118 | item.classList.remove('open'); 119 | 120 | popup.scrollTop = 0; 121 | var menu = popup.querySelector('.dropdown-menu'); 122 | if (menu) { 123 | menu.scrollTop = 0; 124 | } 125 | var dropdown = popup.querySelector('.dropdown-submenu > a'); 126 | if (dropdown) { 127 | dropdown.classList.remove('open'); 128 | } 129 | } 130 | 131 | document.querySelectorAll('.dropdown-submenu > a').forEach(function(item) { 132 | item.addEventListener('click', function(e) { 133 | if (item.nextElementSibling.classList.contains('show')) { 134 | hideInnerDropdown(item); 135 | } else { 136 | showInnerDropdown(item); 137 | } 138 | 139 | e.stopPropagation(); 140 | e.preventDefault(); 141 | }); 142 | }); 143 | 144 | document.querySelectorAll('.dropdown-menu').forEach(function(menu) { 145 | menu.parentElement.addEventListener('hide.bs.dropdown', function() { 146 | menu.scrollTop = 0; 147 | var dropdown = menu.querySelector('.dropdown-submenu > a'); 148 | if (dropdown) { 149 | dropdown.classList.remove('open'); 150 | } 151 | menu.querySelectorAll('.dropdown-menu .dropdown-menu').forEach(function(submenu) { 152 | submenu.classList.remove('show'); 153 | }); 154 | }); 155 | }); 156 | 157 | applyTopPadding(); 158 | }); 159 | 160 | window.addEventListener('resize', applyTopPadding); 161 | 162 | var scrollSpy = new bootstrap.ScrollSpy(document.body, { 163 | target: '.bs-sidebar' 164 | }); 165 | 166 | /* Prevent disabled links from causing a page reload */ 167 | document.querySelectorAll("li.disabled a").forEach(function(item) { 168 | item.addEventListener("click", function(event) { 169 | event.preventDefault(); 170 | }); 171 | }); 172 | 173 | // See https://www.cambiaresearch.com/articles/15/javascript-char-codes-key-codes 174 | // We only list common keys below. Obscure keys are omitted and their use is discouraged. 175 | var keyCodes = { 176 | 8: 'backspace', 177 | 9: 'tab', 178 | 13: 'enter', 179 | 16: 'shift', 180 | 17: 'ctrl', 181 | 18: 'alt', 182 | 19: 'pause/break', 183 | 20: 'caps lock', 184 | 27: 'escape', 185 | 32: 'spacebar', 186 | 33: 'page up', 187 | 34: 'page down', 188 | 35: 'end', 189 | 36: 'home', 190 | 37: '←', 191 | 38: '↑', 192 | 39: '→', 193 | 40: '↓', 194 | 45: 'insert', 195 | 46: 'delete', 196 | 48: '0', 197 | 49: '1', 198 | 50: '2', 199 | 51: '3', 200 | 52: '4', 201 | 53: '5', 202 | 54: '6', 203 | 55: '7', 204 | 56: '8', 205 | 57: '9', 206 | 65: 'a', 207 | 66: 'b', 208 | 67: 'c', 209 | 68: 'd', 210 | 69: 'e', 211 | 70: 'f', 212 | 71: 'g', 213 | 72: 'h', 214 | 73: 'i', 215 | 74: 'j', 216 | 75: 'k', 217 | 76: 'l', 218 | 77: 'm', 219 | 78: 'n', 220 | 79: 'o', 221 | 80: 'p', 222 | 81: 'q', 223 | 82: 'r', 224 | 83: 's', 225 | 84: 't', 226 | 85: 'u', 227 | 86: 'v', 228 | 87: 'w', 229 | 88: 'x', 230 | 89: 'y', 231 | 90: 'z', 232 | 91: 'Left Windows Key / Left ⌘', 233 | 92: 'Right Windows Key', 234 | 93: 'Windows Menu / Right ⌘', 235 | 96: 'numpad 0', 236 | 97: 'numpad 1', 237 | 98: 'numpad 2', 238 | 99: 'numpad 3', 239 | 100: 'numpad 4', 240 | 101: 'numpad 5', 241 | 102: 'numpad 6', 242 | 103: 'numpad 7', 243 | 104: 'numpad 8', 244 | 105: 'numpad 9', 245 | 106: 'multiply', 246 | 107: 'add', 247 | 109: 'subtract', 248 | 110: 'decimal point', 249 | 111: 'divide', 250 | 112: 'f1', 251 | 113: 'f2', 252 | 114: 'f3', 253 | 115: 'f4', 254 | 116: 'f5', 255 | 117: 'f6', 256 | 118: 'f7', 257 | 119: 'f8', 258 | 120: 'f9', 259 | 121: 'f10', 260 | 122: 'f11', 261 | 123: 'f12', 262 | 124: 'f13', 263 | 125: 'f14', 264 | 126: 'f15', 265 | 127: 'f16', 266 | 128: 'f17', 267 | 129: 'f18', 268 | 130: 'f19', 269 | 131: 'f20', 270 | 132: 'f21', 271 | 133: 'f22', 272 | 134: 'f23', 273 | 135: 'f24', 274 | 144: 'num lock', 275 | 145: 'scroll lock', 276 | 186: ';', 277 | 187: '=', 278 | 188: ',', 279 | 189: '‐', 280 | 190: '.', 281 | 191: '?', 282 | 192: '`', 283 | 219: '[', 284 | 220: '\', 285 | 221: ']', 286 | 222: ''', 287 | }; 288 | -------------------------------------------------------------------------------- /docs/js/darkmode.js: -------------------------------------------------------------------------------- 1 | function setColorMode(mode) { 2 | // Switch between light/dark theme. `mode` is a string value of either 'dark' or 'light'. 3 | var hljs_light = document.getElementById('hljs-light'), 4 | hljs_dark = document.getElementById('hljs-dark'); 5 | document.documentElement.setAttribute('data-bs-theme', mode); 6 | if (mode == 'dark') { 7 | hljs_light.disabled = true; 8 | hljs_dark.disabled = false; 9 | } else { 10 | hljs_dark.disabled = true; 11 | hljs_light.disabled = false; 12 | } 13 | } 14 | 15 | function updateModeToggle(mode) { 16 | // Update icon and toggle checkmarks of color mode selector. 17 | var menu = document.getElementById('theme-menu'); 18 | document.querySelectorAll('[data-bs-theme-value]') 19 | .forEach(function(toggle) { 20 | if (mode == toggle.getAttribute('data-bs-theme-value')) { 21 | toggle.setAttribute('aria-pressed', 'true'); 22 | toggle.lastElementChild.classList.remove('d-none'); 23 | menu.firstElementChild.setAttribute('class', toggle.firstElementChild.getAttribute('class')); 24 | } else { 25 | toggle.setAttribute('aria-pressed', 'false'); 26 | toggle.lastElementChild.classList.add('d-none'); 27 | } 28 | }); 29 | } 30 | 31 | function onSystemColorSchemeChange(event) { 32 | // Update site color mode to match system color mode. 33 | setColorMode(event.matches ? 'dark' : 'light'); 34 | } 35 | 36 | var mql = window.matchMedia('(prefers-color-scheme: dark)'), 37 | defaultMode = document.documentElement.getAttribute('data-bs-theme'), 38 | storedMode = localStorage.getItem('mkdocs-colormode'); 39 | if (storedMode && storedMode != 'auto') { 40 | setColorMode(storedMode); 41 | updateModeToggle(storedMode); 42 | } else if (storedMode == 'auto' || defaultMode == 'auto') { 43 | setColorMode(mql.matches ? 'dark' : 'light'); 44 | updateModeToggle('auto'); 45 | mql.addEventListener('change', onSystemColorSchemeChange); 46 | } else { 47 | setColorMode(defaultMode); 48 | updateModeToggle(defaultMode); 49 | } 50 | 51 | document.querySelectorAll('[data-bs-theme-value]') 52 | .forEach(function(toggle) { 53 | toggle.addEventListener('click', function (e) { 54 | var mode = e.currentTarget.getAttribute('data-bs-theme-value'); 55 | localStorage.setItem('mkdocs-colormode', mode); 56 | if (mode == 'auto') { 57 | setColorMode(mql.matches ? 'dark' : 'light'); 58 | mql.addEventListener('change', onSystemColorSchemeChange); 59 | } else { 60 | setColorMode(mode); 61 | mql.removeEventListener('change', onSystemColorSchemeChange); 62 | } 63 | updateModeToggle(mode); 64 | }); 65 | }); 66 | -------------------------------------------------------------------------------- /docs/search/main.js: -------------------------------------------------------------------------------- 1 | function getSearchTermFromLocation() { 2 | var sPageURL = window.location.search.substring(1); 3 | var sURLVariables = sPageURL.split('&'); 4 | for (var i = 0; i < sURLVariables.length; i++) { 5 | var sParameterName = sURLVariables[i].split('='); 6 | if (sParameterName[0] == 'q') { 7 | return decodeURIComponent(sParameterName[1].replace(/\+/g, '%20')); 8 | } 9 | } 10 | } 11 | 12 | function joinUrl (base, path) { 13 | if (path.substring(0, 1) === "/") { 14 | // path starts with `/`. Thus it is absolute. 15 | return path; 16 | } 17 | if (base.substring(base.length-1) === "/") { 18 | // base ends with `/` 19 | return base + path; 20 | } 21 | return base + "/" + path; 22 | } 23 | 24 | function escapeHtml (value) { 25 | return value.replace(/&/g, '&') 26 | .replace(/"/g, '"') 27 | .replace(//g, '>'); 29 | } 30 | 31 | function formatResult (location, title, summary) { 32 | return '

'+ escapeHtml(title) + '

' + escapeHtml(summary) +'

'; 33 | } 34 | 35 | function displayResults (results) { 36 | var search_results = document.getElementById("mkdocs-search-results"); 37 | while (search_results.firstChild) { 38 | search_results.removeChild(search_results.firstChild); 39 | } 40 | if (results.length > 0){ 41 | for (var i=0; i < results.length; i++){ 42 | var result = results[i]; 43 | var html = formatResult(result.location, result.title, result.summary); 44 | search_results.insertAdjacentHTML('beforeend', html); 45 | } 46 | } else { 47 | var noResultsText = search_results.getAttribute('data-no-results-text'); 48 | if (!noResultsText) { 49 | noResultsText = "No results found"; 50 | } 51 | search_results.insertAdjacentHTML('beforeend', '

' + noResultsText + '

'); 52 | } 53 | } 54 | 55 | function doSearch () { 56 | var query = document.getElementById('mkdocs-search-query').value; 57 | if (query.length > min_search_length) { 58 | if (!window.Worker) { 59 | displayResults(search(query)); 60 | } else { 61 | searchWorker.postMessage({query: query}); 62 | } 63 | } else { 64 | // Clear results for short queries 65 | displayResults([]); 66 | } 67 | } 68 | 69 | function initSearch () { 70 | var search_input = document.getElementById('mkdocs-search-query'); 71 | if (search_input) { 72 | search_input.addEventListener("keyup", doSearch); 73 | } 74 | var term = getSearchTermFromLocation(); 75 | if (term) { 76 | search_input.value = term; 77 | doSearch(); 78 | } 79 | } 80 | 81 | function onWorkerMessage (e) { 82 | if (e.data.allowSearch) { 83 | initSearch(); 84 | } else if (e.data.results) { 85 | var results = e.data.results; 86 | displayResults(results); 87 | } else if (e.data.config) { 88 | min_search_length = e.data.config.min_search_length-1; 89 | } 90 | } 91 | 92 | if (!window.Worker) { 93 | console.log('Web Worker API not supported'); 94 | // load index in main thread 95 | $.getScript(joinUrl(base_url, "search/worker.js")).done(function () { 96 | console.log('Loaded worker'); 97 | init(); 98 | window.postMessage = function (msg) { 99 | onWorkerMessage({data: msg}); 100 | }; 101 | }).fail(function (jqxhr, settings, exception) { 102 | console.error('Could not load worker.js'); 103 | }); 104 | } else { 105 | // Wrap search in a web worker 106 | var searchWorker = new Worker(joinUrl(base_url, "search/worker.js")); 107 | searchWorker.postMessage({init: true}); 108 | searchWorker.onmessage = onWorkerMessage; 109 | } 110 | -------------------------------------------------------------------------------- /docs/search/worker.js: -------------------------------------------------------------------------------- 1 | var base_path = 'function' === typeof importScripts ? '.' : '/search/'; 2 | var allowSearch = false; 3 | var index; 4 | var documents = {}; 5 | var lang = ['en']; 6 | var data; 7 | 8 | function getScript(script, callback) { 9 | console.log('Loading script: ' + script); 10 | $.getScript(base_path + script).done(function () { 11 | callback(); 12 | }).fail(function (jqxhr, settings, exception) { 13 | console.log('Error: ' + exception); 14 | }); 15 | } 16 | 17 | function getScriptsInOrder(scripts, callback) { 18 | if (scripts.length === 0) { 19 | callback(); 20 | return; 21 | } 22 | getScript(scripts[0], function() { 23 | getScriptsInOrder(scripts.slice(1), callback); 24 | }); 25 | } 26 | 27 | function loadScripts(urls, callback) { 28 | if( 'function' === typeof importScripts ) { 29 | importScripts.apply(null, urls); 30 | callback(); 31 | } else { 32 | getScriptsInOrder(urls, callback); 33 | } 34 | } 35 | 36 | function onJSONLoaded () { 37 | data = JSON.parse(this.responseText); 38 | var scriptsToLoad = ['lunr.js']; 39 | if (data.config && data.config.lang && data.config.lang.length) { 40 | lang = data.config.lang; 41 | } 42 | if (lang.length > 1 || lang[0] !== "en") { 43 | scriptsToLoad.push('lunr.stemmer.support.js'); 44 | if (lang.length > 1) { 45 | scriptsToLoad.push('lunr.multi.js'); 46 | } 47 | if (lang.includes("ja") || lang.includes("jp")) { 48 | scriptsToLoad.push('tinyseg.js'); 49 | } 50 | for (var i=0; i < lang.length; i++) { 51 | if (lang[i] != 'en') { 52 | scriptsToLoad.push(['lunr', lang[i], 'js'].join('.')); 53 | } 54 | } 55 | } 56 | loadScripts(scriptsToLoad, onScriptsLoaded); 57 | } 58 | 59 | function onScriptsLoaded () { 60 | console.log('All search scripts loaded, building Lunr index...'); 61 | if (data.config && data.config.separator && data.config.separator.length) { 62 | lunr.tokenizer.separator = new RegExp(data.config.separator); 63 | } 64 | 65 | if (data.index) { 66 | index = lunr.Index.load(data.index); 67 | data.docs.forEach(function (doc) { 68 | documents[doc.location] = doc; 69 | }); 70 | console.log('Lunr pre-built index loaded, search ready'); 71 | } else { 72 | index = lunr(function () { 73 | if (lang.length === 1 && lang[0] !== "en" && lunr[lang[0]]) { 74 | this.use(lunr[lang[0]]); 75 | } else if (lang.length > 1) { 76 | this.use(lunr.multiLanguage.apply(null, lang)); // spread operator not supported in all browsers: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_operator#Browser_compatibility 77 | } 78 | this.field('title'); 79 | this.field('text'); 80 | this.ref('location'); 81 | 82 | for (var i=0; i < data.docs.length; i++) { 83 | var doc = data.docs[i]; 84 | this.add(doc); 85 | documents[doc.location] = doc; 86 | } 87 | }); 88 | console.log('Lunr index built, search ready'); 89 | } 90 | allowSearch = true; 91 | postMessage({config: data.config}); 92 | postMessage({allowSearch: allowSearch}); 93 | } 94 | 95 | function init () { 96 | var oReq = new XMLHttpRequest(); 97 | oReq.addEventListener("load", onJSONLoaded); 98 | var index_path = base_path + '/search_index.json'; 99 | if( 'function' === typeof importScripts ){ 100 | index_path = 'search_index.json'; 101 | } 102 | oReq.open("GET", index_path); 103 | oReq.send(); 104 | } 105 | 106 | function search (query) { 107 | if (!allowSearch) { 108 | console.error('Assets for search still loading'); 109 | return; 110 | } 111 | 112 | var resultDocuments = []; 113 | var results = index.search(query); 114 | for (var i=0; i < results.length; i++){ 115 | var result = results[i]; 116 | doc = documents[result.ref]; 117 | doc.summary = doc.text.substring(0, 200); 118 | resultDocuments.push(doc); 119 | } 120 | return resultDocuments; 121 | } 122 | 123 | if( 'function' === typeof importScripts ) { 124 | onmessage = function (e) { 125 | if (e.data.init) { 126 | init(); 127 | } else if (e.data.query) { 128 | postMessage({ results: search(e.data.query) }); 129 | } else { 130 | console.error("Worker - Unrecognized message: " + e); 131 | } 132 | }; 133 | } 134 | -------------------------------------------------------------------------------- /docs/sitemap.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | https://github.com/interactivereport/scaleSC/tree/main/ 5 | 2024-11-07 6 | 7 | 8 | https://github.com/interactivereport/scaleSC/tree/main/api-docs/ 9 | 2024-11-07 10 | 11 | 12 | https://github.com/interactivereport/scaleSC/tree/main/api-docs/harmonypy_gpu/ 13 | 2024-11-07 14 | 15 | 16 | https://github.com/interactivereport/scaleSC/tree/main/api-docs/kernels/ 17 | 2024-11-07 18 | 19 | 20 | https://github.com/interactivereport/scaleSC/tree/main/api-docs/pp/ 21 | 2024-11-07 22 | 23 | 24 | https://github.com/interactivereport/scaleSC/tree/main/api-docs/trim_merge_marker/ 25 | 2024-11-07 26 | 27 | 28 | https://github.com/interactivereport/scaleSC/tree/main/api-docs/util/ 29 | 2024-11-07 30 | 31 | -------------------------------------------------------------------------------- /docs/sitemap.xml.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/docs/sitemap.xml.gz -------------------------------------------------------------------------------- /docs/webfonts/fa-brands-400.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/docs/webfonts/fa-brands-400.ttf -------------------------------------------------------------------------------- /docs/webfonts/fa-brands-400.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/docs/webfonts/fa-brands-400.woff2 -------------------------------------------------------------------------------- /docs/webfonts/fa-regular-400.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/docs/webfonts/fa-regular-400.ttf -------------------------------------------------------------------------------- /docs/webfonts/fa-regular-400.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/docs/webfonts/fa-regular-400.woff2 -------------------------------------------------------------------------------- /docs/webfonts/fa-solid-900.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/docs/webfonts/fa-solid-900.ttf -------------------------------------------------------------------------------- /docs/webfonts/fa-solid-900.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/docs/webfonts/fa-solid-900.woff2 -------------------------------------------------------------------------------- /docs/webfonts/fa-v4compatibility.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/docs/webfonts/fa-v4compatibility.ttf -------------------------------------------------------------------------------- /docs/webfonts/fa-v4compatibility.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/docs/webfonts/fa-v4compatibility.woff2 -------------------------------------------------------------------------------- /img/pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/img/pipeline.png -------------------------------------------------------------------------------- /img/scalesc_overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/img/scalesc_overview.png -------------------------------------------------------------------------------- /img/scalesc_pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/img/scalesc_pipeline.png -------------------------------------------------------------------------------- /img/time_comp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/interactivereport/ScaleSC/876add3ba974401c5d760b89e5be3f9a712269e7/img/time_comp.png -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["setuptools>=75.0", "wheel"] 3 | build-backend = "setuptools.build_meta" 4 | 5 | [project] 6 | name = "scalesc" 7 | version = "0.1.0" 8 | description = "A GPU-accelerated tool for single-cell RNAseq analysis of extremely large dataset." 9 | keywords = [ "single cell", "GPU" ] 10 | readme = "README.md" 11 | authors = [ 12 | { name = "Haotian Zhang", email = "haotianzh@uconn.edu" }, 13 | { name = "Wenxing Hu", email = "wenxing.hu@biogen.com" }, 14 | ] 15 | license = { file = "LICENSE" } 16 | requires-python = ">=3.10" 17 | dependencies = [ 18 | "rapids-singlecell" 19 | ] 20 | 21 | [project.urls] 22 | "Homepage" = "https://github.com/interactivereport/scaleSC/tree/main" 23 | 24 | [tool.setuptools] 25 | packages = ["scalesc"] 26 | -------------------------------------------------------------------------------- /scalesc/__init__.py: -------------------------------------------------------------------------------- 1 | __all__ = ['ScaleSC', 2 | 'clusters_merge', 3 | 'find_markers', 4 | 'AnnDataBatchReader', 5 | 'check_nonnegative_integers', 6 | 'correct_leiden', 7 | 'write_to_disk', 8 | 'gc', 9 | '__version__'] 10 | __version__ = '0.1.0' 11 | 12 | import logging 13 | from scalesc.pp import ScaleSC 14 | from scalesc.util import AnnDataBatchReader, check_nonnegative_integers, correct_leiden, write_to_disk, gc 15 | from scalesc.trim_merge_marker import clusters_merge, find_markers 16 | 17 | logger = logging.getLogger("scaleSC") 18 | logger.setLevel(logging.DEBUG) 19 | ch = logging.StreamHandler() 20 | ch.setLevel(logging.DEBUG) 21 | formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s") 22 | ch.setFormatter(formatter) 23 | logger.addHandler(ch) 24 | -------------------------------------------------------------------------------- /scalesc/harmonypy_gpu.py: -------------------------------------------------------------------------------- 1 | # haotian 2 | # harmonypy - A data alignment algorithm. 3 | # Copyright (C) 2018 Ilya Korsunsky 4 | # 2019 Kamil Slowikowski 5 | # 2022 Severin Dicks 6 | # 7 | # This program is free software: you can redistribute it and/or modify 8 | # it under the terms of the GNU General Public License as published by 9 | # the Free Software Foundation, either version 3 of the License, or 10 | # (at your option) any later version. 11 | # 12 | # This program is distributed in the hope that it will be useful, 13 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 14 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15 | # GNU General Public License for more details. 16 | # 17 | # You should have received a copy of the GNU General Public License 18 | # along with this program. If not, see . 19 | from __future__ import annotations 20 | # from memory_profiler import profile 21 | # from test import get_usage 22 | import logging 23 | import tracemalloc 24 | import cupy as cp 25 | import numpy as np 26 | import pandas as pd 27 | from cuml import KMeans as KMeans_gpu 28 | from sklearn.cluster import KMeans as KMeans_cpu 29 | from sklearn.cluster import MiniBatchKMeans 30 | from sklearn.cluster import kmeans_plusplus 31 | from sklearn.utils import check_random_state 32 | from cupyx.scipy.sparse import csr_matrix as csr_matrix_cuda, csc_matrix as csc_matrix_cuda 33 | from scipy.sparse import csr_matrix, csc_matrix, coo_matrix 34 | from scipy.sparse import issparse, vstack 35 | 36 | # create logger 37 | logger = logging.getLogger("scaleSC") 38 | # logger.setLevel(logging.DEBUG) 39 | # ch = logging.StreamHandler() 40 | # ch.setLevel(logging.DEBUG) 41 | # formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s") 42 | # ch.setFormatter(formatter) 43 | # logger.addHandler(ch) 44 | 45 | # from IPython.core.debugger import set_trace 46 | 47 | def get_usage(s): 48 | pass 49 | 50 | def to_csr_cuda(x, dtype): 51 | """ Move to GPU as a csr_matrix. """ 52 | # if issparse(x): 53 | # return csr_matrix_cuda(x, dtype=dtype) 54 | return csr_matrix_cuda(csr_matrix(x, dtype=dtype)) 55 | 56 | def to_csc_cuda(x, dtype): 57 | """ Move to GPU as a csc_matrix, speed up column slice. """ 58 | # if issparse(x): 59 | # return csc_matrix_cuda(x, dtype=dtype) 60 | return csc_matrix_cuda(csc_matrix(x, dtype=dtype)) 61 | 62 | def get_dummies(x): 63 | """ Return a sparse dummy matrix. """ 64 | x = x.to_numpy().flatten() 65 | data = np.zeros(x.shape[0], dtype=np.float32) 66 | i = np.zeros(x.shape[0], dtype=int) 67 | j = np.zeros(x.shape[0], dtype=int) 68 | categories = pd.Categorical(x).categories.tolist() 69 | categories = {cat: p for p, cat in enumerate(categories)} 70 | for p in range(len(x)): 71 | q = categories[x[p]] 72 | data[p] = 1 73 | i[p] = q 74 | j[p] = p 75 | return coo_matrix((data, (i,j)), shape=(len(categories), len(x))) 76 | 77 | 78 | # @profile 79 | def run_harmony( 80 | data_mat: np.ndarray, 81 | meta_data: pd.DataFrame, 82 | vars_use, 83 | *, 84 | init_seeds=None, 85 | theta=None, 86 | lamb=None, 87 | sigma=0.1, 88 | nclust=None, 89 | tau=0, 90 | block_size=0.05, 91 | max_iter_harmony=10, 92 | max_iter_kmeans=20, 93 | epsilon_cluster=1e-5, 94 | epsilon_harmony=1e-4, 95 | plot_convergence=False, 96 | verbose=True, 97 | reference_values=None, 98 | cluster_prior=None, 99 | n_init=1, 100 | random_state=0, 101 | dtype=cp.float32, 102 | ): 103 | """Run Harmony.""" 104 | # init_seeds = None 105 | # theta = None 106 | # lamb = None 107 | # sigma = 0.1 108 | # nclust = None 109 | # tau = 0 110 | # block_size = 0.05 111 | # epsilon_cluster = 1e-5 112 | # epsilon_harmony = 1e-4 113 | # plot_convergence = False 114 | # verbose = True 115 | # reference_values = None 116 | # cluster_prior = None 117 | # random_state = 0 118 | get_usage('start') 119 | N = meta_data.shape[0] 120 | if data_mat.shape[1] != N: 121 | # force to shape (n_pc, n_data) 122 | data_mat = data_mat.T 123 | 124 | assert (data_mat.shape[1] == N), "data_mat and meta_data do not have the same number of cells" 125 | 126 | if nclust is None: 127 | nclust = np.min([np.round(N / 30.0), 100]).astype(int) 128 | 129 | if isinstance(sigma, float) and nclust > 1: 130 | sigma = np.repeat(sigma, nclust) 131 | 132 | if isinstance(vars_use, str): 133 | vars_use = [vars_use] 134 | 135 | # phi2 = pd.get_dummies(meta_data[vars_use]).to_numpy().T 136 | phi = get_dummies(meta_data[vars_use]) 137 | phi_n = meta_data[vars_use].describe().loc["unique"].to_numpy().astype(int) 138 | # print('phi_n', phi_n) 139 | if theta is None: 140 | theta = np.repeat([1] * len(phi_n), phi_n) 141 | elif isinstance(theta, float) or isinstance(theta, int): 142 | theta = np.repeat([theta] * len(phi_n), phi_n) 143 | elif len(theta) == len(phi_n): 144 | theta = np.repeat([theta], phi_n) 145 | 146 | assert len(theta) == np.sum(phi_n), "each batch variable must have a theta" 147 | 148 | if lamb is None: 149 | lamb = np.repeat([1] * len(phi_n), phi_n) 150 | elif isinstance(lamb, float) or isinstance(lamb, int): 151 | lamb = np.repeat([lamb] * len(phi_n), phi_n) 152 | elif len(lamb) == len(phi_n): 153 | lamb = np.repeat([lamb], phi_n) 154 | 155 | assert len(lamb) == np.sum(phi_n), "each batch variable must have a lambda" 156 | 157 | # Number of items in each category. 158 | 159 | N_b = np.asarray(phi.sum(axis=1)).reshape(-1) 160 | # N_b2 = phi2.sum(axis=1) 161 | # Proportion of items in each category. 162 | Pr_b = N_b / N 163 | if tau > 0: 164 | theta = theta * (1 - np.exp(-((N_b / (nclust * tau)) ** 2))) 165 | 166 | lamb_mat = np.diag(np.insert(lamb, 0, 0)) 167 | 168 | # phi_moe2 = np.vstack((np.repeat(1, N), phi2)) 169 | phi_moe = vstack((np.ones((1, N)), phi)) 170 | get_usage('before random') 171 | cp.random.seed(random_state) 172 | np.random.seed(random_state) 173 | get_usage('data_prepare') 174 | ho = Harmony( 175 | data_mat, 176 | init_seeds, 177 | n_init, 178 | phi, 179 | phi_moe, 180 | Pr_b, 181 | sigma, 182 | theta, 183 | max_iter_harmony, 184 | max_iter_kmeans, 185 | epsilon_cluster, 186 | epsilon_harmony, 187 | nclust, 188 | block_size, 189 | lamb_mat, 190 | verbose, 191 | random_state, 192 | dtype=dtype, 193 | ) 194 | 195 | # breakpoint() 196 | return ho 197 | 198 | 199 | class Harmony: 200 | # @profile 201 | def __init__( 202 | self, 203 | Z, 204 | init_seeds, 205 | n_init, 206 | Phi, 207 | Phi_moe, 208 | Pr_b, 209 | sigma, 210 | theta, 211 | max_iter_harmony, 212 | max_iter_kmeans, 213 | epsilon_kmeans, 214 | epsilon_harmony, 215 | K, 216 | block_size, 217 | lamb, 218 | verbose, 219 | random_state, 220 | dtype, 221 | ): 222 | 223 | self.Z_corr = cp.array(Z, dtype=dtype) 224 | self.Z_orig = cp.array(Z, dtype=dtype) 225 | 226 | self.Z_cos = self.Z_orig / self.Z_orig.max(axis=0) 227 | self.Z_cos = self.Z_cos / cp.linalg.norm(self.Z_cos, ord=2, axis=0) 228 | 229 | self.init_seeds = init_seeds 230 | self.n_init = n_init 231 | # self.Phi = cp.array(Phi, dtype=dtype) 232 | self.Phi = to_csc_cuda(Phi, dtype=dtype) 233 | # self.Phi_moe = cp.array(Phi_moe, dtype=dtype) 234 | self.Phi_moe = to_csr_cuda(Phi_moe, dtype=dtype) 235 | self.N = self.Z_corr.shape[1] 236 | self.Pr_b = cp.array(Pr_b, dtype=dtype) 237 | self.B = self.Phi.shape[0] # number of batch variables 238 | self.d = self.Z_corr.shape[0] 239 | self.window_size = 3 240 | self.epsilon_kmeans = epsilon_kmeans 241 | self.epsilon_harmony = epsilon_harmony 242 | 243 | self.lamb = cp.array(lamb, dtype=dtype) 244 | self.sigma = cp.array(sigma, dtype=dtype) 245 | self.sigma_prior = cp.array(sigma, dtype=dtype) 246 | self.block_size = block_size 247 | self.K = K # number of clusters 248 | self.max_iter_harmony = max_iter_harmony 249 | self.max_iter_kmeans = max_iter_kmeans 250 | self.verbose = verbose 251 | self.theta = cp.array(theta, dtype=dtype) 252 | # self.random_state = random_state 253 | self.random_state = check_random_state(random_state) # get a RNG instance 254 | 255 | self.objective_harmony = [] 256 | self.objective_kmeans = [] 257 | self.objective_kmeans_dist = [] 258 | self.objective_kmeans_entropy = [] 259 | self.objective_kmeans_cross = [] 260 | self.kmeans_rounds = [] 261 | self.dtype = dtype 262 | get_usage('copy to gpu') 263 | self.allocate_buffers() 264 | get_usage('allocate buffer') 265 | self.init_cluster() 266 | get_usage('init cluster') 267 | self.harmonize(self.max_iter_harmony, self.verbose) 268 | get_usage('iter') 269 | 270 | def result(self): 271 | return self.Z_corr.T.get() 272 | 273 | def allocate_buffers(self): 274 | # self._scale_dist = cp.zeros((self.K, self.N), dtype=self.dtype) 275 | self.dist_mat = cp.zeros((self.K, self.N), dtype=self.dtype) 276 | self.O = cp.zeros((self.K, self.B), dtype=self.dtype) 277 | self.E = cp.zeros((self.K, self.B), dtype=self.dtype) 278 | self.W = cp.zeros((self.B + 1, self.d), dtype=self.dtype) 279 | # self.Phi_Rk = cp.zeros((self.B + 1, self.N), dtype=self.dtype) 280 | self.Phi_Rk = to_csr_cuda(np.zeros((self.B + 1, self.N), dtype=self.dtype), dtype=self.dtype) # to csr 281 | 282 | 283 | def kmeans_multirestart(self): 284 | Z_cos_cpu = self.Z_cos.T.get() 285 | best_centers = None 286 | best_score = None 287 | for i in range(self.n_init): 288 | center, indices = kmeans_plusplus(Z_cos_cpu, n_clusters=self.K, random_state=self.random_state) 289 | kmeans_obj = KMeans_gpu(n_clusters=self.K, init=center, n_init=1, max_iter=25).fit(Z_cos_cpu) 290 | score = kmeans_obj.inertia_ 291 | if best_score is None: 292 | best_score = score 293 | else: 294 | if score < best_score: 295 | best_score = score 296 | best_centers = kmeans_obj.cluster_centers_ 297 | return best_centers 298 | 299 | # @profile 300 | def init_cluster(self): 301 | best_cluster_centers = None 302 | # if none then perform two-step correction as default 303 | if self.init_seeds is None or self.init_seeds == '2-step': 304 | best_cluster_centers = self.kmeans_multirestart() 305 | elif self.init_seeds == '1-step': 306 | Z_cos_cpu = self.Z_cos.T.get() 307 | kmeans_obj = KMeans_cpu(n_clusters=self.K, init='k-means++', n_init=self.n_init, max_iter=25, random_state=self.random_state).fit(Z_cos_cpu) 308 | best_cluster_centers = kmeans_obj.cluster_centers_ 309 | elif self.init_seeds == 'rapids': 310 | kmeans_obj = KMeans_gpu(n_clusters=self.K, random_state=0, init="k-means||", max_iter=25, n_init=self.n_init).fit(self.Z_cos.T) 311 | best_cluster_centers = kmeans_obj.cluster_centers_ 312 | 313 | # self.Y = kmeans_obj.cluster_centers_.T 314 | self.Y = cp.array(best_cluster_centers.T) 315 | # (1) Normalize 316 | self.Y = self.Y / cp.linalg.norm(self.Y, ord=2, axis=0) 317 | # (2) Assign cluster probabilities 318 | self.dist_mat = 2 * (1 - cp.dot(self.Y.T, self.Z_cos)) 319 | self.R = -self.dist_mat 320 | self.R = self.R / self.sigma[:, None] 321 | self.R -= cp.max(self.R, axis=0) 322 | self.R = cp.exp(self.R) 323 | self.R = self.R / cp.sum(self.R, axis=0) 324 | # (3) Batch diversity statistics 325 | self.E = cp.outer(cp.sum(self.R, axis=1), self.Pr_b) 326 | self.O = self.R @ self.Phi.T 327 | # self.O = cp.inner(self.R, self.Phi) 328 | self.compute_objective() 329 | # Save results 330 | self.objective_harmony.append(self.objective_kmeans[-1]) 331 | 332 | def compute_objective(self): 333 | get_usage('---s') 334 | kmeans_error = cp.sum(cp.multiply(self.R, self.dist_mat)) 335 | # Entropy 336 | _entropy = cp.sum(safe_entropy(self.R) * self.sigma[:, cp.newaxis]) 337 | # Cross Entropy 338 | x = self.R * self.sigma[:, cp.newaxis] 339 | y = cp.tile(self.theta[:, cp.newaxis], self.K).T 340 | z = cp.log((self.O + 1) / (self.E + 1)) 341 | # w = cp.dot(y * z, self.Phi) 342 | w = (y*z) @ self.Phi 343 | _cross_entropy = cp.sum(x * w) 344 | get_usage('---!') 345 | # Save results 346 | self.objective_kmeans.append(kmeans_error + _entropy + _cross_entropy) 347 | self.objective_kmeans_dist.append(kmeans_error) 348 | self.objective_kmeans_entropy.append(_entropy) 349 | self.objective_kmeans_cross.append(_cross_entropy) 350 | 351 | # @profile 352 | def harmonize(self, iter_harmony=10, verbose=True): 353 | converged = False 354 | for i in range(1, iter_harmony + 1): 355 | if verbose: 356 | logger.debug(f"Harmony: Iteration {i} of {iter_harmony}") 357 | # STEP 1: Clustering 358 | self.cluster() 359 | # STEP 2: Regress out covariates 360 | # self.moe_correct_ridge() 361 | self.Z_cos, self.Z_corr, self.W, self.Phi_Rk = moe_correct_ridge( 362 | self.Z_orig, 363 | self.Z_cos, 364 | self.Z_corr, 365 | self.R, 366 | self.W, 367 | self.K, 368 | self.Phi_Rk, 369 | self.Phi_moe, 370 | self.lamb, 371 | ) 372 | # STEP 3: Check for convergence 373 | converged = self.check_convergence(1) 374 | if converged: 375 | if verbose: 376 | logger.info( 377 | "Harmony: Converged after {} iteration{}".format(i, "s" if i > 1 else "") 378 | ) 379 | break 380 | if verbose and not converged: 381 | logger.info("Harmony: Stopped before convergence") 382 | return 0 383 | 384 | def cluster(self): 385 | get_usage('--cluster') 386 | # Z_cos has changed 387 | # R is assumed to not have changed 388 | # Update Y to match new integrated data 389 | self.dist_mat = 2 * (1 - cp.dot(self.Y.T, self.Z_cos)) 390 | for i in range(self.max_iter_kmeans): 391 | # print("kmeans {}".format(i)) 392 | # STEP 1: Update Y 393 | self.Y = cp.dot(self.Z_cos, self.R.T) 394 | self.Y = self.Y / cp.linalg.norm(self.Y, ord=2, axis=0) 395 | # STEP 2: Update dist_mat 396 | self.dist_mat = 2 * (1 - cp.dot(self.Y.T, self.Z_cos)) 397 | # STEP 3: Update R 398 | self.update_R() 399 | # STEP 4: Check for convergence 400 | self.compute_objective() 401 | if i > self.window_size: 402 | converged = self.check_convergence(0) 403 | if converged: 404 | break 405 | self.kmeans_rounds.append(i) 406 | self.objective_harmony.append(self.objective_kmeans[-1]) 407 | get_usage('--end cluster') 408 | return 0 409 | 410 | def update_R(self): 411 | get_usage('update R') 412 | # self._scale_dist = -self.dist_mat 413 | # self._scale_dist = self._scale_dist / self.sigma[:, None] 414 | # self._scale_dist -= cp.max(self._scale_dist, axis=0) 415 | # self._scale_dist = cp.exp(self._scale_dist) 416 | _scale_dist = -self.dist_mat 417 | _scale_dist = _scale_dist / self.sigma[:, None] 418 | _scale_dist -= cp.max(_scale_dist, axis=0) 419 | _scale_dist = cp.exp(_scale_dist) 420 | 421 | # Update cells in blocks 422 | # update_order = cp.arange(self.N) 423 | # cp.random.shuffle(update_order) 424 | update_order = np.arange(self.N) 425 | np.random.shuffle(update_order) 426 | update_order = cp.array(update_order) 427 | # print(update_order) 428 | n_blocks = cp.ceil(1 / self.block_size).astype(int) 429 | blocks = cp.array_split(update_order, int(n_blocks)) 430 | for b in blocks: 431 | # print('b', b) 432 | # print('R', self.R.shape) 433 | # print('Phi', self.Phi.shape) 434 | # STEP 1: Remove cells 435 | self.E -= cp.outer(cp.sum(self.R[:, b], axis=1), self.Pr_b) 436 | # self.O -= cp.dot(self.R[:, b], self.Phi[:, b].T) 437 | self.O -= self.R[:, b] @ self.Phi[:, b].T 438 | # STEP 2: Recompute R for removed cells 439 | # self.R[:, b] = self._scale_dist[:, b] 440 | self.R[:, b] = _scale_dist[:, b] 441 | self.R[:, b] = cp.multiply( 442 | self.R[:, b], 443 | cp.dot( 444 | cp.power((self.E + 1) / (self.O + 1), self.theta), self.Phi[:, b].toarray() 445 | ), 446 | ) 447 | self.R[:, b] = self.R[:, b] / cp.linalg.norm(self.R[:, b], ord=1, axis=0) 448 | # STEP 3: Put cells back 449 | self.E += cp.outer(cp.sum(self.R[:, b], axis=1), self.Pr_b) 450 | # self.O += cp.dot(self.R[:, b], self.Phi[:, b].T) 451 | self.O += self.R[:, b] @ self.Phi[:, b].T 452 | get_usage('end update R') 453 | return 0 454 | 455 | def check_convergence(self, i_type): 456 | obj_old = 0.0 457 | obj_new = 0.0 458 | # Clustering, compute new window mean 459 | if i_type == 0: 460 | okl = len(self.objective_kmeans) 461 | for i in range(self.window_size): 462 | obj_old += self.objective_kmeans[okl - 2 - i] 463 | obj_new += self.objective_kmeans[okl - 1 - i] 464 | if abs(obj_old - obj_new) / abs(obj_old) < self.epsilon_kmeans: 465 | return True 466 | return False 467 | # Harmony 468 | if i_type == 1: 469 | obj_old = self.objective_harmony[-2] 470 | obj_new = self.objective_harmony[-1] 471 | if (obj_old - obj_new) / abs(obj_old) < self.epsilon_harmony: 472 | return True 473 | return False 474 | return True 475 | 476 | 477 | def safe_entropy(x: cp.array): 478 | y = cp.multiply(x, cp.log(x)) 479 | y[~cp.isfinite(y)] = 0.0 480 | return y 481 | 482 | 483 | # def moe_correct_ridge(Z_orig, Z_cos, Z_corr, R, W, K, Phi_Rk, Phi_moe, lamb): 484 | # Z_corr = Z_orig.copy() 485 | # for i in range(K): 486 | # Phi_Rk = cp.multiply(Phi_moe, R[i, :]) 487 | # x = cp.dot(Phi_Rk, Phi_moe.T) + lamb 488 | # W = cp.dot(cp.dot(cp.linalg.inv(x), Phi_Rk), Z_orig.T) 489 | # W[0, :] = 0 # do not remove the intercept 490 | # Z_corr -= cp.dot(W.T, Phi_Rk) 491 | # Z_cos = Z_corr / cp.linalg.norm(Z_corr, ord=2, axis=0) 492 | # return Z_cos, Z_corr, W, Phi_Rk 493 | 494 | def moe_correct_ridge(Z_orig, Z_cos, Z_corr, R, W, K, Phi_Rk, Phi_moe, lamb): 495 | get_usage('moe start') 496 | Z_corr = Z_orig.copy() 497 | for i in range(K): 498 | # Phi_Rk = cp.multiply(Phi_moe, R[i, :]) 499 | Phi_Rk = Phi_moe.multiply(R[i, :]) 500 | x = (Phi_Rk @ Phi_moe.T) + lamb 501 | # x = cp.dot(Phi_Rk, Phi_moe.T) + lamb 502 | # W = (cp.linalg.inv(x) @ Phi_Rk).dot(Z_orig.T) 503 | # W = (csr_matrix_cuda(cp.linalg.inv(x)) @ Phi_Rk).dot(Z_orig.T) 504 | # W = (cp.linalg.inv(x) @ Phi_Rk) @ Z_orig.T 505 | 506 | W = cp.linalg.inv(x) @ (Phi_Rk @ Z_orig.T) 507 | # W = cp.dot(cp.dot(cp.linalg.inv(x), Phi_Rk), Z_orig.T) 508 | W[0, :] = 0 # do not remove the intercept 509 | 510 | Z_corr -= W.T @ Phi_Rk 511 | # Z_corr -= cp.dot(W.T, Phi_Rk) 512 | Z_cos = Z_corr / cp.linalg.norm(Z_corr, ord=2, axis=0) 513 | get_usage('moe end ') 514 | return Z_cos, Z_corr, W, Phi_Rk -------------------------------------------------------------------------------- /scalesc/kernels.py: -------------------------------------------------------------------------------- 1 | from __future__ import annotations 2 | 3 | import cupy as cp 4 | from cuml.common.kernel_utils import cuda_kernel_factory 5 | 6 | get_mean_var_major_kernel = r""" 7 | (const int *indptr,const int *index,const {0} *data, 8 | double* means,double* vars, 9 | int major, int minor) { 10 | int major_idx = blockIdx.x; 11 | if(major_idx >= major){ 12 | return; 13 | } 14 | int start_idx = indptr[major_idx]; 15 | int stop_idx = indptr[major_idx+1]; 16 | 17 | __shared__ double mean_place[64]; 18 | __shared__ double var_place[64]; 19 | 20 | mean_place[threadIdx.x] = 0.0; 21 | var_place[threadIdx.x] = 0.0; 22 | __syncthreads(); 23 | 24 | for(int minor_idx = start_idx+threadIdx.x; minor_idx < stop_idx; minor_idx+= blockDim.x){ 25 | double value = (double)data[minor_idx]; 26 | mean_place[threadIdx.x] += value; 27 | var_place[threadIdx.x] += value*value; 28 | } 29 | __syncthreads(); 30 | 31 | for (unsigned int s = blockDim.x / 2; s > 0; s >>= 1) { 32 | if (threadIdx.x < s) { 33 | mean_place[threadIdx.x] += mean_place[threadIdx.x + s]; 34 | var_place[threadIdx.x] += var_place[threadIdx.x + s]; 35 | } 36 | __syncthreads(); // Synchronize at each step of the reduction 37 | } 38 | if (threadIdx.x == 0) { 39 | means[major_idx] = mean_place[threadIdx.x]; 40 | vars[major_idx] = var_place[threadIdx.x]; 41 | } 42 | 43 | } 44 | """ 45 | 46 | """ 47 | Modified: return sum(x) and sq_sum(x) 48 | """ 49 | get_mean_var_minor_kernel = r""" 50 | (const int *index,const {0} *data, 51 | double* sums, double* sq_sums, 52 | int major, int nnz) { 53 | int idx = blockDim.x * blockIdx.x + threadIdx.x; 54 | if(idx >= nnz){ 55 | return; 56 | } 57 | double value = (double) data[idx]; 58 | int minor_pos = index[idx]; 59 | atomicAdd(&sums[minor_pos], value); 60 | atomicAdd(&sq_sums[minor_pos], value*value); 61 | } 62 | """ 63 | 64 | find_indices_kernel = r""" 65 | extern "C" __global__ void find_indices(const long int* A, const long int* indptr, int* result, long int N, int M) { 66 | int idx = blockIdx.x * blockDim.x + threadIdx.x; 67 | 68 | if (idx < N) { 69 | long int value = A[idx]; 70 | long int left = 0; 71 | long int right = M - 1; 72 | 73 | while (left < right) { 74 | int mid = left + (right - left) / 2; 75 | if (indptr[mid] <= value) { 76 | left = mid + 1; 77 | } else { 78 | right = mid; 79 | } 80 | } 81 | if (left > 0 && indptr[left - 1] <= value && left < M) { 82 | result[idx] = left - 1; 83 | } else { 84 | result[idx] = -1; 85 | } 86 | } 87 | } 88 | """ 89 | 90 | check_in_cols_kernel = cp.ElementwiseKernel( 91 | 'int32 index, int32 cols_size, raw int32 cols', 92 | 'bool is_in', 93 | ''' 94 | is_in = false; 95 | for (int i = 0; i < cols_size; i++) { 96 | if (index == cols[i]) { 97 | is_in = true; 98 | break; 99 | } 100 | } 101 | ''', 102 | 'check_in_cols' 103 | ) 104 | 105 | 106 | csr_row_index_kernel = cp.ElementwiseKernel( 107 | 'int32 out_rows, raw I rows, ' 108 | 'raw int64 Ap, raw int32 Aj, raw T Ax, raw int64 Bp', 109 | 'int32 Bj, T Bx', 110 | ''' 111 | const I row = rows[out_rows]; 112 | 113 | // Look up starting offset 114 | const I starting_output_offset = Bp[out_rows]; 115 | const I output_offset = i - starting_output_offset; 116 | const I starting_input_offset = Ap[row]; 117 | 118 | Bj = Aj[starting_input_offset + output_offset]; 119 | Bx = Ax[starting_input_offset + output_offset]; 120 | ''', 'cupyx_scipy_sparse_csr_row_index_ker') 121 | 122 | 123 | 124 | sq_sum = cp.ReductionKernel( 125 | "T x", # input params 126 | "float64 y", # output params 127 | "x * x", # map 128 | "a + b", # reduce 129 | "y = a", # post-reduction map 130 | "0", # identity value 131 | "sqsum64", # kernel name 132 | ) 133 | 134 | mean_sum = cp.ReductionKernel( 135 | "T x", # input params 136 | "float64 y", # output params 137 | "x", # map 138 | "a + b", # reduce 139 | "y = a", # post-reduction map 140 | "0", # identity value 141 | "sum64", # kernel name 142 | ) 143 | 144 | 145 | seurat_v3_elementwise_kernel = cp.ElementwiseKernel( 146 | "T data, S idx, raw D clip_val", 147 | "raw D sq_sum, raw D sum", 148 | """ 149 | D element = min((double)data, clip_val[idx]); 150 | atomicAdd(&sq_sum[idx], element * element); 151 | atomicAdd(&sum[idx], element); 152 | """, 153 | "seurat_v3_elementwise_kernel", 154 | no_return=True, 155 | ) 156 | 157 | 158 | sum_sign_elementwise_kernel = cp.ElementwiseKernel( 159 | "T data, S idx", 160 | "raw D sum", 161 | """ 162 | if (data > 0){ 163 | atomicAdd(&sum[idx], 1); 164 | }else{ 165 | atomicAdd(&sum[idx], 0); 166 | } 167 | """, 168 | "sum_sign_elementwise_kernel", 169 | no_return=True, 170 | ) 171 | 172 | def get_mean_var_major(dtype): 173 | return cuda_kernel_factory( 174 | get_mean_var_major_kernel, (dtype,), "get_mean_var_major_kernel" 175 | ) 176 | 177 | 178 | def get_mean_var_minor(dtype): 179 | return cuda_kernel_factory( 180 | get_mean_var_minor_kernel, (dtype,), "get_mean_var_minor_kernel" 181 | ) 182 | 183 | 184 | def get_find_indices(): 185 | return cp.RawKernel(find_indices_kernel, 'find_indices') 186 | 187 | -------------------------------------------------------------------------------- /scalesc/pp.py: -------------------------------------------------------------------------------- 1 | import os 2 | import logging 3 | import warnings 4 | import cupy as cp 5 | import numpy as np 6 | import scanpy as sc 7 | import rapids_singlecell as rsc 8 | from skmisc.loess import loess 9 | from time import time 10 | from scalesc import util 11 | from scalesc.kernels import * 12 | 13 | 14 | class ScaleSC(): 15 | """ScaleSC integrated pipeline in a scanpy-like style. 16 | 17 | It will automatcially load dataset in chunks, see `scalesc.util.AnnDataBatchReader` 18 | for details, and all methods in this class manipulate this chunked data. 19 | 20 | Args: 21 | 22 | data_dir (`str`): Data folder of the dataset. 23 | max_cell_batch (`int`): Maximum number of cells in a single batch. 24 | Default: 100000. 25 | preload_on_cpu (`bool`): If load the entire chunked data on CPU. Default: `True` 26 | preload_on_gpu (`bool`): If load the entire chunked data on GPU, `preload_on_cpu` 27 | will be overwritten to `True` when this sets to `True`. Default: `True`. 28 | save_raw_counts (`bool`): If save `adata_X` to disk after QC filtering. 29 | Default: False. 30 | save_norm_counts (`bool`): If save `adata_X` data to disk after normalization. 31 | Default: False. 32 | save_after_each_step (`bool`): If save `adata` (without .X) to disk after each step. 33 | Default: False. 34 | output_dir (`str`): Output folder. Default: './results'. 35 | gpus (`list`): List of GPU ids, `[0]` is set if this is None. Default: None. 36 | """ 37 | def __init__(self, data_dir, 38 | max_cell_batch=1e5, 39 | preload_on_cpu=True, 40 | preload_on_gpu=True, 41 | save_raw_counts=False, 42 | save_norm_counts=False, 43 | save_after_each_step=False, 44 | output_dir='results', 45 | gpus=None): 46 | self.data_dir = data_dir 47 | self.max_cell_batch = max_cell_batch 48 | self.preload_on_cpu = preload_on_cpu 49 | self.preload_on_gpu = preload_on_gpu 50 | if preload_on_gpu: 51 | self.preload_on_cpu = True 52 | self.save_raw_counts = save_raw_counts 53 | self.save_norm_counts = save_norm_counts 54 | self.save_after_each_step = save_after_each_step 55 | self.output_dir = output_dir 56 | self.gpus = gpus 57 | self.data_name = data_dir.split('/')[-1] 58 | self.norm = False 59 | self._init() 60 | 61 | @property 62 | def adata(self): 63 | """`AnnData`: An AnnData object that used to store all intermediate results 64 | without the count matrix. 65 | 66 | Note: This is always on CPU. 67 | """ 68 | assert self.preload_on_cpu and not self.reader.have_looped_once, "adata hasn't been created, call 'batchify()' once to initialize it." 69 | return self.reader.adata 70 | 71 | @property 72 | def adata_X(self): 73 | """`AnnData`: An `AnnData` object that used to store all intermediate results 74 | including the count matrix. Internally, all chunks should be merged on CPU to avoid 75 | high GPU consumption, make sure to invoke `to_CPU()` before calling this object. 76 | """ 77 | return self.reader.get_merged_adata_with_X() 78 | 79 | def to_GPU(self): 80 | """Move all chunks to GPU.""" 81 | self.reader.batch_to_GPU() 82 | 83 | def to_CPU(self): 84 | """Move all chunks to CPU.""" 85 | self.reader.batch_to_CPU() 86 | 87 | def clear(self): 88 | """Clean the memory""" 89 | self.reader.clear() 90 | 91 | def _init(self): 92 | assert os.path.exists(self.data_dir), "Data dir is not existed. Please double check and make sure samples have already been split." 93 | # TODO: walk dir and get size 94 | self.reader = util.AnnDataBatchReader(data_dir=self.data_dir, 95 | preload_on_cpu=self.preload_on_cpu, 96 | preload_on_gpu=self.preload_on_gpu, 97 | gpus=self.gpus, 98 | max_cell_batch=self.max_cell_batch) 99 | 100 | def calculate_qc_metrics(self): 101 | """Calculate quality control metrics.""" 102 | assert self.preload_on_cpu, "Run in mode with preload_on_cpu as False, terminated." 103 | for d in self.reader.batchify(axis='cell'): 104 | rsc.pp.calculate_qc_metrics(d) 105 | 106 | def filter_genes(self, min_count=0, max_count=None, qc_var='n_cells_by_counts', qc=False): 107 | """Filter genes based on number of a QC metric. 108 | 109 | Args: 110 | min_count (`int`): Minimum number of counts required for a gene to pass filtering. 111 | max_count (`int`): Maximum number of counts required for a gene to pass filtering. 112 | qc_var (`str`='n_cells_by_counts'): Feature in QC metrics that used to filter genes. 113 | qc (`bool`=`False`): Call `calculate_qc_metrics` before filtering. 114 | """ 115 | if qc: 116 | self.calculate_qc_metrics() 117 | genes_counts = [] 118 | num_cells = 0 119 | for d in self.reader.batchify(axis='cell'): 120 | num_cells += d.shape[0] 121 | genes_counts.append(d.var[qc_var]) 122 | genes_total_counts = np.sum(genes_counts, axis=0) 123 | if max_count is None: 124 | max_count = num_cells 125 | genes_filter = (genes_total_counts >= min_count) & (genes_total_counts <= max_count) 126 | self.reader.set_genes_filter(genes_filter, update=True if self.preload_on_cpu else False) 127 | 128 | def filter_cells(self, min_count=0, max_count=None, qc_var='n_genes_by_counts', qc=False): 129 | """Filter genes based on number of a QC metric. 130 | 131 | Args: 132 | min_count (`int`): Minimum number of counts required for a cell to pass filtering. 133 | max_count (`int`): Maximum number of counts required for a cell to pass filtering. 134 | qc_var (`str`='n_genes_by_counts'): Feature in QC metrics that used to filter cells. 135 | qc (`bool`=`False`): Call `calculate_qc_metrics` before filtering. 136 | """ 137 | if qc: 138 | self.calculate_qc_metrics() 139 | cells_filter = [] 140 | cell_names = [] 141 | for d in self.reader.batchify(axis='cell'): 142 | cells_index = util.filter_cells(d, qc_var=qc_var, min_count=min_count, max_count=max_count) 143 | cells_filter.append(cells_index) 144 | cell_names += d.obs.index[cells_index].tolist() 145 | self.reader.set_cells_filter(cells_filter, update=True if self.preload_on_cpu else False) 146 | self.cell_names = cell_names 147 | 148 | def filter_genes_and_cells(self, min_counts_per_gene=0, min_counts_per_cell=0, 149 | max_counts_per_gene=None, max_counts_per_cell=None, 150 | qc_var_gene='n_cells_by_counts', qc_var_cell='n_genes_by_counts', 151 | qc=False): 152 | """Filter genes based on number of a QC metric. 153 | 154 | Note: 155 | This is an efficient way to perform a regular filtering on genes and cells without 156 | repeatedly iterating over chunks. 157 | 158 | Args: 159 | min_counts_per_gene (`int`): Minimum number of counts required for a gene to pass filtering. 160 | max_counts_per_gene (`int`): Maximum number of counts required for a gene to pass filtering. 161 | qc_var_gene (`str`='n_cells_by_counts'): Feature in QC metrics that used to filter genes. 162 | min_counts_per_cell (`int`): Minimum number of counts required for a cell to pass filtering. 163 | max_counts_per_cell (`int`): Maximum number of counts required for a cell to pass filtering. 164 | qc_var_cell (`str`='n_genes_by_counts'): Feature in QC metrics that used to filter cells. 165 | qc (`bool`=`False`): Call `calculate_qc_metrics` before filtering. 166 | """ 167 | if qc: 168 | self.calculate_qc_metrics() 169 | num_cells = 0 170 | cells_filter = [] 171 | genes_counts = [] 172 | for d in self.reader.batchify(axis='cell'): 173 | rsc.pp.calculate_qc_metrics(d) 174 | cells_index = util.filter_cells(d, qc_var=qc_var_cell, min_count=min_counts_per_cell, max_count=max_counts_per_cell) 175 | num_cells += d.shape[0] 176 | cells_filter.append(cells_index) 177 | genes_counts.append(d.var[qc_var_gene]) 178 | if max_counts_per_gene is None: 179 | max_counts_per_gene = num_cells 180 | genes_total_counts = np.sum(genes_counts, axis=0) 181 | genes_filter = (genes_total_counts >= min_counts_per_gene) & (genes_total_counts <= max_counts_per_gene) 182 | self.reader.set_cells_filter(cells_filter, update=True if self.preload_on_cpu else False) 183 | self.reader.set_genes_filter(genes_filter, update=True if self.preload_on_cpu else False) 184 | 185 | 186 | def highly_variable_genes(self, n_top_genes=4000, method='seurat_v3'): 187 | """Annotate highly variable genes. 188 | 189 | Note: 190 | Only `seurat_v3` is implemented. Count data is expected for `seurat_v3`. 191 | HVGs are set to `True` in `adata.var['highly_variable']`. 192 | 193 | Args: 194 | n_top_genes (`int`=`4000`): Number of highly-variable genes to keep. 195 | method (`str`=`'seurat_v3'`): Choose the flavor for identifying highly variable genes. 196 | """ 197 | valid_methods = ['seurat_v3'] 198 | assert method in valid_methods, NotImplementedError("only seurat_v3 has been implemented yet.") 199 | N, M = self.reader.shape 200 | _sum_x = cp.zeros([M], dtype=cp.float64) 201 | _sum_x_sq = cp.zeros([M], dtype=cp.float64) 202 | for d in self.reader.batchify(axis='cell'): 203 | X_batch = d.X 204 | x_sum, x_sq_sum = util.get_mean_var(X_batch, axis=0) 205 | _sum_x += x_sum 206 | _sum_x_sq += x_sq_sum 207 | mean = _sum_x / N 208 | var = (_sum_x_sq / N - mean**2) * N / (N-1) 209 | estimate_var = cp.zeros(M, dtype=cp.float64) 210 | x = cp.log10(mean[var > 0]) 211 | y = cp.log10(var[var > 0]) 212 | model = loess(x.get(), y.get(), span=0.3, degree=2) # fix span and degree here 213 | model.fit() 214 | estimate_var[var > 0] = model.outputs.fitted_values 215 | std = cp.sqrt(10**estimate_var) 216 | clip_val = std * cp.sqrt(N) + mean 217 | squared_batch_counts_sum = cp.zeros(clip_val.shape, dtype=cp.float64) 218 | batch_counts_sum = cp.zeros(clip_val.shape, dtype=cp.float64) 219 | for d in self.reader.batchify(axis='cell'): 220 | batch_counts = d.X 221 | x_sq = cp.zeros_like(squared_batch_counts_sum, dtype=cp.float64) 222 | x = cp.zeros_like(batch_counts_sum, dtype=cp.float64) 223 | seurat_v3_elementwise_kernel(batch_counts.data, batch_counts.indices, clip_val, x_sq, x) 224 | squared_batch_counts_sum += x_sq 225 | batch_counts_sum += x 226 | """ 227 | ** is not correct here 228 | z = (x-m) / s 229 | var(z) = E[z^2] - E[z]^2 230 | E[z^2] = E[x^2 - 2xm + m^2] / s^2 231 | E[z] = E[x-m] / s 232 | x is the truncated value x by \sqrt N. m is the mean before trunction, s is the estimated std 233 | E[z]^2 is supposed to be close to 0. 234 | """ 235 | e_z_sq = (1 / ((N - 1) * cp.square(std))) *\ 236 | (N*cp.square(mean) + squared_batch_counts_sum - 2*batch_counts_sum*mean) 237 | e_sq_z = (1 / cp.square(std) / (N-1)**2) *\ 238 | cp.square((squared_batch_counts_sum - N*mean)) 239 | norm_gene_var = e_z_sq 240 | ranked_norm_gene_vars = cp.argsort(cp.argsort(-norm_gene_var)) 241 | self.genes_hvg_filter = (ranked_norm_gene_vars < n_top_genes).get() 242 | self.adata.var['highly_variable'] = self.genes_hvg_filter 243 | # reader.set_genes_filter(genes_hvg_filter, update=False) # do not update data, since normalization needs to be performed on all genes after filtering. 244 | 245 | def normalize_log1p(self, target_sum=1e4): 246 | """Normalize counts per cell then log1p. 247 | 248 | Note: 249 | If `save_raw_counts` or `save_norm_counts` is set, write `adata_X` to disk here automatically. 250 | 251 | Args: 252 | target_sum (`int`=`1e4`): If None, after normalization, each observation (cell) has a total count 253 | equal to the median of total counts for observations (cells) before normalization. 254 | """ 255 | assert self.preload_on_cpu, "count matrix manipulation is disabled when preload_on_cpu is False, call 'normalize_log1p_pca' to perform PCA. " 256 | for i, d in enumerate(self.reader.batchify(axis='cell')): # the first loop is used to calculate mean and X.TX 257 | if self.save_raw_counts: 258 | util.write_to_disk(d, output_dir=f'{self.output_dir}/raw_counts', data_name=self.data_name, batch_name=f'batch_{i}') 259 | rsc.pp.normalize_total(d, target_sum=target_sum) 260 | rsc.pp.log1p(d) 261 | if self.save_norm_counts: 262 | util.write_to_disk(d, output_dir=f'{self.output_dir}/norm_counts', data_name=self.data_name, batch_name=f'batch_{i}') 263 | self.norm = True 264 | 265 | def pca(self, n_components=50, hvg_var='highly_variable'): 266 | """Principal component analysis. 267 | 268 | Computes PCA coordinates, loadings and variance decomposition. Uses the implementation of scikit-learn. 269 | 270 | Note: 271 | Flip the directions according to the largest values in loadings. Results will match up with 272 | scanpy perfectly. Calculated PCA matrix is stored in `adata.obsm['X_pca']`. 273 | 274 | Args: 275 | n_components (`int`=`50`): Number of principal components to compute. 276 | hvg_var (`str`=`'highly_variable'`): Use highly variable genes only. 277 | """ 278 | if not self.norm: 279 | warnings.warn("data may haven't been normalized.") 280 | N, M = self.reader.shape 281 | genes_hvg_filter = self.adata.var[hvg_var].values 282 | n_top_genes = sum(self.adata.var['highly_variable']) 283 | cov = cp.zeros((n_top_genes, n_top_genes), dtype=cp.float64) 284 | s = cp.zeros((1, n_top_genes), dtype=cp.float64) 285 | for d in self.reader.batchify(axis='cell'): 286 | d = d[:, genes_hvg_filter].copy() # use all genes to normalize instead of hvgs 287 | X = d.X.toarray() 288 | cov += cp.dot(X.T, X) 289 | s += X.sum(axis=0, dtype=cp.float64) 290 | m = s / N 291 | cov_norm = cov - cp.dot(m.T, s) - cp.dot(s.T, m) + cp.dot(m.T, m)*N 292 | eigvecs = cp.linalg.eigh(cov_norm)[1][:, :-n_components-1:-1] # eig values is acsending, eigvecs[:, i] corresponds to the i-th eigvec 293 | eigvecs = util.svd_flip(eigvecs) 294 | X_pca = cp.zeros([N, n_components], dtype=cp.float64) 295 | start_index = 0 296 | for d in self.reader.batchify(axis='cell'): # the second loop is used to obtain PCA projection 297 | d = d[:, genes_hvg_filter].copy() 298 | X = d.X.toarray() 299 | X_pca_batch = (X-m) @ eigvecs 300 | end_index = min(start_index+X_pca_batch.shape[0], N) 301 | X_pca[start_index:end_index] = X_pca_batch 302 | start_index = end_index 303 | X_pca_cpu = X_pca.get() 304 | # self.reader.set_genes_filter(genes_hvg_filter) # can set or not 305 | self.adata.obsm['X_pca'] = X_pca_cpu 306 | 307 | def normalize_log1p_pca(self, target_sum=1e4, n_components=50, hvg_var='highly_variable'): 308 | """An alternative for calling `normalize_log1p` and `pca` together. 309 | 310 | Note: 311 | Used when `preload_on_cpu` is `False`. 312 | """ 313 | if not self.norm: 314 | warnings.warn("data may haven't been normalized.") 315 | N, M = self.reader.shape 316 | genes_hvg_filter = self.adata.var[hvg_var].values 317 | n_top_genes = sum(self.adata.var['highly_variable']) 318 | cov = cp.zeros((n_top_genes, n_top_genes), dtype=cp.float64) 319 | s = cp.zeros((1, n_top_genes), dtype=cp.float64) 320 | for i, d in enumerate(self.reader.batchify(axis='cell')): 321 | if self.save_raw_counts: 322 | util.write_to_disk(d, output_dir=f'{self.output_dir}/raw_counts', data_name=self.data_name, batch_name=f'batch_{i}') 323 | rsc.pp.normalize_total(d, target_sum=target_sum) 324 | rsc.pp.log1p(d) 325 | if self.save_norm_counts: 326 | util.write_to_disk(d, output_dir=f'{self.output_dir}/norm_counts', data_name=self.data_name, batch_name=f'batch_{i}') 327 | d = d[:, genes_hvg_filter].copy() # use all genes to normalize instead of hvgs 328 | X = d.X.toarray() 329 | cov += cp.dot(X.T, X) 330 | s += X.sum(axis=0, dtype=cp.float64) 331 | m = s / N 332 | cov_norm = cov - cp.dot(m.T, s) - cp.dot(s.T, m) + cp.dot(m.T, m)*N 333 | eigvecs = cp.linalg.eigh(cov_norm)[1][:, :-n_components-1:-1] # eig values is acsending, eigvecs[:, i] corresponds to the i-th eigvec 334 | eigvecs = util.svd_flip(eigvecs) 335 | X_pca = cp.zeros([N, n_components], dtype=cp.float64) 336 | start_index = 0 337 | for d in self.reader.batchify(axis='cell'): # the second loop is used to obtain PCA projection 338 | if not self.preload_on_cpu: 339 | rsc.pp.normalize_total(d, target_sum=target_sum) 340 | rsc.pp.log1p(d) 341 | d = d[:, genes_hvg_filter].copy() 342 | X = d.X.toarray() 343 | X_pca_batch = (X-m) @ eigvecs 344 | end_index = min(start_index+X_pca_batch.shape[0], N) 345 | X_pca[start_index:end_index] = X_pca_batch 346 | start_index = end_index 347 | X_pca_cpu = X_pca.get() 348 | self.reader.set_genes_filter(genes_hvg_filter) 349 | self.adata.obsm['X_pca'] = X_pca_cpu 350 | 351 | def harmony(self, sample_col_name, n_init=10, max_iter_harmony=20): 352 | """Use Harmony to integrate different experiments. 353 | 354 | Note: 355 | This modified harmony function can easily scale up to 15M cells with 50 pcs on GPU (A100 80G). 356 | Result after harmony is stored into `adata.obsm['X_pca_harmony']`. 357 | 358 | Args: 359 | sample_col_name (`str`): Column of sample ID. 360 | n_init (`int`=`10`): Number of times the k-means algorithm is run with different centroid seeds. 361 | max_iter_harmony (`int`=`20`): Maximum iteration number of harmony. 362 | """ 363 | util.harmony(self.adata, key=sample_col_name, init_seeds='2-step', n_init=n_init, max_iter_harmony=max_iter_harmony) 364 | if self.save_after_each_step: 365 | self.save(data_name=f'{self.data_name}_after_harmony') 366 | 367 | def neighbors(self, n_neighbors=20, n_pcs=50, use_rep='X_pac_harmony', algorithm='cagra'): 368 | """Compute a neighborhood graph of observations using `rapids-singlecell`. 369 | 370 | Args: 371 | n_neighbors (`int`=`20`): The size of local neighborhood (in terms of number of neighboring data points) 372 | used for manifold approximation. 373 | n_pcs (`int`=`50`): Use this many PCs. 374 | use_rep (`str`=`'X_pca_harmony'`): Use the indicated representation. 375 | algorithm (`str`=`'cagra'`): The query algorithm to use. 376 | """ 377 | rsc.pp.neighbors(self.adata, n_neighbors=n_neighbors, n_pcs=n_pcs, use_rep=use_rep, algorithm=algorithm) 378 | if self.save_after_each_step: 379 | self.save(data_name=f'{self.data_name}_after_neighbor') 380 | 381 | def leiden(self, resolution=0.5, random_state=42): 382 | """Performs Leiden clustering using `rapids-singlecell`. 383 | 384 | Args: 385 | resolution (`float`=`0.5`): A parameter value controlling the coarseness of the clustering. 386 | (called gamma in the modularity formula). Higher values lead to more clusters. 387 | random_state (`int`=`42`): Random seed. 388 | """ 389 | rsc.tl.leiden(self.adata, resolution=resolution, random_state=random_state) 390 | util.correct_leiden(self.adata) 391 | if self.save_after_each_step: 392 | self.save(data_name=f'{self.data_name}_after_leiden') 393 | 394 | def umap(self, random_state=42): 395 | """Embed the neighborhood graph using `rapids-singlecell`. 396 | 397 | Args: 398 | random_state (`int`=`42`): Random seed. 399 | """ 400 | rsc.tl.umap(self.adata, random_state=random_state) 401 | if self.save_after_each_step: 402 | self.save(data_name=f'{self.data_name}_after_umap') 403 | 404 | def save(self, data_name=None): 405 | """Save `adata` to disk. 406 | 407 | Note: 408 | Save to '`output_dir`/`data_name`.h5ad'. 409 | 410 | Args: 411 | data_name (`str`): If `None`, set as `data_dir`. 412 | """ 413 | if data_name is None: 414 | data_name = self.data_name 415 | util.write_to_disk(adata=self.adata, output_dir=self.output_dir, data_name=data_name) 416 | 417 | def savex(self, name, data_name=None): 418 | """Save `adata` to disk in chunks. 419 | 420 | Note: 421 | Each chunk will be saved individually in a subfolder under `output_dir`. 422 | Save to '`output_dir`/`name`/`data_name`_`i`.h5ad'. 423 | 424 | Args: 425 | name (`str`): Subfolder name. 426 | data_name (`str`): If `None`, set as `data_dir`. 427 | """ 428 | if data_name is None: 429 | data_name = self.data_name 430 | for i, d in enumerate(self.reader.batchify(axis='cell')): 431 | util.write_to_disk(d, output_dir=f'{self.output_dir}/{name}', batch_name=f'batch_{i}', data_name=data_name) 432 | 433 | 434 | # if __name__ == 'scalesc.pp': 435 | # scalesc = ScaleSC(data_dir='/edgehpc/dept/compbio/projects/scaleSC/haotian/batch/data_dir/70k_human_lung') 436 | # scalesc.calculate_qc_metrics() 437 | # scalesc.filter_genes(min_count=3) 438 | # scalesc.filter_cells(min_count=200, max_count=6000) 439 | # scalesc.highly_variable_genes(n_top_genes=4000) 440 | # scalesc.normalize_log1p() 441 | # scalesc.pca(n_components=50) 442 | # scalesc.to_CPU() 443 | # print(scalesc.adata_X) 444 | 445 | 446 | --------------------------------------------------------------------------------