├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 OmicsML 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Foundation-Model-Evaluation-For-Single-cell 2 | 1. [2025 Genome Biology] **Zero-shot evaluation reveals limitations of single-cell foundation models** [[paper]](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03574-x) 3 | 1. [2024 biorxiv] **Evaluating the role of pre-training dataset size and diversity on single-cell foundation model performance** [[paper]](https://www.biorxiv.org/content/10.1101/2024.12.13.628448v1) 4 | 1. [2024 Nature Machine Intelligence] **Deeper evaluation of a single-cell foundation models** [[paper]](https://www.nature.com/articles/s42256-024-00949-w) 5 | 1. [2024 Nature Methods] **Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis** [[paper]](https://www.nature.com/articles/s41592-024-02235-4) 6 | 1. [2024 biorxiv] **Metric Mirages in Cell Embeddings** [[paper]](https://www.biorxiv.org/content/10.1101/2024.04.02.587824v1) 7 | 1. [2023 biorxiv] **A Deep Dive into Single-Cell RNA Sequencing Foundation Models** [[paper]](https://www.biorxiv.org/content/10.1101/2023.10.19.563100v1.abstract) 8 | 1. [2023 bioRxiv scEval] **Evaluating the Utilities of Large Language Models in Single-cell Data Analysis** [[paper]](https://www.biorxiv.org/content/10.1101/2023.09.08.555192v2) 9 | 1. [2023 bioRxiv] **Foundation Models Meet Imbalanced Single-Cell Data When Learning Cell Type Annotations** [[paper]](https://www.biorxiv.org/content/10.1101/2023.10.24.563625v1) 10 | 1. [2023 bioRxiv] **Evaluation of large language models for discovery of gene set function** [[paper]](https://arxiv.org/abs/2309.04019) 11 | 1. [2024 ICLR benchmark DNA FD] **BEND: Benchmarking DNA Language Models on Biologically Meaningful Tasks** [[paper]](https://openreview.net/pdf?id=uKB4cFNQFg) 12 | 13 | 14 | # Foundation-Model-For-Single-cell 15 | 1. [2025 BioRxiv] **Scaling Large Language Models for Next-Generation Single-Cell Analysis** [[paper](https://www.biorxiv.org/content/10.1101/2025.04.14.648850v1)] 16 | 1. [2025 BioRxiv] **Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data** [[paper](https://www.biorxiv.org/content/10.1101/2025.04.10.647852v1)] 17 | 1. [2025 BioRxiv, Tabular Modeling on single-cell data] **Toward a privacy-preserving predictive foundation model of single-cell transcriptomics with federated learning and tabular modeling** [[paper](https://www.biorxiv.org/content/10.1101/2025.01.06.631427v1)] 18 | 1. [2024 BioRxiv] **CASSIA: a multi-agent large language model for reference free, interpretable, and automated cell annotation of single-cell RNA-sequencing data** [[paper](https://www.biorxiv.org/content/10.1101/2024.12.04.626476v2)] 19 | 1. [2024 BioRxiv] **Scaling Dense Representations for Single Cell with Transcriptome-Scale Context** [[paper](https://www.biorxiv.org/content/10.1101/2024.11.28.625303v1.full)] 20 | 1. [2024 BioRxiv] **SubCell: Vision foundation models for microscopy capture single-cell biology** [[paper](https://www.biorxiv.org/content/10.1101/2024.12.06.627299v1)] 21 | 1. [2024 Nature Machine Intelligence] **Delineating the effective use of self-supervised learning in single-cell genomics** [[paper](https://www.nature.com/articles/s42256-024-00934-3)] 22 | 1. [2024 arxiv] **scReader: Prompting Large Language Models to Interpret scRNA-seq Data** [[paper](https://arxiv.org/abs/2412.18156)] 23 | 1. [2024 BioRxiv] **A framework for gene representation on spatial transcriptomics** [[paper](https://www.biorxiv.org/content/10.1101/2024.09.27.615337v5)] 24 | 1. [2024 BioRxiv] **Multimodal learning of transcriptomes and text enables interactive single-cell RNA-seq data exploration with natural-language chats** [[paper](https://www.biorxiv.org/content/10.1101/2024.10.15.618501v1)] 25 | 1. [2024 BioRxiv] **CancerFoundation: A single-cell RNA sequencing foundation model to decipher drug resistance in cancer** [[paper](https://www.biorxiv.org/content/10.1101/2024.11.01.621087v1.full.pdf)] 26 | 1. [2024 BioRxiv] **scChat: A Large Language Model-Powered Co-Pilot for Contextualized Single-Cell RNA Sequencing Analysis** [[paper](https://www.biorxiv.org/content/10.1101/2024.10.01.616063v1.full.pdf+html)] 27 | 1. [2024 BioRxiv] **Cell-ontology guided transcriptome foundation model** [[paper](https://arxiv.org/pdf/2408.12373)] 28 | 1. [2024 BioRxiv] **How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities** [[paper](https://arxiv.org/abs/2409.11654)] 29 | 1. [2024 Nature Methods] **Transformers in single-cell omics: a review and new perspectives** [[paper](https://www.nature.com/articles/s41592-024-02353-z)] 30 | 1. [2024 BioRxiv] **Cell-Graph Compass: Modeling Single Cells with Graph Structure Foundation Model** [[paper](https://www.biorxiv.org/content/10.1101/2024.06.04.597354v1.full.pdf)] 31 | 1. [2024 BioRxiv] **scPRINT: pre-training on 50 million cells allows robust gene network predictions** [[paper](https://www.biorxiv.org/content/10.1101/2024.07.29.605556v1)] 32 | 1. [2024 biorxiv] **Precious3GPT: Multimodal Multi-Species Multi-Omics Multi-Tissue Transformer for Aging Research and Drug Discovery** [[paper]](https://www.biorxiv.org/content/10.1101/2024.07.25.605062v1) 33 | 1. [2024 biorxiv] **scMulan: a multitask generative pre-trained language model for single-cell analysis** [[paper]](https://www.biorxiv.org/content/10.1101/2024.01.25.577152v1) 34 | 1. [2024 biorxiv] **CELLama: Foundation Model for Single Cell and Spatial Transcriptomics by Cell Embedding Leveraging Language Model Abilities** [[paper]](https://www.biorxiv.org/content/10.1101/2024.05.08.593094v1#:~:text=To%20address%20these%20challenges%2C%20we,data%20embedding%20for%20various%20analysis.) 35 | 1. [2024 biorxiv] **LangCell: Language-Cell Pre-training for Cell Identity Understanding** [[paper]](https://arxiv.org/pdf/2405.06708) 36 | 1. [2024 biorxiv] **Nicheformer: a foundation model for single-cell and spatial omics** [[paper]](https://www.biorxiv.org/content/10.1101/2024.04.15.589472v1) 37 | 1. [2024 biorxiv] **Large-scale characterization of cell niches in spatial atlases using bio-inspired graph learning** [[paper]](https://www.biorxiv.org/content/10.1101/2024.02.21.581428v1) 38 | 1. [2024 biorxiv] **scmFormer Integrates Large-Scale Single-Cell Proteomics and Transcriptomics Data by Multi-Task Transformer** [[paper]](https://pubmed.ncbi.nlm.nih.gov/38483032/) 39 | 1. [2024 biorxiv] **Sequence modeling and design from molecular to genome scale with Evo** [[paper]](https://www.biorxiv.org/content/10.1101/2024.02.27.582234v1) 40 | 1. [2024] **Single-cell metadata as language** [[paper]](https://www.nxn.se/valent/2024/2/4/single-cell-metadata-as-language) 41 | 1. [2023 NeurIPS] **MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data** [[paper]](https://openreview.net/forum?id=4UCktT9XZx) 42 | 1. [2023 biorxiv] **scNODE: Generative Model for Temporal Single Cell Transcriptomic Data Prediction** [[paper]](https://www.biorxiv.org/content/10.1101/2023.11.22.568346v1.full.pdf) 43 | 1. [2023 biorxiv] **Universal Cell Embeddings: A Foundation Model for Cell Biology** [[paper]](https://www.biorxiv.org/content/10.1101/2023.11.28.568918v1.full.pdf) 44 | 1. [2023 NeurIPS 2023 AI for Science Workshop] **scCLIP: Multi-modal Single-cell Contrastive Learning Integration Pre-training** [[paper]](https://openreview.net/pdf?id=KMtM5ZHxct) 45 | 1. [2023 NeurIPS 2023 AI for Science Workshop] **Single-cell Masked Autoencoder: An Accurate and Interpretable Automated Immunophenotyper** [[paper]](https://openreview.net/pdf?id=2mq6uezuGj) 46 | 1. [2023 biorxiv] **scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis** [[paper]](https://www.biorxiv.org/content/10.1101/2023.12.07.569910v1.full.pdf) 47 | 1. [2023 biorxiv] **Large-Scale Cell Representation Learning via Divide-and-Conquer Contrastive Learning** [[paper]](https://arxiv.org/pdf/2306.04371.pdf) 48 | 1. [2023 arxiv multimodal] **MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data** [[paper]](https://arxiv.org/abs/2310.02275) 49 | 1. [2023 Nature Machine Intelligence] **Reusability report: Learning the transcriptional grammar in single-cell RNA-sequencing data using transformers** [[paper]](https://www.nature.com/articles/s42256-023-00757-8) 50 | 1. [2023 bioRxiv] **Scalable querying of human cell atlases via a foundational model reveals commonalities across fibrosis-associated macrophages** [[paper]](https://www.biorxiv.org/content/10.1101/2023.07.18.549537v1) 51 | 1. [2023 bioRxiv] **To Transformers and Beyond: Large Language Models for the Genome** [[paper]](https://arxiv.org/abs/2311.07621) 52 | 1. [2023 bioRxiv] **A pre-trained large generative model for translating single-cell transcriptome to proteome** [[paper]](https://www.biorxiv.org/content/10.1101/2023.07.04.547619v2.full.pdf) 53 | 1. [2023 bioRxiv] **GENEPT: A SIMPLE BUT HARD-TO-BEAT FOUNDATION MODEL FOR GENES AND CELLS BUILT FROM CHATGPT** [[paper]](https://www.biorxiv.org/content/10.1101/2023.10.16.562533v1.full.pdf) 54 | 1. [2023 bioRxiv] **CellPLM: Pre-training of Cell Language Model Beyond Single Cells** [[paper]](https://www.biorxiv.org/content/10.1101/2023.10.03.560734v1) 55 | 1. [2023 Nature Biotechnology multi-modal] **Integration of multi-modal single-cell data** [[Paper]](https://www.nature.com/articles/s41587-023-01826-4) 56 | 1. [2023 bioRxiv multi-modal] **Single-cell gene expression prediction from DNA sequence at large contexts** [[paper]](https://www.biorxiv.org/content/10.1101/2023.07.26.550634v1.full) 57 | 1. [2023 bioRxiv multi-modal] **Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation** [[paper]](https://www.biorxiv.org/content/10.1101/2023.08.30.555582v1) 58 | 1. [2023 bioRxiv] **CellPolaris: Decoding Cell Fate through Generalization Transfer Learning of Gene Regulatory Networks** [[paper]](https://www.biorxiv.org/content/10.1101/2023.09.25.559244v1#:~:text=Applications%20of%20CellPolaris%20demonstrate%20remarkable,outcomes%20in%20cell%20reprogramming%20and) 59 | 1. [2023 bioRxiv] **GeneCompass: Deciphering Universal Gene Regulatory Mechanisms with Knowledge-Informed Cross-Species Foundation Model** [[paper]](https://www.biorxiv.org/content/10.1101/2023.09.26.559542v1) 60 | 1. [2023 bioRxiv] **scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain** [[paper]](https://arxiv.org/abs/2310.02713) 61 | 1. [2023 XXXX] **A Deeper Dive into Single-Cell RNA Sequencing Foundation Models** 62 | 1. [2025 Nature] **GET: a foundation model of transcription across human cell types** [[paper]](https://www.biorxiv.org/content/10.1101/2023.09.24.559168v1) 63 | 1. [2024 ICML] **Cell2Sentence: Teaching Large Language Models the Language of Biology** [[paper]](https://www.biorxiv.org/content/10.1101/2023.09.11.557287v1) 64 | 1. [2023 bioRxiv][**scTranslator**] **A pre-trained large language model for translating single-cell transcriptome to proteome** [[paper]](https://www.biorxiv.org/content/10.1101/2023.07.04.547619v1) 65 | 1. [2023 bioRxiv][**scPoli**] **Population-level integration of single-cell datasets enables multi-scale analysis across samples** [[paper]](https://www.biorxiv.org/content/10.1101/2022.11.28.517803v1) 66 | 1. [2023 bioRxiv] **Towards Universal Cell Embeddings: Integrating Single-cell RNA-seq Datasets across Species with SATURN** [[paper]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9915700/) 67 | 1. [2023 bioRxiv][**scFoundation**] **Large Scale Foundation Model on Single-cell Transcriptomics** [[paper]](https://www.biorxiv.org/content/10.1101/2023.05.29.542705v2) 68 | 1. [2023 Nature][**GeneFormer**] **Transfer learning enables predictions in network biology** [[paper]](https://www.nature.com/articles/s41586-023-06139-9) 69 | 1. [2023 iSchience][**tGPT**] **Generative pretraining from large-scale transcriptomes for single-cell deciphering** [[paper]](https://www.sciencedirect.com/science/article/pii/S2589004223006132) 70 | 1. [2023 bioRxiv][**scGPT**] **scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI** [[paper v1]](https://www.biorxiv.org/content/10.1101/2023.04.30.538439v1), [[paper v2]](https://www.biorxiv.org/content/10.1101/2023.04.30.538439v2) 71 | 1. [2023 bioRxiv][**xTrimoGene**] **xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data** [[paper]](https://www.biorxiv.org/content/10.1101/2023.03.24.534055v1) 72 | 1. [2022 arxiv][**Exceiver**] **A single-cell gene expression language model** [[paper]](https://arxiv.org/abs/2210.14330) 73 | 1. [2022 Nature Machine Intelligence][**scBERT**] **scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data** [[paper]](https://www.nature.com/articles/s42256-022-00534-z) 74 | 1. [2022 bioRxiv][**scFormer**] **scFormer: a universal representation learning approach for single-cell data using transformers** [[paper]](https://openreview.net/pdf?id=7hdmA0qtr5) 75 | 1. [2022 Bioinformatics][**scPretrain**] **scPretrain: multi-task self-supervised learning for cell-type classification** [[paper]](https://academic.oup.com/bioinformatics/article/38/6/1607/6499287) 76 | 77 | 78 | ## Foundation-Model-Genetic-Perturbation 79 | 1. [2024 biorxiv] **Benchmarking a foundational cell model for post-perturbation RNAseq prediction** [[paper]](https://www.biorxiv.org/content/biorxiv/early/2024/10/01/2024.09.30.615843.full.pdf) 80 | 1. [2024 biorxiv] **Benchmarking Transcriptomics Foundation Models for Perturbation Analysis: one PCA still rules them all** [[paper]](https://arxiv.org/abs/2410.13956) 81 | 1. [2024 biorxiv] **PertEval-scFM: Benchmarking Single-Cell Foundation Models for Perturbation Effect Prediction** [[paper]](https://www.biorxiv.org/content/10.1101/2024.10.02.616248v1) 82 | 1. [2024 BioRxiv] **scGenePT: Is language all you need for modeling single-cell perturbations?** [[paper](https://www.biorxiv.org/content/10.1101/2024.10.23.619972v1)] 83 | 1. [2024 Cell, FM4perturbation data: a review] **Toward a foundation model of causal cell and tissue biology with a Perturbation Cell and Tissue Atlas** [[paper](https://www.cell.com/cell/abstract/S0092-8674(24)00829-8?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0092867424008298%3Fshowall%3Dtrue)] 84 | 1. [2023 NM] **Causal identification of single-cell experimental perturbation effects with CINEMA-OT** [[paper]](https://www.nature.com/articles/s41592-023-02040-5) 85 | 86 | 87 | 88 | # Foundation-Model-For-Pathology 89 | 1. [2024 bioRxiv] **BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once** [[paper]](https://arxiv.org/pdf/2405.12971) 90 | 1. [2024 Nature] **A whole-slide foundation model for digital pathology from real-world data** [[paper]](https://www.nature.com/articles/s41586-024-07441-w) 91 | 1. [2024 Nature Medicine FM4Pathology] **Towards a general-purpose foundation model for computational pathology** [[paper]](https://www.nature.com/articles/s41591-024-02857-3) 92 | 1. [2024 Nature Medicine FM4Pathology] **A visual-language foundation model for computational pathology** [[paper]](https://www.nature.com/articles/s41591-024-02856-4) 93 | 1. [2023 Nature Medicine] **A visual–language foundation model for pathology image analysis using medical Twitter** [[paper]](https://www.nature.com/articles/s41591-023-02504-3) 94 | 95 | 96 | 97 | --------------------------------------------------------------------------------