├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Aviv 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Awesome State-Space Resources for ML 2 | 3 | **Contributions are welcome! Please read the [contribution guidelines](#contributions) before contributing.** 4 | 5 | ## Table of Contents 6 | 7 | - [Tutorials](#tutorials) 8 | - [Surveys](#surveys) 9 | - [Books](#books) 10 | - [Foundation](#foundation) 11 | - [Distillation](#distillation) 12 | - [Architecture](#architecture) 13 | - [Vision](#vision) 14 | - [Language](#language) 15 | - [Audio](#audio) 16 | - [Time-Series](#time-series) 17 | - [Medical](#medical) 18 | - [Tabular](#tabular) 19 | - [Reinforcement Learning](#reinforcement-learning) 20 | - [Parameterization and Initialization](#ssm-parameterization-and-initialization) 21 | - [Systems Optimizations](#systems-optimizations) 22 | - [Miscellaneous](#miscellaneous) 23 | 24 | ## Tutorials 25 | 26 | #### Blogposts 27 | 1. [S4 Series](https://hazyresearch.stanford.edu/blog/2022-01-14-s4-1) 28 | 2. [The Annotated S4](https://srush.github.io/annotated-s4/) 29 | 3. [The Annotated S4D](https://srush.github.io/annotated-s4/s4d.html) 30 | 4. [The Annotated Mamba](https://srush.github.io/annotated-mamba/hard.html) [[code]](https://github.com/srush/annotated-mamba) 31 | 5. [Mamba: The Easy Way](https://jackcook.com/2024/02/23/mamba.html) 32 | 6. [Mamba: The Hard Way](https://srush.github.io/annotated-mamba/hard.html) 33 | 7. [A Visual Guide to Mamba and State Space Models](https://open.substack.com/pub/maartengrootendorst/p/a-visual-guide-to-mamba-and-state) 34 | 8. [State Space Models: A Modern Approach](https://probml.github.io/ssm-book/root.html) 35 | 9. [Mamba No. 5 (A Little Bit Of...)](https://jameschen.io/jekyll/update/2024/02/12/mamba.html) 36 | 10. [Mamba: SSM, Theory, and Implementation in Keras and TensorFlow](https://medium.com/towards-data-science/mamba-ssm-theory-and-implementation-in-keras-and-tensorflow-32d6d4b32546) 37 | 38 | #### Videos 39 | 1. [Efficiently Modeling Long Sequences with Structured State Spaces](https://www.youtube.com/watch?v=luCBXCErkCs) 40 | 2. [Do we need Attention? A Mamba Primer](https://www.youtube.com/watch?v=dVH1dRoMPBc) 41 | 3. [Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math](https://www.youtube.com/watch?v=8Q_tqwpTpVU) 42 | 4. [MAMBA from Scratch](https://www.youtube.com/watch?v=N6Piou4oYx8) 43 | 5. [Yannic Kilcher's Video](https://www.youtube.com/watch?v=9dSkvxS2EB0) 44 | 45 | ## Surveys (Structured State Space Models) 46 | 1. [Modeling Sequences with Structured State Spaces](https://www.proquest.com/docview/2880853867?pq-origsite=gscholar&fromopenview=true&sourcetype=Dissertations%20&%20Theses) 47 | 2. [State Space Model for New-Generation Network Alternative to Transformers](https://arxiv.org/abs/2404.09516) 48 | 3. [Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges](https://arxiv.org/abs/2404.16112) 49 | 4. [A Survey on Visual Mamba](https://arxiv.org/abs/2404.15956) 50 | 51 | ## Books (Classical State Space Models) 52 | 1. [Linear State-Space Control Systems](https://onlinelibrary.wiley.com/doi/book/10.1002/9780470117873) 53 | 2. [Principles of System Identification Theory and Practice](https://www.taylorfrancis.com/books/mono/10.1201/9781315222509/principles-system-identification-arun-tangirala) 54 | 55 | ## Foundation 56 | 1. [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752) [[code]](https://github.com/state-spaces/mamba) 57 | 2. [Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality](https://arxiv.org/abs/2405.21060) 58 | 3. [Structured state-space models are deep Wiener models](https://arxiv.org/abs/2312.06211) 59 | 4. [State-space Models with Layer-wise Nonlinearity are Universal Approximators with Exponential Decaying Memory](https://arxiv.org/abs/2309.13414) 60 | 5. [Repeat After Me: Transformers are Better than State Space Models at Copying](https://arxiv.org/abs/2402.01032) 61 | 6. [Theoretical Foundations of Deep Selective State-Space Models](https://arxiv.org/abs/2402.19047) 62 | 7. [The Hidden Attention of Mamba Models](https://arxiv.org/abs/2403.01590) 63 | 8. [The Expressive Capacity of State Space Models: A Formal Language Perspective](https://arxiv.org/abs/2405.17394) 64 | 9. [Simplifying and Understanding State Space Models with Diagonal Linear RNNs](https://arxiv.org/abs/2212.00768) 65 | 10. [Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism](https://arxiv.org/abs/2504.18574) 66 | 11. [Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks](https://arxiv.org/abs/2402.04248) 67 | 12. [An Empirical Study of Mamba-based Language Models](https://arxiv.org/abs/2406.07887) 68 | 69 | ## Distillation 70 | 1. [Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models](https://arxiv.org/abs/2408.10189) [[code]](https://github.com/goombalab/phi-mamba) 71 | 2. [The Mamba in the Llama: Distilling and Accelerating Hybrid Models](https://arxiv.org/abs/2408.15237) 72 | 3. [Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners](https://arxiv.org/abs/2502.20339) 73 | 74 | ## Architectures 75 | 76 | ### Hybrid Architectures 77 | 1. [Jamba: A Hybrid Transformer-Mamba Language Model](https://arxiv.org/abs/2403.19887) 78 | 2. [Jamba-1.5: Hybrid Transformer-Mamba Models at Scale](https://arxiv.org/abs/2408.12570) 79 | 3. [The Zamba2 Suite: Technical Report](https://arxiv.org/abs/2411.15242) 80 | 4. [Hymba: A Hybrid-head Architecture for Small Language Models](https://arxiv.org/abs/2411.13676) 81 | 82 | ### Other 83 | 1. [S5: Simplified State Space Layers for Sequence Modeling](https://arxiv.org/abs/2208.04933) (ICLR 2023) [[code]](https://github.com/lindermanlab/S5) 84 | 2. [Long range language modeling via gated state spaces](https://arxiv.org/abs/2206.13947) (ICLR 2023) 85 | 3. [Pretraining Without Attention](https://arxiv.org/abs/2212.10544) [[code]](https://github.com/jxiw/BiGS) 86 | 4. [MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts](https://arxiv.org/abs/2401.04081) [[code]](https://github.com/llm-random/llm-random) 87 | 5. [LOCOST: State-Space Models for Long Document Abstractive Summarization](https://arxiv.org/abs/2401.17919) [[code]](https://github.com/flbbb/locost-summarization) 88 | 6. [BlackMamba: Mixture of Experts for State-Space Models](https://arxiv.org/abs/2402.01771) [[code]](https://github.com/Zyphra/BlackMamba) 89 | 7. [DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models](https://arxiv.org/abs/2403.00818) [[code]](https://github.com/WailordHe/DenseSSM) 90 | 8. [ZigMa: Zigzag Mamba Diffusion Model (ECCV 2024) ](https://arxiv.org/abs/2403.13802) [[code]](https://github.com/CompVis/zigma) [[website]](https://taohu.me/zigma/) 91 | 9. [Block-State Transformers](https://arxiv.org/abs/2306.09539) 92 | 10. [Efficient Long Sequence Modeling via State Space Augmented Transformer](https://arxiv.org/abs/2212.08136) 93 | 11. [S7: Selective and Simplified State Space Layers for Sequence Modeling](https://arxiv.org/abs/2410.03464) 94 | 95 | ## Language 96 | 1. [Hungry Hungry Hippos: Towards Language Modeling with State Space Models](https://arxiv.org/abs/2212.14052) (ICLR 2023) [[code]](https://github.com/HazyResearch/H3) 97 | 2. [Long range language modeling via gated state spaces](https://arxiv.org/abs/2206.13947) (ICLR 2023) [[code]](https://github.com/lucidrains/gated-state-spaces-pytorch.git) 98 | 3. [MambaByte: Token-free Selective State Space Model](https://arxiv.org/abs/2401.13660) [[code]](https://github.com/lucidrains/MEGABYTE-pytorch) 99 | 4. [Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing](https://arxiv.org/abs/2502.14458) 100 | 101 | ## Audio 102 | 1. [It's Raw! Audio Generation with State-Space Models](https://arxiv.org/abs/2202.09729) (ICML 2022) [[code]](https://github.com/state-spaces/s4) 103 | 2. [Augmenting conformers with structured state space models for online speech recognition](https://arxiv.org/abs/2309.08551) 104 | 3. [Diagonal State Space Augmented Transformers for Speech Recognition](https://arxiv.org/abs/2302.14120) 105 | 4. [Structured State Space Decoder for Speech Recognition and Synthesis](https://arxiv.org/abs/2210.17098) 106 | 5. [Spiking Structured State Space Model for Monaural Speech Enhancement](https://arxiv.org/abs/2309.03641) 107 | 6. [A Neural State-Space Model Approach to Efficient Speech Separation](https://arxiv.org/abs/2305.16932) 108 | 7. [Multi-Head State Space Model for Speech Recognition](https://arxiv.org/abs/2305.12498) 109 | 8. [Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation](https://arxiv.org/abs/2403.18257) [[code]](https://github.com/xi-j/Mamba-TasNet) 110 | 9. [SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model](https://arxiv.org/abs/2405.11831) [[code]](https://github.com/SiavashShams/ssamba) 111 | 10. [Audio Mamba: Bidirectional State Space Model for Audio Representation Learning](https://arxiv.org/abs/2406.03344) [[code]](https://github.com/kaistmm/Audio-Mamba-AuM) 112 | 11. [Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis](https://arxiv.org/abs/2407.09732) [[code]](https://github.com/xi-j/Mamba-ASR) 113 | 114 | ## Vision 115 | 1. [S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces](https://arxiv.org/abs/2210.06583) (NeurIPS 2022) 116 | 2. [Long movie clip classification with state-space video models](https://arxiv.org/abs/2204.01692) (ECCV 2022) [[code]](https://github.com/md-mohaiminul/ViS4mer) 117 | 3. [Efficient Movie Scene Detection using State-Space Transformers](https://arxiv.org/abs/2212.14427) (CVPR 2023) 118 | 4. [Selective Structured State-Spaces for Long-Form Video Understanding](https://arxiv.org/abs/2303.14526) (CVPR 2023) 119 | 5. [2-D SSM: A General Spatial Layer for Visual Transformers](https://arxiv.org/abs/2306.06635) [[code]](https://github.com/ethanbar11/ssm_2d) 120 | 6. [Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model](https://arxiv.org/abs/2401.09417) [[code]](https://github.com/hustvl/Vim) 121 | 7. [VMamba: Visual State Space Model](https://arxiv.org/abs/2401.10166) [[code]](https://github.com/MzeroMiko/VMamba) 122 | 8. [U-shaped Vision Mamba for Single Image Dehazing](https://arxiv.org/abs/2402.04139) [[code]](https://github.com/zzr-idam/UVM-Net) 123 | 9. [Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning](https://arxiv.org/abs/2402.15761) [[code]](https://github.com/ChiShengChen/ResVMamba) 124 | 10. [Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation](https://arxiv.org/abs/2402.10887) [[code]](https://github.com/ziyangwang007/Mamba-UNet) 125 | 11. [LocalMamba: Visual State Space Model with Windowed Selective Scan](https://arxiv.org/abs/2403.09338) [[code]](https://github.com/hunto/LocalMamba) 126 | 12. [Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM](https://arxiv.org/abs/2403.07487) [[code]](https://steve-zeyu-zhang.github.io/MotionMamba/?utm_source=catalyzex.com) 127 | 13. [MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models](https://arxiv.org/abs/2403.09471) [[code]](https://github.com/kkakkkka/MambaTalk) 128 | 14. [A Survey on Visual Mamba](https://arxiv.org/abs/2404.15956) 129 | 15. [SUM: Saliency Unification through Mamba for Visual Attention Modeling](https://arxiv.org/abs/2406.17815) [[code]](https://github.com/Arhosseini77/SUM) 130 | 16. [State Space Models for Event Cameras](https://arxiv.org/abs/2402.15584) [[code]](https://github.com/uzh-rpg/ssms_event_cameras) (CVPR 2024 Spotlight) 131 | 17. [DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding](https://arxiv.org/abs/2503.16426) [[code]](https://github.com/KyanChen/DynamicVis) 132 | 133 | ## Time-Series 134 | 1. [Deep State Space Models for Time Series Forecasting](https://papers.nips.cc/paper_files/paper/2018/hash/5cf68969fb67aa6082363a6d4e6468e2-Abstract.html) (NeurIPS 2018) 135 | 2. [FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting](https://arxiv.org/abs/2205.08897) (NeurIPS 2022) 136 | 3. [Effectively modeling time series with simple discrete state spaces](https://arxiv.org/abs/2303.09489) (ICLR 2023) 137 | 4. [Deep Latent State Space Models for Time-Series Generation](https://arxiv.org/abs/2212.12749) (ICML 2023) 138 | 5. [Generative AI for End-to-End Limit Order Book Modelling](https://arxiv.org/abs/2309.00638) (ICAIF 2023) 139 | 6. [On the Performance of Legendre State-Space Models in Short-Term Time Series Forecasting](https://ieeexplore.ieee.org/document/10289082) (CCECE 2023) 140 | 7. [Neural Continuous-Discrete State Space Models for Irregularly-Sampled Time Series](https://arxiv.org/abs/2301.11308) 141 | 8. [Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models](https://arxiv.org/abs/2208.09399) 142 | 9. [Is Mamba Effective for Time Series Forecasting?](https://arxiv.org/abs/2403.11144) 143 | 144 | ## Medical 145 | 1. [Structured State Space Models for Multiple Instance Learning in Digital Pathology](https://arxiv.org/abs/2306.15789) 146 | 2. [Modeling Multivariate Biosignals with Graph Neural Networks and Structured State Space](https://arxiv.org/abs/2211.11176) 147 | 3. [Diffusion-based conditional ECG generation with structured state space models](https://arxiv.org/abs/2301.08227) 148 | 4. [Improving the Diagnosis of Psychiatric Disorders with Self-Supervised Graph State Space Models](https://arxiv.org/abs/2206.03331) 149 | 5. [fMRI-S4: learning short- and long-range dynamic fMRI dependencies using 1D Convolutions and State Space Models](https://arxiv.org/abs/2208.04166) 150 | 6. [Vivim: a Video Vision Mamba for Medical Video Object Segmentation](https://arxiv.org/abs/2401.14168) [[code]](https://github.com/scott-yjyang/Vivim) 151 | 7. [MambaMorph: a Mamba-based Backbone with Contrastive Feature Learning for Deformable MR-CT Registration](https://arxiv.org/abs/2401.13934) [[code]](https://github.com/Guo-Stone/MambaMorph) 152 | 8. [SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation](https://arxiv.org/abs/2401.13560) [[code]](https://github.com/ge-xing/SegMamba) 153 | 9. [U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation](https://arxiv.org/abs/2401.04722) [[code]](https://github.com/bowang-lab/U-Mamba) 154 | 10. [nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model](https://arxiv.org/abs/2402.03526) 155 | 11. [VM-UNet: Vision Mamba UNet for Medical Image Segmentation](https://arxiv.org/abs/2402.02491) 156 | 12. [MambaMIR: An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation](https://arxiv.org/abs/2402.18451) 157 | 13. [ViM-UNet: Vision Mamba for Biomedical Segmentation](https://doi.org/10.48550/arXiv.2404.07705) (MIDL 2024) 158 | 14. [I2I-Mamba: Multi-modal medical image synthesis via selective state space modeling](https://arxiv.org/abs/2405.14022) [[code]](https://github.com/icon-lab/I2I-Mamba) 159 | 15. [BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba](https://arxiv.org/abs/2408.02600) 160 | 16. [MambaRoll: Physics-Driven Autoregressive State Space Models for Medical Image Reconstruction](https://arxiv.org/abs/2412.09331) [[code]](https://github.com/icon-lab/MambaRoll) 161 | 162 | ## SSM Parameterization and Initialization 163 | 1. [Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers](https://arxiv.org/abs/2110.13985) (NeurIPS 2021) 164 | 2. [Efficiently Modeling Long Sequences with Structured State Spaces](https://arxiv.org/abs/2110.13985) (ICLR 2022) 165 | 3. [On the Parameterization and Initialization of Diagonal State Space Models](https://arxiv.org/abs/2206.11893) (NeurIPS 2022) 166 | 4. [Diagonal State Spaces are as Effective as Structured State Spaces](https://arxiv.org/abs/2203.14343) (NeurIPS 2022) [[code]](https://github.com/ag1988/dss) 167 | 5. [How to Train your HIPPO: State Space Models with Generalized Orthogonal Basis Projections](https://arxiv.org/abs/2206.12037) (ICLR 2023) 168 | 7. [Robustifying State-space Models for Long Sequences via Approximate Diagonalization](https://arxiv.org/abs/2310.01698) 169 | 8. [StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization](https://arxiv.org/abs/2311.14495) 170 | 9. [Spectral State Space Models](https://arxiv.org/abs/2312.06837) 171 | 10. [From Generalization Analysis to Optimization Designs for State Space Models](https://arxiv.org/abs/2405.02670) (ICML 2024) 172 | 173 | ## Systems Optimizations 174 | 1. [Marconi: Prefix Caching for the Era of Hybrid LLMs](https://arxiv.org/abs/2411.19379) (MLSys 2025) 175 | 2. [Jenga: Effective Memory Management for Serving LLM with Heterogeneity](https://arxiv.org/abs/2503.18292) 176 | 177 | ## Tabular 178 | 1. [MambaTab: A Plug-and-Play Model for Learning Tabular Data](https://arxiv.org/abs/2401.08867) 179 | 2. [Mambular: A Sequential Model for Tabular Deep Learning](https://arxiv.org/pdf/2408.06291) 180 | 181 | ## Reinforcement Learning 182 | 1. [Decision S4: Efficient Sequence-Based RL via State Spaces Layers](https://arxiv.org/abs/2306.05167) (ICLR 2023) 183 | 2. [Structured State Space Models for In-Context Reinforcement Learning](https://arxiv.org/abs/2303.03982) (NeurIPS 2023) 184 | 3. [Mastering Memory Tasks with World Models](https://openreview.net/forum?id=1vDArHJ68h) (ICLR 2024 oral) 185 | 186 | ## Miscellaneous 187 | 1. [Variational learning for switching state-space models](https://www.cs.toronto.edu/~hinton/absps/switch.pdf) (Neural Computation 2000) 188 | 2. [Liquid structural state-space models](https://arxiv.org/abs/2209.12951) (ICLR 2023) 189 | 3. [Resurrecting Recurrent Neural Networks for Long Sequences](https://arxiv.org/abs/2303.06349) (ICML 2023) 190 | 4. [Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets](https://arxiv.org/abs/2210.14064) (ICLR 2023) 191 | 5. [Never Train from Scratch: Fair Comparison Of Long- Sequence Models Requires Data-Driven Pirors](https://arxiv.org/abs/2310.02980) 192 | 6. [Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks](https://papers.nips.cc/paper_files/paper/2019/hash/952285b9b7e7a1be5aa7849f32ffff05-Abstract.html) (NeurIPS 2019) 193 | 194 | 195 | ## Contributions 196 | 197 | 🎉 Thank you for considering contributing to our Awesome State Space Models for Machine Learning repository! 🚀 198 | 199 | ### Contribute in 3 Steps: 200 | 201 | 1. **Fork the Repo:** 202 | Fork this repo to your GitHub account. 203 | 204 | 2. **Edit Content:** 205 | Contribute by adding new resources or improving existing content in the `README.md` file. 206 | 207 | 3. **Create a Pull Request:** 208 | Open a pull request (PR) from your branch to the main repository. 209 | 210 | ### Guidelines 211 | 212 | - Follow the existing structure and formatting. 213 | - Ensure added resources are relevant to State Space Models in Machine Learning. 214 | - Verify that links work correctly. 215 | 216 | ### Reporting Issues 217 | 218 | If you encounter issues or have suggestions, open an issue on the GitHub repository. 219 | 220 | Your contributions make this repository awesome! Thank you! 🙌 221 | 222 | ## License 223 | 224 | This project is licensed under the [MIT License](LICENSE). 225 | --------------------------------------------------------------------------------