├── .gitignore ├── Representation Learning ├── DSs.md ├── CLs.md └── SSLs.md ├── Language Modeling ├── Logic.md ├── Archs.md ├── Prompt.md ├── Scaling.md └── Short cut learning.md ├── Reinforcement Learning └── SRLs.md ├── Data-Centric Learning ├── DPs.md └── DDs.md ├── Inference Scaling Law └── all.md ├── Generative Modeling ├── Archs.md ├── VAEs.md ├── GANs.md ├── Video Generation.md ├── Continuous&Discrete.md └── DMs&FMs.md ├── README.md ├── Neural Networks └── Archs.md └── Optimization └── OPTs.md /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store -------------------------------------------------------------------------------- /Representation Learning/DSs.md: -------------------------------------------------------------------------------- 1 | #### Efficacy 2 | 3 | - [Deep Sets](https://arxiv.org/abs/1703.06114) -------------------------------------------------------------------------------- /Language Modeling/Logic.md: -------------------------------------------------------------------------------- 1 | 2 | - [How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis](https://arxiv.org/abs/2411.04105) -------------------------------------------------------------------------------- /Representation Learning/CLs.md: -------------------------------------------------------------------------------- 1 | #### Efficacy 2 | 3 | - [Overcoming catastrophic forgetting in neural networks](https://arxiv.org/abs/1612.00796) -------------------------------------------------------------------------------- /Language Modeling/Archs.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | #### Efficacy 4 | 5 | - [Scaling up Masked Diffusion Models on Text](https://arxiv.org/abs/2410.18514) 6 | - Code: https://github.com/ML-GSAI/SMDM -------------------------------------------------------------------------------- /Language Modeling/Prompt.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | #### Prompt Compression 4 | - [Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles](https://arxiv.org/abs/2410.14042) 5 | -------------------------------------------------------------------------------- /Reinforcement Learning/SRLs.md: -------------------------------------------------------------------------------- 1 | #### Efficacy 2 | 3 | - [Streaming Deep Reinforcement Learning Finally Works](https://arxiv.org/abs/2410.14606v2) 4 | - Code: https://github.com/mohmdelsayed/streaming-drl 5 | -------------------------------------------------------------------------------- /Data-Centric Learning/DPs.md: -------------------------------------------------------------------------------- 1 | 2 | #### Theoritical Analysis 3 | 4 | - [Towards a statistical theory of data selection under weak supervision](https://arxiv.org/abs/2309.14563) 5 | - Code: https://github.com/granica-ai/DataSelectionICLR2024/tree/main -------------------------------------------------------------------------------- /Inference Scaling Law/all.md: -------------------------------------------------------------------------------- 1 | #### image 2 | 3 | - [Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps](https://arxiv.org/abs/2501.09732) 4 | 5 | 6 | 7 | #### text 8 | 9 | - [s1: Simple test-time scaling](https://arxiv.org/abs/2501.19393) -------------------------------------------------------------------------------- /Generative Modeling/Archs.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | #### Transformer 4 | 5 | - [Scalable Diffusion Models with Transformers](https://arxiv.org/abs/2212.09748) 6 | 7 | - [SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers](https://arxiv.org/abs/2401.08740) 8 | 9 | -------------------------------------------------------------------------------- /Language Modeling/Scaling.md: -------------------------------------------------------------------------------- 1 | 2 | #### Efficacy 3 | - [Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models](https://arxiv.org/abs/2312.06585) 4 | 5 | #### Training Efficiency 6 | 7 | - [Data Efficient Neural Scaling Law via Model Reusing](https://openreview.net/forum?id=iXYnIz4RRx) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | #### Awesome Repositories 4 | 5 | There are several implementations of classical deep learning methods that I highly recommend: 6 | 7 | - [Deep Learning Paper Implementations](https://github.com/labmlai/annotated_deep_learning_paper_implementations) 8 | 9 | - [Papers-in-100-Lines-of-Code](https://github.com/MaximeVandegar/Papers-in-100-Lines-of-Code) -------------------------------------------------------------------------------- /Neural Networks/Archs.md: -------------------------------------------------------------------------------- 1 | #### Deep Neural Networks 2 | - [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556) 3 | - [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) 4 | - [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) 5 | - [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) 6 | - [Attention Is All You Need](https://arxiv.org/abs/1706.03762) 7 | 8 | #### Large Models 9 | - [Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving](https://arxiv.org/abs/2407.00079v3#) -------------------------------------------------------------------------------- /Data-Centric Learning/DDs.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | #### Dataset Distillation 4 | - [Dataset Distillation by Matching Training Trajectories](https://arxiv.org/abs/2203.11932) 5 | - [Accelerating Dataset Distillation via Model Augmentation](https://arxiv.org/abs/2212.06152) 6 | - [Dataset Condensation with Distribution Matching](https://arxiv.org/abs/2110.04181) 7 | - [Dataset Distillation via the Wasserstein Metric](https://arxiv.org/abs/2311.18531) 8 | - [Dataset Meta-Learning from Kernel Ridge-Regression](https://arxiv.org/abs/2011.00050) 9 | - [Dataset Distillation using Neural Feature Regression](https://arxiv.org/abs/2206.00719) 10 | - [Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective](https://arxiv.org/abs/2306.13092) 11 | - [On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm](https://arxiv.org/abs/2312.03526) 12 | 13 | -------------------------------------------------------------------------------- /Optimization/OPTs.md: -------------------------------------------------------------------------------- 1 | #### Efficacy 2 | 3 | - [Cautious Optimizers: Improving Training with One Line of Code](https://arxiv.org/abs/2411.16085) 4 | - Code: https://github.com/kyleliang919/C-Optim 5 | - [Learning-Rate-Free Learning by D-Adaptation](https://arxiv.org/abs/2301.07733) 6 | - Code: https://github.com/facebookresearch/dadaptation 7 | - [Prodigy: An Expeditiously Adaptive Parameter-Free Learner](https://arxiv.org/abs/2306.06101) 8 | - Code: https://github.com/konstmish/prodigy/blob/119ca3ade34584ebaa8afe2da8d623e35fec6b59/prodigyopt/prodigy.py 9 | - [The Road Less Scheduled](https://arxiv.org/abs/2405.15682) 10 | - Code: https://github.com/facebookresearch/schedule_free 11 | - [DoG is SGD’s Best Friend: A Parameter-Free Dynamic Step Size Schedule](https://proceedings.mlr.press/v202/ivgi23a) 12 | - [Memory Layers at Scale](https://arxiv.org/abs/2412.09764) 13 | - Code: https://github.com/facebookresearch/memory -------------------------------------------------------------------------------- /Generative Modeling/VAEs.md: -------------------------------------------------------------------------------- 1 | 2 | #### Efficacy 3 | - [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114) 4 | 5 | - [beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework](https://openreview.net/forum?id=Sy2fzU9gl) 6 | 7 | - [SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization](https://proceedings.mlr.press/v162/takida22a/takida22a.pdf) 8 | 9 | - [Wasserstein Auto-Encoders](https://arxiv.org/abs/1711.01558) 10 | 11 | - [Vector Quantized Wasserstein Auto-Encoder](https://arxiv.org/abs/2302.05917) 12 | 13 | #### Efficient Sampling 14 | 15 | - [Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models](https://arxiv.org/abs/2410.10733) 16 | - Code: https://github.com/mit-han-lab/efficientvit/tree/master/applications/dc_ae 17 | 18 | - [FLUX](https://github.com/black-forest-labs/flux) 19 | 20 | - [Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack](https://arxiv.org/abs/2309.15807) -------------------------------------------------------------------------------- /Generative Modeling/GANs.md: -------------------------------------------------------------------------------- 1 | #### Awesome Repositories 2 | 3 | - [StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs)](https://github.com/POSTECH-CVLab/PyTorch-StudioGAN) 4 | 5 | #### Efficacy 6 | 7 | - [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434) 8 | 9 | - [Wasserstein GAN](https://arxiv.org/abs/1701.07875) 10 | - Code: https://github.com/anibali/wgan-cifar10 11 | 12 | - [Improved Training of Wasserstein GANs](https://arxiv.org/abs/1704.00028) 13 | - Code: https://github.com/Zeleni9/pytorch-wgan 14 | 15 | - [Large Scale Adversarial Representation Learning](https://arxiv.org/abs/1907.02544) 16 | - Code: https://github.com/LEGO999/BigBiGAN-TensorFlow2.0 17 | - Mini Code: https://github.com/rkorzeniowski/bigbigan-pytorch 18 | 19 | - [Training Generative Adversarial Networks with Limited Data](https://arxiv.org/abs/2006.06676v2) 20 | - Code: https://github.com/NVlabs/stylegan2-ada-pytorch 21 | - Mini Code: https://github.com/eps696/stylegan2ada 22 | 23 | - [StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets](https://arxiv.org/abs/2202.00273) 24 | - Code: https://github.com/autonomousvision/stylegan-xl -------------------------------------------------------------------------------- /Representation Learning/SSLs.md: -------------------------------------------------------------------------------- 1 | 2 | #### Awesome Repositories 3 | - [LightlySSL](https://github.com/lightly-ai/lightly) 4 | - [solo-learn](https://github.com/vturrisi/solo-learn) 5 | 6 | #### Efficacy 7 | 8 | - [Representation Learning with Contrastive Predictive Coding](https://arxiv.org/abs/1807.03748) 9 | 10 | - [A Simple Framework for Contrastive Learning of Visual Representations](https://arxiv.org/abs/2002.05709) 11 | 12 | - [Barlow Twins: Self-Supervised Learning via Redundancy Reduction](https://arxiv.org/abs/2103.03230) 13 | 14 | - [Bootstrap your own latent: A new approach to self-supervised Learning](https://arxiv.org/abs/2006.07733) 15 | 16 | - [Momentum Contrast for Unsupervised Visual Representation Learning](https://arxiv.org/abs/1911.05722) 17 | 18 | - [Emerging Properties in Self-Supervised Vision Transformers](https://arxiv.org/abs/2104.14294) 19 | 20 | - [Contrastive Multiview Coding](https://arxiv.org/abs/1906.05849) 21 | 22 | - [Exploring Simple Siamese Representation Learning](https://arxiv.org/abs/2011.10566) 23 | 24 | - [Unsupervised Learning of Visual Features by Contrasting Cluster Assignments](https://arxiv.org/abs/2006.09882) 25 | 26 | - [VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning](https://arxiv.org/abs/2105.04906) 27 | 28 | - [DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment](https://www.arxiv.org/abs/2412.16334) -------------------------------------------------------------------------------- /Language Modeling/Short cut learning.md: -------------------------------------------------------------------------------- 1 | # Analytical tools & Interpretability 2 | - [On the Biology of a Large Language Model](https://www.jiqizhixin.com/articles/2025-03-28-10) 3 | - [Efficiently Serving LLM Reasoning Programs with Certaindex](https://arxiv.org/abs/2412.20993) 4 | - [Finding Transformer Circuits with Edge Pruning](https://arxiv.org/abs/2406.16778) 5 | 6 | 7 | # LLMs Spurious Correlation & Solutions 8 | - [How Likely Do LLMs with CoT Mimic Human Reasoning?](https://arxiv.org/pdf/2402.16048) 9 | - [DIRECT PREFERENCE OPTIMIZATION USING SPARSE FEATURE-LEVEL CONSTRAINTS](https://arxiv.org/pdf/2411.07618?) 10 | - [LOCKING DOWN THE FINETUNED LLMS SAFETY](https://arxiv.org/pdf/2410.10343) 11 | - [Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future](https://aclanthology.org/2023.emnlp-main.276.pdf) 12 | - [Semformer: Transformer Language Models with Semantic Planning](https://arxiv.org/pdf/2409.11143) 13 | - [GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-Distribution Generalization Perspective](https://arxiv.org/pdf/2211.08073) 14 | - [SUPERVISED KNOWLEDGE MAKES LARGE LANGUAGE MODELS BETTER IN-CONTEXT LEARNERS](https://arxiv.org/pdf/2312.15918) 15 | 16 | 17 | # Reward Hacking 18 | - [详解 Reward Hacking](https://zhuanlan.zhihu.com/p/21488802591) 19 | - [理解RLHF中的Reward Hacking](https://zhuanlan.zhihu.com/p/6082362466) 20 | - [Reward Hacking in Reinforcement Learning](https://lilianweng.github.io/posts/2024-11-28-reward-hacking/#:~:text=Reward%20hacking%20occurs%20when%20a%20reinforcement%20learning%20%28RL%29,without%20genuinely%20learning%20or%20completing%20the%20intended%20task.) 21 | - [Defining and Characterizing Reward Hacking](https://arxiv.org/pdf/2209.13085) 22 | - [Reward Shaping to Mitigate Reward Hacking in RLHF](https://arxiv.org/pdf/2502.18770) 23 | - [Adversarial Training of Reward Models](https://arxiv.org/pdf/2504.06141) 24 | 25 | -------------------------------------------------------------------------------- /Generative Modeling/Video Generation.md: -------------------------------------------------------------------------------- 1 | # Video Generation 2 | 3 | - [Video Generation](#video-generation) 4 | - [Survey](#survey) 5 | - [Awesome repo](#awesome-repo) 6 | - [Video Generation Evaluation](#video-generation-evaluation) 7 | - [VBench Leaderboard](#vbench-leaderboard) 8 | 9 | ## Survey 10 | [A Survey on Video Diffusion Models](https://arxiv.org/abs/2310.10647) 11 | 12 | [A Survey on Long Video Generation: Challenges, Methods, and Prospects]( https://arxiv.org/abs/2403.16407) 13 | 14 | 15 | [A Survey of AI-Generated Video Evaluation](https://arxiv.org/abs/2410.19884) 16 | 17 | [A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights](https://arxiv.org/abs/2407.08428) 18 | 19 | [From Sora What We Can See: A Survey of Text-to-Video Generation](https://arxiv.org/abs/2405.10674) 20 | 21 | ## Awesome repo 22 | [Video Generation](https://github.com/wangkai930418/awesome-diffusion-categorized#video-generation) 23 | 24 | [Video-Editing](https://github.com/wangkai930418/awesome-diffusion-categorized?tab=readme-ov-file#video-editing) 25 | 26 | [Video Generation Survey](https://github.com/yzhang2016/video-generation-survey/blob/main/video-generation.md#video-generation-survey) 27 | 28 | 29 | ## Video Generation Evaluation 30 | 31 | [A Survey of AI-Generated Video Evaluation](https://arxiv.org/abs/2410.19884) 32 | 33 | [VBench: Comprehensive Benchmark Suite for Video Generative Models](https://arxiv.org/abs/2311.17982) 34 | - code: https://github.com/vchitect/vbench 35 | 36 | [VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models](https://arxiv.org/abs/2411.13503) 37 | - code: https://github.com/vchitect/vbench 38 | 39 | 40 | [AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM](https://arxiv.org/abs/2411.17221) 41 | - code: https://github.com/wangjiarui153/AIGV-Assessor 42 | 43 | [Evaluation of Text-to-Video Generation Models: A Dynamics Perspective](https://arxiv.org/abs/2407.01094) 44 | - code: https://github.com/mingxiangl/devil 45 | 46 | [Comprehensive Subjective and Objective Evaluation Method for Text-generated Video](https://arxiv.org/abs/2501.08545) 47 | 48 | [MSG score: A Comprehensive Evaluation for Multi-Scene Video Generation](https://arxiv.org/abs/2411.19121) 49 | 50 | ## VBench Leaderboard 51 | 52 | Below are some papers listed in VBench Leaderboard 53 | 54 | [Goku: Flow Based Video Generative Foundation Models](https://arxiv.org/abs/2502.04896) 55 | - code: https://github.com/Saiyan-World/goku 56 | 57 | [AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning](https://arxiv.org/abs/2307.04725) 58 | - code: https://github.com/guoyww/AnimateDiff 59 | 60 | [VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models](https://arxiv.org/abs/2401.09047) 61 | - code: https://github.com/AILab-CVC/VideoCrafter 62 | 63 | [OpenSora](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v3) 64 | - code: https://github.com/hpcaitech/Open-Sora 65 | 66 | [Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation](https://arxiv.org/abs/2309.15818) 67 | - code: https://github.com/showlab/Show-1 68 | 69 | [CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer](https://arxiv.org/abs/2408.06072) 70 | - code: https://github.com/THUDM/CogVideo 71 | 72 | [MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions](https://arxiv.org/abs/2407.06358v1) 73 | - code:https://github.com/mira-space/Mira 74 | 75 | [From Slow Bidirectional to Fast Autoregressive Video Diffusion Models](https://arxiv.org/abs/2412.07772) 76 | 77 | 78 | [HunyuanVideo: A Systematic Framework For Large Video Generative Models](https://arxiv.org/abs/2412.03603) 79 | - code: https://github.com/Tencent/HunyuanVideo 80 | 81 | [Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models](https://arxiv.org/abs/2501.08453) 82 | - code:https://github.com/Vchitect/Vchitect-2.0 83 | 84 | 85 | [RepVideo: Rethinking Cross-Layer Representation for Video Generation](https://arxiv.org/abs/2501.08994) 86 | - code: https://github.com/Vchitect/RepVideo 87 | 88 | 89 | [LTX-Video: Realtime Video Latent Diffusion](https://arxiv.org/abs/2501.00103) 90 | - code: https://github.com/Lightricks/LTX-Video 91 | 92 | [AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data](https://arxiv.org/abs/2402.00769) 93 | - code: https://github.com/G-U-N/AnimateLCM 94 | 95 | [LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models](https://arxiv.org/abs/2309.15103) 96 | - code: https://github.com/Vchitect/LaVie 97 | 98 | [Latte: Latent Diffusion Transformer for Video Generation](https://arxiv.org/abs/2401.03048) 99 | - code: https://github.com/Vchitect/Latte 100 | 101 | [I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models](https://arxiv.org/abs/2311.04145) 102 | - code: https://github.com/ali-vilab/VGen?tab=readme-ov-file#5-run-the-tf-t2v-cvpr-2024-model -------------------------------------------------------------------------------- /Generative Modeling/Continuous&Discrete.md: -------------------------------------------------------------------------------- 1 | # Continuous & Discrete Diffusion Models on Vision & Language Generation 2 | 3 | - [Continuous \& Discrete Diffusion Models on Vision \& Language Generation](#continuous--discrete-diffusion-models-on-vision--language-generation) 4 | - [Discrete](#discrete) 5 | - [Vision](#vision) 6 | - [Language](#language) 7 | - [Guidance](#guidance) 8 | - [Continuous](#continuous) 9 | - [Language](#language-1) 10 | 11 | ## Discrete 12 | 13 | ### Vision 14 | 15 | - [NeurIPS 2021] [Structured Denoising Diffusion Models in Discrete State-Spaces](https://arxiv.org/pdf/2107.03006) 16 | > D3PM, Discrete Denoising Diffusion Probabilistic Models, a framework that extends diffusion models to discrete domains. The authors evaluate D3PMs on both text and image generation tasks. 17 | 18 | - [CVPR 2022] [MaskGIT: Masked Generative Image Transformer](https://arxiv.org/pdf/2202.04200) 19 | > MaskGIT first compresses the input image into a grid of discrete tokens using a convolutional encoder and vector quantization. During training, random tokens are masked, and the model learns to predict them based on the visible tokens. MaskGIT enables various applications including class-conditional image generation, image manipulation, and image extrapolation. 20 | 21 | - [CVPR 2022] [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/pdf/2111.14822) 22 | > An encoder maps input images to a spatial collection of image tokens. A decoder reconstructs the images from these discrete tokens. Doing masked diffusion on the image tokens. 23 | 24 | - [arXiv 2024] [[MASK] is All You Need](https://arxiv.org/pdf/2412.06787) 25 | > It builds upon Discrete Flow Matching theory to establish connections between these seemingly different approaches. By treating the masking process in Masked Generative Models as a form of discrete diffusion, this paper create a bridge that allows techniques from both paradigms to be combined and compared systematically. 26 | 27 | - [NeurIPS 2024] [Simplified and Generalized Masked Diffusion for Discrete Data](https://arxiv.org/pdf/2406.04329) 28 | > A continuous-time framework that clarifies the underlying mechanics of masked diffusion and provides a remarkably simple training objective. The approach not only streamlines the implementation of these models but also improves their performance across text and image generation tasks. By allowing for state-dependent masking schedules, it creates a more flexible diffusion process that outperforms previous discrete diffusion approaches. 29 | 30 | - [arXiv 2025] [Di[M]O: Distilling Masked Diffusion Models into One-step Generator](https://arxiv.org/pdf/2503.15457) 31 | > Solve the problem of slow inference speed of MDM, by model distillation, transfer the generation ability of the teacher model (multi-step MDM) to a student model (single-step generator). 32 | 33 | ### Language 34 | 35 | - [ICML 2024] [Discrete diffusion modeling by estimating the ratios of the data distribution](https://arxiv.org/pdf/2310.16834) 36 | 37 | - [NeurIPS 2024] [Simple and Effective Masked Diffusion Language Models](https://arxiv.org/pdf/2406.07524) 38 | 39 | - [NeurIPS 2021] [Structured Denoising Diffusion Models in Discrete State-Spaces](https://arxiv.org/pdf/2107.03006) 40 | 41 | - [ICLR 2025] [Beyond Autoregression: Fast LLMs via Self-Distillation Through Time](https://arxiv.org/pdf/2410.21035) 42 | 43 | - [ICLR 2025] [Scaling Diffusion Language Models via Adaptation from Autoregressive Models](https://arxiv.org/pdf/2410.17891) 44 | 45 | - [ICLR 2025] [Scaling up Masked Diffusion Models on Text](https://arxiv.org/pdf/2410.18514) 46 | 47 | - [arXiv 2025] [Large language diffusion models](https://arxiv.org/pdf/2502.09992) 48 | 49 | - [ICLR 2025 Oral] [Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models](https://arxiv.org/pdf/2503.09573) 50 | 51 | - [COLM 2024] [A Reparameterized Discrete Diffusion Model for Text Generation](https://arxiv.org/pdf/2302.05737) 52 | 53 | - [ICLR 2025] [Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data](https://arxiv.org/pdf/2406.03736) 54 | 55 | - [NIPS 2024] [Simplified and Generalized Masked Diffusion for Discrete Data](https://arxiv.org/pdf/2406.04329) 56 | 57 | - [NAACL 2025] [Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion](https://arxiv.org/pdf/2408.05636) 58 | 59 | - [ICLR 2023] [Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning](https://arxiv.org/pdf/2208.04202) 60 | 61 | - [arXiv 2025] [Theoretical Benefit and Limitation of Diffusion Language Model](https://arxiv.org/pdf/2502.09622) 62 | 63 | - [arXiv 2025] [Unifying Autoregressive and Diffusion-Based Sequence Generation](https://arxiv.org/pdf/2504.06416) 64 | 65 | ### Guidance 66 | 67 | - [arXiv 2024] [Derivative-free guidance in continuous and discrete diffusion models with soft value-based decoding](https://arxiv.org/pdf/2408.08252) 68 | > Images, Molecules, DNA & RNA 69 | 70 | - [ICLR 2025] [Unlocking guidance for discrete state-space diffusion and flow models](https://arxiv.org/pdf/2406.01572) 71 | > DNA 72 | 73 | - [ICLR 2025] [Steering masked discrete diffusion models via discrete denoising posterior prediction](https://arxiv.org/pdf/2410.08134) 74 | > Images, Text, Molecules 75 | 76 | - [ICLR 2025] [Simple Guidance Mechanisms for Discrete Diffusion Models](https://arxiv.org/pdf/2412.10193) 77 | > Text 78 | 79 | - [arXiv 2025] [A general framework for inference-time scaling and steering of diffusion models](https://arxiv.org/abs/2501.06848) 80 | > Images, Text 81 | 82 | 83 | 84 | ## Continuous 85 | 86 | ### Language 87 | 88 | - [ICLR 2023] [DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models](https://arxiv.org/pdf/2210.08933) 89 | 90 | - [EMNLP 2023] [DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models](https://arxiv.org/pdf/2310.05793) 91 | 92 | - [NeurIPS 2023] [PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model](https://arxiv.org/pdf/2306.02531) 93 | 94 | - [NeurIPS 2023] [AR-DIFFUSION: Auto-Regressive Diffusion Model for Text Generation](https://arxiv.org/pdf/2305.09515) 95 | 96 | - [ICML 2024] [Discrete diffusion modeling by estimating the ratios of the data distribution](https://arxiv.org/pdf/2310.16834) 97 | 98 | - [ICLR 2025 Workshop] [The Diffusion Duality](https://openreview.net/pdf?id=CB0Ub2yXjC) 99 | **** 100 | - [arXiv 2025] [Continuous Diffusion Model for Language Modeling](https://arxiv.org/pdf/2502.11564) 101 | -------------------------------------------------------------------------------- /Generative Modeling/DMs&FMs.md: -------------------------------------------------------------------------------- 1 | #### Awesome Repositories 2 | - [Diffusion Models and Representation Learning: A Survey](https://github.com/dongzhuoyao/Diffusion-Representation-Learning-Survey-Taxonomy) 3 | - [Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices](https://arxiv.org/abs/2410.11795) 4 | 5 | #### Efficacy 6 | 7 | - [Glow: Generative Flow with Invertible 1x1 Convolutions](https://arxiv.org/abs/1807.03039) 8 | 9 | - [Flow Matching for Generative Modeling](https://arxiv.org/abs/2210.02747v2) 10 | - Mini Code: https://github.com/gle-bellier/flow-matching/tree/main 11 | - Mini Code: https://gist.github.com/francois-rozet/fd6a820e052157f8ac6e2aa39e16c1aa 12 | 13 | - [Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow](https://arxiv.org/abs/2209.03003) 14 | - https://github.com/MaximeVandegar/Papers-in-100-Lines-of-Code/tree/main/Flow_Straight_and_Fast_Learning_to_Generate_and_Transfer_Data_with_Rectified_Flow 15 | 16 | - [Score-Based Generative Modeling through Stochastic Differential Equations](https://arxiv.org/abs/2011.13456) 17 | - Code: https://github.com/yang-song/score_sde 18 | 19 | - [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) 20 | - Code: https://github.com/hojonathanho/diffusion 21 | - Mini Code: https://github.com/tqch/ddpm-torch 22 | - Mini Code: https://github.com/w86763777/pytorch-ddpm 23 | 24 | - [Generative Modeling by Estimating Gradients of the Data Distribution](https://arxiv.org/abs/1907.05600) 25 | - Code: https://github.com/ermongroup/ncsn 26 | 27 | - [Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364) 28 | - Code: https://github.com/nvlabs/edm 29 | - Mini Code: https://github.com/yuanzhi-zhu/mini_edm/tree/main 30 | 31 | - [Analyzing and Improving the Training Dynamics of Diffusion Models](https://arxiv.org/abs/2312.02696) 32 | - Code: https://github.com/nvlabs/edm2 33 | - Mini Code: https://github.com/mmathew23/improved_edm 34 | - Mini Code: https://github.com/FutureXiang/edm2 35 | - Mini Code: https://github.com/YichengDWu/tinyedm 36 | 37 | 38 | #### Sampling Efficiency 39 | 40 | - [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) 41 | - Code: https://github.com/ermongroup/ddim 42 | - Mini Code: https://github.com/Alokia/diffusion-DDIM-pytorch 43 | 44 | - [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://arxiv.org/abs/2206.00927) 45 | 46 | - [Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models](https://arxiv.org/abs/2201.06503) 47 | 48 | - [Consistency Models](https://arxiv.org/abs/2303.01469) 49 | - Code: https://github.com/openai/consistency_models 50 | - Mini Code: https://github.com/openai/consistency_models_cifar10 51 | - Mini Code: https://github.com/Kinyugo/consistency_models 52 | - Mini Code: https://github.com/junhsss/consistency-models 53 | 54 | - [Improved Techniques for Training Consistency Models](https://arxiv.org/abs/2310.14189) 55 | - Code: https://github.com/Kinyugo/consistency_models 56 | 57 | - [Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models](https://arxiv.org/abs/2410.11081) 58 | 59 | - [Consistency Models Made Easy](https://arxiv.org/abs/2406.14548) 60 | - Code: https://github.com/locuslab/ect 61 | 62 | - [Stable Consistency Tuning: Understanding and Improving Consistency Models](https://arxiv.org/abs/2410.18958) 63 | - Code: https://github.com/G-U-N/Stable-Consistency-Tuning 64 | 65 | - [Scaling Rectified Flow Transformers for High-Resolution Image Synthesis](https://arxiv.org/abs/2403.03206) 66 | 67 | - [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) 68 | - Code: https://github.com/CompVis/latent-diffusion 69 | 70 | - [Phased Consistency Model](https://arxiv.org/abs/2405.18407) 71 | - Code: https://github.com/G-U-N/Phased-Consistency-Model 72 | 73 | - [Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step](https://arxiv.org/abs/2410.14919v4) 74 | - Code: https://github.com/mingyuanzhou/SiD/tree/sida 75 | 76 | #### Training Efficiency 77 | 78 | 79 | - [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) 80 | 81 | 82 | 83 | - [Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think](https://arxiv.org/abs/2410.06940) 84 | - Code: https://github.com/sihyun-yu/REPA 85 | 86 | 87 | - [Return of Unconditional Generation: A Self-supervised Representation Generation Method](https://arxiv.org/abs/2312.03701) 88 | - Code: https://github.com/LTH14/rcg 89 | 90 | - [Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment](https://arxiv.org/abs/2406.12303) 91 | - Code: https://github.com/yhli123/immiscible-diffusion 92 | 93 | 94 | - [FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification](https://arxiv.org/abs/2410.10356) 95 | 96 | 97 | #### Interesting Applications 98 | 99 | - [Diffusion Model as Representation Learner](https://arxiv.org/abs/2308.10916) 100 | - Code: https://github.com/Adamdad/Repfusion 101 | - Interesting Point: Our study begins by examining the feature space of DPMs, revealing that DPMs are inherently denoising autoencoders that balance the representation learning with regularizing model capacity. 102 | 103 | - [Denoising Diffusion Autoencoders are Unified Self-supervised Learners](https://arxiv.org/abs/2303.09769) 104 | - Code: https://github.com/FutureXiang/ddae 105 | - Interesting Point: This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners: by pre-training on unconditional image generation, DDAE has already learned strongly linear-separable representations within its intermediate layers without auxiliary encoders, thus making diffusion pre-training emerge as a general approach for generative-and-discriminative dual learning. 106 | 107 | - [SODA: Bottleneck Diffusion Models for Representation Learning](https://arxiv.org/abs/2311.17901) 108 | - Interesting Point: What I cannot create, I do not understand. 109 | 110 | - [Self-Improving Diffusion Models with Synthetic Data](https://arxiv.org/abs/2408.16333v1) 111 | - Interesting Point: Self-IMproving diffusion models with Synthetic data (SIMS) is a new training concept for diffusion models that uses self-synthesized data to provide negative guidance during the generation process to steer a model's generative process away from the non-ideal synthetic data manifold and towards the real data distribution. 112 | 113 | - [In-Context LoRA for Diffusion Transformers](https://arxiv.org/abs/2410.23775v3) 114 | - Code: https://github.com/ali-vilab/In-Context-LoRA 115 | 116 | 117 | - [Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model](https://arxiv.org/abs/2212.00490v2) 118 | - Code: https://github.com/wyhuai/ddnm 119 | 120 | - [Scaling Rectified Flow Transformers for High-Resolution Image Synthesis](https://arxiv.org/abs/2403.03206) 121 | 122 | 123 | - [OmniGen: Unified Image Generation](https://arxiv.org/abs/2409.11340v1) 124 | 125 | - [Poisson Flow Generative Models](https://arxiv.org/abs/2209.11178) 126 | - https://github.com/Newbeeer/Poisson_flow 127 | 128 | - [Adversarial Diffusion Distillation](https://arxiv.org/abs/2311.17042) 129 | 130 | - [PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher](https://arxiv.org/abs/2405.14822v1) 131 | 132 | - [Denoising Diffusion Restoration Models](https://arxiv.org/abs/2201.11793) 133 | 134 | - [ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting](https://arxiv.org/abs/2307.12348) 135 | 136 | - [PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis](https://arxiv.org/abs/2310.00426) 137 | - Code: https://github.com/PixArt-alpha/PixArt-alpha 138 | 139 | - [Progressive Compositionality In Text-to-Image Generative Models](https://arxiv.org/abs/2410.16719) 140 | 141 | - [Pyramidal Flow Matching for Efficient Video Generative Modeling](https://arxiv.org/abs/2410.05954v1) 142 | - Code: https://github.com/jy0205/Pyramid-Flow 143 | 144 | - [Improved Distribution Matching Distillation for Fast Image Synthesis](https://arxiv.org/abs/2405.14867) 145 | - Code: https://github.com/tianweiy/DMD2 146 | 147 | - [OminiControl: Minimal and Universal Control for Diffusion Transformer](https://arxiv.org/abs/2411.15098v3) 148 | 149 | - [WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model](https://arxiv.org/abs/2411.17459v2) 150 | 151 | - [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) 152 | - Code: https://github.com/cientgu/VQ-Diffusion 153 | 154 | - [Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget](https://arxiv.org/abs/2407.15811v1) 155 | - Code: https://github.com/SonyResearch/micro_diffusion 156 | 157 | - [Masked Diffusion Transformer is a Strong Image Synthesizer](https://github.com/sail-sg/MDT/tree/main) 158 | - Code: https://github.com/sail-sg/MDT/tree/main 159 | 160 | - [TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training](https://arxiv.org/abs/2501.04765) 161 | - Code: https://github.com/CompVis/tread/tree/master 162 | 163 | - [TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space](https://arxiv.org/abs/2501.12224) 164 | 165 | - [Memory-Driven Text-to-Image Generation](https://arxiv.org/abs/2208.07022) 166 | 167 | 168 | - [DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation](https://arxiv.org/abs/2412.03255) 169 | 170 | 171 | 172 | 173 | #### Interesting Explorations 174 | 175 | 176 | - [Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise](https://arxiv.org/abs/2208.09392) 177 | - Code: https://github.com/arpitbansal297/Cold-Diffusion-Models/tree/main 178 | - Mini Code: https://github.com/arpitbansal297/Cold-Diffusion-Models/blob/main/demixing-diffusion-pytorch/demixing_diffusion_pytorch/demixing_diffusion_pytorch.py 179 | 180 | - [Soft Diffusion: Score Matching for General Corruptions](https://arxiv.org/abs/2209.05442) 181 | 182 | - [Dual Diffusion Implicit Bridges for Image-to-Image Translation](https://arxiv.org/abs/2203.08382) 183 | - Code: https://github.com/suxuann/ddib/ 184 | - Mini Code: https://github.com/suxuann/ddib/blob/main/guided_diffusion/gaussian_diffusion.py 185 | 186 | 187 | - [Generator Matching: Generative modeling with arbitrary Markov processes](https://arxiv.org/abs/2410.20587) 188 | 189 | 190 | - [$\epsilon$-VAE: Denoising as Visual Decoding](https://arxiv.org/abs/2410.04081) 191 | - Interesting Point: In this work, we offer a new perspective by proposing denoising as decoding, shifting from single-step reconstruction to iterative refinement. 192 | 193 | - [Diffusion Models are Evolutionary Algorithms](https://arxiv.org/abs/2410.02543v2) 194 | 195 | - [Semi-Parametric Neural Image Synthesis](https://arxiv.org/abs/2204.11824) 196 | 197 | #### Theoritical Analysis 198 | 199 | - [Generalization in diffusion models arises from geometry-adaptive harmonic representations](https://arxiv.org/abs/2310.02557) 200 | - Code: https://github.com/LabForComputationalVision/memorization_generalization_in_diffusion_models 201 | - Interesting Point: Finally, we show that when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic, the denoising performance of the networks is near-optimal. 202 | 203 | - [Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation](https://arxiv.org/abs/2303.00848) --------------------------------------------------------------------------------