├── .gitignore
├── Representation Learning
    ├── DSs.md
    ├── CLs.md
    └── SSLs.md
├── Language Modeling
    ├── Logic.md
    ├── Archs.md
    ├── Prompt.md
    ├── Scaling.md
    └── Short cut learning.md
├── Reinforcement Learning
    └── SRLs.md
├── Data-Centric Learning
    ├── DPs.md
    └── DDs.md
├── Inference Scaling Law
    └── all.md
├── Generative Modeling
    ├── Archs.md
    ├── VAEs.md
    ├── GANs.md
    ├── Video Generation.md
    ├── Continuous&Discrete.md
    └── DMs&FMs.md
├── README.md
├── Neural Networks
    └── Archs.md
└── Optimization
    └── OPTs.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store


--------------------------------------------------------------------------------
/Representation Learning/DSs.md:
--------------------------------------------------------------------------------
1 | #### Efficacy
2 | 
3 | - [Deep Sets](https://arxiv.org/abs/1703.06114)


--------------------------------------------------------------------------------
/Language Modeling/Logic.md:
--------------------------------------------------------------------------------
1 | 
2 | - [How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis](https://arxiv.org/abs/2411.04105)


--------------------------------------------------------------------------------
/Representation Learning/CLs.md:
--------------------------------------------------------------------------------
1 | #### Efficacy
2 | 
3 | - [Overcoming catastrophic forgetting in neural networks](https://arxiv.org/abs/1612.00796)


--------------------------------------------------------------------------------
/Language Modeling/Archs.md:
--------------------------------------------------------------------------------
1 | 
2 | 
3 | #### Efficacy
4 | 
5 | - [Scaling up Masked Diffusion Models on Text](https://arxiv.org/abs/2410.18514)
6 |     - Code: https://github.com/ML-GSAI/SMDM


--------------------------------------------------------------------------------
/Language Modeling/Prompt.md:
--------------------------------------------------------------------------------
1 | 
2 | 
3 | #### Prompt Compression
4 | - [Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles](https://arxiv.org/abs/2410.14042)
5 | 


--------------------------------------------------------------------------------
/Reinforcement Learning/SRLs.md:
--------------------------------------------------------------------------------
1 | #### Efficacy
2 | 
3 | - [Streaming Deep Reinforcement Learning Finally Works](https://arxiv.org/abs/2410.14606v2)
4 |   - Code: https://github.com/mohmdelsayed/streaming-drl
5 | 


--------------------------------------------------------------------------------
/Data-Centric Learning/DPs.md:
--------------------------------------------------------------------------------
1 | 
2 | #### Theoritical Analysis
3 | 
4 | - [Towards a statistical theory of data selection under weak supervision](https://arxiv.org/abs/2309.14563)
5 |     - Code: https://github.com/granica-ai/DataSelectionICLR2024/tree/main


--------------------------------------------------------------------------------
/Inference Scaling Law/all.md:
--------------------------------------------------------------------------------
1 | #### image
2 | 
3 | - [Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps](https://arxiv.org/abs/2501.09732)
4 | 
5 | 
6 | 
7 | #### text
8 | 
9 | - [s1: Simple test-time scaling](https://arxiv.org/abs/2501.19393)


--------------------------------------------------------------------------------
/Generative Modeling/Archs.md:
--------------------------------------------------------------------------------
1 | 
2 | 
3 | #### Transformer
4 | 
5 | - [Scalable Diffusion Models with Transformers](https://arxiv.org/abs/2212.09748)
6 | 
7 | - [SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers](https://arxiv.org/abs/2401.08740)
8 | 
9 | 


--------------------------------------------------------------------------------
/Language Modeling/Scaling.md:
--------------------------------------------------------------------------------
1 | 
2 | #### Efficacy
3 | - [Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models](https://arxiv.org/abs/2312.06585)
4 | 
5 | #### Training Efficiency
6 | 
7 | - [Data Efficient Neural Scaling Law via Model Reusing](https://openreview.net/forum?id=iXYnIz4RRx)


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | 
2 | 
3 | #### Awesome Repositories
4 | 
5 | There are several implementations of classical deep learning methods that I highly recommend:
6 | 
7 | - [Deep Learning Paper Implementations](https://github.com/labmlai/annotated_deep_learning_paper_implementations)
8 | 
9 | - [Papers-in-100-Lines-of-Code](https://github.com/MaximeVandegar/Papers-in-100-Lines-of-Code)


--------------------------------------------------------------------------------
/Neural Networks/Archs.md:
--------------------------------------------------------------------------------
1 | #### Deep Neural Networks
2 | - [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556)
3 | - [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
4 | - [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946)
5 | - [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)
6 | - [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
7 | 
8 | #### Large Models
9 | - [Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving](https://arxiv.org/abs/2407.00079v3#)


--------------------------------------------------------------------------------
/Data-Centric Learning/DDs.md:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | #### Dataset Distillation
 4 | - [Dataset Distillation by Matching Training Trajectories](https://arxiv.org/abs/2203.11932)
 5 | - [Accelerating Dataset Distillation via Model Augmentation](https://arxiv.org/abs/2212.06152)
 6 | - [Dataset Condensation with Distribution Matching](https://arxiv.org/abs/2110.04181)
 7 | - [Dataset Distillation via the Wasserstein Metric](https://arxiv.org/abs/2311.18531)
 8 | - [Dataset Meta-Learning from Kernel Ridge-Regression](https://arxiv.org/abs/2011.00050)
 9 | - [Dataset Distillation using Neural Feature Regression](https://arxiv.org/abs/2206.00719)
10 | - [Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective](https://arxiv.org/abs/2306.13092)
11 | - [On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm](https://arxiv.org/abs/2312.03526)
12 | 
13 | 


--------------------------------------------------------------------------------
/Optimization/OPTs.md:
--------------------------------------------------------------------------------
 1 | #### Efficacy
 2 | 
 3 | - [Cautious Optimizers: Improving Training with One Line of Code](https://arxiv.org/abs/2411.16085)
 4 |   - Code: https://github.com/kyleliang919/C-Optim
 5 | - [Learning-Rate-Free Learning by D-Adaptation](https://arxiv.org/abs/2301.07733)
 6 |   - Code: https://github.com/facebookresearch/dadaptation
 7 | - [Prodigy: An Expeditiously Adaptive Parameter-Free Learner](https://arxiv.org/abs/2306.06101)
 8 |   - Code: https://github.com/konstmish/prodigy/blob/119ca3ade34584ebaa8afe2da8d623e35fec6b59/prodigyopt/prodigy.py
 9 | - [The Road Less Scheduled](https://arxiv.org/abs/2405.15682)
10 |   - Code: https://github.com/facebookresearch/schedule_free
11 | - [DoG is SGD’s Best Friend: A Parameter-Free Dynamic Step Size Schedule](https://proceedings.mlr.press/v202/ivgi23a)
12 | - [Memory Layers at Scale](https://arxiv.org/abs/2412.09764)
13 |   - Code: https://github.com/facebookresearch/memory


--------------------------------------------------------------------------------
/Generative Modeling/VAEs.md:
--------------------------------------------------------------------------------
 1 | 
 2 | #### Efficacy
 3 | - [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114)
 4 | 
 5 | - [beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework](https://openreview.net/forum?id=Sy2fzU9gl)
 6 | 
 7 | - [SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization](https://proceedings.mlr.press/v162/takida22a/takida22a.pdf)
 8 | 
 9 | - [Wasserstein Auto-Encoders](https://arxiv.org/abs/1711.01558)
10 | 
11 | - [Vector Quantized Wasserstein Auto-Encoder](https://arxiv.org/abs/2302.05917)
12 | 
13 | #### Efficient Sampling
14 | 
15 | - [Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models](https://arxiv.org/abs/2410.10733)
16 |   - Code: https://github.com/mit-han-lab/efficientvit/tree/master/applications/dc_ae
17 | 
18 | - [FLUX](https://github.com/black-forest-labs/flux)
19 | 
20 | - [Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack](https://arxiv.org/abs/2309.15807)


--------------------------------------------------------------------------------
/Generative Modeling/GANs.md:
--------------------------------------------------------------------------------
 1 | #### Awesome Repositories
 2 | 
 3 | - [StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs)](https://github.com/POSTECH-CVLab/PyTorch-StudioGAN)
 4 | 
 5 | #### Efficacy
 6 | 
 7 | - [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434)
 8 | 
 9 | - [Wasserstein GAN](https://arxiv.org/abs/1701.07875)
10 |   - Code: https://github.com/anibali/wgan-cifar10
11 | 
12 | - [Improved Training of Wasserstein GANs](https://arxiv.org/abs/1704.00028)
13 |   - Code: https://github.com/Zeleni9/pytorch-wgan
14 | 
15 | - [Large Scale Adversarial Representation Learning](https://arxiv.org/abs/1907.02544)
16 |   - Code: https://github.com/LEGO999/BigBiGAN-TensorFlow2.0
17 |   - Mini Code: https://github.com/rkorzeniowski/bigbigan-pytorch
18 | 
19 | - [Training Generative Adversarial Networks with Limited Data](https://arxiv.org/abs/2006.06676v2)
20 |   - Code: https://github.com/NVlabs/stylegan2-ada-pytorch
21 |   - Mini Code: https://github.com/eps696/stylegan2ada
22 | 
23 | - [StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets](https://arxiv.org/abs/2202.00273)
24 |   - Code: https://github.com/autonomousvision/stylegan-xl


--------------------------------------------------------------------------------
/Representation Learning/SSLs.md:
--------------------------------------------------------------------------------
 1 | 
 2 | #### Awesome Repositories
 3 | - [LightlySSL](https://github.com/lightly-ai/lightly)
 4 | - [solo-learn](https://github.com/vturrisi/solo-learn)
 5 | 
 6 | #### Efficacy
 7 | 
 8 | - [Representation Learning with Contrastive Predictive Coding](https://arxiv.org/abs/1807.03748)
 9 | 
10 | - [A Simple Framework for Contrastive Learning of Visual Representations](https://arxiv.org/abs/2002.05709)
11 | 
12 | - [Barlow Twins: Self-Supervised Learning via Redundancy Reduction](https://arxiv.org/abs/2103.03230)
13 | 
14 | - [Bootstrap your own latent: A new approach to self-supervised Learning](https://arxiv.org/abs/2006.07733)
15 | 
16 | - [Momentum Contrast for Unsupervised Visual Representation Learning](https://arxiv.org/abs/1911.05722)
17 | 
18 | - [Emerging Properties in Self-Supervised Vision Transformers](https://arxiv.org/abs/2104.14294)
19 | 
20 | - [Contrastive Multiview Coding](https://arxiv.org/abs/1906.05849)
21 | 
22 | - [Exploring Simple Siamese Representation Learning](https://arxiv.org/abs/2011.10566)
23 | 
24 | - [Unsupervised Learning of Visual Features by Contrasting Cluster Assignments](https://arxiv.org/abs/2006.09882)
25 | 
26 | - [VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning](https://arxiv.org/abs/2105.04906)
27 | 
28 | - [DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment](https://www.arxiv.org/abs/2412.16334)


--------------------------------------------------------------------------------
/Language Modeling/Short cut learning.md:
--------------------------------------------------------------------------------
 1 | # Analytical tools & Interpretability
 2 | - [On the Biology of a Large Language Model](https://www.jiqizhixin.com/articles/2025-03-28-10)
 3 | - [Efficiently Serving LLM Reasoning Programs with Certaindex](https://arxiv.org/abs/2412.20993)
 4 | - [Finding Transformer Circuits with Edge Pruning](https://arxiv.org/abs/2406.16778)
 5 | 
 6 | 
 7 | # LLMs Spurious Correlation & Solutions 
 8 | - [How Likely Do LLMs with CoT Mimic Human Reasoning?](https://arxiv.org/pdf/2402.16048)
 9 | - [DIRECT PREFERENCE OPTIMIZATION USING SPARSE FEATURE-LEVEL CONSTRAINTS](https://arxiv.org/pdf/2411.07618?)
10 | - [LOCKING DOWN THE FINETUNED LLMS SAFETY](https://arxiv.org/pdf/2410.10343)
11 | - [Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future](https://aclanthology.org/2023.emnlp-main.276.pdf)
12 | - [Semformer: Transformer Language Models with Semantic Planning](https://arxiv.org/pdf/2409.11143)
13 | - [GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-Distribution Generalization Perspective](https://arxiv.org/pdf/2211.08073)
14 | - [SUPERVISED KNOWLEDGE MAKES LARGE LANGUAGE MODELS BETTER IN-CONTEXT LEARNERS](https://arxiv.org/pdf/2312.15918)
15 | 
16 | 
17 | # Reward Hacking 
18 | - [详解 Reward Hacking](https://zhuanlan.zhihu.com/p/21488802591)
19 | - [理解RLHF中的Reward Hacking](https://zhuanlan.zhihu.com/p/6082362466)
20 | - [Reward Hacking in Reinforcement Learning](https://lilianweng.github.io/posts/2024-11-28-reward-hacking/#:~:text=Reward%20hacking%20occurs%20when%20a%20reinforcement%20learning%20%28RL%29,without%20genuinely%20learning%20or%20completing%20the%20intended%20task.)
21 | - [Defining and Characterizing Reward Hacking](https://arxiv.org/pdf/2209.13085)
22 | - [Reward Shaping to Mitigate Reward Hacking in RLHF](https://arxiv.org/pdf/2502.18770)
23 | - [Adversarial Training of Reward Models](https://arxiv.org/pdf/2504.06141)
24 | 
25 | 


--------------------------------------------------------------------------------
/Generative Modeling/Video Generation.md:
--------------------------------------------------------------------------------
  1 | # Video Generation
  2 | 
  3 | - [Video Generation](#video-generation)
  4 |   - [Survey](#survey)
  5 |   - [Awesome repo](#awesome-repo)
  6 |   - [Video Generation Evaluation](#video-generation-evaluation)
  7 |   - [VBench Leaderboard](#vbench-leaderboard)
  8 | 
  9 | ## Survey
 10 | [A Survey on Video Diffusion Models](https://arxiv.org/abs/2310.10647)
 11 | 
 12 | [A Survey on Long Video Generation: Challenges, Methods, and Prospects]( https://arxiv.org/abs/2403.16407)
 13 | 
 14 | 
 15 | [A Survey of AI-Generated Video Evaluation](https://arxiv.org/abs/2410.19884)
 16 | 
 17 | [A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights](https://arxiv.org/abs/2407.08428)
 18 | 
 19 | [From Sora What We Can See: A Survey of Text-to-Video Generation](https://arxiv.org/abs/2405.10674)
 20 | 
 21 | ## Awesome repo
 22 | [Video Generation](https://github.com/wangkai930418/awesome-diffusion-categorized#video-generation)
 23 | 
 24 | [Video-Editing](https://github.com/wangkai930418/awesome-diffusion-categorized?tab=readme-ov-file#video-editing)
 25 | 
 26 | [Video Generation Survey](https://github.com/yzhang2016/video-generation-survey/blob/main/video-generation.md#video-generation-survey)
 27 | 
 28 | 
 29 | ## Video Generation Evaluation
 30 | 
 31 | [A Survey of AI-Generated Video Evaluation](https://arxiv.org/abs/2410.19884)
 32 | 
 33 |  [VBench: Comprehensive Benchmark Suite for Video Generative Models](https://arxiv.org/abs/2311.17982)
 34 | - code: https://github.com/vchitect/vbench
 35 | 
 36 | [VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models](https://arxiv.org/abs/2411.13503)
 37 | - code: https://github.com/vchitect/vbench
 38 | 
 39 | 
 40 | [AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM](https://arxiv.org/abs/2411.17221)
 41 |  - code: https://github.com/wangjiarui153/AIGV-Assessor
 42 | 
 43 | [Evaluation of Text-to-Video Generation Models: A Dynamics Perspective](https://arxiv.org/abs/2407.01094)
 44 | - code: https://github.com/mingxiangl/devil
 45 | 
 46 | [Comprehensive Subjective and Objective Evaluation Method for Text-generated Video](https://arxiv.org/abs/2501.08545)
 47 | 
 48 | [MSG score: A Comprehensive Evaluation for Multi-Scene Video Generation](https://arxiv.org/abs/2411.19121)
 49 | 
 50 | ## VBench Leaderboard
 51 | 
 52 | Below are some papers listed in VBench Leaderboard
 53 |  
 54 | [Goku: Flow Based Video Generative Foundation Models](https://arxiv.org/abs/2502.04896)
 55 | - code: https://github.com/Saiyan-World/goku
 56 | 
 57 | [AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning](https://arxiv.org/abs/2307.04725)
 58 | - code: https://github.com/guoyww/AnimateDiff
 59 | 
 60 | [VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models](https://arxiv.org/abs/2401.09047)
 61 | - code: https://github.com/AILab-CVC/VideoCrafter
 62 | 
 63 | [OpenSora](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v3)
 64 | - code: https://github.com/hpcaitech/Open-Sora
 65 | 
 66 | [Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation](https://arxiv.org/abs/2309.15818)
 67 | - code: https://github.com/showlab/Show-1
 68 | 
 69 | [CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer](https://arxiv.org/abs/2408.06072)
 70 | - code: https://github.com/THUDM/CogVideo
 71 | 
 72 | [MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions](https://arxiv.org/abs/2407.06358v1)
 73 | - code:https://github.com/mira-space/Mira
 74 | 
 75 | [From Slow Bidirectional to Fast Autoregressive Video Diffusion Models](https://arxiv.org/abs/2412.07772)
 76 | 
 77 | 
 78 | [HunyuanVideo: A Systematic Framework For Large Video Generative Models](https://arxiv.org/abs/2412.03603)
 79 | - code: https://github.com/Tencent/HunyuanVideo
 80 | 
 81 | [Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models](https://arxiv.org/abs/2501.08453)
 82 | - code:https://github.com/Vchitect/Vchitect-2.0
 83 | 
 84 | 
 85 | [RepVideo: Rethinking Cross-Layer Representation for Video Generation](https://arxiv.org/abs/2501.08994)
 86 | - code: https://github.com/Vchitect/RepVideo
 87 | 
 88 | 
 89 | [LTX-Video: Realtime Video Latent Diffusion](https://arxiv.org/abs/2501.00103)
 90 | - code: https://github.com/Lightricks/LTX-Video
 91 | 
 92 | [AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data](https://arxiv.org/abs/2402.00769)
 93 | - code: https://github.com/G-U-N/AnimateLCM
 94 | 
 95 | [LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models](https://arxiv.org/abs/2309.15103)
 96 | - code: https://github.com/Vchitect/LaVie 
 97 | 
 98 | [Latte: Latent Diffusion Transformer for Video Generation](https://arxiv.org/abs/2401.03048)
 99 | - code: https://github.com/Vchitect/Latte
100 | 
101 | [I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models](https://arxiv.org/abs/2311.04145)
102 | - code: https://github.com/ali-vilab/VGen?tab=readme-ov-file#5-run-the-tf-t2v-cvpr-2024-model


--------------------------------------------------------------------------------
/Generative Modeling/Continuous&Discrete.md:
--------------------------------------------------------------------------------
  1 | # Continuous & Discrete Diffusion Models on Vision & Language Generation  
  2 | 
  3 | - [Continuous \& Discrete Diffusion Models on Vision \& Language Generation](#continuous--discrete-diffusion-models-on-vision--language-generation)
  4 |   - [Discrete](#discrete)
  5 |     - [Vision](#vision)
  6 |     - [Language](#language)
  7 |     - [Guidance](#guidance)
  8 |   - [Continuous](#continuous)
  9 |     - [Language](#language-1)
 10 | 
 11 | ## Discrete
 12 | 
 13 | ### Vision
 14 | 
 15 | - [NeurIPS 2021] [Structured Denoising Diffusion Models in Discrete State-Spaces](https://arxiv.org/pdf/2107.03006) 
 16 |   > D3PM, Discrete Denoising Diffusion Probabilistic Models, a framework that extends diffusion models to discrete domains. The authors evaluate D3PMs on both text and image generation tasks.
 17 | 
 18 | - [CVPR 2022] [MaskGIT: Masked Generative Image Transformer](https://arxiv.org/pdf/2202.04200)
 19 |   > MaskGIT first compresses the input image into a grid of discrete tokens using a convolutional encoder and vector quantization. During training, random tokens are masked, and the model learns to predict them based on the visible tokens. MaskGIT enables various applications including class-conditional image generation, image manipulation, and image extrapolation.
 20 | 
 21 | - [CVPR 2022] [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/pdf/2111.14822)
 22 |   > An encoder maps input images to a spatial collection of image tokens. A decoder reconstructs the images from these discrete tokens. Doing masked diffusion on the image tokens.
 23 | 
 24 | - [arXiv 2024] [[MASK] is All You Need](https://arxiv.org/pdf/2412.06787)
 25 |   > It builds upon Discrete Flow Matching theory to establish connections between these seemingly different approaches. By treating the masking process in Masked Generative Models as a form of discrete diffusion, this paper create a bridge that allows techniques from both paradigms to be combined and compared systematically.
 26 | 
 27 | - [NeurIPS 2024] [Simplified and Generalized Masked Diffusion for Discrete Data](https://arxiv.org/pdf/2406.04329)
 28 |   > A continuous-time framework that clarifies the underlying mechanics of masked diffusion and provides a remarkably simple training objective. The approach not only streamlines the implementation of these models but also improves their performance across text and image generation tasks. By allowing for state-dependent masking schedules, it creates a more flexible diffusion process that outperforms previous discrete diffusion approaches.
 29 | 
 30 | - [arXiv 2025] [Di[M]O: Distilling Masked Diffusion Models into One-step Generator](https://arxiv.org/pdf/2503.15457)
 31 |   > Solve the problem of slow inference speed of MDM, by model distillation, transfer the generation ability of the teacher model (multi-step MDM) to a student model (single-step generator).
 32 | 
 33 | ### Language
 34 | 
 35 | - [ICML 2024] [Discrete diffusion modeling by estimating the ratios of the data distribution](https://arxiv.org/pdf/2310.16834)
 36 | 
 37 | - [NeurIPS 2024] [Simple and Effective Masked Diffusion Language Models](https://arxiv.org/pdf/2406.07524)
 38 | 
 39 | - [NeurIPS 2021] [Structured Denoising Diffusion Models in Discrete State-Spaces](https://arxiv.org/pdf/2107.03006) 
 40 | 
 41 | - [ICLR 2025] [Beyond Autoregression: Fast LLMs via Self-Distillation Through Time](https://arxiv.org/pdf/2410.21035)
 42 | 
 43 | - [ICLR 2025] [Scaling Diffusion Language Models via Adaptation from Autoregressive Models](https://arxiv.org/pdf/2410.17891) 
 44 | 
 45 | - [ICLR 2025] [Scaling up Masked Diffusion Models on Text](https://arxiv.org/pdf/2410.18514)
 46 | 
 47 | - [arXiv 2025] [Large language diffusion models](https://arxiv.org/pdf/2502.09992)
 48 | 
 49 | - [ICLR 2025 Oral] [Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models](https://arxiv.org/pdf/2503.09573)
 50 | 
 51 | - [COLM 2024] [A Reparameterized Discrete Diffusion Model for Text Generation](https://arxiv.org/pdf/2302.05737)
 52 | 
 53 | - [ICLR 2025] [Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data](https://arxiv.org/pdf/2406.03736)
 54 | 
 55 | - [NIPS 2024] [Simplified and Generalized Masked Diffusion for Discrete Data](https://arxiv.org/pdf/2406.04329)
 56 | 
 57 | - [NAACL 2025] [Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion](https://arxiv.org/pdf/2408.05636)
 58 | 
 59 | - [ICLR 2023] [Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning](https://arxiv.org/pdf/2208.04202)
 60 | 
 61 | - [arXiv 2025] [Theoretical Benefit and Limitation of Diffusion Language Model](https://arxiv.org/pdf/2502.09622)
 62 | 
 63 | - [arXiv 2025] [Unifying Autoregressive and Diffusion-Based Sequence Generation](https://arxiv.org/pdf/2504.06416)
 64 | 
 65 | ### Guidance
 66 | 
 67 | - [arXiv 2024] [Derivative-free guidance in continuous and discrete diffusion models with soft value-based decoding](https://arxiv.org/pdf/2408.08252)
 68 |   > Images, Molecules, DNA & RNA
 69 | 
 70 | - [ICLR 2025] [Unlocking guidance for discrete state-space diffusion and flow models](https://arxiv.org/pdf/2406.01572)
 71 |   > DNA
 72 | 
 73 | - [ICLR 2025] [Steering masked discrete diffusion models via discrete denoising posterior prediction](https://arxiv.org/pdf/2410.08134)
 74 |   > Images, Text, Molecules
 75 | 
 76 | - [ICLR 2025] [Simple Guidance Mechanisms for Discrete Diffusion Models](https://arxiv.org/pdf/2412.10193)
 77 |   > Text
 78 | 
 79 | - [arXiv 2025] [A general framework for inference-time scaling and steering of diffusion models](https://arxiv.org/abs/2501.06848)
 80 |   > Images, Text
 81 | 
 82 | 
 83 | 
 84 | ## Continuous
 85 | 
 86 | ### Language
 87 | 
 88 | - [ICLR 2023] [DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models](https://arxiv.org/pdf/2210.08933)
 89 | 
 90 | - [EMNLP 2023] [DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models](https://arxiv.org/pdf/2310.05793)
 91 | 
 92 | - [NeurIPS 2023] [PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model](https://arxiv.org/pdf/2306.02531)
 93 | 
 94 | - [NeurIPS 2023] [AR-DIFFUSION: Auto-Regressive Diffusion Model for Text Generation](https://arxiv.org/pdf/2305.09515)
 95 | 
 96 | - [ICML 2024] [Discrete diffusion modeling by estimating the ratios of the data distribution](https://arxiv.org/pdf/2310.16834)
 97 | 
 98 | - [ICLR 2025 Workshop] [The Diffusion Duality](https://openreview.net/pdf?id=CB0Ub2yXjC)
 99 | ****
100 | - [arXiv 2025] [Continuous Diffusion Model for Language Modeling](https://arxiv.org/pdf/2502.11564)
101 | 


--------------------------------------------------------------------------------
/Generative Modeling/DMs&FMs.md:
--------------------------------------------------------------------------------
  1 | #### Awesome Repositories
  2 | - [Diffusion Models and Representation Learning: A Survey](https://github.com/dongzhuoyao/Diffusion-Representation-Learning-Survey-Taxonomy)
  3 | - [Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices](https://arxiv.org/abs/2410.11795)
  4 | 
  5 | #### Efficacy
  6 | 
  7 | - [Glow: Generative Flow with Invertible 1x1 Convolutions](https://arxiv.org/abs/1807.03039)
  8 | 
  9 | - [Flow Matching for Generative Modeling](https://arxiv.org/abs/2210.02747v2)
 10 |   - Mini Code: https://github.com/gle-bellier/flow-matching/tree/main
 11 |   - Mini Code: https://gist.github.com/francois-rozet/fd6a820e052157f8ac6e2aa39e16c1aa
 12 | 
 13 | - [Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow](https://arxiv.org/abs/2209.03003)
 14 |   - https://github.com/MaximeVandegar/Papers-in-100-Lines-of-Code/tree/main/Flow_Straight_and_Fast_Learning_to_Generate_and_Transfer_Data_with_Rectified_Flow
 15 | 
 16 | - [Score-Based Generative Modeling through Stochastic Differential Equations](https://arxiv.org/abs/2011.13456)
 17 |   - Code: https://github.com/yang-song/score_sde
 18 | 
 19 | - [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239)
 20 |   - Code: https://github.com/hojonathanho/diffusion
 21 |   - Mini Code: https://github.com/tqch/ddpm-torch
 22 |   - Mini Code: https://github.com/w86763777/pytorch-ddpm
 23 | 
 24 | - [Generative Modeling by Estimating Gradients of the Data Distribution](https://arxiv.org/abs/1907.05600)
 25 |   - Code: https://github.com/ermongroup/ncsn
 26 | 
 27 | - [Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364)
 28 |   - Code: https://github.com/nvlabs/edm
 29 |   - Mini Code: https://github.com/yuanzhi-zhu/mini_edm/tree/main
 30 | 
 31 | - [Analyzing and Improving the Training Dynamics of Diffusion Models](https://arxiv.org/abs/2312.02696)
 32 |   - Code: https://github.com/nvlabs/edm2
 33 |   - Mini Code: https://github.com/mmathew23/improved_edm
 34 |   - Mini Code: https://github.com/FutureXiang/edm2
 35 |   - Mini Code: https://github.com/YichengDWu/tinyedm
 36 | 
 37 | 
 38 | #### Sampling Efficiency
 39 | 
 40 | - [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502)
 41 |   - Code: https://github.com/ermongroup/ddim
 42 |   - Mini Code: https://github.com/Alokia/diffusion-DDIM-pytorch
 43 | 
 44 | - [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://arxiv.org/abs/2206.00927)
 45 | 
 46 | - [Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models](https://arxiv.org/abs/2201.06503)
 47 | 
 48 | - [Consistency Models](https://arxiv.org/abs/2303.01469)
 49 |   - Code: https://github.com/openai/consistency_models
 50 |   - Mini Code: https://github.com/openai/consistency_models_cifar10
 51 |   - Mini Code: https://github.com/Kinyugo/consistency_models
 52 |   - Mini Code: https://github.com/junhsss/consistency-models
 53 | 
 54 | - [Improved Techniques for Training Consistency Models](https://arxiv.org/abs/2310.14189)
 55 |   - Code: https://github.com/Kinyugo/consistency_models
 56 | 
 57 | - [Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models](https://arxiv.org/abs/2410.11081)
 58 | 
 59 | - [Consistency Models Made Easy](https://arxiv.org/abs/2406.14548)
 60 |   - Code: https://github.com/locuslab/ect
 61 | 
 62 | - [Stable Consistency Tuning: Understanding and Improving Consistency Models](https://arxiv.org/abs/2410.18958)
 63 |   - Code: https://github.com/G-U-N/Stable-Consistency-Tuning
 64 | 
 65 | - [Scaling Rectified Flow Transformers for High-Resolution Image Synthesis](https://arxiv.org/abs/2403.03206)
 66 | 
 67 | - [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)
 68 |   - Code: https://github.com/CompVis/latent-diffusion
 69 | 
 70 | - [Phased Consistency Model](https://arxiv.org/abs/2405.18407)
 71 |   - Code: https://github.com/G-U-N/Phased-Consistency-Model
 72 | 
 73 | - [Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step](https://arxiv.org/abs/2410.14919v4)
 74 |   - Code: https://github.com/mingyuanzhou/SiD/tree/sida
 75 | 
 76 | #### Training Efficiency
 77 | 
 78 | 
 79 | - [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)
 80 | 
 81 | 
 82 | 
 83 | - [Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think](https://arxiv.org/abs/2410.06940)
 84 |   - Code: https://github.com/sihyun-yu/REPA
 85 | 
 86 | 
 87 | - [Return of Unconditional Generation: A Self-supervised Representation Generation Method](https://arxiv.org/abs/2312.03701)
 88 |   - Code: https://github.com/LTH14/rcg
 89 | 
 90 | - [Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment](https://arxiv.org/abs/2406.12303)
 91 |   - Code: https://github.com/yhli123/immiscible-diffusion
 92 | 
 93 | 
 94 | - [FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification](https://arxiv.org/abs/2410.10356)
 95 | 
 96 | 
 97 | #### Interesting Applications
 98 | 
 99 | - [Diffusion Model as Representation Learner](https://arxiv.org/abs/2308.10916)
100 |   - Code: https://github.com/Adamdad/Repfusion
101 |   - Interesting Point: Our study begins by examining the feature space of DPMs, revealing that DPMs are inherently denoising autoencoders that balance the representation learning with regularizing model capacity.
102 | 
103 | - [Denoising Diffusion Autoencoders are Unified Self-supervised Learners](https://arxiv.org/abs/2303.09769)
104 |   - Code: https://github.com/FutureXiang/ddae
105 |   - Interesting Point: This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners: by pre-training on unconditional image generation, DDAE has already learned strongly linear-separable representations within its intermediate layers without auxiliary encoders, thus making diffusion pre-training emerge as a general approach for generative-and-discriminative dual learning.
106 | 
107 | - [SODA: Bottleneck Diffusion Models for Representation Learning](https://arxiv.org/abs/2311.17901)
108 |   - Interesting Point: What I cannot create, I do not understand.
109 | 
110 | - [Self-Improving Diffusion Models with Synthetic Data](https://arxiv.org/abs/2408.16333v1)
111 |   - Interesting Point: Self-IMproving diffusion models with Synthetic data (SIMS) is a new training concept for diffusion models that uses self-synthesized data to provide negative guidance during the generation process to steer a model's generative process away from the non-ideal synthetic data manifold and towards the real data distribution.
112 | 
113 | - [In-Context LoRA for Diffusion Transformers](https://arxiv.org/abs/2410.23775v3)
114 |   - Code: https://github.com/ali-vilab/In-Context-LoRA
115 | 
116 | 
117 | - [Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model](https://arxiv.org/abs/2212.00490v2)
118 |   - Code: https://github.com/wyhuai/ddnm
119 | 
120 | - [Scaling Rectified Flow Transformers for High-Resolution Image Synthesis](https://arxiv.org/abs/2403.03206)
121 | 
122 | 
123 | - [OmniGen: Unified Image Generation](https://arxiv.org/abs/2409.11340v1)
124 | 
125 | - [Poisson Flow Generative Models](https://arxiv.org/abs/2209.11178)
126 |   - https://github.com/Newbeeer/Poisson_flow
127 | 
128 | - [Adversarial Diffusion Distillation](https://arxiv.org/abs/2311.17042)
129 | 
130 | - [PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher](https://arxiv.org/abs/2405.14822v1)
131 | 
132 | - [Denoising Diffusion Restoration Models](https://arxiv.org/abs/2201.11793)
133 | 
134 | - [ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting](https://arxiv.org/abs/2307.12348)
135 | 
136 | - [PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis](https://arxiv.org/abs/2310.00426)
137 |   - Code: https://github.com/PixArt-alpha/PixArt-alpha
138 | 
139 | - [Progressive Compositionality In Text-to-Image Generative Models](https://arxiv.org/abs/2410.16719)
140 | 
141 | - [Pyramidal Flow Matching for Efficient Video Generative Modeling](https://arxiv.org/abs/2410.05954v1)
142 |   - Code: https://github.com/jy0205/Pyramid-Flow
143 | 
144 | - [Improved Distribution Matching Distillation for Fast Image Synthesis](https://arxiv.org/abs/2405.14867)
145 |   - Code: https://github.com/tianweiy/DMD2
146 | 
147 | - [OminiControl: Minimal and Universal Control for Diffusion Transformer](https://arxiv.org/abs/2411.15098v3)
148 | 
149 | - [WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model](https://arxiv.org/abs/2411.17459v2)
150 | 
151 | - [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822)
152 |   - Code: https://github.com/cientgu/VQ-Diffusion
153 | 
154 | - [Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget](https://arxiv.org/abs/2407.15811v1)
155 |   - Code: https://github.com/SonyResearch/micro_diffusion
156 | 
157 | - [Masked Diffusion Transformer is a Strong Image Synthesizer](https://github.com/sail-sg/MDT/tree/main)
158 |   - Code: https://github.com/sail-sg/MDT/tree/main
159 | 
160 | - [TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training](https://arxiv.org/abs/2501.04765)
161 |   - Code: https://github.com/CompVis/tread/tree/master
162 | 
163 | - [TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space](https://arxiv.org/abs/2501.12224)
164 | 
165 | - [Memory-Driven Text-to-Image Generation](https://arxiv.org/abs/2208.07022)
166 | 
167 | 
168 | - [DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation](https://arxiv.org/abs/2412.03255)
169 | 
170 | 
171 | 
172 | 
173 | #### Interesting Explorations
174 | 
175 | 
176 | - [Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise](https://arxiv.org/abs/2208.09392)
177 |   - Code: https://github.com/arpitbansal297/Cold-Diffusion-Models/tree/main
178 |   - Mini Code: https://github.com/arpitbansal297/Cold-Diffusion-Models/blob/main/demixing-diffusion-pytorch/demixing_diffusion_pytorch/demixing_diffusion_pytorch.py
179 | 
180 | - [Soft Diffusion: Score Matching for General Corruptions](https://arxiv.org/abs/2209.05442)
181 | 
182 | - [Dual Diffusion Implicit Bridges for Image-to-Image Translation](https://arxiv.org/abs/2203.08382)
183 |   - Code: https://github.com/suxuann/ddib/
184 |   - Mini Code: https://github.com/suxuann/ddib/blob/main/guided_diffusion/gaussian_diffusion.py
185 | 
186 | 
187 | - [Generator Matching: Generative modeling with arbitrary Markov processes](https://arxiv.org/abs/2410.20587)
188 | 
189 | 
190 | - [$\epsilon$-VAE: Denoising as Visual Decoding](https://arxiv.org/abs/2410.04081)
191 |   - Interesting Point: In this work, we offer a new perspective by proposing denoising as decoding, shifting from single-step reconstruction to iterative refinement.
192 | 
193 | - [Diffusion Models are Evolutionary Algorithms](https://arxiv.org/abs/2410.02543v2)
194 | 
195 | - [Semi-Parametric Neural Image Synthesis](https://arxiv.org/abs/2204.11824)
196 | 
197 | #### Theoritical Analysis
198 | 
199 | - [Generalization in diffusion models arises from geometry-adaptive harmonic representations](https://arxiv.org/abs/2310.02557)
200 |   - Code: https://github.com/LabForComputationalVision/memorization_generalization_in_diffusion_models
201 |   - Interesting Point: Finally, we show that when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic, the denoising performance of the networks is near-optimal.
202 | 
203 | - [Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation](https://arxiv.org/abs/2303.00848)


--------------------------------------------------------------------------------