├── png_figs
├── fig1.png
├── fig2.png
├── fig3.png
├── fig4.png
├── fig5.png
├── table2.png
├── table3.png
├── table4.png
├── table5.png
├── table6.png
├── table7.png
├── formula1.png
├── formula2.png
├── formula3.png
├── formula4.png
├── formula5.png
├── fig1_with_caption.png
├── fig2_with_caption.png
├── fig3_with_caption.png
├── fig4_with_caption.png
├── table2_with_caption.png
├── table3_with_caption.png
├── table4_with_caption.png
├── table5_with_caption.png
├── table6_with_caption.png
└── table7_with_caption.png
├── resource
└── Attention_Survey.pdf
└── README.md
/png_figs/fig1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig1.png
--------------------------------------------------------------------------------
/png_figs/fig2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig2.png
--------------------------------------------------------------------------------
/png_figs/fig3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig3.png
--------------------------------------------------------------------------------
/png_figs/fig4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig4.png
--------------------------------------------------------------------------------
/png_figs/fig5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig5.png
--------------------------------------------------------------------------------
/png_figs/table2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table2.png
--------------------------------------------------------------------------------
/png_figs/table3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table3.png
--------------------------------------------------------------------------------
/png_figs/table4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table4.png
--------------------------------------------------------------------------------
/png_figs/table5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table5.png
--------------------------------------------------------------------------------
/png_figs/table6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table6.png
--------------------------------------------------------------------------------
/png_figs/table7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table7.png
--------------------------------------------------------------------------------
/png_figs/formula1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/formula1.png
--------------------------------------------------------------------------------
/png_figs/formula2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/formula2.png
--------------------------------------------------------------------------------
/png_figs/formula3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/formula3.png
--------------------------------------------------------------------------------
/png_figs/formula4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/formula4.png
--------------------------------------------------------------------------------
/png_figs/formula5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/formula5.png
--------------------------------------------------------------------------------
/resource/Attention_Survey.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/resource/Attention_Survey.pdf
--------------------------------------------------------------------------------
/png_figs/fig1_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig1_with_caption.png
--------------------------------------------------------------------------------
/png_figs/fig2_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig2_with_caption.png
--------------------------------------------------------------------------------
/png_figs/fig3_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig3_with_caption.png
--------------------------------------------------------------------------------
/png_figs/fig4_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig4_with_caption.png
--------------------------------------------------------------------------------
/png_figs/table2_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table2_with_caption.png
--------------------------------------------------------------------------------
/png_figs/table3_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table3_with_caption.png
--------------------------------------------------------------------------------
/png_figs/table4_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table4_with_caption.png
--------------------------------------------------------------------------------
/png_figs/table5_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table5_with_caption.png
--------------------------------------------------------------------------------
/png_figs/table6_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table6_with_caption.png
--------------------------------------------------------------------------------
/png_figs/table7_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table7_with_caption.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # A Survey of Efficient Attention Methods
2 |
3 | **Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention**
4 | **PDF**: https://attention-survey.github.io/files/Attention_Survey.pdf
5 | Paper webpage: https://attention-survey.github.io
6 |
7 | 
8 |
9 | This paper provides a comprehensive survey of **Efficient Attention Methods**, categorizing them into four classes.
10 |
11 | -----
12 |
13 | ## Updates
14 |
15 | - **[2025/8/19]** 🎉 Our survey paper is now publicly available on [GitHub](https://attention-survey.github.io/files/Attention_Survey.pdf)! If you do find our resources helpful, please [cite our paper](#citation).
16 |
17 | -----
18 |
19 | ## Class 1: Hardware-efficient Attention
20 |
21 | 💡 **Core Idea**: Accelerate attention by leveraging hardware characteristics.
22 |
23 | 📝 **Overall Formulations**:
24 |
25 | Hardware-efficient attention for the prefilling stage can be formulated as:
26 |
27 |
28 | Where $\Psi(\cdot), \Theta(\cdot)$ are preprocess functions to accelerate computation, e.g., quantization functions in SageAttention.
29 |
30 | Hardware-efficient attention for the decoding stage can be formulated as:
31 |
32 |
33 | Where $\Psi(\cdot), \Theta(\cdot)$ are KV cache preprocess functions.
34 |
35 | ---
36 |
37 | An example is FlashAttention, which tiles $Q, K, V$ to progressively compute the attention output $O$. Such a strategy avoids the I/O of $S, P$ matrices in the shape of $N \times N$.
38 |
39 |
40 |
41 | ---
42 |
43 | The Table below summarizes various hardware-efficient attention methods. 👇
44 |
45 | 
46 |
47 |
48 | -----
49 |
50 | ## Class2: Compact Attention
51 |
52 | 💡 **Core Idea**: Compressing the KV cache of attention by weight sharing or low rank decomposition while keeping computational cost unchanged, as with a full-sized KV cache.
53 |
54 | 📝 **Overall Formulations**:
55 |
56 |
57 |
58 | ---
59 |
60 | The Table Below is a summarization of various compact attention approaches. 👇
61 |
62 | 
63 |
64 |
65 | -----
66 |
67 | ## Class3: Sparse Attention
68 |
69 | 💡 **Core Idea**: Selectively performing a subset of computations in attention while omitting others.
70 |
71 | 📝 **Overall Formulations**:
72 |
73 |
74 |
75 | ---
76 |
77 | The Table below summarizes various sparse attention methods. 👇
78 |
79 | 
80 |
81 | -----
82 |
83 | ## Class4: Linear Attention
84 |
85 | 💡 **Core Idea**: Redesigning the computational formulation of attention to achieve \(\mathcal{O}(N)\) time complexity.
86 |
87 | 📝 **Overall Formulations**:
88 |
89 |
90 |
91 | ---
92 | ### Computational Forms
93 |
94 | Linear Attention can be implemented in three forms: **parallel**, **recurrent**, and **chunkwise**.
95 |
96 | 
97 |
98 | ---
99 |
100 | ### Gating Mechanisms
101 |
102 | Many linear attention methods incorporate **forget gates** and **select gates**.
103 |
104 |
105 |
106 | Based on the presence of these gates, we can classify linear attention methods as follows:
107 |
108 | 1. **Naive Linear Attention (No Gates)**
109 |
110 | 📝 The Table below summarizes naive attention methods. 👇
111 |
112 | 
113 |
114 |
115 | 2. **Linear Attention with a Forget Gate**
116 |
117 | 📝 This Table compares methods that use a forget gate. 👇
118 |
119 | 
120 |
121 |
122 | 3. **Linear Attention with Forget and Select Gates**
123 |
124 | 📝 This Table compares methods that utilize both the forget gate and the select gate. 👇
125 |
126 | 
127 |
128 |
129 | ### A Special Case: Test-Time Training (TTT)
130 |
131 | A unique approach, **Test-Time Training (TTT)**, treats the hidden states of linear attention as learnable parameters.
132 |
133 |
134 |
135 | -----
136 |
137 | ## Citation
138 |
139 | If you find our work helpful, please cite our paper:
140 |
141 | ```
142 | @article{zhangsurvey,
143 | title={Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention},
144 | author={Zhang, Jintao and Su, Rundong and Liu, Chunyu and Wei, Jia and Wang, Ziteng and Zhang, Pengle and Wang, Haoxu and Jiang, Huiqiang and Huang, Haofeng and Xiang, Chendong and others}
145 | }
146 | ```
147 |
--------------------------------------------------------------------------------