├── png_figs
    ├── fig1.png
    ├── fig2.png
    ├── fig3.png
    ├── fig4.png
    ├── fig5.png
    ├── table2.png
    ├── table3.png
    ├── table4.png
    ├── table5.png
    ├── table6.png
    ├── table7.png
    ├── formula1.png
    ├── formula2.png
    ├── formula3.png
    ├── formula4.png
    ├── formula5.png
    ├── fig1_with_caption.png
    ├── fig2_with_caption.png
    ├── fig3_with_caption.png
    ├── fig4_with_caption.png
    ├── table2_with_caption.png
    ├── table3_with_caption.png
    ├── table4_with_caption.png
    ├── table5_with_caption.png
    ├── table6_with_caption.png
    └── table7_with_caption.png
├── resource
    └── Attention_Survey.pdf
└── README.md


/png_figs/fig1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig1.png


--------------------------------------------------------------------------------
/png_figs/fig2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig2.png


--------------------------------------------------------------------------------
/png_figs/fig3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig3.png


--------------------------------------------------------------------------------
/png_figs/fig4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig4.png


--------------------------------------------------------------------------------
/png_figs/fig5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig5.png


--------------------------------------------------------------------------------
/png_figs/table2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table2.png


--------------------------------------------------------------------------------
/png_figs/table3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table3.png


--------------------------------------------------------------------------------
/png_figs/table4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table4.png


--------------------------------------------------------------------------------
/png_figs/table5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table5.png


--------------------------------------------------------------------------------
/png_figs/table6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table6.png


--------------------------------------------------------------------------------
/png_figs/table7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table7.png


--------------------------------------------------------------------------------
/png_figs/formula1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/formula1.png


--------------------------------------------------------------------------------
/png_figs/formula2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/formula2.png


--------------------------------------------------------------------------------
/png_figs/formula3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/formula3.png


--------------------------------------------------------------------------------
/png_figs/formula4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/formula4.png


--------------------------------------------------------------------------------
/png_figs/formula5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/formula5.png


--------------------------------------------------------------------------------
/resource/Attention_Survey.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/resource/Attention_Survey.pdf


--------------------------------------------------------------------------------
/png_figs/fig1_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig1_with_caption.png


--------------------------------------------------------------------------------
/png_figs/fig2_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig2_with_caption.png


--------------------------------------------------------------------------------
/png_figs/fig3_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig3_with_caption.png


--------------------------------------------------------------------------------
/png_figs/fig4_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/fig4_with_caption.png


--------------------------------------------------------------------------------
/png_figs/table2_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table2_with_caption.png


--------------------------------------------------------------------------------
/png_figs/table3_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table3_with_caption.png


--------------------------------------------------------------------------------
/png_figs/table4_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table4_with_caption.png


--------------------------------------------------------------------------------
/png_figs/table5_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table5_with_caption.png


--------------------------------------------------------------------------------
/png_figs/table6_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table6_with_caption.png


--------------------------------------------------------------------------------
/png_figs/table7_with_caption.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/attention-survey/Efficient_Attention_Survey/HEAD/png_figs/table7_with_caption.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # A Survey of Efficient Attention Methods
  2 | 
  3 | **Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention**  
  4 | **PDF**: https://attention-survey.github.io/files/Attention_Survey.pdf  
  5 | Paper webpage: https://attention-survey.github.io
  6 | 
  7 | ![](./png_figs/fig2.png)
  8 | 
  9 | This paper provides a comprehensive survey of **Efficient Attention Methods**, categorizing them into four classes.
 10 | 
 11 | -----
 12 | 
 13 | ## Updates
 14 | 
 15 |  - **[2025/8/19]** 🎉 Our survey paper is now publicly available on [GitHub](https://attention-survey.github.io/files/Attention_Survey.pdf)! If you do find our resources helpful, please [cite our paper](#citation).
 16 | 
 17 | -----
 18 | 
 19 | ## Class 1: Hardware-efficient Attention
 20 | 
 21 | 💡 **Core Idea**: Accelerate attention by leveraging hardware characteristics.
 22 | 
 23 | 📝 **Overall Formulations**: 
 24 | 
 25 | Hardware-efficient attention for the prefilling stage can be formulated as:
 26 |  <img src="./png_figs/formula1.png" width="50%" height="50%">
 27 | 
 28 | Where $\Psi(\cdot), \Theta(\cdot)$ are preprocess functions to accelerate computation, e.g., quantization functions in SageAttention.
 29 | 
 30 | Hardware-efficient attention for the decoding stage can be formulated as:
 31 |  <img src="./png_figs/formula2.png" width="50%" height="50%">
 32 | 
 33 | Where $\Psi(\cdot), \Theta(\cdot)$ are KV cache preprocess functions.
 34 | 
 35 |  ---
 36 | 
 37 | An example is FlashAttention, which tiles $Q, K, V$ to progressively compute the attention output $O$. Such a strategy avoids the I/O of $S, P$ matrices in the shape of $N \times N$.
 38 | 
 39 |  <img src="./png_figs/fig1.png" width="70%" height="70%">
 40 | 
 41 | ---
 42 | 
 43 | The Table below summarizes various hardware-efficient attention methods. 👇
 44 | 
 45 | ![](./png_figs/table2.png)
 46 | 
 47 | 
 48 | -----
 49 | 
 50 | ## Class2: Compact Attention
 51 | 
 52 | 💡 **Core Idea**: Compressing the KV cache of attention by weight sharing or low rank decomposition while keeping computational cost unchanged, as with a full-sized KV cache. 
 53 | 
 54 | 📝 **Overall Formulations**: 
 55 | 
 56 |  <img src="./png_figs/formula3.png" width="50%" height="50%">
 57 | 
 58 | ---
 59 | 
 60 | The Table Below is a summarization of various compact attention approaches. 👇
 61 | 
 62 | ![](./png_figs/table3.png)
 63 | 
 64 | 
 65 | -----
 66 | 
 67 | ## Class3: Sparse Attention
 68 | 
 69 | 💡 **Core Idea**: Selectively performing a subset of computations in attention while omitting others.
 70 | 
 71 | 📝 **Overall Formulations**: 
 72 | 
 73 |  <img src="./png_figs/formula4.png" width="90%" height="90%">
 74 | 
 75 |  ---
 76 | 
 77 | The Table below summarizes various sparse attention methods. 👇
 78 | 
 79 | ![](./png_figs/table4.png)
 80 | 
 81 | -----
 82 | 
 83 | ## Class4: Linear Attention
 84 | 
 85 | 💡 **Core Idea**: Redesigning the computational formulation of attention to achieve \(\mathcal{O}(N)\) time complexity. 
 86 | 
 87 | 📝 **Overall Formulations**: 
 88 | 
 89 |  <img src="./png_figs/formula5.png" width="50%" height="50%">
 90 | 
 91 | ---
 92 | ### Computational Forms
 93 | 
 94 | Linear Attention can be implemented in three forms: **parallel**, **recurrent**, and **chunkwise**.
 95 | 
 96 | ![](./png_figs/fig3.png)
 97 | 
 98 | ---
 99 | 
100 | ### Gating Mechanisms
101 | 
102 | Many linear attention methods incorporate **forget gates** and **select gates**.
103 | 
104 |  <img src="./png_figs/fig4.png" width="70%" height="70%">
105 | 
106 | Based on the presence of these gates, we can classify linear attention methods as follows:
107 | 
108 | 1.  **Naive Linear Attention (No Gates)**
109 | 
110 |     📝 The Table below summarizes naive attention methods. 👇
111 | 
112 |     ![](./png_figs/table5.png)
113 | 
114 | 
115 | 2.  **Linear Attention with a Forget Gate**
116 | 
117 |     📝 This Table compares methods that use a forget gate. 👇
118 | 
119 |     ![](./png_figs/table6.png)
120 | 
121 | 
122 | 3.  **Linear Attention with Forget and Select Gates**
123 | 
124 |     📝 This Table compares methods that utilize both the forget gate and the select gate. 👇
125 | 
126 |     ![](./png_figs/table6.png)
127 |     
128 | 
129 | ### A Special Case: Test-Time Training (TTT)
130 | 
131 | A unique approach, **Test-Time Training (TTT)**, treats the hidden states of linear attention as learnable parameters.
132 | 
133 |  <img src="./png_figs/fig5.png" width="70%" height="70%">
134 | 
135 | -----
136 | 
137 | ## Citation
138 | 
139 | If you find our work helpful, please cite our paper:
140 | 
141 | ```
142 | @article{zhangsurvey,
143 |   title={Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention},
144 |   author={Zhang, Jintao and Su, Rundong and Liu, Chunyu and Wei, Jia and Wang, Ziteng and Zhang, Pengle and Wang, Haoxu and Jiang, Huiqiang and Huang, Haofeng and Xiang, Chendong and others}
145 | }
146 | ```
147 | 


--------------------------------------------------------------------------------