├── LICENSE
└── README.md
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2024 Trae1ounG
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
Awesome Parametric Knowledge in LLMs
2 |
3 |
4 |
5 |
6 | [](https://github.com/Trae1ounG/Awesome-parametric-Knowledge-in-LLMs/blob/main/LICENSE)
7 | 
8 | [](https://github.com/Xnhyacinth/Long_Text_Modeling_Papers/commits/main)
9 | [](https://github.com/Trae1ounG/Awesome-parametric-Knowledge-in-LLMs/pulls)
10 | [](https://github.com/Trae1ounG/Awesome-parametric-Knowledge-in-LLMs)
11 |
12 |
13 |
14 | This repo includes papers about parametric knowledge in LLMs, now we have parametric knowledge detection and parametric knowledge application these two main categories!👻
15 |
16 | We believe that the parametric knowledge in LLMs is still a largely unexplored area, and we hope this repository will provide you with some valuable insights!😶🌫️
17 |
18 | # Prametric Knowledge Detection
19 | ## Knowledge in Transformer-based Model——Analysis🧠
20 | ### 2025
21 | 1. **[Decoding specialised feature neurons in LLMs with the final projection layer](http://arxiv.org/abs/2501.02688)**
22 |
23 | [Logits Lens, Analysis of Query Neuron]
24 | ### 2024
25 | 1. **[What does the knowledge neuron thesis have to do with knowledge? ](https://arxiv.org/abs/2405.02421)**
26 |
27 | *Jingcheng Niu, Andrew Liu, Zining Zhu, Gerald Penn.* ICLR'24(Spotlight)
28 |
29 | 2. **[Knowledge Mechanisms in Large Language Models: A Survey and Perspective](https://arxiv.org/abs/2407.15017)**
30 |
31 | *Mengru Wang, Yunzhi Yao, Ziwen Xu, Shuofei Qiao, Shumin Deng, Peng Wang, Xiang Chen, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang.* EMNLP'24 Findings
32 |
33 | 3. **[Disentangling Memory and Reasoning Ability in Large Language Models](https://arxiv.org/abs/2411.13504v2)** [](https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning)
34 |
35 | *Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang.* preprint'24
36 |
37 | 4. **[Linguistic collapse: Neural collapse in (large) language models](https://arxiv.org/abs/2405.17767)**[]( https://github.com/rhubarbwu/linguistic-collapse)
38 |
39 | *Robert Wu, Vardan Papyan.* NIPS'24
40 |
41 | 5. **[Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models](https://arxiv.org/abs/2410.08414)**[](https://github.com/sitaocheng/Knowledge_Interplay)
42 |
43 | *Sitao Cheng, Liangming Pan, Xunjian Yin, Xinyi Wang, William Yang Wang.* Preprint'24
44 |
45 | 6. **[Evaluating the External and Parametric Knowledge Fusion of Large Language Models](https://arxiv.org/abs/2405.19010)**
46 |
47 | *Hao Zhang, Yuyang Zhang, Xiaoguang Li, Wenxuan Shi, Haonan Xu, Huanshuo Liu, Yasheng Wang, Lifeng Shang, Qun Liu, Yong Liu, Ruiming Tang.* Preprint'24
48 |
49 | 7. **[Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts](https://arxiv.org/abs/2305.13300)**[](https://github.com/OSU-NLP-Group/LLM-Knowledge-Conflict)
50 |
51 | *Jian Xie, Kai Zhang, Jiangjie Chen, Renze Lou, Yu Su.* ICLR'24 Spotlight
52 |
53 | 8. **[Knowledge entropy decay during language model pretraining hinders new knowledge acquisition](https://arxiv.org/abs/2410.01380)**
54 |
55 | *Jiyeon Kim, Hyunji Lee, Hyowon Cho, Joel Jang, Hyeonbin Hwang, Seungpil Won, Youbin Ahn, Dohaeng Lee, Minjoon Seo.* Preprint'24
56 |
57 | 9. **[When Context Leads but Parametric Memory Follows in Large Language Models](https://arxiv.org/abs/2409.08435)**[](https://github.com/PortNLP/WikiAtomic)
58 |
59 | *Yufei Tao, Adam Hiatt, Erik Haake, Antonie J. Jetter, Ameeta Agrawal.* EMNLP'24
60 |
61 | 10. **[Neuron-level knowledge attribution in large language models](https://arxiv.org/abs/2312.12141)**[](https://github.com/zepingyu0512/neuron-attribution)
62 |
63 | *Zeping Yu, Sophia Ananiadou.* EMNLP'24
64 |
65 | 11. **[Dissecting recall of factual associations in auto-regressive language models](http://arxiv.org/abs/2304.14767)**[[code](https://github.com/google-research/google-research/tree/master/dissecting_factual_predictions)]
66 |
67 | *Mor Geva, Jasmijn Bastings, Katja Filippova, Amir Globerson.* EMNLP'23
68 |
69 | 12. **[INSIDE: LLMs' internal states retain the power of hallucination detection](http://arxiv.org/abs/2402.03744)**
70 |
71 | [Hallucination Detection, Sequence-level] ICLR'24
72 | ### 2021
73 | 1. **[Transformer Feed-Forward Layers Are Key-Value Memories](https://arxiv.org/abs/2012.14913)**
74 |
75 | *Mor Geva, Roei Schuster, Jonathan Berant, Omer Levy.* EMNLP'21
76 | ## Knowledge in Transformer-based Model——Causal Tracing🦾
77 | 1. **[Does knowledge localization hold true? Surprising differences between entity and relation perspectives in language models](https://arxiv.org/pdf/2409.00617)**
78 |
79 | *Yifan Wei, Xiaoyan Yu, Yixuan Weng, Huanhuan Ma, Yuanzhe Zhang, Jun Zhao, Kang Liu.* CIKM'24
80 |
81 | ### 2022
82 | 1. **[Locating and Editing Factual Associations in GPT](https://arxiv.org/abs/2202.05262)**
83 |
84 | *Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov.* NIPS'22
85 | ### 2024
86 | ## Knowledge in Transformer-based Model——Gradient Attribution👀
87 |
88 | 1. **[Identifying query-relevant neurons in large language models for long-form texts](https://arxiv.org/abs/2406.10868)**
89 |
90 | *Lihu Chen, Adam Dejl, Francesca Toni.* Preprint'24
91 |
92 | 2. **[Revealing the parametric knowledge of language models: A unified framework for attribution methods](https://arxiv.org/abs/2404.18655)**
93 |
94 | *Haeun Yu, Pepa Atanasova, Isabelle Augenstein.* ACL'24
95 | 3. **[Does Large Language Model contain Task-Specific Neurons.](https://aclanthology.org/2024.emnlp-main.403/)**
96 |
97 | *Ran Song, Shizhu He, Shuting Jiang, Yantuan Xian, Shengxiang Gao, Kang Liu, and Zhengtao Yu.* EMNLP'24
98 |
99 | 4. **[Journey to the center of the knowledge neurons: Discoveries of language-independent knowledge neurons and degenerate knowledge neurons](http://arxiv.org/abs/2308.13198)**[](https://github.com/heng840/AMIG)
100 |
101 | *Yuheng Chen, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao.* AAAI'24
102 | ### 2022
103 | 1. **[Knowledge Neurons in Pretrained Transformers](https://arxiv.org/abs/2104.08696)**[](https://github.com/Hunter-DDM/knowledge-neurons)
104 |
105 | *Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei.* ACL'22
106 |
107 | ## Knowledge in Transformer-based Model——Activation🫀
108 | ### 2025
109 | 1. **[Improving instruction-following in language models through activation steering](http://arxiv.org/abs/2410.12877)**
110 | [Activation Steering, Instruction Following] ICLR'25
111 |
112 | 2. **[Activation-informed merging of large language models](http://arxiv.org/abs/2502.02421)**
113 | [Activation-based Model Merging] Arxiv'25
114 |
115 | ### 2024
116 | 1. **[Separating tongue from thought: Activation patching reveals language-agnostic concept representations in transformers](https://arxiv.org/abs/2411.08745)** [](https://github.com/Butanium/llm-lang-agnostic)
117 |
118 | *Clément Dumas, Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West.* ICLR'24 Spotlight
119 |
120 | 2. **[From yes-men to truth-tellers Addressing sycophancy in large language models with pinpoint tuning](https://arxiv.org/pdf/2409.01658v2)**
121 |
122 | *Wei Chen, Zhen Huang, Liang Xie, Binbin Lin, Houqiang Li, Le Lu, Xinmei Tian, Deng Cai, Yonggang Zhang, Wenxiao Wang, Xu Shen, Jieping Ye.* ICML'24
123 |
124 | 3. **[Language-specific neurons: The key to multilingual capabilities in large language models.](https://arxiv.org/abs/2402.16438)**
125 |
126 | *Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei, Ji-Rong Wen.* ACL'24
127 |
128 | 4. **[Multi-property Steering of Large Language Models with Dynamic Activation Composition](https://arxiv.org/abs/2406.17563)** [](https://github.com/DanielSc4/Dynamic-Activation-Composition)
129 |
130 | *Daniel Scalena, Gabriele Sarti, Malvina Nissim.* ACL'24 BlackboxNLP Workshop
131 |
132 | 5. **[Exploring the benefit of activation sparsity in pre-training](http://arxiv.org/abs/2410.03440)**[](https://github.com/thunlp/moefication)
133 |
134 | [MoE, Activation Sparsity, Activation Pattern, Inference Speedup]
135 | *Zhengyan Zhang, Chaojun Xiao, Qiujieli Qin, Yankai Lin, Zhiyuan Zeng, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou.* ICML'24
136 |
137 |
138 |
139 | ## 2023
140 | 1. **[Activation Addition: Steering Language Models Without Optimization](https://arxiv.org/abs/2308.10248v4)**
141 |
142 | *Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, Monte MacDiarmid.* Preprint'23
143 |
144 | 2. **[Deja vu: Contextual sparsity for efficient LLMs at inference time](http://arxiv.org/abs/2310.17157)**
145 |
146 | [Sparsity, Inference Speedup]
147 | *Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen.* ICML'23
148 |
149 |
150 |
151 |
152 | # Parametric Knowledge Application
153 | ## Knowledge Editing 🧑⚕️
154 | ### 2024
155 | 1. **[A Comprehensive Study of Knowledge Editing for Large Language Models](https://arxiv.org/abs/2401.01286)**
156 |
157 | *Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, Huajun Chen.* Preprint'24
158 |
159 | 2. **[FAME: Towards Factual Multi-Task Model Editing](https://arxiv.org/abs/2410.10859)**[](https://github.com/BITHLP/FAME)
160 | *Li Zeng, Yingyu Shan, Zeming Liu, Jiashu Yao, Yuhang Guo.* EMNLP'24
161 |
162 | 3. **[To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models](https://arxiv.org/abs/2407.01920)**[](https://github.com/zjunlp/KnowUnDo)
163 |
164 | *Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang.* EMNLP'24 findings
165 |
166 | 4. **[Understanding the Collapse of LLMs in Model Editing](https://arxiv.org/abs/2406.11263)**
167 |
168 | *Wanli Yang, Fei Sun, Jiajun Tan, Xinyu Ma, Du Su, Dawei Yin, Huawei Shen.* EMNLP'24 findings
169 |
170 | 5. **[Is it possible to edit large language models robustly?](https://arxiv.org/pdf/2402.05827)**[](https://github.com/xbmxb/edit_analysis)
171 |
172 | *Xinbei Ma, Tianjie Ju, Jiyang Qiu, Zhuosheng Zhang, Hai Zhao, Lifeng Liu, Yulong Wang.* Preprint'24
173 |
174 | 6. **[Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering](https://arxiv.org/pdf/2403.19631)**[](https://github.com/sycny/RAE)
175 |
176 | *Yucheng Shi, Qiaoyu Tan, Xuansheng Wu, Shaochen Zhong, Kaixiong Zhou, Ninghao Liu.* CIKM'24
177 |
178 | 7. **[Latent paraphrasing: Perturbation on layers improves knowledge injection in language models](https://arxiv.org/abs/2411.00686)**
179 |
180 | *Minki Kang, Sung Ju Hwang, Gibbeum Lee, Jaewoong Cho.* NIPS'24
181 |
182 | 8. **[Learning to edit: Aligning LLMs with knowledge editing](https://arxiv.org/abs/2402.11905)**[](https://github.com/YJiangcm/LTE)
183 |
184 | *Yuxin Jiang, Yufei Wang, Chuhan Wu, Wanjun Zhong, Xingshan Zeng, Jiahui Gao, Liangyou Li, Xin Jiang, Lifeng Shang, Ruiming Tang, Qun Liu, Wei Wang.* ACL'24
185 |
186 | 9. **[Inspecting and Editing Knowledge Representations in Language Models](https://arxiv.org/abs/2304.00740)**[](https://github.com/evandez/REMEDI)
187 |
188 | *Evan Hernandez, Belinda Z. Li, Jacob Andreas.* COLM'24
189 |
190 | 10. **[Forgetting before learning: Utilizing parametric arithmetic for knowledge updating in large language models](https://arxiv.org/abs/2311.08011)**
191 |
192 | *Shiwen Ni, Dingwei Chen, Chengming Li, Xiping Hu, Ruifeng Xu, Min Yang.* ACL'24
193 |
194 | 11. **[Ethos: Rectifying language models in orthogonal parameter space](http://arxiv.org/abs/2403.08994)**
195 |
196 | [Toxic/Bias Unlearning, SVD, Analysis of Parametric Knowledge, Task Vector]
197 |
198 | NAACL'24 findings
199 |
200 | ### 2023
201 | 1. **[Editing Large Language Models: Problems, Methods, and Opportunities](https://arxiv.org/abs/2305.13172)**
202 |
203 | *Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang.* EMNLP'23
204 | ### 2022
205 | 1. **[Locating and Editing Factual Associations in GPT](https://arxiv.org/abs/2202.05262)**
206 |
207 | *Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov.* NIPS'22
208 | 2. **[Memory-Based Model Editing at Scale](https://arxiv.org/abs/2206.06520)**
209 |
210 | *Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, Chelsea Finn.* ICLR'22
211 | ### 2021
212 | 1. **[Editing Factual Knowledge in Language Models](https://arxiv.org/abs/2104.08164)**
213 |
214 | *Nicola De Cao, Wilker Aziz, Ivan Titov.* EMNLP'21
215 | ### 2020
216 | 1. **[Editable neural networks.](https://arxiv.org/abs/2004.00345)**
217 |
218 | *Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitriy Pyrkin, Sergei Popov, Artem Babenko.* ICLR'20
219 | ## Knowledge Transfer🧚♀️
220 | ### 2025
221 | 1. **[Unlocking efficient long-to-short LLM reasoning with model merging](http://arxiv.org/abs/2503.20641)**
222 | [Model Merging for L2S] Arxiv'25
223 |
224 | ### 2024
225 | 1. **[Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective](https://arxiv.org/abs/2310.11451)**
226 |
227 | *Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He.* ICLR'24
228 |
229 | 2. **[Initializing models with larger ones](https://arxiv.org/abs/2311.18823)**[](https://github.com/OscarXZQ/weight-selection)
230 |
231 | *Zhiqiu Xu, Yanjie Chen, Kirill Vishniakov, Yida Yin, Zhiqiang Shen, Trevor Darrell, Lingjie Liu, Zhuang Liu.* ICLR'24 **Spotlight**
232 |
233 | 3. **[Cross-model Control: Improving Multiple Large Language Models in One-time Training](https://www.arxiv.org/abs/2410.17599)**[](https://github.com/wujwyi/CMC)
234 |
235 | *Jiayi Wu, Hao Sun, Hengyi Cai, Lixin Su, Shuaiqiang Wang, Dawei Yin, Xiang Li, Ming Gao.* NIPS'24
236 |
237 | 4. **[Knowledge fusion of large language models](https://arxiv.org/abs/2401.10491)**[](https://github.com/fanqiwan/FuseLLM)
238 |
239 | *Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi.* ICLR'24
240 |
241 | 5. **[Tuning language models by proxy](https://arxiv.org/abs/2401.08565)**[](https://github.com/alisawuffles/proxy-tuning)
242 |
243 | *Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith.* COLM'24
244 |
245 | 6. **[Chat vector: A simple approach to equip LLMs with instruction following and model alignment in new languages](http://arxiv.org/abs/2310.04799)**[](https://github.com/aqweteddy/ChatVector)
246 |
247 | [Task Vector, Parametric Knowledge, Knowledge Transfer]
248 |
249 | ACL'24
250 |
251 | 7. **[FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models](https://arxiv.org/abs/2406.02224)**
252 |
253 | [Federated Learning, Knowledge Transfer, Heterogeneous Token Alignment]
254 |
255 | Coling'25
256 |
257 | 8. **[Function vectors in large language models](http://arxiv.org/abs/2310.15213)**
258 |
259 | [Function Vector, Causal Mediation, Mechanism Interpretation]
260 |
261 | ICLR'24
262 |
263 | 9. **[Refine large language model fine-tuning via instruction vector](http://arxiv.org/abs/2406.12227)**
264 |
265 | [Catastrophic Forgetting, Function Vector, Causal Mediation]
266 |
267 | Preprint'24
268 |
269 | 10. **[KlF: Knowledge localization and fusion for language model continual learning](http://arxiv.org/abs/2408.05200)**
270 |
271 | [Catastrophic Forgetting, Continual Learning, Sensetity-based Location]
272 |
273 | ACL'24
274 |
275 | 11. **[Language models are super mario: Absorbing abilities from homologous models as a free lunch](http://arxiv.org/abs/2311.03099)**
276 |
277 | [Knowledge Transfer, Model Merging, Efficient Skill] ICML'24
278 |
279 | 12. **[Beyond task vectors: Selective task arithmetic based on importance metrics](http://arxiv.org/abs/2411.16139)**
280 |
281 | [Task Vector, Sensetivity-based Importance Score, Model Merging] Preprint'24
282 |
283 | 13. **[Determine-then-ensemble: Necessity of top-k union for large language model ensembling](http://arxiv.org/abs/2410.03777)**
284 |
285 | [Model Ensemble, Prabability-Level, Analysis] ICLR'25 Spotlight
286 |
287 | ### 2023
288 | 1. **[Mutual enhancement of large and small language models with cross-silo knowledge transfer](https://arxiv.org/abs/2312.05842)**
289 |
290 | *Yongheng Deng, Ziqing Qiao, Ju Ren, Yang Liu, Yaoxue Zhang.* Preprint'23
291 |
292 | 2. **[Learning to grow pretrained models for efficient transformer training](https://arxiv.org/abs/2303.00980)**[](https://github.com/VITA-Group/LiGO)
293 |
294 | *Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David D. Cox, Zhangyang Wang, Yoon Kim.* ICLR'23
295 |
296 | 3. **[Retrieval-based knowledge transfer: An effective approach for extreme large language model compression](https://arxiv.org/abs/2310.15594)**
297 |
298 | *Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Dongyan Zhao, Ran Lucien Wang, Rui Yan.* EMNLP'23 Findings
299 |
300 | 4. **[Editing models with task arithmetic](http://arxiv.org/abs/2212.04089)**[](https://github.com/mlfoundations/task_vectors)
301 |
302 | [Task Vecotr, Parametric Knowledge, Knowledge Transfer, Multi-task Learning]
303 |
304 | ICLR'23
305 |
306 | 5. **[Task-Specific Skill Localization in Fine-tuned Language Models](http://arxiv.org/abs/2302.06600)**
307 |
308 | [Knowledge Transfer, Model Graft, Skill Parameter Localization]
309 |
310 | ICML'23
311 |
312 | 6. **[Composing parameter-efficient modules with arithmetic operations](http://arxiv.org/abs/2306.14870)**
313 |
314 | [PEFT, Task Vector, Model Merge]
315 |
316 | NIPS'23
317 |
318 | 7. **[Dataless knowledge fusion by merging weights of language models](http://arxiv.org/abs/2212.09849)**
319 |
320 | [Model Merge]
321 |
322 | ICLR'23
323 | ### 2021
324 | 1. **[Weight distillation: Transferring the knowledge in neural network parameters](https://arxiv.org/abs/2009.09152)**[](https://github.com/Lollipop321/weight-distillation)
325 |
326 | *Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu.* ACL'21
327 |
328 | ## Activation Steering
329 | ## 2024
330 | 1. **[Multi-property Steering of Large Language Models with Dynamic Activation Composition](https://arxiv.org/abs/2406.17563)** [](https://github.com/DanielSc4/Dynamic-Activation-Composition)
331 |
332 | *Daniel Scalena, Gabriele Sarti, Malvina Nissim.* ACL'24 BlackboxNLP Workshop
333 |
334 | 2. **[Word embeddings are steers for language models](http://arxiv.org/abs/2305.12798)**
335 |
336 | [Word Embedding Steering, Generation Control] ACL'24
337 |
338 | ## 2023
339 | 1. **[Activation Addition: Steering Language Models Without Optimization](https://arxiv.org/abs/2308.10248v4)**
340 |
341 | *Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, Monte MacDiarmid.* Preprint'23
342 | ## Knowledge Distillation
343 | ### 2024
344 | 1. **[PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning](https://arxiv.org/abs/2402.12842)**[](https://github.com/gmkim-ai/PromptKD)(Note: not parametric)
345 |
346 | *Gyeongman Kim, Doohyuk Jang, Eunho Yang.* EMNLP'24 findings
347 |
348 | 2. **[From Instance Training to Instruction Learning: Task Adapters Generation from Instructions](https://arxiv.org/abs/2406.12382)**[](https://github.com/Xnhyacinth/TAGI/)
349 |
350 | *Huanxuan Liao, Yao Xu, Shizhu He, Yuanzhe Zhang, Yanchao Hao, Shengping Liu, Kang Liu, Jun Zhao.* NIPS'24
351 |
352 | 3. **[When babies teach babies: Can student knowledge sharing outperform teacher-guided distillation on small datasets?](https://arxiv.org/abs/2411.16487v1)**
353 |
354 | *Srikrishna Iyer.* EMNLP'24 CoNLL Workshop
355 |
356 | ## Pramatric Quantization
357 |
358 | ### 2024
359 |
360 | 1. **[OneBit: Towards extremely low-bit large language models](https://arxiv.org/abs/2402.11295)** []( https://github.com/xuyuzhuang11/OneBit)
361 |
362 |
363 | *Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, Wanxiang Che.* NIPS'24
364 |
365 | ### 2023
366 |
367 | 1. **[The cost of compression: Investigating the impact of compression on parametric knowledge in language models](https://arxiv.org/abs/2312.00960)** [](https://github.com/NamburiSrinath/LLMCompression)
368 |
369 | *Satya Sai Srinath Namburi, Makesh Sreedhar, Srinath Srinivasan, Frederic Sala.* EMNLP'23 findings
370 |
371 | ## Knowledge Injection
372 | ### 2024
373 | 1. **[Awakening augmented generation: Learning to awaken internal knowledge of large language models for question answering](http://arxiv.org/abs/2403.15268)**[](https://github.com/Xnhyacinth/IAG)
374 |
375 | [HyperNet, RAG, Context Compression]
376 |
377 | *Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Kang Liu, Shengping Liu, Jun Zhao.* AAAI'25
378 |
379 | 2. **[Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass](https://arxiv.org/abs/2411.05877)**
380 |
381 | [Hypernetwork, Temperal Knowledge, Context Compression] ICLR'25
382 | ### 2023
383 | 1. **[Memory injections: Correcting multi-hop reasoning failures during inference in transformer-based language models](http://arxiv.org/abs/2309.05605)**[](https://github.com/msakarvadia/memory_injections)
384 |
385 | *Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster.* Oral Presentation at BlackboxNLP Workshop at EMNLP'23
386 |
387 | 2. **[Decouple knowledge from parameters for plug-and-play language modeling](http://arxiv.org/abs/2305.11564)**[](https://github.com/Hannibal046/PlugLM)
388 |
389 | *Xin Cheng, Yankai Lin, Xiuying Chen, Dongyan Zhao, Rui Yan.* ACL'23 findings
390 |
391 | 3. **[IN-PARAMETER KNOWLEDGE INJECTION: INTEGRATING TEMPORARY CONTEXTUAL INFORMATION INTO MODEL PARAMETERS](https://openreview.net/forum?id=sl4hOq9wm9)**
392 |
393 | submitted to ICLR'25
394 | ### 2022
395 | 1. **[Kformer: Knowledge injection in transformer feed-forward layers](http://arxiv.org/abs/2201.05742)**[](https://github.com/zjunlp/Kformer)
396 |
397 | *Yunzhi Yao, Shaohan Huang, Li Dong, Furu Wei, Huajun Chen, Ningyu Zhang.* NLPCC'22
398 |
399 | ## Parameter-Effecient Fine-tuning(PEFT)
400 | ### 2024
401 | 1. **[KaSA: Knowledge-aware singular-value adaptation of large language models](http://arxiv.org/abs/2412.06071)**[](https://github.com/juyongjiang/KaSA)
402 |
403 | [Knowledge-aware LoRA, SVD]
404 |
405 | *Fan Wang, Juyong Jiang, Chansung Park, Sunghun Kim, Jing Tang.* Preprint'24
406 |
407 | 2. **[CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning](https://arxiv.org/abs/2406.05223)**[](https://github.com/iboing/CorDA)
408 |
409 | [Knowledge-aware LoRA, SVD]
410 |
411 | *Yibo Yang, Xiaojie Li, Zhongzhu Zhou, Shuaiwen Leon Song, Jianlong Wu, Liqiang Nie, Bernard Ghanem.* NIPS'24
412 |
413 | 3. **[DoRA: Weight-Decomposed Low-Rank Adaptation](https://arxiv.org/abs/2402.09353)**[](https://github.com/NVlabs/DoRA)
414 |
415 | [Weight-Decomposed LoRA, SVD, Analysis of FT and LoRA]
416 | *Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen.* ICML'24 Oral
417 |
418 | 4. **[Low-rank adaptation with task-relevant feature enhancement for fine-tuning language models](http://arxiv.org/abs/2412.09827)**
419 |
420 | [Task-aware LoRA, Hidden Representation Enhancement] AAAI'25 CoLoRAI Workshop
421 |
422 | 5 **[Train small, infer large: Memory-efficient LoRA training for large language models](http://arxiv.org/abs/2502.13533)**
423 |
424 | [Memory-efficient LoRA Training, Pruning Methods, High memory efficiency]
425 | ## Continual Learning
426 | ### 2024
427 | 1. **[Learn more, but bother less: Parameter efficient continual learning](https://neurips.cc/virtual/2024/poster/94599)**
428 |
429 | [Continual Learning, Parameter Efficient, Knowledge Transfer] NIPS'24
430 |
431 | 2. **[What will my model forget? Forecasting forgotten examples in language model refinement](http://arxiv.org/abs/2402.01865)**
432 |
433 | [Catastrophic Forgetting, Forecasting Forgetting, Analysis] ICML'24 Spotlight
434 | ## RAG
435 |
436 | ### 2025
437 | 1. **[Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement](https://arxiv.org/abs/2503.23895)** [[code](https://github.com/Trae1ounG/DyPRAG)] [[Homepage](https://trae1ounG.github.io/DyPRAG/)]
438 |
439 | [Dynamic Parametric RAG, Test-time Knowledge Enhancement, Reducing RAG Hallucination, Plug-and-Play RAG Improvement] Arxiv'25
440 |
441 | ### 2024
442 | 1. **[xRAG: Extreme context compression for retrieval-augmented generation with one token](http://arxiv.org/abs/2405.13792)**
443 |
444 | [Context Compression, RAG, Multimodal Fusion] NIPS'24
445 |
446 | 2. **[Parametric retrieval augmented generation](http://arxiv.org/abs/2501.15915)**
447 |
448 | [Parametric RAG, Document Parameterization, Offline Method]
449 |
450 | 3. **[RAGTruth: A hallucination corpus for developing trustworthy retrieval-augmented language models](http://arxiv.org/abs/2401.00396)**
451 |
452 | [RAG, Hallucination, Benchmark] ACL'24
453 |
454 |
455 |
456 | ## Long Context Extend
457 | ### 2024
458 |
459 | 1. **[LongEmbed: Extending embedding models for long context retrieval](http://arxiv.org/abs/2404.12096)**
460 |
461 | [Long Context, Embedding Model, Benchmark] EMNLP'24
462 |
463 | 2. **[LLM maybe LongLM: Self-extend LLM context window without tuning](http://arxiv.org/abs/2401.01325)**
464 |
465 | [Long Context Extend, Plug-and-Play Method] ICML'24 Spotlight
466 |
467 | 3. **[Two stones hit one bird: Bilevel positional encoding for better length extrapolation](http://arxiv.org/abs/2401.16421)**
468 |
469 | [Long Context Extend, Absolute PE + Relative PE, Plug-and-Play but Training-based Method] ICML'24
470 | ### 2023
471 | 1. **YaRN: Efficient context window extension of large language models[http://arxiv.org/abs/2309.00071]**
472 |
473 | [Long Context Extend, Variation of RoPE] ICLR'24
474 | ### 2022
475 | 1. **[Train short, test long: Attention with linear biases enables input length extrapolation](http://arxiv.org/abs/2108.12409)**
476 |
477 | [Alibi, Long Context Extrapolate, Training-based Method] ICLR'22
478 |
479 | ### 2021
480 | 1. **[RoFormer: Enhanced Transformer with Rotary Position Embedding.](https://arxiv.org/abs/2104.09864)**
481 |
482 | [Rotary Position Embedding, Classic]
483 |
484 | ## Long2Short Compression
485 | ### 2025
486 | 1. **[TokenSkip: Controllable chain-of-thought compression in LLMs](http://arxiv.org/abs/2502.12067)**
487 | [L2S, LLMLingua-based Compression, Fine-tune] Arxiv'25
488 |
489 | 2. **[CoT-valve: Length-compressible chain-of-thought tuning](http://arxiv.org/abs/2502.09601)**
490 | [L2S, Task-vector-based Compression, Fine-tune] Arxiv'25
491 |
492 | ### 2024
493 | 1. **[LLMLingua-2: Data distillation for efficient and faithful task-agnostic prompt compression](http://arxiv.org/abs/2403.12968)**
494 | [Prompt Compression] ACL'24 Findings
495 | ### 2023
496 | 1. **[LLMLingua: Compressing prompts for accelerated inference of large language models](http://arxiv.org/abs/2310.05736)**
497 | [Prompt Compression, Bert Model] EMNLP'23
498 |
499 | ## Star History
500 |
501 | [](https://star-history.com/#Trae1ounG/Awesome-parametric-Knowledge-in-LLMs&Date)
--------------------------------------------------------------------------------