├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Trae1ounG 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

Awesome Parametric Knowledge in LLMs

2 | 3 | 4 |
5 | 6 | [![LICENSE](https://img.shields.io/github/license/Xnhyacinth/Awesome-LLM-Long-Context-Modeling)](https://github.com/Trae1ounG/Awesome-parametric-Knowledge-in-LLMs/blob/main/LICENSE) 7 | ![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg) 8 | [![commit](https://img.shields.io/github/last-commit/Trae1ounG/Awesome-parametric-Knowledge-in-LLMs?color=blue)](https://github.com/Xnhyacinth/Long_Text_Modeling_Papers/commits/main) 9 | [![PR](https://img.shields.io/badge/PRs-Welcome-red)](https://github.com/Trae1ounG/Awesome-parametric-Knowledge-in-LLMs/pulls) 10 | [![GitHub Repo stars](https://img.shields.io/github/stars/Trae1ounG/Awesome-parametric-Knowledge-in-LLMs)](https://github.com/Trae1ounG/Awesome-parametric-Knowledge-in-LLMs) 11 | 12 | 13 |
14 | This repo includes papers about parametric knowledge in LLMs, now we have parametric knowledge detection and parametric knowledge application these two main categories!👻 15 | 16 | We believe that the parametric knowledge in LLMs is still a largely unexplored area, and we hope this repository will provide you with some valuable insights!😶‍🌫️ 17 | 18 | # Prametric Knowledge Detection 19 | ## Knowledge in Transformer-based Model——Analysis🧠 20 | ### 2025 21 | 1. **[Decoding specialised feature neurons in LLMs with the final projection layer](http://arxiv.org/abs/2501.02688)** 22 | 23 | [Logits Lens, Analysis of Query Neuron] 24 | ### 2024 25 | 1. **[What does the knowledge neuron thesis have to do with knowledge? ](https://arxiv.org/abs/2405.02421)** 26 | 27 | *Jingcheng Niu, Andrew Liu, Zining Zhu, Gerald Penn.* ICLR'24(Spotlight) 28 | 29 | 2. **[Knowledge Mechanisms in Large Language Models: A Survey and Perspective](https://arxiv.org/abs/2407.15017)** 30 | 31 | *Mengru Wang, Yunzhi Yao, Ziwen Xu, Shuofei Qiao, Shumin Deng, Peng Wang, Xiang Chen, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang.* EMNLP'24 Findings 32 | 33 | 3. **[Disentangling Memory and Reasoning Ability in Large Language Models](https://arxiv.org/abs/2411.13504v2)** [![github repo stars](https://img.shields.io/github/stars/MingyuJ666/Disentangling-Memory-and-Reasoning)](https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning) 34 | 35 | *Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang.* preprint'24 36 | 37 | 4. **[Linguistic collapse: Neural collapse in (large) language models](https://arxiv.org/abs/2405.17767)**[![github repo stars](https://img.shields.io/github/stars/rhubarbwu/linguistic-collapse)]( https://github.com/rhubarbwu/linguistic-collapse) 38 | 39 | *Robert Wu, Vardan Papyan.* NIPS'24 40 | 41 | 5. **[Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models](https://arxiv.org/abs/2410.08414)**[![github repo stars](https://img.shields.io/github/stars/sitaocheng/Knowledge_Interplay)](https://github.com/sitaocheng/Knowledge_Interplay) 42 | 43 | *Sitao Cheng, Liangming Pan, Xunjian Yin, Xinyi Wang, William Yang Wang.* Preprint'24 44 | 45 | 6. **[Evaluating the External and Parametric Knowledge Fusion of Large Language Models](https://arxiv.org/abs/2405.19010)** 46 | 47 | *Hao Zhang, Yuyang Zhang, Xiaoguang Li, Wenxuan Shi, Haonan Xu, Huanshuo Liu, Yasheng Wang, Lifeng Shang, Qun Liu, Yong Liu, Ruiming Tang.* Preprint'24 48 | 49 | 7. **[Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts](https://arxiv.org/abs/2305.13300)**[![github repo stars](https://img.shields.io/github/stars/OSU-NLP-Group/LLM-Knowledge-Conflict)](https://github.com/OSU-NLP-Group/LLM-Knowledge-Conflict) 50 | 51 | *Jian Xie, Kai Zhang, Jiangjie Chen, Renze Lou, Yu Su.* ICLR'24 Spotlight 52 | 53 | 8. **[Knowledge entropy decay during language model pretraining hinders new knowledge acquisition](https://arxiv.org/abs/2410.01380)** 54 | 55 | *Jiyeon Kim, Hyunji Lee, Hyowon Cho, Joel Jang, Hyeonbin Hwang, Seungpil Won, Youbin Ahn, Dohaeng Lee, Minjoon Seo.* Preprint'24 56 | 57 | 9. **[When Context Leads but Parametric Memory Follows in Large Language Models](https://arxiv.org/abs/2409.08435)**[![github repo stars](https://img.shields.io/github/stars/PortNLP/WikiAtomic)](https://github.com/PortNLP/WikiAtomic) 58 | 59 | *Yufei Tao, Adam Hiatt, Erik Haake, Antonie J. Jetter, Ameeta Agrawal.* EMNLP'24 60 | 61 | 10. **[Neuron-level knowledge attribution in large language models](https://arxiv.org/abs/2312.12141)**[![github repo stars](https://img.shields.io/github/stars/zepingyu0512/neuron-attribution)](https://github.com/zepingyu0512/neuron-attribution) 62 | 63 | *Zeping Yu, Sophia Ananiadou.* EMNLP'24 64 | 65 | 11. **[Dissecting recall of factual associations in auto-regressive language models](http://arxiv.org/abs/2304.14767)**[[code](https://github.com/google-research/google-research/tree/master/dissecting_factual_predictions)] 66 | 67 | *Mor Geva, Jasmijn Bastings, Katja Filippova, Amir Globerson.* EMNLP'23 68 | 69 | 12. **[INSIDE: LLMs' internal states retain the power of hallucination detection](http://arxiv.org/abs/2402.03744)** 70 | 71 | [Hallucination Detection, Sequence-level] ICLR'24 72 | ### 2021 73 | 1. **[Transformer Feed-Forward Layers Are Key-Value Memories](https://arxiv.org/abs/2012.14913)** 74 | 75 | *Mor Geva, Roei Schuster, Jonathan Berant, Omer Levy.* EMNLP'21 76 | ## Knowledge in Transformer-based Model——Causal Tracing🦾 77 | 1. **[Does knowledge localization hold true? Surprising differences between entity and relation perspectives in language models](https://arxiv.org/pdf/2409.00617)** 78 | 79 | *Yifan Wei, Xiaoyan Yu, Yixuan Weng, Huanhuan Ma, Yuanzhe Zhang, Jun Zhao, Kang Liu.* CIKM'24 80 | 81 | ### 2022 82 | 1. **[Locating and Editing Factual Associations in GPT](https://arxiv.org/abs/2202.05262)** 83 | 84 | *Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov.* NIPS'22 85 | ### 2024 86 | ## Knowledge in Transformer-based Model——Gradient Attribution👀 87 | 88 | 1. **[Identifying query-relevant neurons in large language models for long-form texts](https://arxiv.org/abs/2406.10868)** 89 | 90 | *Lihu Chen, Adam Dejl, Francesca Toni.* Preprint'24 91 | 92 | 2. **[Revealing the parametric knowledge of language models: A unified framework for attribution methods](https://arxiv.org/abs/2404.18655)** 93 | 94 | *Haeun Yu, Pepa Atanasova, Isabelle Augenstein.* ACL'24 95 | 3. **[Does Large Language Model contain Task-Specific Neurons.](https://aclanthology.org/2024.emnlp-main.403/)** 96 | 97 | *Ran Song, Shizhu He, Shuting Jiang, Yantuan Xian, Shengxiang Gao, Kang Liu, and Zhengtao Yu.* EMNLP'24 98 | 99 | 4. **[Journey to the center of the knowledge neurons: Discoveries of language-independent knowledge neurons and degenerate knowledge neurons](http://arxiv.org/abs/2308.13198)**[![github repo stars](https://img.shields.io/github/stars/heng840/AMIG)](https://github.com/heng840/AMIG) 100 | 101 | *Yuheng Chen, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao.* AAAI'24 102 | ### 2022 103 | 1. **[Knowledge Neurons in Pretrained Transformers](https://arxiv.org/abs/2104.08696)**[![github repo stars](https://img.shields.io/github/stars/Hunter-DDM/knowledge-neurons)](https://github.com/Hunter-DDM/knowledge-neurons) 104 | 105 | *Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei.* ACL'22 106 | 107 | ## Knowledge in Transformer-based Model——Activation🫀 108 | ### 2025 109 | 1. **[Improving instruction-following in language models through activation steering](http://arxiv.org/abs/2410.12877)** 110 | [Activation Steering, Instruction Following] ICLR'25 111 | 112 | 2. **[Activation-informed merging of large language models](http://arxiv.org/abs/2502.02421)** 113 | [Activation-based Model Merging] Arxiv'25 114 | 115 | ### 2024 116 | 1. **[Separating tongue from thought: Activation patching reveals language-agnostic concept representations in transformers](https://arxiv.org/abs/2411.08745)** [![github repo stars](https://img.shields.io/github/stars/Butanium/llm-lang-agnostic)](https://github.com/Butanium/llm-lang-agnostic) 117 | 118 | *Clément Dumas, Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West.* ICLR'24 Spotlight 119 | 120 | 2. **[From yes-men to truth-tellers Addressing sycophancy in large language models with pinpoint tuning](https://arxiv.org/pdf/2409.01658v2)** 121 | 122 | *Wei Chen, Zhen Huang, Liang Xie, Binbin Lin, Houqiang Li, Le Lu, Xinmei Tian, Deng Cai, Yonggang Zhang, Wenxiao Wang, Xu Shen, Jieping Ye.* ICML'24 123 | 124 | 3. **[Language-specific neurons: The key to multilingual capabilities in large language models.](https://arxiv.org/abs/2402.16438)** 125 | 126 | *Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei, Ji-Rong Wen.* ACL'24 127 | 128 | 4. **[Multi-property Steering of Large Language Models with Dynamic Activation Composition](https://arxiv.org/abs/2406.17563)** [![github repo stars](https://img.shields.io/github/stars/DanielSc4/Dynamic-Activation-Composition)](https://github.com/DanielSc4/Dynamic-Activation-Composition) 129 | 130 | *Daniel Scalena, Gabriele Sarti, Malvina Nissim.* ACL'24 BlackboxNLP Workshop 131 | 132 | 5. **[Exploring the benefit of activation sparsity in pre-training](http://arxiv.org/abs/2410.03440)**[![github repo stars](https://img.shields.io/github/stars/thunlp/moefication)](https://github.com/thunlp/moefication) 133 | 134 | [MoE, Activation Sparsity, Activation Pattern, Inference Speedup] 135 | *Zhengyan Zhang, Chaojun Xiao, Qiujieli Qin, Yankai Lin, Zhiyuan Zeng, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou.* ICML'24 136 | 137 | 138 | 139 | ## 2023 140 | 1. **[Activation Addition: Steering Language Models Without Optimization](https://arxiv.org/abs/2308.10248v4)** 141 | 142 | *Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, Monte MacDiarmid.* Preprint'23 143 | 144 | 2. **[Deja vu: Contextual sparsity for efficient LLMs at inference time](http://arxiv.org/abs/2310.17157)** 145 | 146 | [Sparsity, Inference Speedup] 147 | *Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen.* ICML'23 148 | 149 | 150 | 151 | 152 | # Parametric Knowledge Application 153 | ## Knowledge Editing 🧑‍⚕️ 154 | ### 2024 155 | 1. **[A Comprehensive Study of Knowledge Editing for Large Language Models](https://arxiv.org/abs/2401.01286)** 156 | 157 | *Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, Huajun Chen.* Preprint'24 158 | 159 | 2. **[FAME: Towards Factual Multi-Task Model Editing](https://arxiv.org/abs/2410.10859)**[![GitHub Repo stars](https://img.shields.io/github/stars/BITHLP/FAME)](https://github.com/BITHLP/FAME) 160 | *Li Zeng, Yingyu Shan, Zeming Liu, Jiashu Yao, Yuhang Guo.* EMNLP'24 161 | 162 | 3. **[To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models](https://arxiv.org/abs/2407.01920)**[![github repo stars](https://img.shields.io/github/stars/zjunlp/KnowUnDo)](https://github.com/zjunlp/KnowUnDo) 163 | 164 | *Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang.* EMNLP'24 findings 165 | 166 | 4. **[Understanding the Collapse of LLMs in Model Editing](https://arxiv.org/abs/2406.11263)** 167 | 168 | *Wanli Yang, Fei Sun, Jiajun Tan, Xinyu Ma, Du Su, Dawei Yin, Huawei Shen.* EMNLP'24 findings 169 | 170 | 5. **[Is it possible to edit large language models robustly?](https://arxiv.org/pdf/2402.05827)**[![github repo stars](https://img.shields.io/github/stars/xbmxb/edit_analysis)](https://github.com/xbmxb/edit_analysis) 171 | 172 | *Xinbei Ma, Tianjie Ju, Jiyang Qiu, Zhuosheng Zhang, Hai Zhao, Lifeng Liu, Yulong Wang.* Preprint'24 173 | 174 | 6. **[Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering](https://arxiv.org/pdf/2403.19631)**[![github repo stars](https://img.shields.io/github/stars/sycny/RAE)](https://github.com/sycny/RAE) 175 | 176 | *Yucheng Shi, Qiaoyu Tan, Xuansheng Wu, Shaochen Zhong, Kaixiong Zhou, Ninghao Liu.* CIKM'24 177 | 178 | 7. **[Latent paraphrasing: Perturbation on layers improves knowledge injection in language models](https://arxiv.org/abs/2411.00686)** 179 | 180 | *Minki Kang, Sung Ju Hwang, Gibbeum Lee, Jaewoong Cho.* NIPS'24 181 | 182 | 8. **[Learning to edit: Aligning LLMs with knowledge editing](https://arxiv.org/abs/2402.11905)**[![github repo stars](https://img.shields.io/github/stars/YJiangcm/LTE)](https://github.com/YJiangcm/LTE) 183 | 184 | *Yuxin Jiang, Yufei Wang, Chuhan Wu, Wanjun Zhong, Xingshan Zeng, Jiahui Gao, Liangyou Li, Xin Jiang, Lifeng Shang, Ruiming Tang, Qun Liu, Wei Wang.* ACL'24 185 | 186 | 9. **[Inspecting and Editing Knowledge Representations in Language Models](https://arxiv.org/abs/2304.00740)**[![github repo stars](https://img.shields.io/github/stars/evandez/REMEDI)](https://github.com/evandez/REMEDI) 187 | 188 | *Evan Hernandez, Belinda Z. Li, Jacob Andreas.* COLM'24 189 | 190 | 10. **[Forgetting before learning: Utilizing parametric arithmetic for knowledge updating in large language models](https://arxiv.org/abs/2311.08011)** 191 | 192 | *Shiwen Ni, Dingwei Chen, Chengming Li, Xiping Hu, Ruifeng Xu, Min Yang.* ACL'24 193 | 194 | 11. **[Ethos: Rectifying language models in orthogonal parameter space](http://arxiv.org/abs/2403.08994)** 195 | 196 | [Toxic/Bias Unlearning, SVD, Analysis of Parametric Knowledge, Task Vector] 197 | 198 | NAACL'24 findings 199 | 200 | ### 2023 201 | 1. **[Editing Large Language Models: Problems, Methods, and Opportunities](https://arxiv.org/abs/2305.13172)** 202 | 203 | *Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang.* EMNLP'23 204 | ### 2022 205 | 1. **[Locating and Editing Factual Associations in GPT](https://arxiv.org/abs/2202.05262)** 206 | 207 | *Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov.* NIPS'22 208 | 2. **[Memory-Based Model Editing at Scale](https://arxiv.org/abs/2206.06520)** 209 | 210 | *Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, Chelsea Finn.* ICLR'22 211 | ### 2021 212 | 1. **[Editing Factual Knowledge in Language Models](https://arxiv.org/abs/2104.08164)** 213 | 214 | *Nicola De Cao, Wilker Aziz, Ivan Titov.* EMNLP'21 215 | ### 2020 216 | 1. **[Editable neural networks.](https://arxiv.org/abs/2004.00345)** 217 | 218 | *Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitriy Pyrkin, Sergei Popov, Artem Babenko.* ICLR'20 219 | ## Knowledge Transfer🧚‍♀️ 220 | ### 2025 221 | 1. **[Unlocking efficient long-to-short LLM reasoning with model merging](http://arxiv.org/abs/2503.20641)** 222 | [Model Merging for L2S] Arxiv'25 223 | 224 | ### 2024 225 | 1. **[Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective](https://arxiv.org/abs/2310.11451)** 226 | 227 | *Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He.* ICLR'24 228 | 229 | 2. **[Initializing models with larger ones](https://arxiv.org/abs/2311.18823)**[![github repo stars](https://img.shields.io/github/stars/OscarXZQ/weight-selection)](https://github.com/OscarXZQ/weight-selection) 230 | 231 | *Zhiqiu Xu, Yanjie Chen, Kirill Vishniakov, Yida Yin, Zhiqiang Shen, Trevor Darrell, Lingjie Liu, Zhuang Liu.* ICLR'24 **Spotlight** 232 | 233 | 3. **[Cross-model Control: Improving Multiple Large Language Models in One-time Training](https://www.arxiv.org/abs/2410.17599)**[![github repo stars](https://img.shields.io/github/stars/wujwyi/CMC)](https://github.com/wujwyi/CMC) 234 | 235 | *Jiayi Wu, Hao Sun, Hengyi Cai, Lixin Su, Shuaiqiang Wang, Dawei Yin, Xiang Li, Ming Gao.* NIPS'24 236 | 237 | 4. **[Knowledge fusion of large language models](https://arxiv.org/abs/2401.10491)**[![github repo stars](https://img.shields.io/github/stars/fanqiwan/FuseLLM)](https://github.com/fanqiwan/FuseLLM) 238 | 239 | *Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi.* ICLR'24 240 | 241 | 5. **[Tuning language models by proxy](https://arxiv.org/abs/2401.08565)**[![github repo stars](https://img.shields.io/github/stars/alisawuffles/proxy-tuning)](https://github.com/alisawuffles/proxy-tuning) 242 | 243 | *Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith.* COLM'24 244 | 245 | 6. **[Chat vector: A simple approach to equip LLMs with instruction following and model alignment in new languages](http://arxiv.org/abs/2310.04799)**[![github repo stars](https://img.shields.io/github/stars/aqweteddy/ChatVector)](https://github.com/aqweteddy/ChatVector) 246 | 247 | [Task Vector, Parametric Knowledge, Knowledge Transfer] 248 | 249 | ACL'24 250 | 251 | 7. **[FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models](https://arxiv.org/abs/2406.02224)** 252 | 253 | [Federated Learning, Knowledge Transfer, Heterogeneous Token Alignment] 254 | 255 | Coling'25 256 | 257 | 8. **[Function vectors in large language models](http://arxiv.org/abs/2310.15213)** 258 | 259 | [Function Vector, Causal Mediation, Mechanism Interpretation] 260 | 261 | ICLR'24 262 | 263 | 9. **[Refine large language model fine-tuning via instruction vector](http://arxiv.org/abs/2406.12227)** 264 | 265 | [Catastrophic Forgetting, Function Vector, Causal Mediation] 266 | 267 | Preprint'24 268 | 269 | 10. **[KlF: Knowledge localization and fusion for language model continual learning](http://arxiv.org/abs/2408.05200)** 270 | 271 | [Catastrophic Forgetting, Continual Learning, Sensetity-based Location] 272 | 273 | ACL'24 274 | 275 | 11. **[Language models are super mario: Absorbing abilities from homologous models as a free lunch](http://arxiv.org/abs/2311.03099)** 276 | 277 | [Knowledge Transfer, Model Merging, Efficient Skill] ICML'24 278 | 279 | 12. **[Beyond task vectors: Selective task arithmetic based on importance metrics](http://arxiv.org/abs/2411.16139)** 280 | 281 | [Task Vector, Sensetivity-based Importance Score, Model Merging] Preprint'24 282 | 283 | 13. **[Determine-then-ensemble: Necessity of top-k union for large language model ensembling](http://arxiv.org/abs/2410.03777)** 284 | 285 | [Model Ensemble, Prabability-Level, Analysis] ICLR'25 Spotlight 286 | 287 | ### 2023 288 | 1. **[Mutual enhancement of large and small language models with cross-silo knowledge transfer](https://arxiv.org/abs/2312.05842)** 289 | 290 | *Yongheng Deng, Ziqing Qiao, Ju Ren, Yang Liu, Yaoxue Zhang.* Preprint'23 291 | 292 | 2. **[Learning to grow pretrained models for efficient transformer training](https://arxiv.org/abs/2303.00980)**[![github repo stars](https://img.shields.io/github/stars/VITA-Group/LiGO)](https://github.com/VITA-Group/LiGO) 293 | 294 | *Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David D. Cox, Zhangyang Wang, Yoon Kim.* ICLR'23 295 | 296 | 3. **[Retrieval-based knowledge transfer: An effective approach for extreme large language model compression](https://arxiv.org/abs/2310.15594)** 297 | 298 | *Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Dongyan Zhao, Ran Lucien Wang, Rui Yan.* EMNLP'23 Findings 299 | 300 | 4. **[Editing models with task arithmetic](http://arxiv.org/abs/2212.04089)**[![github repo stars](https://img.shields.io/github/stars/mlfoundations/task_vectors)](https://github.com/mlfoundations/task_vectors) 301 | 302 | [Task Vecotr, Parametric Knowledge, Knowledge Transfer, Multi-task Learning] 303 | 304 | ICLR'23 305 | 306 | 5. **[Task-Specific Skill Localization in Fine-tuned Language Models](http://arxiv.org/abs/2302.06600)** 307 | 308 | [Knowledge Transfer, Model Graft, Skill Parameter Localization] 309 | 310 | ICML'23 311 | 312 | 6. **[Composing parameter-efficient modules with arithmetic operations](http://arxiv.org/abs/2306.14870)** 313 | 314 | [PEFT, Task Vector, Model Merge] 315 | 316 | NIPS'23 317 | 318 | 7. **[Dataless knowledge fusion by merging weights of language models](http://arxiv.org/abs/2212.09849)** 319 | 320 | [Model Merge] 321 | 322 | ICLR'23 323 | ### 2021 324 | 1. **[Weight distillation: Transferring the knowledge in neural network parameters](https://arxiv.org/abs/2009.09152)**[![github repo stars](https://img.shields.io/github/stars/Lollipop321/weight-distillation)](https://github.com/Lollipop321/weight-distillation) 325 | 326 | *Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu.* ACL'21 327 | 328 | ## Activation Steering 329 | ## 2024 330 | 1. **[Multi-property Steering of Large Language Models with Dynamic Activation Composition](https://arxiv.org/abs/2406.17563)** [![github repo stars](https://img.shields.io/github/stars/DanielSc4/Dynamic-Activation-Composition)](https://github.com/DanielSc4/Dynamic-Activation-Composition) 331 | 332 | *Daniel Scalena, Gabriele Sarti, Malvina Nissim.* ACL'24 BlackboxNLP Workshop 333 | 334 | 2. **[Word embeddings are steers for language models](http://arxiv.org/abs/2305.12798)** 335 | 336 | [Word Embedding Steering, Generation Control] ACL'24 337 | 338 | ## 2023 339 | 1. **[Activation Addition: Steering Language Models Without Optimization](https://arxiv.org/abs/2308.10248v4)** 340 | 341 | *Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, Monte MacDiarmid.* Preprint'23 342 | ## Knowledge Distillation 343 | ### 2024 344 | 1. **[PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning](https://arxiv.org/abs/2402.12842)**[![github repo stars](https://img.shields.io/github/stars/gmkim-ai/PromptKD)](https://github.com/gmkim-ai/PromptKD)(Note: not parametric) 345 | 346 | *Gyeongman Kim, Doohyuk Jang, Eunho Yang.* EMNLP'24 findings 347 | 348 | 2. **[From Instance Training to Instruction Learning: Task Adapters Generation from Instructions](https://arxiv.org/abs/2406.12382)**[![github repo stars](https://img.shields.io/github/stars/Xnhyacinth/TAGI)](https://github.com/Xnhyacinth/TAGI/) 349 | 350 | *Huanxuan Liao, Yao Xu, Shizhu He, Yuanzhe Zhang, Yanchao Hao, Shengping Liu, Kang Liu, Jun Zhao.* NIPS'24 351 | 352 | 3. **[When babies teach babies: Can student knowledge sharing outperform teacher-guided distillation on small datasets?](https://arxiv.org/abs/2411.16487v1)** 353 | 354 | *Srikrishna Iyer.* EMNLP'24 CoNLL Workshop 355 | 356 | ## Pramatric Quantization 357 | 358 | ### 2024 359 | 360 | 1. **[OneBit: Towards extremely low-bit large language models](https://arxiv.org/abs/2402.11295)** [![github repo stars](https://img.shields.io/github/stars/xuyuzhuang11/OneBit)]( https://github.com/xuyuzhuang11/OneBit) 361 | 362 | 363 | *Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, Wanxiang Che.* NIPS'24 364 | 365 | ### 2023 366 | 367 | 1. **[The cost of compression: Investigating the impact of compression on parametric knowledge in language models](https://arxiv.org/abs/2312.00960)** [![github repo stars](https://img.shields.io/github/stars/NamburiSrinath/LLMCompression)](https://github.com/NamburiSrinath/LLMCompression) 368 | 369 | *Satya Sai Srinath Namburi, Makesh Sreedhar, Srinath Srinivasan, Frederic Sala.* EMNLP'23 findings 370 | 371 | ## Knowledge Injection 372 | ### 2024 373 | 1. **[Awakening augmented generation: Learning to awaken internal knowledge of large language models for question answering](http://arxiv.org/abs/2403.15268)**[![github repo stars](https://img.shields.io/github/stars/Xnhyacinth/IAG)](https://github.com/Xnhyacinth/IAG) 374 | 375 | [HyperNet, RAG, Context Compression] 376 | 377 | *Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Kang Liu, Shengping Liu, Jun Zhao.* AAAI'25 378 | 379 | 2. **[Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass](https://arxiv.org/abs/2411.05877)** 380 | 381 | [Hypernetwork, Temperal Knowledge, Context Compression] ICLR'25 382 | ### 2023 383 | 1. **[Memory injections: Correcting multi-hop reasoning failures during inference in transformer-based language models](http://arxiv.org/abs/2309.05605)**[![github repo stars](https://img.shields.io/github/stars/msakarvadia/memory_injections)](https://github.com/msakarvadia/memory_injections) 384 | 385 | *Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster.* Oral Presentation at BlackboxNLP Workshop at EMNLP'23 386 | 387 | 2. **[Decouple knowledge from parameters for plug-and-play language modeling](http://arxiv.org/abs/2305.11564)**[![github repo stars](https://img.shields.io/github/stars/Hannibal046/PlugLM)](https://github.com/Hannibal046/PlugLM) 388 | 389 | *Xin Cheng, Yankai Lin, Xiuying Chen, Dongyan Zhao, Rui Yan.* ACL'23 findings 390 | 391 | 3. **[IN-PARAMETER KNOWLEDGE INJECTION: INTEGRATING TEMPORARY CONTEXTUAL INFORMATION INTO MODEL PARAMETERS](https://openreview.net/forum?id=sl4hOq9wm9)** 392 | 393 | submitted to ICLR'25 394 | ### 2022 395 | 1. **[Kformer: Knowledge injection in transformer feed-forward layers](http://arxiv.org/abs/2201.05742)**[![github repo stars](https://img.shields.io/github/stars/zjunlp/Kformer)](https://github.com/zjunlp/Kformer) 396 | 397 | *Yunzhi Yao, Shaohan Huang, Li Dong, Furu Wei, Huajun Chen, Ningyu Zhang.* NLPCC'22 398 | 399 | ## Parameter-Effecient Fine-tuning(PEFT) 400 | ### 2024 401 | 1. **[KaSA: Knowledge-aware singular-value adaptation of large language models](http://arxiv.org/abs/2412.06071)**[![github repo stars](https://img.shields.io/github/stars/juyongjiang/KaSA)](https://github.com/juyongjiang/KaSA) 402 | 403 | [Knowledge-aware LoRA, SVD] 404 | 405 | *Fan Wang, Juyong Jiang, Chansung Park, Sunghun Kim, Jing Tang.* Preprint'24 406 | 407 | 2. **[CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning](https://arxiv.org/abs/2406.05223)**[![github repo stars](https://img.shields.io/github/stars/iboing/CorDA)](https://github.com/iboing/CorDA) 408 | 409 | [Knowledge-aware LoRA, SVD] 410 | 411 | *Yibo Yang, Xiaojie Li, Zhongzhu Zhou, Shuaiwen Leon Song, Jianlong Wu, Liqiang Nie, Bernard Ghanem.* NIPS'24 412 | 413 | 3. **[DoRA: Weight-Decomposed Low-Rank Adaptation](https://arxiv.org/abs/2402.09353)**[![github repo stars](https://img.shields.io/github/stars/NVlabs/DoRA)](https://github.com/NVlabs/DoRA) 414 | 415 | [Weight-Decomposed LoRA, SVD, Analysis of FT and LoRA] 416 | *Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen.* ICML'24 Oral 417 | 418 | 4. **[Low-rank adaptation with task-relevant feature enhancement for fine-tuning language models](http://arxiv.org/abs/2412.09827)** 419 | 420 | [Task-aware LoRA, Hidden Representation Enhancement] AAAI'25 CoLoRAI Workshop 421 | 422 | 5 **[Train small, infer large: Memory-efficient LoRA training for large language models](http://arxiv.org/abs/2502.13533)** 423 | 424 | [Memory-efficient LoRA Training, Pruning Methods, High memory efficiency] 425 | ## Continual Learning 426 | ### 2024 427 | 1. **[Learn more, but bother less: Parameter efficient continual learning](https://neurips.cc/virtual/2024/poster/94599)** 428 | 429 | [Continual Learning, Parameter Efficient, Knowledge Transfer] NIPS'24 430 | 431 | 2. **[What will my model forget? Forecasting forgotten examples in language model refinement](http://arxiv.org/abs/2402.01865)** 432 | 433 | [Catastrophic Forgetting, Forecasting Forgetting, Analysis] ICML'24 Spotlight 434 | ## RAG 435 | 436 | ### 2025 437 | 1. **[Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement](https://arxiv.org/abs/2503.23895)** [[code](https://github.com/Trae1ounG/DyPRAG)] [[Homepage](https://trae1ounG.github.io/DyPRAG/)] 438 | 439 | [Dynamic Parametric RAG, Test-time Knowledge Enhancement, Reducing RAG Hallucination, Plug-and-Play RAG Improvement] Arxiv'25 440 | 441 | ### 2024 442 | 1. **[xRAG: Extreme context compression for retrieval-augmented generation with one token](http://arxiv.org/abs/2405.13792)** 443 | 444 | [Context Compression, RAG, Multimodal Fusion] NIPS'24 445 | 446 | 2. **[Parametric retrieval augmented generation](http://arxiv.org/abs/2501.15915)** 447 | 448 | [Parametric RAG, Document Parameterization, Offline Method] 449 | 450 | 3. **[RAGTruth: A hallucination corpus for developing trustworthy retrieval-augmented language models](http://arxiv.org/abs/2401.00396)** 451 | 452 | [RAG, Hallucination, Benchmark] ACL'24 453 | 454 | 455 | 456 | ## Long Context Extend 457 | ### 2024 458 | 459 | 1. **[LongEmbed: Extending embedding models for long context retrieval](http://arxiv.org/abs/2404.12096)** 460 | 461 | [Long Context, Embedding Model, Benchmark] EMNLP'24 462 | 463 | 2. **[LLM maybe LongLM: Self-extend LLM context window without tuning](http://arxiv.org/abs/2401.01325)** 464 | 465 | [Long Context Extend, Plug-and-Play Method] ICML'24 Spotlight 466 | 467 | 3. **[Two stones hit one bird: Bilevel positional encoding for better length extrapolation](http://arxiv.org/abs/2401.16421)** 468 | 469 | [Long Context Extend, Absolute PE + Relative PE, Plug-and-Play but Training-based Method] ICML'24 470 | ### 2023 471 | 1. **YaRN: Efficient context window extension of large language models[http://arxiv.org/abs/2309.00071]** 472 | 473 | [Long Context Extend, Variation of RoPE] ICLR'24 474 | ### 2022 475 | 1. **[Train short, test long: Attention with linear biases enables input length extrapolation](http://arxiv.org/abs/2108.12409)** 476 | 477 | [Alibi, Long Context Extrapolate, Training-based Method] ICLR'22 478 | 479 | ### 2021 480 | 1. **[RoFormer: Enhanced Transformer with Rotary Position Embedding.](https://arxiv.org/abs/2104.09864)** 481 | 482 | [Rotary Position Embedding, Classic] 483 | 484 | ## Long2Short Compression 485 | ### 2025 486 | 1. **[TokenSkip: Controllable chain-of-thought compression in LLMs](http://arxiv.org/abs/2502.12067)** 487 | [L2S, LLMLingua-based Compression, Fine-tune] Arxiv'25 488 | 489 | 2. **[CoT-valve: Length-compressible chain-of-thought tuning](http://arxiv.org/abs/2502.09601)** 490 | [L2S, Task-vector-based Compression, Fine-tune] Arxiv'25 491 | 492 | ### 2024 493 | 1. **[LLMLingua-2: Data distillation for efficient and faithful task-agnostic prompt compression](http://arxiv.org/abs/2403.12968)** 494 | [Prompt Compression] ACL'24 Findings 495 | ### 2023 496 | 1. **[LLMLingua: Compressing prompts for accelerated inference of large language models](http://arxiv.org/abs/2310.05736)** 497 | [Prompt Compression, Bert Model] EMNLP'23 498 | 499 | ## Star History 500 | 501 | [![Star History Chart](https://api.star-history.com/svg?repos=Trae1ounG/Awesome-parametric-Knowledge-in-LLMs&type=Date)](https://star-history.com/#Trae1ounG/Awesome-parametric-Knowledge-in-LLMs&Date) --------------------------------------------------------------------------------