├── README.md
└── fig
└── overview.png
/README.md:
--------------------------------------------------------------------------------
1 | # Continual Learning of Large Language Models: A Comprehensive Survey
2 | This is an updating survey for Continual Learning of Large Language Models (CL-LLMs), a constantly updated and extended version for the manuscript "[Continual Learning of Large Language Models: A Comprehensive Survey](https://arxiv.org/abs/2404.16789)", published in [ACM Computing Surveys 2025](https://dl.acm.org/doi/10.1145/3735633).
3 |
4 | ***Welcome to contribute to this survey by submitting a pull request or opening an issue!***
5 |
6 |
7 |
8 |
9 |
10 | ### Update History
11 | - **[05/2025] Our paper has been accpeted to [CSUR](https://dl.acm.org/doi/10.1145/3735633)!**
12 | - **[03/2025] 🔥 new papers: 03/2025.**
13 | - **[02/2025] ⭐ new papers: 02/2025.**
14 | - [01/2025] new papers: 01/2025.
15 | - [12/2024] new papers: 12/2024.
16 | - [11/2024] new papers: 11/2024.
17 | - [11/2024] we have an [updated version](https://arxiv.org/abs/2404.16789), which is concise and free of broken links.
18 | - [10/2024] new papers: 10/2024.
19 | - [09/2024] new papers: 07/2024 - 09/2024.
20 | - [07/2024] new papers: 06/2024 - 07/2024.
21 | - [07/2024] the [updated version of the paper](https://arxiv.org/abs/2404.16789) has been released on arXiv.
22 | - [06/2024] new papers: 05/2024 - 06/2024.
23 | - [05/2024] new papers: 02/2024 - 05/2024.
24 | - [04/2024] initial release.
25 |
26 | ### Table of Contents
27 | * [Relevant Survey Papers](#relevant-survey-papers)
28 | * [Continual Pre-Training of LLMs (CPT)](#continual-pre-training-of-llms-cpt)
29 | * [Domain-Adaptive Pre-Training of LLMs (DAP)](#domain-adaptive-pre-training-of-llms-dap)
30 | * [General Domains](#for-general-domains)
31 | * [Legal Domain](#legal-domain)
32 | * [Medical Domain](#medical-domain)
33 | * [Financial Domain](#financial-domain)
34 | * [Scientific Domain](#scientific-domain)
35 | * [Code Domain](#code-domain)
36 | * [Language Domain](#language-domain)
37 | * [Other Domains](#other-domains)
38 | * [Continual Fine-Tuning of LLMs (CFT)](#continual-fine-tuning-of-llms-cft)
39 | * [General Continual Fine-Tuning](#general-continual-fine-tuning)
40 | * [Continual Instruction Tuning (CIT)](#continual-instruction-tuning-cit)
41 | * [Continual Model Refinement (CMR)](#continual-model-refinement-cmr)
42 | * [Continual Model Alignment (CMA)](#continual-model-alignment-cma)
43 | * [Continual Multimodal LLMs (CMLLMs)](#continual-multimodal-llms-cmllms)
44 | * [Continual LLMs Miscs](#continual-llms-miscs)
45 |
46 | ## Relevant Survey Papers
47 | - 🔥 Continual Pre-training of MoEs: How robust is your router? [[paper](https://arxiv.org/abs/2503.05029)]
48 | - 🔥 Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model [[paper](https://arxiv.org/abs/2503.04543)][[code](https://github.com/WenkeHuang/Awesome-MLLM-Tuning)]
49 | - Towards Lifelong Learning of Large Language Models: A Survey [[paper](https://arxiv.org/abs/2406.06391)][[code](https://github.com/qianlima-lab/awesome-lifelong-learning-methods-for-llm)]
50 | - Recent Advances of Foundation Language Models-based Continual Learning: A Survey [[paper](https://arxiv.org/pdf/2405.18653)]
51 | - A Comprehensive Survey of Continual Learning: Theory, Method and Application (TPAMI 2024) [[paper](https://arxiv.org/abs/2302.00487)]
52 | - Continual Learning for Large Language Models: A Survey [[paper](https://arxiv.org/abs/2402.01364)]
53 | - Continual Lifelong Learning in Natural Language Processing: A Survey (COLING 2020) [[paper](https://arxiv.org/abs/2012.09823)]
54 | - Continual Learning of Natural Language Processing Tasks: A Survey [[paper](https://arxiv.org/abs/2211.12701)]
55 | - A Survey on Knowledge Distillation of Large Language Models [[paper](https://arxiv.org/abs/2402.13116)]
56 |
57 |
58 | ## Continual Pre-Training of LLMs (CPT)
59 | - 🔥 Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training [[paper](https://arxiv.org/abs/2503.02844)]
60 | - ⭐ Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models? [[paper](https://arxiv.org/abs/2502.11895)]
61 | - ⭐ LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation [[paper](https://arxiv.org/abs/2502.07365)][[code](https://github.com/RUCAIBox/LongReD)]
62 | - ⭐ FIRE: Flexible Integration of Data Quality Ratings for Effective Pre-Training [[paper](https://arxiv.org/abs/2502.00761)]
63 | - Control LLM: Controlled Evolution for Intelligence Retention in LLM [[paper](https://arxiv.org/abs/2501.10979)][[code](https://github.com/linkedin/ControlLLM)][[huggingface](https://huggingface.co/ControlLLM)]
64 | - TiC-LM: A Multi-Year Benchmark for Continual Pretraining of Language Models [[paper](https://openreview.net/forum?id=PpSDVE5rAy)]
65 | - Gradient Localization Improves Lifelong Pretraining of Language Models [[paper](https://arxiv.org/abs/2411.04448)]
66 | - Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs [[paper](https://arxiv.org/abs/2410.10739)]
67 | - A Learning Rate Path Switching Training Paradigm for Version Updates of Large Language Models [[paper](https://arxiv.org/abs/2410.04103)]
68 | - A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio [[paper](https://arxiv.org/abs/2409.06624)]
69 | - Towards Effective and Efficient Continual Pre-training of Large Language Models [[paper](https://arxiv.org/abs/2407.18743)][[code](https://github.com/RUC-GSAI/Llama-3-SynE)]
70 | - Bilingual Adaptation of Monolingual Foundation Models
71 | [[paper](https://arxiv.org/abs/2407.12869)]
72 | - Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment [[paper](https://arxiv.org/abs/2407.10804)]
73 | - Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale [[paper](https://arxiv.org/abs/2407.02118)]
74 | - LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training [[paper](https://arxiv.org/abs/2406.16554)][[code](https://github.com/pjlab-sys4nlp/llama-moe)]
75 | - Efficient Continual Pre-training by Mitigating the Stability Gap [[paper](https://arxiv.org/abs/2406.14833)][[huggingface](https://huggingface.co/YiDuo1999/Llama-3-Physician-8B-Instruct.)]
76 | - How Do Large Language Models Acquire Factual Knowledge During Pretraining? [[paper](https://arxiv.org/abs/2406.11813)]
77 | - DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion [[paper](https://arxiv.org/abs/2406.06567)]
78 | - MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning [[paper](https://arxiv.org/abs/2405.12130)][[code](https://github.com/kongds/MoRA)]
79 | - Large Language Model Can Continue Evolving From Mistakes [[paper](https://arxiv.org/abs/2404.08707)]
80 | - Rho-1: Not All Tokens Are What You Need [[paper](https://arxiv.org/abs/2404.07965)][[code](https://github.com/microsoft/rho)]
81 | - Simple and Scalable Strategies to Continually Pre-train Large Language Models [[paper](https://arxiv.org/abs/2403.08763)]
82 | - Investigating Continual Pretraining in Large Language Models: Insights and Implications [[paper](https://arxiv.org/abs/2402.17400)]
83 | - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization [[paper](https://arxiv.org/abs/2402.14270)][[code](https://github.com/VITA-Group/HardFocusTraining)]
84 | - TimeLMs: Diachronic Language Models from Twitter (ACL 2022, Demo Track) [[paper](https://arxiv.org/abs/2202.03829)][[code](https://github.com/cardiffnlp/timelms)]
85 | - Continual Pre-Training of Large Language Models: How to (re)warm your model? [[paper](https://arxiv.org/abs/2308.04014)]
86 | - Continual Learning Under Language Shift [[paper](https://arxiv.org/abs/2311.01200)]
87 | - Examining Forgetting in Continual Pre-training of Aligned Large Language Models [[paper](https://arxiv.org/abs/2401.03129)]
88 | - Towards Continual Knowledge Learning of Language Models (ICLR 2022) [[paper](https://arxiv.org/abs/2110.03215)][[code](https://github.com/joeljang/continual-knowledge-learning)]
89 | - Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora (NAACL 2022) [[paper](https://arxiv.org/abs/2110.08534)]
90 | - TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models (EMNLP 2022) [[paper](https://arxiv.org/abs/2204.14211)][[code](https://github.com/joeljang/temporalwiki)]
91 | - Continual Training of Language Models for Few-Shot Learning (EMNLP 2022) [[paper](https://arxiv.org/abs/2210.05549)][[code](https://github.com/UIC-Liu-Lab/CPT)]
92 | - ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding (AAAI 2020) [[paper](https://arxiv.org/abs/1907.12412)][[code](https://github.com/PaddlePaddle/ERNIE)]
93 | - Dynamic Language Models for Continuously Evolving Content (KDD 2021) [[paper](https://arxiv.org/abs/2106.06297)]
94 | - Continual Pre-Training Mitigates Forgetting in Language and Vision [[paper](https://arxiv.org/abs/2205.09357)][[code](https://github.com/AndreaCossu/continual-pretraining-nlp-vision)]
95 | - DEMix Layers: Disentangling Domains for Modular Language Modeling (NAACL 2022) [[paper](https://arxiv.org/abs/2108.05036)][[code](https://github.com/kernelmachine/demix)]
96 | - Time-Aware Language Models as Temporal Knowledge Bases (TACL 2022) [[paper](https://arxiv.org/abs/2106.15110)]
97 | - Recyclable Tuning for Continual Pre-training (ACL 2023 Findings) [[paper](https://arxiv.org/abs/2305.08702)][[code](https://github.com/thunlp/RecyclableTuning)]
98 | - Lifelong Language Pretraining with Distribution-Specialized Experts (ICML 2023) [[paper](https://arxiv.org/abs/2305.12281)]
99 | - ELLE: Efficient Lifelong Pre-training for Emerging Data (ACL 2022 Findings) [[paper](https://arxiv.org/abs/2203.06311)][[code](https://github.com/thunlp/ELLE)]
100 |
101 |
102 | ## Domain-Adaptive Pre-Training of LLMs (DAP)
103 | ### For General Domains
104 | - Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models [[paper](https://arxiv.org/abs/2412.07171)]
105 | - DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining [[paper](https://arxiv.org/abs/2410.00260)]
106 | - Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models [[paper](https://arxiv.org/abs/2408.06663)]
107 | - CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models [[paper](https://arxiv.org/abs/2407.17467)]
108 | - Task Oriented In-Domain Data Augmentation [[paper](https://arxiv.org/abs/2406.16694)]
109 | - Instruction Pre-Training: Language Models are Supervised Multitask Learners [[paper](https://arxiv.org/abs/2406.14491)][[code](https://github.com/microsoft/LMOps)][[huggingface](https://huggingface.co/instruction-pretrain)]
110 | - D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models [[paper](https://arxiv.org/abs/2406.01375)]
111 | - BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [[paper](https://arxiv.org/abs/2403.18365)]
112 | - Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains [[paper](https://arxiv.org/abs/2402.05140)]
113 | - Adapting Large Language Models via Reading Comprehension (ICLR 2024) [[paper](https://arxiv.org/abs/2309.09530)][[code](https://github.com/microsoft/LMOps)]
114 |
115 | ### Legal Domain
116 | - The interplay between domain specialization and model size: a case study in the legal domain [[paper](https://arxiv.org/abs/2501.02068)]
117 | - SaulLM-7B: A pioneering Large Language Model for Law [[paper](https://arxiv.org/abs/2403.03883)][[huggingface](https://huggingface.co/papers/2403.03883)]
118 | - Lawyer LLaMA Technical Report [[paper](https://arxiv.org/abs/2305.15062)]
119 |
120 | ### Medical Domain
121 | - PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications [[paper](https://arxiv.org/abs/2405.19266)]
122 | - Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare [[paper](https://arxiv.org/abs/2404.16621)][[project](https://cyberiada.github.io/Hippocrates/)][[huggingface](https://huggingface.co/emrecanacikgoz)]
123 | - Me LLaMA: Foundation Large Language Models for Medical Applications [[paper](https://arxiv.org/abs/2402.12749)][[code](https://github.com/BIDS-Xu-Lab/Me-LLaMA)]
124 | - BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine [[paper](https://arxiv.org/abs/2308.09442)][[code](https://github.com/PharMolix/OpenBioMed)]
125 | - Continuous Training and Fine-tuning for Domain-Specific Language Models in Medical Question Answering [[paper](https://arxiv.org/abs/2311.00204)]
126 | - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [[paper](https://arxiv.org/abs/2304.14454)][[code](https://github.com/chaoyi-wu/PMC-LLaMA)]
127 | - AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model [[paper](https://arxiv.org/abs/2211.11363)]
128 | - Continual Domain-Tuning for Pretrained Language Models [[paper](https://arxiv.org/abs/2004.02288)]
129 | - HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs [[paper](https://arxiv.org/abs/2311.09774)][[code](https://github.com/FreedomIntelligence/HuatuoGPT-II)]
130 |
131 | ### Financial Domain
132 | - Demystifying Domain-adaptive Post-training for Financial LLMs [[paper](https://arxiv.org/abs/2501.04961)][[code](https://github.com/SalesforceAIResearch/FinDap)]
133 | - Baichuan4-Finance Technical Report [[paper](https://arxiv.org/abs/2412.15270)]
134 | - The Construction of Instruction-tuned LLMs for Finance without Instruction Data Using Continual Pretraining and Model Merging [[paper](https://arxiv.org/abs/2409.19854)][[huggingface](https://huggingface.co/pfnet/nekomata-14b-pfn-qfin-inst-merge)]
135 | - Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications [[paper](https://arxiv.org/abs/2408.11878)]
136 | - Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation [[paper](https://arxiv.org/abs/2406.14971)][[huggingface](https://huggingface.co/arcee-ai/Llama-3-SEC-Base)]
137 | - Construction of Domain-specified Japanese Large Language Model for Finance through Continual Pre-training [[paper](https://arxiv.org/abs/2404.10555)]
138 | - Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain [[paper](https://arxiv.org/abs/2404.08262)][[huggingface](https://huggingface.co/stockmark)]
139 | - BBT-Fin: Comprehensive Construction of Chinese Financial Domain Pre-trained Language Model, Corpus and Benchmark [[paper](https://arxiv.org/abs/2302.09432)][[code](https://github.com/ssymmetry/BBT-FinCUGE-Applications)]
140 | - CFGPT: Chinese Financial Assistant with Large Language Model [[paper](https://arxiv.org/abs/2309.10654)][[code](https://github.com/TongjiFinLab/CFGPT)]
141 | - Efficient Continual Pre-training for Building Domain Specific Large Language Models [[paper](https://arxiv.org/abs/2311.08545)]
142 | - WeaverBird: Empowering Financial Decision-Making with Large Language Model, Knowledge Base, and Search Engine [[paper](https://arxiv.org/abs/2308.05361)][[code](https://github.com/ant-research/fin_domain_llm)][[huggingface](https://huggingface.co/weaverbirdllm)][[demo](https://www.youtube.com/watch?v=yofgeqnlrMc)]
143 | - XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters [[paper](https://arxiv.org/abs/2305.12002)][[huggingface](https://huggingface.co/xyz-nlp/XuanYuan2.0)]
144 |
145 | ### Scientific Domain
146 | - MELT: Materials-aware Continued Pre-training for Language Model Adaptation to Materials Science [[paper](https://arxiv.org/abs/2410.15126)][[code](https://github.com/JunhoKim94/MELT)]
147 | - AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy [[paper](https://arxiv.org/abs/2409.19750)][[huggingface](https://huggingface.co/AstroMLab)]
148 | - SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding [[paper](https://arxiv.org/abs/2408.15545)]
149 | - PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes [[paper](https://arxiv.org/abs/2406.13193)][[code](https://github.com/IDEA-XL/PRESTO)]
150 | - ClimateGPT: Towards AI Synthesizing Interdisciplinary Research on Climate Change [[paper](https://arxiv.org/abs/2401.09646)][[hugginface](https://huggingface.co/collections/eci-io/climategpt-65a83cd8a92d5908dfffc849)]
151 | - AstroLLaMA: Towards Specialized Foundation Models in Astronomy [[paper](https://arxiv.org/abs/2309.06126)]
152 | - OceanGPT: A Large Language Model for Ocean Science Tasks [[paper](https://arxiv.org/abs/2310.02031)][[code](https://github.com/zjunlp/KnowLM)]
153 | - K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization [[paper](https://arxiv.org/abs/2306.05064)][[code](https://github.com/davendw49/k2)][[huggingface](https://huggingface.co/daven3/k2-v1)]
154 | - MarineGPT: Unlocking Secrets of "Ocean" to the Public [[paper](https://arxiv.org/abs/2310.13596)][[code](https://github.com/hkust-vgd/MarineGPT)]
155 | - GeoGalactica: A Scientific Large Language Model in Geoscience [[paper](https://arxiv.org/abs/2401.00434)][[code](https://github.com/geobrain-ai/geogalactica)][[huggingface](https://huggingface.co/papers/2401.00434)]
156 | - Llemma: An Open Language Model For Mathematics [[paper](https://arxiv.org/abs/2310.10631)][[code](https://github.com/EleutherAI/math-lm)][[huggingface](https://huggingface.co/EleutherAI/llemma_34b)]
157 | - PLLaMa: An Open-source Large Language Model for Plant Science [[paper](https://arxiv.org/abs/2401.01600)][[code](https://github.com/Xianjun-Yang/PLLaMa)][[huggingface](https://huggingface.co/papers/2401.01600)]
158 |
159 | ### Code Domain
160 | - CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis [[paper](https://arxiv.org/abs/2203.13474)][[code](https://github.com/salesforce/CodeGen)][[huggingface](https://huggingface.co/models?search=salesforce+codegen)]
161 | - Code Needs Comments: Enhancing Code LLMs with Comment Augmentation [[code](https://arxiv.org/abs/2402.13013)]
162 | - StarCoder: may the source be with you! [[ppaer](https://arxiv.org/abs/2305.06161)][[code](https://github.com/bigcode-project/starcoder)]
163 | - DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence [[paper](https://arxiv.org/abs/2401.14196)][[code](https://github.com/deepseek-ai/DeepSeek-Coder)][[huggingface](https://huggingface.co/deepseek-ai)]
164 | - IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators [[paper](https://arxiv.org/abs/2403.03894)][[code](https://github.com/UKPLab/arxiv2024-ircoder)]
165 | - Code Llama: Open Foundation Models for Code [[paper](https://arxiv.org/abs/2308.12950)][[code]( https://github.com/facebookresearch/codellama)]
166 |
167 | ### Language Domain
168 | - Domain-adaptative Continual Learning for Low-resource Tasks: Evaluation on Nepali [[paper](https://arxiv.org/abs/2412.13860)][[code](github.com/sharad461/DAPT-Nepali)]
169 | - Efficient Continual Pre-training of LLMs for Low-resource Languages [[paper](https://arxiv.org/abs/2412.10244)]
170 | - Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement [[paper](https://arxiv.org/abs/2412.04003)]
171 | - Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training [[paper](https://arxiv.org/abs/2411.17799)]
172 | - Code-Switching Curriculum Learning for Multilingual Transfer in LLMs [[paper](https://arxiv.org/abs/2411.02460)]
173 | - RedWhale: An Adapted Korean LLM Through Efficient Continual Pretraining [[paper](https://arxiv.org/abs/2408.11294)]
174 | - Unlocking the Potential of Model Merging for Low-Resource Languages
175 | [[paper](https://arxiv.org/abs/2407.03994)]
176 | - Mitigating Catastrophic Forgetting in Language Transfer via Model Merging [[paper](https://arxiv.org/abs/2407.08699)]
177 | - Enhancing Translation Accuracy of Large Language Models through Continual Pre-Training on Parallel Data [[paper](https://arxiv.org/abs/2407.03145)]
178 | - BAMBINO-LM: (Bilingual-)Human-Inspired Continual Pretraining of BabyLM [[paper](https://arxiv.org/abs/2406.11418)]
179 | - InstructionCP: A fast approach to transfer Large Language Models into target language [[paper](https://arxiv.org/abs/2405.20175)]
180 | - Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities [[paper](https://arxiv.org/abs/2404.17790)]
181 | - Sailor: Open Language Models for South-East Asia [[paper](https://arxiv.org/abs/2404.03608)][[code](https://github.com/sail-sg/sailor-llm)]
182 | - Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order [[paper](https://arxiv.org/abs/2404.00399)][[huggingface](https://huggingface.co/collections/aurora-m/aurora-m-models-65fdfdff62471e09812f5407)]
183 |
184 | ### Other Domains
185 | - Domain Adaptation of Foundation LLMs for e-Commerce [[paper](https://arxiv.org/abs/2501.09706)]
186 | - Adapting Large Language Models to Log Analysis with Interpretable Domain Knowledge [[paper](https://arxiv.org/abs/2412.01377)]
187 | - CPRM: A LLM-based Continual Pre-training Framework for Relevance Modeling in Commercial Search [[paper](https://arxiv.org/abs/2412.01269)]
188 | - LLaMA Pro: Progressive LLaMA with Block Expansion [[paper](https://arxiv.org/abs/2401.02415)][[code](https://github.com/TencentARC/LLaMA-Pro)][[huggingface](https://huggingface.co/TencentARC/LLaMA-Pro-8B)]
189 | - ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning [[paper](https://arxiv.org/abs/2012.15283)][[code](https://github.com/PlusLabNLP/ECONET)]
190 | - Pre-training Text-to-Text Transformers for Concept-centric Common Sense [[paper](https://arxiv.org/abs/2011.07956)][[code](https://github.com/INK-USC/CALM/)][[project](https://inklab.usc.edu/calm-project/)]
191 | - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks (ACL 2020) [[paper](https://arxiv.org/abs/2004.10964)][[code](https://github.com/allenai/dont-stop-pretraining)]
192 | - EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data [[paper](https://arxiv.org/abs/2312.15696)]
193 |
194 | ## Continual Fine-Tuning of LLMs (CFT)
195 |
196 | ### General Continual Fine-Tuning
197 | - ⭐ PCL: Prompt-based Continual Learning for User Modeling in Recommender Systems [[paper](https://arxiv.org/abs/2502.19628)]
198 | - ⭐ Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging [[paper](https://arxiv.org/abs/2502.12217)]
199 | - ⭐ SPARC: Subspace-Aware Prompt Adaptation for Robust Continual Learning in LLMs [[paper](https://arxiv.org/abs/2502.02909)]
200 | - Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model [[paper](https://openreview.net/forum?id=lUl3Iz4k64)]
201 | - Preserving Generalization of Language models in Few-shot Continual Relation Extraction [[paper](https://arxiv.org/abs/2410.00334)]
202 | - MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning [[paper](https://arxiv.org/abs/2407.20999)]
203 | - Learn it or Leave it: Module Composition and Pruning for Continual Learning [[paper](https://arxiv.org/abs/2406.18708)]
204 | - Unlocking Continual Learning Abilities in Language Models [[paper](https://arxiv.org/abs/2406.17245)][[code](https://github.com/wenyudu/MIGU)]
205 | - Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning (NeurIPS 2021) [[paper](https://arxiv.org/abs/2112.02706)][[code](https://github.com/ZixuanKe/PyContinual)]
206 | - Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study (ICLR 2023) [[paper](https://arxiv.org/abs/2303.01081)][[code](https://github.com/kobayashikanna01/plms_are_lifelong_learners)]
207 | - CIRCLE: Continual Repair across Programming Languages (ISSTA 2022) [[paper](https://arxiv.org/abs/2205.10956)]
208 | - ConPET: Continual Parameter-Efficient Tuning for Large Language Models [[paper](https://arxiv.org/abs/2309.14763)][[code](https://github.com/Raincleared-Song/ConPET)]
209 | - Enhancing Continual Learning with Global Prototypes: Counteracting Negative Representation Drift [[paper](https://arxiv.org/abs/2205.12186)]
210 | - Investigating Forgetting in Pre-Trained Representations Through Continual Learning [[paper](https://arxiv.org/abs/2305.05968)]
211 | - Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models [[paper](https://arxiv.org/abs/2312.07887)][[code](https://github.com/zzz47zzz/pretrained-lm-for-incremental-learning)]
212 | - LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5 (ICLR 2022) [[paper](https://arxiv.org/abs/2110.07298)][[code](https://github.com/qcwthu/Lifelong-Fewshot-Language-Learning)]
213 | - On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code [[paper](https://arxiv.org/abs/2305.04106)]
214 | - Overcoming Catastrophic Forgetting in Massively Multilingual Continual Learning (ACL 2023 Findings) [[paper](https://arxiv.org/abs/2305.16252)]
215 | - Parameterizing Context: Unleashing the Power of Parameter-Efficient Fine-Tuning and In-Context Tuning for Continual Table Semantic Parsing (NeurIPS 2023) [[paper](https://arxiv.org/abs/2310.04801)][[code](https://github.com/KSESEU/C3)]
216 |
217 | ### Continual Instruction Tuning (CIT)
218 | - Fine-tuned Language Models are Continual Learners [[paper](https://arxiv.org/pdf/2205.12393.pdf)][[code](https://github.com/ThomasScialom/T0_continual_learning)]
219 | - TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models [[paper](https://arxiv.org/pdf/2310.06762.pdf)][[code](https://github.com/BeyonderXX/TRACE)]
220 | - Large-scale Lifelong Learning of In-context Instructions and How to Tackle It [[paper](https://aclanthology.org/2023.acl-long.703.pdf)]
221 | - CITB: A Benchmark for Continual Instruction Tuning [[paper](https://arxiv.org/pdf/2310.14510.pdf)][[code](https://github.com/hyintell/CITB)]
222 | - Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal [[paper](https://arxiv.org/pdf/2403.01244.pdf)]
223 | - Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning [[paper](https://arxiv.org/pdf/2403.10056.pdf)]
224 | - ConTinTin: Continual Learning from Task Instructions [[paper](https://arxiv.org/pdf/2203.08512.pdf)]
225 | - Orthogonal Subspace Learning for Language Model Continual Learning [[paper](https://arxiv.org/pdf/2310.14152.pdf)][[code](https://github.com/cmnfriend/O-LoRA)]
226 | - SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models [[paper](https://arxiv.org/pdf/2401.08295.pdf)]
227 | - InsCL: A Data-efficient Continual Learning Paradigm for Fine-tuning Large Language Models with Instructions [[paper](https://arxiv.org/pdf/2403.11435.pdf)]
228 |
229 | ### Continual Model Refinement (CMR)
230 | - ⭐ Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing [[paper](https://arxiv.org/abs/2502.19416)]
231 | - ⭐ Reinforced Lifelong Editing for Language Models [[paper](https://arxiv.org/abs/2502.05759)]
232 | - Continual Memorization of Factoids in Large Language Models [[paper](https://arxiv.org/abs/2411.07175)][[code](https://github.com/princeton-nlp/continual-factoid-memorization)]
233 | - UniAdapt: A Universal Adapter for Knowledge Calibration [[paper](https://arxiv.org/abs/2410.00454)]
234 | - LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models [[paper](https://arxiv.org/abs/2406.20030)]
235 | - WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models [[paper](https://arxiv.org/abs/2405.14768)][[code](https://github.com/zjunlp/EasyEdit)]
236 | - Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors [[paper](https://arxiv.org/pdf/2211.11031.pdf)][[code](https://github.com/thartvigsen/grace)]
237 | - On Continual Model Refinement in Out-of-Distribution Data Streams [[paper](https://arxiv.org/pdf/2205.02014.pdf)][[code](https://github.com/facebookresearch/cmr)][[project](https://cmr-nlp.github.io/)]
238 | - Melo: Enhancing model editing with neuron-indexed dynamic lora [[paper](https://arxiv.org/pdf/2312.11795.pdf)][[code](https://github.com/ECNU-ICALK/MELO)]
239 | - Larimar: Large language models with episodic memory control [[paper](https://arxiv.org/pdf/2403.11901.pdf)]
240 | - Wilke: Wise-layer knowledge editor for lifelong knowledge editing [[paper](https://arxiv.org/pdf/2402.10987.pdf)]
241 |
242 |
243 | ### Continual Model Alignment (CMA)
244 | - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [[paper](https://arxiv.org/abs/2407.05342)]
245 | - Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment [[paper](https://arxiv.org/abs/2405.17931)][[code](https://github.com/QwenLM/online_merging_optimizers)]
246 | - Alpaca: A Strong, Replicable Instruction-Following Model [[project](https://crfm.stanford.edu/2023/03/13/alpaca.html)] [[code](https://github.com/tatsu-lab/stanford_alpaca)]
247 | - Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems [[paper](https://arxiv.org/pdf/2108.12589.pdf)] [[code](https://github.com/MiFei/ST-ToD)]
248 | - Training language models to follow instructions with human feedback (NeurIPS 2022) [[paper](https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf)]
249 | - Direct preference optimization: Your language model is secretly a reward model (NeurIPS 2023) [[paper](https://proceedings.neurips.cc/paper_files/paper/2023/file/a85b405ed65c6477a4fe8302b5e06ce7-Paper-Conference.pdf)]
250 | - Copf: Continual learning human preference through optimal policy fitting [[paper](https://arxiv.org/pdf/2310.15694)]
251 | - CPPO: Continual Learning for Reinforcement Learning with Human Feedback (ICLR 2024) [[paper](https://openreview.net/pdf?id=86zAUE80pP)]
252 | - A Moral Imperative: The Need for Continual Superalignment of Large Language Models [[paper](https://arxiv.org/pdf/2403.14683)]
253 | - Mitigating the Alignment Tax of RLHF [[paper](https://arxiv.org/abs/2309.06256)]
254 |
255 | ### Continual Multimodal LLMs (CMLLMs)
256 | - 🔥 IAP: Improving Continual Learning of Vision-Language Models via Instance-Aware Prompting [[paper](https://arxiv.org/abs/2503.20612)][[code](https://github.com/FerdinandZJU/IAP)]
257 | - 🔥 HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model [[paper](https://arxiv.org/abs/2503.12941)]
258 | - 🔥 Synthetic Data is an Elegant GIFT for Continual Vision-Language Models [[paper](https://arxiv.org/abs/2503.04229)]
259 | - ⭐ Modular Prompt Learning Improves Vision-Language Models [[paper](https://arxiv.org/abs/2502.14125)]
260 | - ⭐ Efficient Few-Shot Continual Learning in Vision-Language Models [[paper](https://arxiv.org/abs/2502.04098)]
261 | - ⭐ DesCLIP: Robust Continual Adaptation via General Attribute Descriptions for Pretrained Vision-Language Models [[paper](https://arxiv.org/abs/2502.00618)]
262 | - A Practitioner's Guide to Continual Multimodal Pretraining [[paper](https://openreview.net/forum?id=gkyosluSbR)]
263 | - ModelGrow: Continual Text-to-Video Pre-training with Model Expansion and Language Understanding Enhancement [[paper](https://arxiv.org/abs/2412.18966)][[project](https://modelgrow.github.io/)]
264 | - Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA [[paper](https://arxiv.org/abs/2412.01004)]
265 | - Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [[paper](https://arxiv.org/abs/2411.06764)]
266 | - LLMs Can Evolve Continually on Modality for X-Modal Reasoning [[paper](https://arxiv.org/abs/2410.20178)][[code](https://github.com/JiazuoYu/PathWeave)]
267 | - Improving Multimodal Large Language Models Using Continual Learning [[paper](https://arxiv.org/abs/2410.19925)]
268 | - ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy [[paper](https://arxiv.org/abs/2410.10923)][[code](https://github.com/lihong2303/ATLAS)]
269 | - Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models [[paper](https://arxiv.org/abs/2410.03955)][[code](https://github.com/GangLii/DevSafety)]
270 | - CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning [[paper](https://arxiv.org/abs/2407.15793)]
271 | - Continually Learn to Map Visual Concepts to Large Language Models in Resource-constrained Environments [[paper](https://arxiv.org/abs/2407.08279)]
272 | - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [[paper](https://arxiv.org/abs/2407.05342)]
273 | - CLIP model is an Efficient Online Lifelong Learner [[paper](https://arxiv.org/abs/2405.15155)]
274 | - CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models [[paper](https://arxiv.org/abs/2403.19137)][[code](https://github.com/srvCodes/clap4clip)]
275 | - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters (CVPR 2024) [[paper](https://arxiv.org/abs/2403.11549)][[code](https://github.com/JiazuoYu/MoE-Adapters4CL)]
276 | - CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning [[paper](https://arxiv.org/abs/2403.10245)]
277 | - Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models [[paper](https://arxiv.org/abs/2403.09296)]
278 | - Investigating the Catastrophic Forgetting in Multimodal Large Language Models (PMLR 2024) [[paper](https://arxiv.org/abs/2309.10313)]
279 | - MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models [[paper](https://arxiv.org/abs/2304.10592)] [[code](https://github.com/Vision-CAIR/MiniGPT-4)]
280 | - Visual Instruction Tuning (NeurIPS 2023, Oral) [[paper](https://arxiv.org/abs/2304.08485)] [[code](https://github.com/haotian-liu/LLaVA)]
281 | - Continual Instruction Tuning for Large Multimodal Models [[paper](https://arxiv.org/abs/2311.16206)]
282 | - CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model [[paper](https://arxiv.org/abs/2403.08350)] [[code](https://github.com/zackschen/coin)]
283 | - Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models [[paper](https://arxiv.org/abs/2402.12048)]
284 | - Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration [[paper](https://arxiv.org/abs/2403.11373)] [[code](https://github.com/Tree-Shu-Zhao/RebQ.pytorch)]
285 |
286 | ## Continual LLMs Miscs
287 | - ⭐ How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training [[paper](https://arxiv.org/abs/2502.11196)][[code](https://github.com/zjunlp/DynamicKnowledgeCircuits)]
288 | - Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle [[paper](https://arxiv.org/abs/2411.08324)][[project](https://agenticlearning.ai/daily-oracle/)][[data](https://drive.google.com/drive/folders/1zMmV5RRxBIcwavxhLvAz-0ZlkVeQPQRG)]
289 | - Scalable Data Ablation Approximations for Language Models through Modular Training and Merging [[paper](https://arxiv.org/abs/2410.15661)]
290 | - How Do Large Language Models Acquire Factual Knowledge During Pretraining? [[paper](https://arxiv.org/abs/2406.11813)]
291 | - Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [[paper](https://arxiv.org/abs/2403.16952)][[code](https://github.com/yegcjs/mixinglaws)]
292 | - Evaluating the External and Parametric Knowledge Fusion of Large Language Models [[paper](https://arxiv.org/abs/2405.19010)]
293 | - Demystifying Forgetting in Language Model Fine-Tuning with Statistical Analysis of Example Associations [[paper](https://arxiv.org/abs/2406.14026)]
294 | - AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees [[paper](https://arxiv.org/abs/2404.08417)]
295 | - COPAL: Continual Pruning in Large Language Generative Models [[paper](https://arxiv.org/abs/2405.02347)]
296 | - HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models [[paper](https://arxiv.org/abs/2405.14831)][[code](https://github.com/OSU-NLP-Group/HippoRAG)]
297 | - Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training [[paper](https://arxiv.org/abs/2403.09613)][[code](https://github.com/Agentic-Learning-AI-Lab/anticipatory-recovery-public)]
298 |
299 | ## Reference
300 | If you find our survey or this collection of papers useful, please consider citing our work by
301 | ```bib
302 | @article{shi2024continual,
303 | title={Continual Learning of Large Language Models: A Comprehensive Survey},
304 | author={Shi, Haizhou and
305 | Xu, Zihao and
306 | Wang, Hengyi and
307 | Qin, Weiyi and
308 | Wang, Wenyuan and
309 | Wang, Yibin and
310 | Wang, Zifeng and
311 | Ebrahimi, Sayna and
312 | Wang, Hao},
313 | journal={arXiv preprint arXiv:2404.16789},
314 | year={2024}
315 | }
316 | ```
317 |
--------------------------------------------------------------------------------
/fig/overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Wang-ML-Lab/llm-continual-learning-survey/7e8c47434bcd8bd459ce8b2cd9789c733880b296/fig/overview.png
--------------------------------------------------------------------------------