└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # LLM4REC 2 | ## OVERVIEW 3 | 1. LLMs enhance Recommendation 4 | - Feature Engineering 5 | - data augmentation 6 | - generate open-world knowledge for user/item 7 | - generate interaction data 8 | - data condense 9 | - feature selection 10 | - feature imputation 11 | - Feature Encoder 12 | - encode text information 13 | - encode id information 14 | 15 | 2. LLMs as Recommenders 16 | - prompt learning 17 | - instruction tuning 18 | - reinforce learning 19 | - knowledge distillation 20 | - Pipeline Controller 21 | - pipeline design 22 | - CoT, ToT, SI 23 | - Incremental Learning 24 | 25 | 3. Other Related work 26 | - Self-distillation in LLM 27 | - DPO in LLM 28 | - LLM4CTR 29 | 30 | ## 1. LLMs enhance Recommendation 31 | 32 | ### Feature Engineering 33 | | Title | Model | Time | Motivation | Discription | 34 | |:-------:|:-------:|:-------:|:-------:|:-------:| 35 | | Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models | CIKGRec | AAAI25 | 结构化LLM中用户侧的世界知识,增强知识感知的基于图的推荐算法 | ![pic](https://github.com/user-attachments/assets/c84c6627-2d75-4195-bdcc-fcf3bc04a6e4) | 36 | | Towards Open-World Recommendation with Knowledge Augmentation from Large Language Models | KAR | RecSys24 | 利用LLM的open-world knowledge扩充用户和物品的信息 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/2d2eea2e-c88e-4d9e-ab3e-7dc8f67c90fe)| 37 | | A First Look at LLM-Powered Generative News Recommendation | ONCE(GENRE+DIRE) | arXiv23 | 对于开源LLM,利用它们作为特征编码器。对于闭源LLM,使用提示丰富训练数据 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/da1e9be8-d334-4240-8359-4deca3417c96) | 38 | | LLMRec: Large Language Models with Graph Augmentation for Recommendation | LLMRec | WSDM24 | 利用LLM进行图数据增强,从item candidates中选出liked item和disliked item |![image](https://github.com/istarryn/LLM4REC/assets/149132603/d241c52f-611f-47bf-8fb4-7a5095b0a1f4) | 39 | | Integrating Large Language Models into Recommendation via Mutual Augmentation and Adaptive Aggregation | Llama4Rec | arXiv24 | 由mutual augmentation和adaptive aggregation组成。mutual augmentation包括data增强和prompt增强。 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/395b7bac-23c9-45cd-bfbd-6ceb463edb27) | 40 | | Data-efficient Fine-tuning for LLM-based Recommendation | DEALRec | SIGIR24 | 设计influence score和effort score,对LLM4REC进行数据蒸馏,挑选出有influential的samples |![image](https://github.com/istarryn/LLM4REC/assets/149132603/6a7b308c-090d-475d-b272-19243c3bd44c)| 41 | | Distillation is All You Need for Practically Using Different Pre-trained Recommendation Models |PRM-KD |arXiv24|利用了不同类型的预训练推荐模型作为教师模型,提取in-batch negative item scores进行联合知识蒸馏|![image](https://github.com/istarryn/LLM4REC/assets/149132603/981489e3-b420-4f8a-99b0-9a36706b0fcb)| 42 | | CoRAL: Collaborative Retrieval-Augmented Large Language Models Improve Long-tail Recommendation | CoRAL | KDD24 | 通过强化学习,将协同信息以prompt的形式增强LLM,实现对于Long-tail Recommendation推荐性能的改进|![image](https://github.com/istarryn/LLM4REC/assets/149132603/ef14e721-86cf-4946-9cca-707a0f6e6eb1)| 43 | | Harnessing Large Language Models for Text-Rich Sequential Recommendation | |WWW24|关注LLM4REC的数据压缩问题,先将用户历史交互分片,然后用LLM总结每个分片的内容,最后设计prompt将总结后的user偏好、最近user交互和candidate items结合在一起|![image](https://github.com/istarryn/LLM4REC/assets/149132603/4d3a8083-c269-4086-9250-2d515cd16738)| 44 | | Large Language Models Enhanced Collaborative Filtering |LLM-CF|CIKM24|通过ICL和COT,将LLM的world knowledge和reasoning capabilities蒸馏到collaborative filtering|![image](https://github.com/istarryn/LLM4REC/assets/149132603/bf79cde0-f342-49c5-a3e7-e9e94eb9051f)| 45 | | Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation | | SIGIR24 | LLMs不能根据其用户的背景和历史偏好定制其生成的输出,通过强化学习+知识蒸馏选择最能增强LLM的个人信息 | ![image](https://github.com/istarryn/LLM4REC/assets/149132603/42f3cff0-3569-40be-b183-79b29464deb6) | 46 | | Large Language Models for Next Point-of-Interest Recommendation | | SIGIR24 | 现有的next POI方法侧重于短轨迹和冷启动问题(数据量少且轨迹短的用户),没有充分探索丰富的LBSN的数据,可以使用LLM的自然语言理解能力,来处理所有类型的LBSN数据并更好地使用上下文信息 | ![image](https://github.com/istarryn/LLM4REC/assets/149132603/ffc1b435-34bd-4483-aa5f-dc5d313f2882) | 47 | 48 | 49 | ### Feature Encoder 50 | | Title | Model | Time | Motivation | Discription | 51 | |:-------:|:-------:|:-------:|:-------:|:-------:| 52 | | U-BERT: Pre-training user representations for improved recommendation | U-BERT | AAAI21 | 早期的工作,主要使用BERT编码评论文本 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/61c9c5b1-74b1-4837-8ed8-2e0815e772b4) | 53 | | Towards universal sequence representation learning for recommender systems | UniSRec | KDD22 | 用BERT对item text信息进行编码,使用了parametric whitening |![image](https://github.com/istarryn/LLM4REC/assets/149132603/983492f4-c1c1-43fb-a61b-a3ed3410a8bb) | 54 | | Learning vector-quantized item representation for transferable sequential recommenders | VQ-Rec | WWW23 | 首先将文本映射到一个离散索引向量(称为item code )中,然后使用这些索引来查找code embedding table进行编码 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/ffd44ae0-e85f-4982-9287-ad04ddfadd3c) | 55 | | Recommender Systems with Generative Retrieval | TIGER | NIPS23 |使用LLM编码有意义的item ID,直接预测candidate IDs,进行端到端的generative retrieval |![image](https://github.com/istarryn/LLM4REC/assets/149132603/529b6903-80b5-45ff-95c6-b58cb8b4d3d9) | 56 | | Representation Learning with Large Language Models for Recommendation | RLMRec | WWW24 | 通过两次对比学习,对齐LLM编码的语义特征和传统方法的协同特征 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/a4c0366c-e6f1-483d-992b-49ae4ca8dbad) | 57 | | Rella: Retrieval-enhanced large language models for lifelong sequential behavior comprehension in recommendation | ReLLa | WWW24 | CTR问题,LLM对于长的序列效果不佳;本文根据target item从长序列中选择相似的部分item作为序列;item的embedding通过LLM对text信息进行构建 | image | 58 | | Breaking the Length Barrier: LLM-Enhanced CTR Prediction in Long Textual User Behaviors | BAHE | SIGIR24 short paper | 长序列LLM推理开销大。本文思路是固定LLM的浅层参数,预先存储一些原子交互的LLM的浅层特征,后续直接查表 | image | 59 | | Large Language Models Augmented Rating Prediction in Recommender System | LLM-TRSR | ICASSP24 | ensemble LLM_Rec和传统Rec的输出 | image | 60 | | Enhancing Content-based Recommendation via Large Language Model | LOID | CIKM24 short paper | 不同domain的content语义信息之间可能有gap;同时利用LLM和传统RS的信息,提出一种ID和content信息align的范式。用ID embedding作为key提取text embedding序列当中的信息 | image 61 | | Aligning Large Language Models with Recommendation Knowledge | | arXiv24 | 将推荐领域的一些知识,例如MIM和BPR,通过prompt的形式将其传输给LLM | image | 62 | | The Elephant in the Room: Rethinking the Usage of Pre-trained Language Model in Sequential Recommendation | Elephant in the Room | RecSys24 | 序列推荐的大模型的attention层的大部分参数都没有被使用,参数存在大量的冗余。本文将LLM学到的item embedding作为SASRec的初始化,然后再训练SASRec | image | 63 | | Demystifying Embedding Spaces using Large Language Models | | ICLR24 | 用LLM对item的embedding空间进行解释,包括未在训练数据中出现过的item | image | 64 | 65 | 66 | ## 2. LLMs serve as Recommenders 67 | ### Scoring/Ranking 68 | | Title | Model | Time | Motivation | Discription | 69 | |:-------:|:-------:|:-------:|:-------:|:-------:| 70 | | Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5) | P5 | RecSys22 | 针对不同任务设计了多个prompts,并且使用推荐数据集重新进行预训练,最终用于解决zero-shot的推荐问题 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/00b0c432-9da6-4790-936a-06943f5483fd) | 71 | | Text Is All You Need: Learning Language Representations for Sequential Recommendation | RecFormer | KDD23 | 将键值对展开为类似句子的prompt ,利用LongFormer训练,输出用户交互序列(兴趣)的表征。然后结合对比学习,进行最后的推荐 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/4e29200f-6ff5-468c-89f6-ba9d9648e44f) | 72 | | Recommendation as instruction following: A large language model empowered recommendation approach | InstructRec | arXiv23 | 采用instruction tuning,将主动的用户指令和被动的交互信息按照一定格式组织成指令,引导LLM完成多任务推荐场景 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/69fb19c0-2473-4737-937d-4f5372877c70) | 73 | | A bi-step grounding paradigm for large language models in recommendation systems | BIGRec | arXiv23 | 针对grounding问题,采用instruction-tuning,实现“Grounding Language Space to Recommendation Space” |![image](https://github.com/istarryn/LLM4REC/assets/149132603/bb6708c0-faf0-4b60-bd59-10980a4db6b0) | 74 | | A Multi-facet Paradigm to Bridge Large Language Model and Recommendation | TransRec | arXiv23 | 在Item indexing上,将ID, title和attribute都当成Item的facet;在generation grounding上,:将生成的identifiers与in-corpus 的每个item的identifiers取交集选出items |![image](https://github.com/istarryn/LLM4REC/assets/149132603/ec8971b3-a70f-4102-aefa-4eb41a9ebd43) | 75 | | CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation | CoLLM | arXiv23 | 将传统模型捕获的协作放到LLM的prompt中,并将其映射到最终的embedding空间 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/77c1bbf2-9765-413e-ba69-bcd2b6fb6f8a) | 76 | | LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking | LlamaRec | CIKM23 | 一般LLM生成推荐结果的推理成本很高,并且要进一步Grounding。LlamaRec利用一个verbalizer ,将LLM head的输出(即所有tokens的分数)转换为候选items的排名分数 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/350fd16d-5a75-4d31-a9b9-9d4ee5aa92b4) | 77 | | Large language models are zero-shot rankers for recommender systems | | arXiv23 | 利用LLM对候选物品集合进行zero-shot排序 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/32a8ae0c-4536-40bf-a0c5-ad753400c068) | 78 | | Language models as recommender systems: Evaluations and limitations | LMRecSys | NeurIPS21 | 采用Prompt tuning的方法,将要预测的物品拆分成多个token,由LLM输出每个token的分布,最终进行推荐 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/de08059f-a75b-47d9-bb12-4799b7ab06ca) | 79 | | Prompt learning for news recommendation | Prompt4NR | SIGIR23 | 设计离散、连续、混合提示模板,以及它们对应的答案空间。使用prompt ensembling组合效果最好的一组prompt模板 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/4d2c8217-ea18-40af-bd57-660a80884900) | 80 | | Prompt distillation for efficient llm-based recommendation | POD | CIKM23 | 通过prompt learning学习作为前缀的连续prompt,将离散prompt信息蒸馏到连续prompt |![image](https://github.com/istarryn/LLM4REC/assets/149132603/892c0f54-730b-4a1c-bdde-da26a3cd67a6) | 81 | | Large Language Models as Zero-Shot Conversational Recommenders | | CIKM23 | 使用具有代表性的大型语言模型在Zero-Shot下对会话推荐任务进行实证研究 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/05dadeef-c3d5-4d71-b4d2-5a54f708746d) | 82 | | Leveraging Large Language Models (LLMs) to Empower Training-Free Dataset Condensation for Content-Based Recommendation | | arXiv23 | 对推荐数据进行蒸馏,设计prompt,用LLM压缩item的信息、提取user偏好,聚类并计算距离选择top-m的user,并产生交互数据 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/ffcc1950-3257-426b-9a2d-518e44d63503) | 83 | | Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents | | arXiv23 | 利用GPT模型进行文本排序任务,将GPT模型的标注结果用于模型蒸馏 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/265a4ae2-3627-4d98-9ff8-473ec5fd8626) | 84 | | LLaRA: Aligning Large Language Models with Sequential Recommenders | LLaRA | arXiv23 | 在prompt中采用文本表征+传统模型学习的混合表征 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/cd1a24be-923e-4fb1-8540-13b70b942a9e)| 85 | | Collaborative Contextualization: Bridging the Gap between Collaborative Filtering and Pre-trained Language Model | CollabContext | arXiv23 | 利用LLM学习到的文本表征和传统模型表征进行双向蒸馏 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/810425d8-65f4-4883-b1b8-021bb6ff6e03)| 86 | |Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation|LC-Rec | ICDE24 |对于item索引,设计了一种语义映射方法,可以为item分配有意义且不冲突的id,同时提出了一系列特别设计的tuning任务,迫使llm深度整合语言和协同过滤语义|![image](https://github.com/istarryn/LLM4REC/assets/149132603/73f5626e-2768-4b07-8599-5d0306c6d4ae)| 87 | | Collaborative Large Language Model for Recommender Systems| CLLM4Rec| WWW24 | 为了减少自然语言和推荐语义的gap,本文为user和item扩充词表使其与唯一的token绑定,并引入协同信号进行训练扩充的token的embedding | ![image](https://github.com/istarryn/LLM4REC/assets/149132603/72044ed3-5b33-41b0-8e99-70cd62cfe9cb)| 88 | | Play to Your Strengths: Collaborative Intelligence of Conventional Recommender Models and Large Language Models | Play to Your Strength | arxiv24 | CTR task;由于LLM inference时间过长,且传统RS和LLM RS擅长不同的数据,本文考虑对不同数据分别使用传统RS和LLM进行推荐。方法是将传统RS confidence低的sample丢给LLM RS判断 | image | 89 | |GPT4Rec: A generative framework for personalized recommendation and user interests interpretation|GPT4Rec| arxiv23 | 用GPT2根据历史交互产生query,在BM25中检索item | ![image](https://github.com/istarryn/LLM4REC/assets/149132603/82808d6d-5a7c-409c-a8ec-42e626fa95e1) | 90 | | Unsupervised large Language Model Alignment for Information Retrieval via Contrastive Feedback | | SIGIR24 | LLMs产生的responses不能捕捉内容相似的document之间的区别,设计group-wise的方法产生反馈信号,用无监督学习+强化学习,使LLMs产生context-specific的responses| ![image](https://github.com/istarryn/LLM4REC/assets/149132603/0d41df18-d4b7-48bc-bc68-e79e51601ee5) | 91 | | RDRec: Rationale Distillation for LLM-based Recommendation | RDRec | arXiv24 | 现在的LLM4REC很少关注user产生interaction背后的rationale;让LLM通过prompt从review中提取user preference和item attribute,然后利用小LM进行蒸馏 | ![image](https://github.com/istarryn/LLM4REC/assets/149132603/6e9bea91-1568-49be-9cc4-3a9fb79b95a7) | 92 | 93 | ### Pipline Controller 94 | | Title | Model | Time | Motivation | Discription | 95 | |:-------:|:-------:|:-------:|:-------:|:-------:| 96 | | Recmind: Large language model powered agent for recommendation | Recmind | arXiv23 | 由LLM驱动的推荐Agent,可以推理、互动、记忆,提供精确的个性化推荐 |![image](https://github.com/istarryn/LLM4REC/assets/149132603/97c194d5-fd08-4e60-9b9b-8bb33538084d) | 97 | | Can Small Language Models be Good Reasoners for Sequential Recommendation? | SLIM | WWW24 | 将大的LLM的逐步推理能力蒸馏到小的LLM中 | ![image](https://github.com/istarryn/LLM4REC/assets/149132603/30e5222a-d8ba-4526-841c-ea3c56578279) | 98 | |Preliminary Study on Incremental Learning for Large Language Model-based Recommender Systems| |arXiv23|实验发现full retraining and fine-tuning增量学习都没有显著提高LLM4Rec的性能,设计long term lora(freeze)和short term lora(hot)分别关注user长短期偏好|![image](https://github.com/istarryn/LLM4REC/assets/149132603/dfde5894-f18e-453e-a0d0-4fa615309074)| 99 | |Scaling Law of Large Sequential Recommendation Models| | arXiv23 |实验发现,扩大模型大小可以极大地提高具有挑战性的推荐任务上(如冷启动、鲁棒性、长期偏好)的性能|![image](https://github.com/istarryn/LLM4REC/assets/149132603/daee6696-aa64-499c-b83d-98b355ac8dae)| 100 | 101 | 102 | ## 3. Other Related work 103 | ### Self-distillation in LLM 104 | | Title | Model | Time | Motivation | Discription | 105 | |:-------:|:-------:|:-------:|:-------:|:-------:| 106 | | SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions | SELF-INSTRUCT | arXiv22 | 对于指令微调,人类编写的指令数据开销大,多样性有限、不能推广到广泛的场景;可以让LLM自己产生指令| ![image](https://github.com/istarryn/LLM4REC/assets/149132603/fff27b3d-90ca-4c4b-98cd-dc811334611f)| 107 | | Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision | SELF-ALIGN | NeurIPS23 | 人类注释的监督微调(SFT)和来自人类反馈的强化学习(RLHF)有成本高,可靠性、多样性参差不齐等问题;可以将原则驱动推理和LLM的生成能力结合起来,在最少的人类监督下实现人工智能agent的自对齐| ![image](https://github.com/istarryn/LLM4REC/assets/149132603/30381c3c-dbe7-4cc1-8404-e6dbe16ae299)| 108 | | RLCD: REINFORCEMENT LEARNING FROM CONTRASTIVE DISTILLATION FOR LM ALIGNMENT| RLCD | ICLR24 | 设计对比学习,在不使用人类反馈的情况下,使语言模型遵循自然语言表达的原则的方法产生指令(从模型输出中创建偏好对,一个旨在鼓励遵循给定原则的积极提示,另一个旨在鼓励违反原则的消极提示) | ![image](https://github.com/istarryn/LLM4REC/assets/149132603/6b576c9e-21b8-45fb-a348-1047e9d7f938)| 109 | | Impossible Distillation for Paraphrasing and Summarization: How to Make High-quality Lemonade out of Small, Low-quality Models| | arXiv23 | 从低质量教师模型(本身不能执行某些特定任务任务的模型)提取出高质量的数据集和模型,最后,学生LM通过自我蒸馏进一步完善(在自己的高质量数据上进行训练)| ![image](https://github.com/istarryn/LLM4REC/assets/149132603/d0236de9-b876-4552-8573-07e7a0e49200)| 110 | | LARGE LANGUAGE MODELS CAN SELF-IMPROVE | | arXiv23 | 微调LLM需要大量有监督数据,而人类的反思不需要外部输入;可以让LLM通过unlabeled数据进行反思;通过Chain-of-Thought prompting 和 self-consistency 让LLM产生“high-confidence” 的回答 | ![image](https://github.com/istarryn/LLM4REC/assets/149132603/274a2c13-810b-4e2f-a0a7-62d3980fd655) | 111 | | Reinforced Self-Training (ReST) for Language Modeling | ReST | arXiv23 | RLHF通过将LLM和人类偏好对齐来提升LLM的能力,它采用的在线训练策略在处理新的样本时开销大;可以采用离线强化学习来解决这个问题(时间问题);离线的强化学习的质量很大程度上取决于数据集的质量,需要得到高质量的离线数据集(提升有效性)| ![image](https://github.com/istarryn/LLM4REC/assets/149132603/c812024d-7c11-40db-8464-ec19c3151658) | 112 | | Self-Rewarding Language Models | | arXiv24 | 目前的RLHF根据人类偏好来训练奖励模型,这受到人类表现水平的显示;其次这些冻结的奖励模型无法在LLM训练的过程中学习改进;需要让LLM自动修改奖励函数,并且在训练的过程中自动改进 | ![image](https://github.com/istarryn/LLM4REC/assets/149132603/4b06b606-dcfd-49ce-b954-2426941b35d8)| 113 | | Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data | Baize | arXiv23 | 目前具备强大能力的聊天模型如ChatGPT访问经常受限(只能通过API访问),希望能够训练一个能力接近ChatGPT的开源模型;为了让开源LLM的聊天能力接近ChatGPT,需要为开源LLM提供高质量的训练数据;通过利用ChatGPT与自己进行对话,可以自动生成高质量的多回合聊天语料库;提出带有反馈的自蒸馏,以进一步提高带有ChatGPT反馈的Baize模型的性能| ![image](https://github.com/istarryn/LLM4REC/assets/149132603/2b0869b7-27e5-43c9-ba98-6164e1183836)| 114 | | STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning | STaR | arXiv22 | 思维链能够提升LLM在复杂推理场景的表现,但是这类方法有个缺点:它们要么需要大量的思维链数据,开销很大;要么只使用少量的思维连数据,损失了一部分推理的能力;希望LLM学习自己生成的rationale来提升推理能力,但自己生成的rationale可能是错误的answer,需要修正| ![image](https://github.com/istarryn/LLM4REC/assets/149132603/3bf3a853-e292-4180-b966-23e3fc66e067) | 115 | | Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning | | arXiv24 | 为特定任务对LLM进行微调通常会面临一个挑战:平衡对特定任务性能和对一般任务指令的遵循能力;LLM重写特定任务的response,来减少两种分布之间的gap | ![image](https://github.com/istarryn/LLM4REC/assets/149132603/43606734-793d-4a49-ac1c-b4b6751b7711) | 116 | 117 | 118 | ### Direct Preference Optimization in LLM (DPO) 119 | | Title | Model | Time | Motivation | Discription/Loss Fuction | 120 | |:-------:|:-------:|:-------:|:-------:|:-------:| 121 | | Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO | NeurIPS24 | 省去RLHF对于reward model的构建,直接针对偏好数据进行模型的优化 | ![image](https://github.com/sssth/LLM4REC/assets/105367602/2ba5ce0c-966a-421c-9c7f-deb3f46b96f9) | 122 | | Statistical Rejection Sampling Improves Preference Optimization | RSO | ICLR24 | 提出DPO的偏好数据并非采样自最优策略,引入显式的reward模型和统计拒绝采样使产生自SFT模型的数据分布可以拟合最优模型的数据分布 | ![image](https://github.com/istarryn/LLM4REC/assets/105367602/bed64e50-1975-4775-9965-6e0f5643d13c) | 123 | | KTO: Model Alignment as Prospect Theoretic Optimization | KTO | arXiv24 | 将DPO修正为针对label数据而非偏好数据对的优化 |![image](https://github.com/istarryn/LLM4REC/assets/105367602/59cfc51b-ae2f-4a5f-9780-f3ad39182874)| 124 | | Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences | Curry-DPO | arXiv24 | 对于同个prompt的多个response,按照reward的差值构造pairwise数据对,再利用课程学习由易到难进行训练 | ![image](https://github.com/istarryn/LLM4REC/assets/105367602/f30f96f5-972f-4e03-9234-e195976368d3)| 125 | | LiPO: Listwise Preference Optimization through Learning-to-Rank | LiPO | arXiv24 | 修正DPO的loss,直接对listwise数据进行优化 | ![image](https://github.com/istarryn/LLM4REC/assets/105367602/49c4daa2-5fa3-4bb7-8bbe-523a4e50c2a7)| 126 | | ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference | ULMA| arXiv23 | 修正DPO的loss,直接对pointwise数据进行优化 |![image](https://github.com/istarryn/LLM4REC/assets/105367602/e7c0a711-34da-4a04-a63e-847fdc6c6f70)| 127 | | Reinforcement Learning from Human Feedback with Active Queries | ADPO | arXiv24 | active-learning的范式,去除reward差值小的数据对 | ![image](https://github.com/istarryn/LLM4REC/assets/105367602/af0fbd5b-3f02-425e-8018-67cf4d529f39)| 128 | | RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models| RS-DPO | arXiv24 | 引入显式的reward模型,使用拒绝统计采样,去除reward差值小的数据对,提高样本利用效率 | ![image](https://github.com/istarryn/LLM4REC/assets/105367602/89f0420e-5a0e-4d3f-92d7-3de9ee5655f3)| 129 | | Direct Preference Optimization with an Offset| ODPO| arXiv24 | 引入偏移值来表示偏好数据集中喜欢相对于不喜欢的程度 | ![image](https://github.com/istarryn/LLM4REC/assets/105367602/55327f61-ebb6-47ab-9e18-f18e45a4651f)| 130 | | BRAIN: Bayesian Reward-conditioned Amortized INference for natural language generation from feedback | BRAIN | arXiv24 | 重新引入reward模型表示偏好数据集中喜欢相对于不喜欢的程度 | ![image](https://github.com/istarryn/LLM4REC/assets/105367602/c578f54f-e299-4887-a6fc-063473d948a6)| 131 | | D2PO: Discriminator-Guided DPO with Response Evaluation Models | D2PO | arXiv24 | online训练方式,同时训练一个reward模型,在训练过程中迭代地由当前模型和reward模型产生新样本| ![image](https://github.com/istarryn/LLM4REC/assets/105367602/0903b4e4-5f3c-42af-985f-3bf42c3303d5)| 132 | |Learn Your Reference Model for Real Good Alignment|TR-DPO|arXiv24|使用soft和hard两种更新方式,在训练期间更新reference model|![image](https://github.com/istarryn/LLM4REC/assets/105367602/f96f57fe-94bd-490e-9f77-c6589c353fdc)| 133 | |sDPO: Don’t Use Your Data All at Once|sDPO|arXiv24|分批利用训练数据集,并在训练过程中更ref模型|![image](https://github.com/istarryn/LLM4REC/assets/105367602/0ceac2e8-c996-4cc1-9956-e1959d646b47)| 134 | |Direct Language Model Alignment from Online AI Feedback|OAIF|arXiv24|利用更优模型在训练过程中产生新的偏好数据对|![image](https://github.com/istarryn/LLM4REC/assets/105367602/44c94172-934b-4b51-a298-e9c669b36ba8)| 135 | | A General Theoretical Paradigm to Understand Learning from Human Preferences | IPO | PMLR24 | 在DPO loss上加了一个正则化项,避免训练时快速overfitting| | ![image](https://github.com/istarryn/LLM4REC/assets/105367602/6b099d1e-505c-4cd6-9a7f-a95740d50b1c) | 136 | | Provably Robust DPO: Aligning Language Models with Noisy Feedback | rDPO | arXiv24 | 修正DPO的loss,使其对偏好数据概率翻转鲁棒 | ![image](https://github.com/istarryn/LLM4REC/assets/105367602/eb4c9e58-b18f-43ae-b5e0-23bff0367dab) | 137 | | Zephyr: Direct Distillation of LM Alignment | Zephyr | arXiv23 | 利用大模型(GPT4)生成偏好数据,再使用DPO对7B模型进行微调 | ![image](https://github.com/istarryn/LLM4REC/assets/105367602/35c51729-90cc-4de4-945d-72da8011891a) | 138 | 139 | ### LLM4CTR 140 | | Title | Model | Time | Description | 141 | |:-------:|:-------:|:-------:|:-------:| 142 | | CTR-BERT: Cost-effective knowledge distillation for billion-parameter teacher models | CTR-BERT | NIPS WS'21 | CTR-BERT 提出了一种成本效益的知识蒸馏方法,用于十亿参数教师模型。 | 143 | | DCAF-BERT: A Distilled Cachable Adaptable Factorized Model For Improved Ads CTR Prediction | DCAF-BERT | WWW'22 | DCAF-BERT 提出了一种经过蒸馏的可缓存可适应因式化模型,用于提高广告点击率预测的准确性。 | 144 | | Learning Supplementary NLP Features for CTR Prediction in Sponsored Search | - | KDD'22 | 为了在赞助搜索中进行点击率预测,该研究探索了学习补充自然语言处理特征的方法。 | 145 | | Practice on Effectively Extracting NLP Features for Click-Through Rate Prediction | - | CIKM'23 | 通过实践,研究了有效提取自然语言处理特征用于点击率预测的方法。 | 146 | | BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction | BERT4CTR | KDD'23 | BERT4CTR 提出了一种高效的框架,将预训练语言模型与非文本特征结合,用于点击率预测。 | 147 | | M6-rec: Generative pretrained language models are open-ended recommender systems | M6-rec | arxiv'22 | M6-rec 提出了一种生成式预训练语言模型,用作开放式推荐系统。 | 148 | | Ctrl: Connect tabular and language model for ctr prediction | Ctrl | arxiv'23 | Ctrl 提出了一种连接表格数据和语言模型用于点击率预测的方法。 | 149 | | FLIP: Towards Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction | FLIP | arxiv'23 | FLIP 旨在实现基于ID的模型和预训练语言模型之间的细粒度对齐,用于点击率预测。 | 150 | | TBIN: Modeling Long Textual Behavior Data for CTR Prediction | TBIN | arxiv'23 | TBIN 提出了一种用于点击率预测的长文本行为数据建模方法。 | 151 | | An Unified Search and Recommendation Foundation Model for Cold-Start Scenario | - | CIKM'23 | 为冷启动场景提出了一个统一的搜索和推荐基础模型。 | 152 | | A Unified Framework for Multi-Domain CTR Prediction via Large Language Models | - | arxiv'23 | 提出了一个通过大型语言模型进行多领域点击率预测的统一框架。 | 153 | | UFIN: Universal Feature Interaction Network for Multi-Domain Click-Through Rate Prediction | UFIN | arxiv'23 | UFIN 提出了一种通用特征交互网络,用于多领域的点击率预测。 | 154 | | ClickPrompt: CTR Models are Strong Prompt Generators for Adapting Language Models to CTR Prediction | ClickPrompt | WWW'24 | ClickPrompt 提出了点击率模型作为强大提示生成器,用于调整语言模型以进行点击率预测。 | 155 | | PRINT: Personalized Relevance Incentive Network for CTR Prediction in Sponsored Search | PRINT | WWW'24 | PRINT 提出了一种个性化相关性激励网络,用于赞助搜索中的点击率预测。 | 156 | | Breaking the Length Barrier: LLM-Enhanced CTR Prediction in Long Textual User Behaviors | - | arxiv'24 | 提出一种在长文本用户行为中增强点击率预测的LLM方法。 | 157 | | KELLMRec: Knowledge-Enhanced Large Language Models for Recommendation | KELLMRec | arxiv'24 | KELLMRec 提出了一种增强知识的大型语言模型用于推荐任务。 | 158 | | Enhancing sequential recommendation via llm-based semantic embedding learning | - | WWW'24 | 通过基于LLM的语义嵌入学习来增强顺序推荐任务。 | 159 | | Heterogeneous knowledge fusion: A novel approach for personalized recommendation via llm | - | Recsys'23 | 通过异构知识融合,提出了一种通过LLM进行个性化推荐的新方法。 | 160 | | Play to Your Strengths: Collaborative Intelligence of Conventional Recommender Models and Large Language Models | - | arxiv'24 | 利用传统推荐模型和大型语言模型的协同智能来发挥各自优势。 | 161 | | Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers | - | arxiv'24 | 通过LLM优化器实现生成式推荐系统的无训练优化,实现探索-利用策略。 | 162 | #### Feature Selection 163 | | Title | Model | Time | Description | 164 | |:-------:|:-------:|:-------:|:-------:| 165 | | ICE-SEARCH: A Language Model-Driven Feature Selection Approach | ICE-SEARCH | arXiv'24 | ICE-SEARCH 提出了一种基于语言模型的特征选择方法。 | 166 | | Large Language Model Pruning | - | arXiv'24 | Model Pruning | 针对大型语言模型的剪枝技术研究。 | 167 | | Dynamic and Adaptive Feature Generation with LLM | - | arXiv'24 | 利用LLM进行动态和自适应特征生成。 | 168 | --------------------------------------------------------------------------------