├── README.ipynb ├── README.md ├── assets ├── classifier_ft.txt ├── indep-benchmarks-llama.png ├── indep-benchmarks.png ├── judge_template.json ├── llm-router-flowchart_1.png ├── llm-router-flowchart_2.png ├── output_24_2.png ├── output_26_0.png ├── output_51_0.png └── system_ft.txt ├── configs ├── ft_config_a10.yaml ├── ft_config_a100.yaml └── ft_job.yaml ├── requirements.txt └── src ├── __init__.py ├── ft.py ├── offline_inference.py ├── online_inference.py └── utils.py /README.md: -------------------------------------------------------------------------------- 1 | # Building an LLM Router for High-Quality and Cost-Effective Responses 2 | 3 | ## TLDR 4 | 1. We introduce a framework for training state-of-the-art *LLM routers*, systems that dynamically direct queries to either high-quality closed LLMs or cost-effective open-source LLMs, based on query complexity, optimizing both response quality and cost. 5 | 6 | 2. This tutorial provides an in-depth guide on building an LLM router *based on a causal-LLM classifier*, starting with generating labeled data, finetuning an LLM-based classifier with Anyscale's API, and finally running offline evaluations. 7 | 8 | 3. In collaboration with the Berkeley LMSys group, we release an [arXiv paper](https://arxiv.org/pdf/2406.18665) presenting extensive evaluations of this model along with other models. Overall, our LLM Routers can achieve the same performance as our baselines with up to a 70% cost reduction on MT Bench, a 30% cost reduction on MMLU, and a 40% cost reduction on GSM8K. 9 | 10 | # Background 11 | When developing applications using Large Language Models (LLMs), achieving high-quality responses while maintaining a budget is a key challenge. Closed models like GPT-4 provide superior quality but are costly, especially with a high volume of queries. Conversely, Open Source Software (OSS) models are more economical but may not match the quality, especially for complex or domain-specific queries. 12 | 13 | An **LLM Router** helps balance these aspects by deciding which queries are routed to a closed LLM and which to an OSS LLM based on the query's complexity or domain specificity. Below is a schematic representation of an LLM Router: 14 | 15 |
100 | | prompt | 101 |answers | 102 |turns | 103 |source | 104 |good_natured | 105 |
---|---|---|---|---|---|
0 | 110 |\n\nHuman: 0.002 = 1000 \n1 = x?\n\nAssistant: | 111 |
112 | [ {'answer': 'To find the value of x, we can set up a proportion using the given information: 113 | 0.002/1000 = 1/x 114 | ... x = 500,000', 'model': 'gpt-3.5-turbo', 'rank': 3.0}, 115 | {'answer': 'If 0.002 equals 1000, then 1 would equal 500,000.', 'model': 'gpt-4', 'rank': 4.0}, 116 | {'answer': 'I apologize, but the equation you provided is not correct. 117 | ...', 'model': 'llama-2-7b-chat', 'rank': 5.0}, 118 | {'answer': '0.001 = x', 'model': 'gpt-3.5-turbo-instruct', 'rank': 6.0}, 119 | {'answer': 'It seems like you are asking for the value of x in the equation x = 1/0.002. 120 | To solve this equation, you can divide both sides by 0.002 to get: 121 | x = 1/0.002 122 | x = 500 123 | Therefore, x = 500.', 'model': 'mistral-7b-instruct-v0.1', 'rank': 7.0} ] |
124 | 1 | 125 |[sharegpt] | 126 |True | 127 |
168 | | prompt | 169 |source | 170 |gpt4_response | 171 |
---|---|---|---|
6062 | 176 |Based on the features mentioned, which hotel d... | 177 |[evol_instruct] | 178 |Based on the features mentioned, Hotel A seems... | 179 |
113830 | 182 |Provide step-by-step instructions on how to cr... | 183 |[ultrachat] | 184 |Sure, here's a simple step-by-step guide on ho... | 185 |
138869 | 188 |What are the 10 largest cities in the US by po... | 189 |[lmsys-chat-1m] | 190 |As of the most recent data available, the 10 l... | 191 |
169249 | 194 |Write a comparison essay of at least 500 words... | 195 |[ultrachat] | 196 |Title: A Comparative Analysis of Driving a Car... | 197 |
116934 | 200 |Q: You are provided with an "Event", "Intent" ... | 201 |[flan_v2_niv2] | 202 |PersonX might feel satisfied or content using ... | 203 |
271 | | prompt | 272 |source | 273 |gpt4_response | 274 |mixtral_response | 275 |
---|---|---|---|---|
6062 | 280 |Based on the features mentioned, which hotel d... | 281 |[evol_instruct] | 282 |Based on the features mentioned, Hotel A seems... | 283 |Based on the information provided, I would sa... | 284 |
113830 | 287 |Provide step-by-step instructions on how to cr... | 288 |[ultrachat] | 289 |Sure, here's a simple step-by-step guide on ho... | 290 |Sure, I'd be happy to help you make a homemad... | 291 |
138869 | 294 |What are the 10 largest cities in the US by po... | 295 |[lmsys-chat-1m] | 296 |As of the most recent data available, the 10 l... | 297 |Sure, I'd be happy to help with that! Here ar... | 298 |
169249 | 301 |Write a comparison essay of at least 500 words... | 302 |[ultrachat] | 303 |Title: A Comparative Analysis of Driving a Car... | 304 |Title: The Great Debate: Driving a Car vs. Ri... | 305 |
116934 | 308 |Q: You are provided with an "Event", "Intent" ... | 309 |[flan_v2_niv2] | 310 |PersonX might feel satisfied or content using ... | 311 |PersonX probably feels comfortable and focuse... | 312 |
399 | | prompt | 400 |source | 401 |gpt4_response | 402 |mixtral_response | 403 |mixtral_score | 404 |
---|---|---|---|---|---|
6062 | 409 |Based on the features mentioned, which hotel d... | 410 |[evol_instruct] | 411 |Based on the features mentioned, Hotel A seems... | 412 |Based on the information provided, I would sa... | 413 |5 | 414 |
113830 | 417 |Provide step-by-step instructions on how to cr... | 418 |[ultrachat] | 419 |Sure, here's a simple step-by-step guide on ho... | 420 |Sure, I'd be happy to help you make a homemad... | 421 |3 | 422 |
138869 | 425 |What are the 10 largest cities in the US by po... | 426 |[lmsys-chat-1m] | 427 |As of the most recent data available, the 10 l... | 428 |Sure, I'd be happy to help with that! Here ar... | 429 |5 | 430 |
169249 | 433 |Write a comparison essay of at least 500 words... | 434 |[ultrachat] | 435 |Title: A Comparative Analysis of Driving a Car... | 436 |Title: The Great Debate: Driving a Car vs. Ri... | 437 |4 | 438 |
116934 | 441 |Q: You are provided with an "Event", "Intent" ... | 442 |[flan_v2_niv2] | 443 |PersonX might feel satisfied or content using ... | 444 |PersonX probably feels comfortable and focuse... | 445 |5 | 446 |
478 | | prompt | 479 |source | 480 |gpt4_response | 481 |mixtral_response | 482 |mixtral_score | 483 |
---|---|---|---|---|---|
0 | 488 |I'll give you a review, can you extract the fo... | 489 |[lmsys-chat-1m] | 490 |Sure, here's the analysis of the review:\n\n1.... | 491 |Food aspects and opinion words:\n\n1. Made to ... | 492 |4 | 493 |
1 | 496 |Answer the following question: Claim: "Joker m... | 497 |[flan_v2_cot] | 498 |The answer is no.\nChain of thoughts: Stan Lee... | 499 |The answer is no.\n\nChain of thoughts: While ... | 500 |5 | 501 |
2 | 504 |TASK DEFINITION: In this task you will be give... | 505 |[flan_v2_niv2] | 506 |ZdoublexpropheciesS | 507 |ZdoublexpropheciesS\n\nIn this task, you are a... | 508 |5 | 509 |
3 | 512 |Detailed Instructions: In this task, you need ... | 513 |[flan_v2_niv2] | 514 |Yes | 515 |No, 'station' is not the longest word in the s... | 516 |5 | 517 |
4 | 520 |A guy pick up a woman Then he puts her down Ex... | 521 |[sharegpt] | 522 |This phrase could be interpreted as a joke bec... | 523 |This joke is a play on words and relies on the... | 524 |5 | 525 |