├── .env_sample ├── .gitignore ├── CODE_OF_CONDUCT.md ├── LICENSE ├── README.md ├── SECURITY.md ├── SUPPORT.md ├── case-studies ├── document-processing │ ├── document-processing.ipynb │ └── document_with_errors.txt └── retrieval-augmented-generation │ ├── product_information.txt │ ├── retrieval-augmented-generation.ipynb │ └── troubleshooting_information.txt ├── notebooks-with-techniques ├── avoid-rewriting-documents │ ├── avoid-rewriting-documents.ipynb │ └── document_with_errors.txt ├── generation-token-compression │ ├── generation-token-compression.ipynb │ └── sales_report.txt ├── inconclusive-techniques │ └── optimise_max_tokens_parameter │ │ └── optimise_max_tokens_parameter.ipynb ├── load-balancing │ └── load-balancing.ipynb ├── multilingual-optimization │ └── multilingual-optimization.ipynb ├── parallelization │ └── parallelization.ipynb ├── semantic-caching │ └── semantic-caching.ipynb └── use-models-with-faster-time-between-tokens │ └── use-models-with-faster-time-between-tokens.ipynb ├── pip_freeze_sample.txt └── requirements.txt /.env_sample: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/.env_sample -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/.gitignore -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/CODE_OF_CONDUCT.md -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/README.md -------------------------------------------------------------------------------- /SECURITY.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/SECURITY.md -------------------------------------------------------------------------------- /SUPPORT.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/SUPPORT.md -------------------------------------------------------------------------------- /case-studies/document-processing/document-processing.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/case-studies/document-processing/document-processing.ipynb -------------------------------------------------------------------------------- /case-studies/document-processing/document_with_errors.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/case-studies/document-processing/document_with_errors.txt -------------------------------------------------------------------------------- /case-studies/retrieval-augmented-generation/product_information.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/case-studies/retrieval-augmented-generation/product_information.txt -------------------------------------------------------------------------------- /case-studies/retrieval-augmented-generation/retrieval-augmented-generation.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/case-studies/retrieval-augmented-generation/retrieval-augmented-generation.ipynb -------------------------------------------------------------------------------- /case-studies/retrieval-augmented-generation/troubleshooting_information.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/case-studies/retrieval-augmented-generation/troubleshooting_information.txt -------------------------------------------------------------------------------- /notebooks-with-techniques/avoid-rewriting-documents/avoid-rewriting-documents.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/notebooks-with-techniques/avoid-rewriting-documents/avoid-rewriting-documents.ipynb -------------------------------------------------------------------------------- /notebooks-with-techniques/avoid-rewriting-documents/document_with_errors.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/notebooks-with-techniques/avoid-rewriting-documents/document_with_errors.txt -------------------------------------------------------------------------------- /notebooks-with-techniques/generation-token-compression/generation-token-compression.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/notebooks-with-techniques/generation-token-compression/generation-token-compression.ipynb -------------------------------------------------------------------------------- /notebooks-with-techniques/generation-token-compression/sales_report.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/notebooks-with-techniques/generation-token-compression/sales_report.txt -------------------------------------------------------------------------------- /notebooks-with-techniques/inconclusive-techniques/optimise_max_tokens_parameter/optimise_max_tokens_parameter.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/notebooks-with-techniques/inconclusive-techniques/optimise_max_tokens_parameter/optimise_max_tokens_parameter.ipynb -------------------------------------------------------------------------------- /notebooks-with-techniques/load-balancing/load-balancing.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/notebooks-with-techniques/load-balancing/load-balancing.ipynb -------------------------------------------------------------------------------- /notebooks-with-techniques/multilingual-optimization/multilingual-optimization.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/notebooks-with-techniques/multilingual-optimization/multilingual-optimization.ipynb -------------------------------------------------------------------------------- /notebooks-with-techniques/parallelization/parallelization.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/notebooks-with-techniques/parallelization/parallelization.ipynb -------------------------------------------------------------------------------- /notebooks-with-techniques/semantic-caching/semantic-caching.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/notebooks-with-techniques/semantic-caching/semantic-caching.ipynb -------------------------------------------------------------------------------- /notebooks-with-techniques/use-models-with-faster-time-between-tokens/use-models-with-faster-time-between-tokens.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/notebooks-with-techniques/use-models-with-faster-time-between-tokens/use-models-with-faster-time-between-tokens.ipynb -------------------------------------------------------------------------------- /pip_freeze_sample.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/pip_freeze_sample.txt -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/HEAD/requirements.txt --------------------------------------------------------------------------------