├── LICENSE
├── RAG_Experimentation_Framework_final.ipynb
└── README.md


/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/RAG_Experimentation_Framework_final.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |   "nbformat": 4,
   3 |   "nbformat_minor": 0,
   4 |   "metadata": {
   5 |     "colab": {
   6 |       "provenance": [],
   7 |       "machine_shape": "hm",
   8 |       "gpuType": "L4",
   9 |       "authorship_tag": "ABX9TyPKWVIVeEtb8/A7NP4NfWMT",
  10 |       "include_colab_link": true
  11 |     },
  12 |     "kernelspec": {
  13 |       "name": "python3",
  14 |       "display_name": "Python 3"
  15 |     },
  16 |     "language_info": {
  17 |       "name": "python"
  18 |     },
  19 |     "accelerator": "GPU"
  20 |   },
  21 |   "cells": [
  22 |     {
  23 |       "cell_type": "markdown",
  24 |       "metadata": {
  25 |         "id": "view-in-github",
  26 |         "colab_type": "text"
  27 |       },
  28 |       "source": [
  29 |         "<a href=\"https://colab.research.google.com/github/wjleece/rag-experimentation-framework/blob/main/RAG_Experimentation_Framework_final.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
  30 |       ]
  31 |     },
  32 |     {
  33 |       "cell_type": "markdown",
  34 |       "source": [
  35 |         "If you use this code, please cite:\n",
  36 |         "\n",
  37 |         "{\n",
  38 |         "  title = {RAG Experimentation Framework},\n",
  39 |         "\n",
  40 |         "  author = {Bill Leece},\n",
  41 |         "\n",
  42 |         "  year = {2024}\n",
  43 |         "}"
  44 |       ],
  45 |       "metadata": {
  46 |         "id": "wZ0kV_UtQn5O"
  47 |       }
  48 |     },
  49 |     {
  50 |       "cell_type": "markdown",
  51 |       "source": [
  52 |         "#Setup"
  53 |       ],
  54 |       "metadata": {
  55 |         "id": "_lHNBLR-92Zk"
  56 |       }
  57 |     },
  58 |     {
  59 |       "cell_type": "code",
  60 |       "source": [
  61 |         "!pip install -U transformers --quiet\n",
  62 |         "#!pip install -U optimum --quiet\n",
  63 |         "!pip install -U accelerate  --quiet\n",
  64 |         "!pip install -U bitsandbytes  --quiet\n",
  65 |         "!pip install -U torch --quiet\n",
  66 |         "!pip install -U sentencepiece --quiet\n",
  67 |         "!pip install -U llama-index --quiet\n",
  68 |         "!pip install -U llama-index-llms-mistralai --quiet\n",
  69 |         "!pip install -U llama-index-embeddings-mistralai --quiet\n",
  70 |         "!pip install -U llama-index-llms-langchain --quiet\n",
  71 |         "!pip install -U langchain --quiet\n",
  72 |         "!pip install -U langchain-community --quiet\n",
  73 |         "!pip install -U langchain-mistralai --quiet\n",
  74 |         "!pip install -U langchain_huggingface --quiet\n",
  75 |         "!pip install -U faiss-gpu --quiet"
  76 |       ],
  77 |       "metadata": {
  78 |         "id": "4g_Vs7wgZW-8"
  79 |       },
  80 |       "execution_count": null,
  81 |       "outputs": []
  82 |     },
  83 |     {
  84 |       "cell_type": "code",
  85 |       "source": [
  86 |         "import os\n",
  87 |         "import json\n",
  88 |         "import numpy as np\n",
  89 |         "import faiss\n",
  90 |         "import transformers\n",
  91 |         "import torch\n",
  92 |         "import gc\n",
  93 |         "import openai\n",
  94 |         "import json\n",
  95 |         "import tiktoken\n",
  96 |         "import textwrap\n",
  97 |         "import time\n",
  98 |         "from google.colab import drive, userdata\n",
  99 |         "from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline\n",
 100 |         "from langchain.prompts import PromptTemplate\n",
 101 |         "from langchain_huggingface import HuggingFacePipeline\n",
 102 |         "from langchain_core.output_parsers import StrOutputParser\n",
 103 |         "from langchain_mistralai.chat_models import ChatMistralAI\n",
 104 |         "from llama_index.embeddings.mistralai import MistralAIEmbedding\n",
 105 |         "from llama_index.core import SimpleDirectoryReader, Settings\n",
 106 |         "from llama_index.core.node_parser import SemanticSplitterNodeParser\n",
 107 |         "import time\n",
 108 |         "from typing import List, Dict, Tuple\n",
 109 |         "from contextlib import contextmanager\n",
 110 |         "from langchain.schema.runnable import RunnableSequence\n",
 111 |         "from langchain.schema.output_parser import StrOutputParser\n",
 112 |         "from langchain_text_splitters.markdown import MarkdownHeaderTextSplitter\n",
 113 |         "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
 114 |         "from datetime import datetime\n",
 115 |         "from typing import Dict, List, Any"
 116 |       ],
 117 |       "metadata": {
 118 |         "id": "Ao7eaSfq-TKs"
 119 |       },
 120 |       "execution_count": null,
 121 |       "outputs": []
 122 |     },
 123 |     {
 124 |       "cell_type": "code",
 125 |       "source": [
 126 |         "os.environ[\"HF_TOKEN\"] = userdata.get('HF_TOKEN')\n",
 127 |         "os.environ[\"MISTRAL_API_KEY\"] = userdata.get('MISTRAL_API_KEY')\n",
 128 |         "api_key = userdata.get('OPENAI_API_KEY')"
 129 |       ],
 130 |       "metadata": {
 131 |         "id": "YvGHY024-OXK"
 132 |       },
 133 |       "execution_count": null,
 134 |       "outputs": []
 135 |     },
 136 |     {
 137 |       "cell_type": "code",
 138 |       "source": [
 139 |         "device = 'cuda' if torch.cuda.is_available() else 'cpu' #Use GPUs when possible"
 140 |       ],
 141 |       "metadata": {
 142 |         "id": "mxAHV7T_-Xlh"
 143 |       },
 144 |       "execution_count": null,
 145 |       "outputs": []
 146 |     },
 147 |     {
 148 |       "cell_type": "markdown",
 149 |       "source": [
 150 |         "#Experiment Configurations"
 151 |       ],
 152 |       "metadata": {
 153 |         "id": "jkqEV8M_HUKG"
 154 |       }
 155 |     },
 156 |     {
 157 |       "cell_type": "code",
 158 |       "source": [
 159 |         "# Setup configurations\n",
 160 |         "MODEL_CONFIGS = {\n",
 161 |         "    \"models\": [\n",
 162 |         "    #    {\n",
 163 |         "    #        \"name\": \"open-mixtral-8x7b\",\n",
 164 |         "    #        \"type\": \"mistral_api\",\n",
 165 |         "    #        \"tokenizer\": None,  # Not needed for API models\n",
 166 |         "    #    },\n",
 167 |         "\n",
 168 |         "          {\n",
 169 |         "            \"name\": \"mistral-large-latest\",\n",
 170 |         "            \"type\": \"mistral_api\",\n",
 171 |         "            \"tokenizer\": None,  # Not needed for API models\n",
 172 |         "         },\n",
 173 |         "\n",
 174 |         "         {\n",
 175 |         "            \"name\": \"open-mistral-nemo\",\n",
 176 |         "            \"type\": \"mistral_api\",\n",
 177 |         "            \"tokenizer\": None,  # Not needed for API models\n",
 178 |         "        },\n",
 179 |         "#        {\n",
 180 |         "#            \"name\": \"ministral-8b-latest\",\n",
 181 |         "#            \"type\": \"mistral_api\",\n",
 182 |         "#            \"tokenizer\": None,  # Not needed for API models\n",
 183 |         "#        },\n",
 184 |         " #       {\n",
 185 |         " #             \"name\": \"meta-llama/Llama-3.1-8B-Instruct\",\n",
 186 |         " #             \"type\": \"huggingface\",\n",
 187 |         " #             \"tokenizer\": \"meta-llama/Llama-3.1-8B-Instruct\"\n",
 188 |         " #       },\n",
 189 |         "\n",
 190 |         "   #   {\n",
 191 |         "   #         \"name\": \"wjleece/quantized-mistral-7b\",\n",
 192 |         "   #         \"type\": \"huggingface\",\n",
 193 |         "   #         \"tokenizer\": \"mistralai/Mixtral-8x7B-v0.1\",  # The same tokenizer that works on the base model will work on the quantized model - there is no 'quantized tokenizer'\n",
 194 |         "   #          \"quantization_config\": {                    #Quantization config left here as a reference, but not used in the code (as we're using an already quantized model from HuggingFace)\n",
 195 |         "   #             \"load_in_4bit\": True,\n",
 196 |         "   #             \"bnb_4bit_compute_dtype\": \"float16\",\n",
 197 |         "   #             \"bnb_4bit_quant_type\": \"nf4\",\n",
 198 |         "   #             \"bnb_4bit_use_double_quant\": False\n",
 199 |         "   #         }\n",
 200 |         "   #     },\n",
 201 |         "      {\n",
 202 |         "              \"name\": \"wjleece/quantized-mistral-nemo-12b\",\n",
 203 |         "              \"type\": \"huggingface\",\n",
 204 |         "              \"tokenizer\": \"mistralai/Mistral-Nemo-Instruct-2407\",  # The same tokenizer that works on the base model will work on the quantized model - there is no 'quantized tokenizer'\n",
 205 |         "              \"quantization_config\": {                    #Quantization config left here as a reference, but not used in the code (as we're using an already quantized model from HuggingFace)\n",
 206 |         "                  \"load_in_4bit\": True,\n",
 207 |         "                  \"bnb_4bit_compute_dtype\": \"float16\",\n",
 208 |         "                  \"bnb_4bit_quant_type\": \"nf4\",\n",
 209 |         "                  \"bnb_4bit_use_double_quant\": False\n",
 210 |         "             }\n",
 211 |         "          },\n",
 212 |         "   #    {\n",
 213 |         "   #           \"name\": \"wjleece/quantized-mistral-8b\",\n",
 214 |         "   #           \"type\": \"huggingface\",\n",
 215 |         "   #           \"tokenizer\": \"mistralai/Ministral-8B-Instruct-2410\",  # The same tokenizer that works on the base model will work on the quantized model - there is no 'quantized tokenizer'\n",
 216 |         "   #           \"quantization_config\": {                    #Quantization config left here as a reference, but not used in the code (as we're using an already quantized model from HuggingFace)\n",
 217 |         "   #               \"load_in_4bit\": True,\n",
 218 |         "   #               \"bnb_4bit_compute_dtype\": \"float16\",\n",
 219 |         "   #               \"bnb_4bit_quant_type\": \"nf4\",\n",
 220 |         "   #              \"bnb_4bit_use_double_quant\": False\n",
 221 |         "   #           }\n",
 222 |         "   #       },\n",
 223 |         "   #     {\n",
 224 |         "   #           \"name\": \"wjleece/quantized-llama-3.1-8b\",\n",
 225 |         "   #           \"type\": \"huggingface\",\n",
 226 |         "   #           \"tokenizer\": \"meta-llama/Llama-3.1-8B-Instruct\",  # The same tokenizer that works on the base model will work on the quantized model - there is no 'quantized tokenizer'\n",
 227 |         "   #           \"quantization_config\": {                    #Quantization config left here as a reference, but not used in the code (as we're using an already quantized model from HuggingFace)\n",
 228 |         "   #               \"load_in_4bit\": True,\n",
 229 |         "   #               \"bnb_4bit_compute_dtype\": \"float16\",\n",
 230 |         "   #               \"bnb_4bit_quant_type\": \"nf4\",\n",
 231 |         "   #               \"bnb_4bit_use_double_quant\": False\n",
 232 |         "   #           }\n",
 233 |         "   #       }\n",
 234 |         "       ]\n",
 235 |         "}\n",
 236 |         "\n",
 237 |         "\n",
 238 |         "CHUNKING_CONFIGS = {\n",
 239 |         "    \"strategies\": [\"paragraph\", \"header\"],\n",
 240 |         "    \"semantic_config\": {\n",
 241 |         "        \"enabled\": True,\n",
 242 |         "        \"thresholds\": [85, 95] if True else []\n",
 243 |         "    },\n",
 244 |         "    \"max_chunk_size\": 2048,\n",
 245 |         "    \"chunk_overlap\": 100,\n",
 246 |         "    \"min_chunk_size\": 35 #we'll ignore any chunk ~5 words or less\n",
 247 |         "}\n",
 248 |         "\n",
 249 |         "QUESTION_CONFIGS = {\n",
 250 |         "    \"questions\": [\n",
 251 |         "        \"What were cloud revenues in the most recent quarter?\",\n",
 252 |         "        \"What were the main drivers of revenue growth in the most recent quarter?\",\n",
 253 |         "        \"How much did YouTube ad revenues grow in the most recent quarter in APAC?\",\n",
 254 |         "        \"Can you summarize recent key antitrust matters?\",\n",
 255 |         "        \"Compare the revenue growth across all geographic regions and explain the main factors for each region.\",\n",
 256 |         "        \"Summarize all mentioned risk factors related to international operations.\",\n",
 257 |         "        \"What were the major changes in operating expenses across all categories and their stated reasons?\",\n",
 258 |         "    ] #These quetsions should relate to the RAG document --> these are your 'business use cases'\n",
 259 |         "}\n",
 260 |         "\n",
 261 |         "FILE_CONFIGS = {\n",
 262 |         "    \"save_directory\": '/content/drive/My Drive/AI/Model_Analysis'\n",
 263 |         "}"
 264 |       ],
 265 |       "metadata": {
 266 |         "id": "YDjgk_JhHWkj"
 267 |       },
 268 |       "execution_count": null,
 269 |       "outputs": []
 270 |     },
 271 |     {
 272 |       "cell_type": "markdown",
 273 |       "source": [
 274 |         "#Load RAG Document"
 275 |       ],
 276 |       "metadata": {
 277 |         "id": "e_wxgOGc95sf"
 278 |       }
 279 |     },
 280 |     {
 281 |       "cell_type": "code",
 282 |       "source": [
 283 |         "drive.mount('/content/drive')\n",
 284 |         "documents = SimpleDirectoryReader(input_files=[\"/content/drive/My Drive/AI/Datasets/Google-10-q/goog-10-q-q3-2024.pdf\"]).load_data()"
 285 |       ],
 286 |       "metadata": {
 287 |         "id": "gS4Lemk09v9Y"
 288 |       },
 289 |       "execution_count": null,
 290 |       "outputs": []
 291 |     },
 292 |     {
 293 |       "cell_type": "markdown",
 294 |       "source": [
 295 |         "#RAG Pipeline Class"
 296 |       ],
 297 |       "metadata": {
 298 |         "id": "MgLxma5M-bZA"
 299 |       }
 300 |     },
 301 |     {
 302 |       "cell_type": "code",
 303 |       "source": [
 304 |         "# Global singleton instance\n",
 305 |         "_GLOBAL_RAG_PIPELINE = None\n",
 306 |         "\n",
 307 |         "class RAGPipeline:\n",
 308 |         "    def __init__(self):\n",
 309 |         "        self.chunk_cache = {}\n",
 310 |         "        self.embedding_cache = {}\n",
 311 |         "        self.embedding_model = None\n",
 312 |         "\n",
 313 |         "    @classmethod\n",
 314 |         "    def get_instance(cls):\n",
 315 |         "        \"\"\"Get or create singleton instance\"\"\"\n",
 316 |         "        global _GLOBAL_RAG_PIPELINE\n",
 317 |         "        if _GLOBAL_RAG_PIPELINE is None:\n",
 318 |         "            _GLOBAL_RAG_PIPELINE = cls()\n",
 319 |         "        return _GLOBAL_RAG_PIPELINE\n",
 320 |         "\n",
 321 |         "\n",
 322 |         "    def initialize_embedding_model(self):\n",
 323 |         "        \"\"\"Initialize the embedding model if not already initialized\"\"\"\n",
 324 |         "        if self.embedding_model is None:\n",
 325 |         "            mistral_api_key = userdata.get('MISTRAL_API_KEY')\n",
 326 |         "            self.embedding_model = MistralAIEmbedding(\n",
 327 |         "                model_name=\"mistral-embed\",\n",
 328 |         "                api_key=mistral_api_key\n",
 329 |         "            )\n",
 330 |         "        return self.embedding_model\n",
 331 |         "\n",
 332 |         "    def convert_to_markdown_headers(self, text):\n",
 333 |         "        \"\"\"Convert document section titles to markdown headers\"\"\"\n",
 334 |         "        import re\n",
 335 |         "\n",
 336 |         "        patterns = [\n",
 337 |         "            (r'^(?:ITEM|Section)\\s+\\d+[.:]\\s*(.+)$', '# '),\n",
 338 |         "            (r'^\\d+\\.\\d+\\s+(.+)$', '## '),\n",
 339 |         "            (r'^\\([a-z]\\)\\s+(.+)$', '### ')\n",
 340 |         "        ]\n",
 341 |         "\n",
 342 |         "        lines = text.split('\\n')\n",
 343 |         "        markdown_lines = []\n",
 344 |         "\n",
 345 |         "        for line in lines:\n",
 346 |         "            line = line.strip()\n",
 347 |         "            converted = False\n",
 348 |         "\n",
 349 |         "            for pattern, header_mark in patterns:\n",
 350 |         "                if re.match(pattern, line, re.IGNORECASE):\n",
 351 |         "                    markdown_lines.append(f\"{header_mark}{line}\")\n",
 352 |         "                    converted = True\n",
 353 |         "                    break\n",
 354 |         "\n",
 355 |         "            if not converted:\n",
 356 |         "                markdown_lines.append(line)\n",
 357 |         "\n",
 358 |         "        return '\\n'.join(markdown_lines)\n",
 359 |         "\n",
 360 |         "\n",
 361 |         "    def create_chunks(self, documents: List, threshold: int, chunk_strategy: str = \"semantic\") -> Dict:\n",
 362 |         "        \"\"\"Create or retrieve chunks based on specified strategy\"\"\"\n",
 363 |         "\n",
 364 |         "        MAX_CHUNK_SIZE = CHUNKING_CONFIGS['max_chunk_size']\n",
 365 |         "        CHUNK_OVERLAP = CHUNKING_CONFIGS['chunk_overlap']\n",
 366 |         "        MIN_CHUNK_SIZE = CHUNKING_CONFIGS['min_chunk_size']\n",
 367 |         "\n",
 368 |         "\n",
 369 |         "        if chunk_strategy == \"semantic\":\n",
 370 |         "            cache_key = f\"{chunk_strategy}_{threshold}\"\n",
 371 |         "            print(f\"Using semantic cache key: {cache_key} with threshold: {threshold}\")\n",
 372 |         "        else:\n",
 373 |         "            cache_key = f\"{chunk_strategy}_{MAX_CHUNK_SIZE}\"\n",
 374 |         "            print(f\"Using non-semantic cache key: {cache_key}\")\n",
 375 |         "\n",
 376 |         "\n",
 377 |         "        if cache_key not in self.chunk_cache:\n",
 378 |         "            print(\"\\nStarting new chunk creation:\")\n",
 379 |         "            texts = []\n",
 380 |         "\n",
 381 |         "            try:\n",
 382 |         "                if chunk_strategy == \"semantic\":\n",
 383 |         "                    print(\"Processing semantic chunking...\")\n",
 384 |         "                    if self.embedding_model is None:\n",
 385 |         "                        print(\"Initializing embedding model\")\n",
 386 |         "                        self.initialize_embedding_model()\n",
 387 |         "\n",
 388 |         "                    splitter = SemanticSplitterNodeParser(\n",
 389 |         "                        buffer_size=1,\n",
 390 |         "                        breakpoint_percentile_threshold=threshold,\n",
 391 |         "                        embed_model=self.embedding_model\n",
 392 |         "                    )\n",
 393 |         "                    nodes = splitter.get_nodes_from_documents(documents)\n",
 394 |         "                    texts = [node.text for node in nodes]\n",
 395 |         "                    print(f\"Generated {len(texts)} semantic chunks\")\n",
 396 |         "\n",
 397 |         "                elif chunk_strategy == \"paragraph\":\n",
 398 |         "                    print(\"Processing paragraph chunking...\")\n",
 399 |         "                    text_splitter = RecursiveCharacterTextSplitter(\n",
 400 |         "                        separators=[\"\\n\\n\", \"\\n\", \". \", \" \", \"\"],\n",
 401 |         "                        chunk_size=MAX_CHUNK_SIZE,\n",
 402 |         "                        chunk_overlap=CHUNK_OVERLAP,\n",
 403 |         "                        length_function=len\n",
 404 |         "                    )\n",
 405 |         "\n",
 406 |         "                    for idx, doc in enumerate(documents):\n",
 407 |         "                        print(f\"\\nProcessing document {idx + 1}/{len(documents)}\")\n",
 408 |         "                        print(f\"Document length: {len(doc.text)} characters\")\n",
 409 |         "                        doc_chunks = text_splitter.split_text(doc.text)\n",
 410 |         "                        print(f\"Initial chunks from document: {len(doc_chunks)}\")\n",
 411 |         "                        if doc_chunks:\n",
 412 |         "                            print(f\"Sample chunk lengths: {[len(c) for c in doc_chunks[:3]]}\")\n",
 413 |         "                        texts.extend(doc_chunks)\n",
 414 |         "\n",
 415 |         "                elif chunk_strategy == \"header\":\n",
 416 |         "                    print(\"Processing header chunking...\")\n",
 417 |         "                    text_splitter = RecursiveCharacterTextSplitter(\n",
 418 |         "                        separators=[\"\\n\\n\", \"\\n\", \". \", \" \", \"\"],\n",
 419 |         "                        chunk_size=MAX_CHUNK_SIZE,\n",
 420 |         "                        chunk_overlap=CHUNK_OVERLAP,\n",
 421 |         "                        length_function=len\n",
 422 |         "                    )\n",
 423 |         "\n",
 424 |         "                    for idx, doc in enumerate(documents):\n",
 425 |         "                        print(f\"\\nProcessing document {idx + 1}/{len(documents)}\")\n",
 426 |         "                        md_text = self.convert_to_markdown_headers(doc.text)\n",
 427 |         "                        print(\"Headers identified. First 100 chars of markdown text:\")\n",
 428 |         "                        print(md_text[:100] + \"...\")\n",
 429 |         "\n",
 430 |         "                        headers_to_split_on = [\n",
 431 |         "                            (\"#\", \"Header 1\"),\n",
 432 |         "                            (\"##\", \"Header 2\"),\n",
 433 |         "                            (\"###\", \"Header 3\"),\n",
 434 |         "                        ]\n",
 435 |         "\n",
 436 |         "                        header_splitter = MarkdownHeaderTextSplitter(\n",
 437 |         "                            headers_to_split_on=headers_to_split_on\n",
 438 |         "                        )\n",
 439 |         "\n",
 440 |         "                        splits = header_splitter.split_text(md_text)\n",
 441 |         "                        print(f\"Generated {len(splits)} header sections\")\n",
 442 |         "                        if splits:\n",
 443 |         "                            print(\"Sample section lengths:\", [len(s.page_content) for s in splits[:3]])\n",
 444 |         "\n",
 445 |         "                        for split in splits:\n",
 446 |         "                            if len(split.page_content) > MAX_CHUNK_SIZE:\n",
 447 |         "                                print(f\"Splitting large section: {len(split.page_content)} chars\")\n",
 448 |         "                                subsections = text_splitter.split_text(split.page_content)\n",
 449 |         "                                print(f\"Created {len(subsections)} subsections\")\n",
 450 |         "                                texts.extend(subsections)\n",
 451 |         "                            else:\n",
 452 |         "                                texts.append(split.page_content)\n",
 453 |         "\n",
 454 |         "                print(\"\\nCleaning and filtering chunks...\")\n",
 455 |         "                initial_count = len(texts)\n",
 456 |         "                cleaned_texts = []\n",
 457 |         "                for idx, text in enumerate(texts):\n",
 458 |         "                    if not isinstance(text, str):\n",
 459 |         "                        print(f\"Warning: Non-string chunk found at index {idx}\")\n",
 460 |         "                        continue\n",
 461 |         "\n",
 462 |         "                    cleaned_text = text.strip()\n",
 463 |         "                    if len(cleaned_text) >= MIN_CHUNK_SIZE:\n",
 464 |         "                        cleaned_texts.append(cleaned_text)\n",
 465 |         "                    else:\n",
 466 |         "                        print(f\"Filtered out small chunk: {len(cleaned_text)} chars\")\n",
 467 |         "\n",
 468 |         "                texts = cleaned_texts\n",
 469 |         "                print(f\"Chunks after cleaning: {len(texts)} (removed {initial_count - len(texts)})\")\n",
 470 |         "\n",
 471 |         "                if not texts:\n",
 472 |         "                    print(\"WARNING: No valid chunks generated!\")\n",
 473 |         "                    return {\n",
 474 |         "                        'texts': [],\n",
 475 |         "                        'strategy': chunk_strategy,\n",
 476 |         "                        'chunk_stats': {\n",
 477 |         "                            'num_chunks': 0,\n",
 478 |         "                            'avg_chunk_size': 0,\n",
 479 |         "                            'min_chunk_size': 0,\n",
 480 |         "                            'max_chunk_size': 0\n",
 481 |         "                        }\n",
 482 |         "                    }\n",
 483 |         "\n",
 484 |         "                # Calculate chunk statistics\n",
 485 |         "                chunk_lengths = [len(t) for t in texts]\n",
 486 |         "                chunk_stats = {\n",
 487 |         "                    'num_chunks': len(texts),\n",
 488 |         "                    'avg_chunk_size': sum(chunk_lengths)/len(texts),\n",
 489 |         "                    'min_chunk_size': min(chunk_lengths),\n",
 490 |         "                    'max_chunk_size': max(chunk_lengths)\n",
 491 |         "                }\n",
 492 |         "\n",
 493 |         "                print(\"\\nFinal Chunk Statistics:\")\n",
 494 |         "                print(f\"Total chunks: {chunk_stats['num_chunks']}\")\n",
 495 |         "                print(f\"Average chunk size: {chunk_stats['avg_chunk_size']:.2f} chars\")\n",
 496 |         "                print(f\"Minimum chunk size: {chunk_stats['min_chunk_size']} chars\")\n",
 497 |         "                print(f\"Maximum chunk size: {chunk_stats['max_chunk_size']} chars\")\n",
 498 |         "\n",
 499 |         "                print(\"\\nSample of first chunk:\")\n",
 500 |         "                if texts:\n",
 501 |         "                    print(texts[0][:200] + \"...\")\n",
 502 |         "\n",
 503 |         "                # Store in cache\n",
 504 |         "                self.chunk_cache[cache_key] = {\n",
 505 |         "                    'texts': texts,\n",
 506 |         "                    'strategy': chunk_strategy,\n",
 507 |         "                    'chunk_stats': chunk_stats\n",
 508 |         "                }\n",
 509 |         "                print(f\"\\nStored chunks in cache with key: {cache_key}\")\n",
 510 |         "\n",
 511 |         "            except Exception as e:\n",
 512 |         "                print(\"\\nERROR in chunk creation:\")\n",
 513 |         "                print(f\"Error type: {type(e).__name__}\")\n",
 514 |         "                print(f\"Error message: {str(e)}\")\n",
 515 |         "                import traceback\n",
 516 |         "                print(\"\\nTraceback:\")\n",
 517 |         "                print(traceback.format_exc())\n",
 518 |         "                return {\n",
 519 |         "                    'texts': [],\n",
 520 |         "                    'strategy': chunk_strategy,\n",
 521 |         "                    'chunk_stats': {\n",
 522 |         "                        'num_chunks': 0,\n",
 523 |         "                        'avg_chunk_size': 0,\n",
 524 |         "                        'min_chunk_size': 0,\n",
 525 |         "                        'max_chunk_size': 0\n",
 526 |         "                    }\n",
 527 |         "                }\n",
 528 |         "        else:\n",
 529 |         "            print(f\"\\nRetrieving {len(self.chunk_cache[cache_key]['texts'])} existing chunks from cache\")\n",
 530 |         "\n",
 531 |         "        result = self.chunk_cache[cache_key]\n",
 532 |         "        print(f\"\\nFinal Output:\")\n",
 533 |         "        print(f\"Number of chunks: {len(result['texts'])}\")\n",
 534 |         "        print(f\"Strategy: {result['strategy']}\")\n",
 535 |         "        print(\"=\"*50)\n",
 536 |         "        return result\n",
 537 |         "\n",
 538 |         "    def run_cosine_search(self, query: str, threshold: int, chunk_strategy: str = \"semantic\", k: int = 5) -> List[Dict]:\n",
 539 |         "        \"\"\"Run cosine similarity search with enhanced error handling and debugging\"\"\"\n",
 540 |         "        print(\"\\n\" + \"=\"*50)\n",
 541 |         "        print(\"COSINE SEARCH DEBUG LOG\")\n",
 542 |         "        print(\"=\"*50)\n",
 543 |         "        print(f\"Query: {query}\")\n",
 544 |         "        print(f\"Strategy: {chunk_strategy}\")\n",
 545 |         "        print(f\"Threshold: {threshold}\")\n",
 546 |         "        print(f\"Requested k: {k}\")\n",
 547 |         "\n",
 548 |         "        if chunk_strategy == \"semantic\":\n",
 549 |         "            cache_key = f\"{chunk_strategy}_{threshold}\"\n",
 550 |         "        else:\n",
 551 |         "            cache_key = f\"{chunk_strategy}_{CHUNKING_CONFIGS['max_chunk_size']}\"\n",
 552 |         "\n",
 553 |         "        print(\"\\nCache Status:\")\n",
 554 |         "        print(f\"Cache key: {cache_key}\")\n",
 555 |         "        print(f\"Available cache keys: {list(self.chunk_cache.keys())}\")\n",
 556 |         "        print(f\"Chunks cache hit: {cache_key in self.chunk_cache}\")\n",
 557 |         "        print(f\"Embeddings cache hit: {cache_key in self.embedding_cache}\")\n",
 558 |         "\n",
 559 |         "        # First, ensure we have chunks\n",
 560 |         "        if cache_key not in self.chunk_cache:\n",
 561 |         "            print(f\"\\nERROR: No chunks found in cache for {cache_key}\")\n",
 562 |         "            print(\"This suggests chunk creation failed or wasn't called\")\n",
 563 |         "            return []\n",
 564 |         "\n",
 565 |         "        chunks_data = self.chunk_cache[cache_key]\n",
 566 |         "        if not chunks_data['texts']:\n",
 567 |         "            print(\"\\nERROR: Chunks list is empty\")\n",
 568 |         "            print(\"This suggests chunk creation succeeded but produced no chunks\")\n",
 569 |         "            return []\n",
 570 |         "\n",
 571 |         "        print(f\"\\nFound {len(chunks_data['texts'])} chunks to search\")\n",
 572 |         "        print(f\"Sample chunk (first 100 chars): {chunks_data['texts'][0][:100]}...\")\n",
 573 |         "\n",
 574 |         "        try:\n",
 575 |         "            if self.embedding_model is None:\n",
 576 |         "                print(\"\\nInitializing embedding model\")\n",
 577 |         "                self.initialize_embedding_model()\n",
 578 |         "\n",
 579 |         "            if cache_key not in self.embedding_cache:\n",
 580 |         "                print(\"\\nGenerating embeddings for chunks...\")\n",
 581 |         "                chunk_embeddings = []\n",
 582 |         "\n",
 583 |         "                # Process in batches\n",
 584 |         "                batch_size = 32\n",
 585 |         "                total_batches = (len(chunks_data['texts']) + batch_size - 1) // batch_size\n",
 586 |         "\n",
 587 |         "                for i in range(0, len(chunks_data['texts']), batch_size):\n",
 588 |         "                    batch = chunks_data['texts'][i:i + batch_size]\n",
 589 |         "                    print(f\"\\nProcessing batch {i//batch_size + 1}/{total_batches}\")\n",
 590 |         "                    print(f\"Batch size: {len(batch)} chunks\")\n",
 591 |         "\n",
 592 |         "                    batch_embeddings = [self.embedding_model.get_text_embedding(text) for text in batch]\n",
 593 |         "                    chunk_embeddings.extend(batch_embeddings)\n",
 594 |         "                    print(f\"Total embeddings so far: {len(chunk_embeddings)}\")\n",
 595 |         "\n",
 596 |         "                print(\"\\nConverting to numpy array...\")\n",
 597 |         "                embeddings_array = np.array(chunk_embeddings).astype('float32')\n",
 598 |         "                print(f\"Embeddings shape: {embeddings_array.shape}\")\n",
 599 |         "\n",
 600 |         "                print(\"Normalizing embeddings...\")\n",
 601 |         "                norms = np.linalg.norm(embeddings_array, axis=1)[:, np.newaxis]\n",
 602 |         "                norms[norms == 0] = 1  # Prevent division by zero\n",
 603 |         "                normalized_embeddings = embeddings_array / norms\n",
 604 |         "\n",
 605 |         "                print(\"Creating FAISS index...\")\n",
 606 |         "                dimension = embeddings_array.shape[1]\n",
 607 |         "                index = faiss.IndexFlatIP(dimension)\n",
 608 |         "                index.add(normalized_embeddings)\n",
 609 |         "\n",
 610 |         "                self.embedding_cache[cache_key] = {\n",
 611 |         "                    'embeddings': embeddings_array,\n",
 612 |         "                    'index': index\n",
 613 |         "                }\n",
 614 |         "                print(\"Embeddings cached successfully\")\n",
 615 |         "\n",
 616 |         "            print(\"\\nProcessing query...\")\n",
 617 |         "            query_embedding = self.embedding_model.get_text_embedding(query)\n",
 618 |         "            query_embedding = np.array([query_embedding]).astype('float32')\n",
 619 |         "\n",
 620 |         "            print(\"Normalizing query embedding...\")\n",
 621 |         "            query_norm = np.linalg.norm(query_embedding)\n",
 622 |         "            if query_norm == 0:\n",
 623 |         "                print(\"ERROR: Zero query vector\")\n",
 624 |         "                return []\n",
 625 |         "            query_normalized = query_embedding / query_norm\n",
 626 |         "\n",
 627 |         "            print(f\"\\nSearching for top {k} matches...\")\n",
 628 |         "            distances, indices = self.embedding_cache[cache_key]['index'].search(\n",
 629 |         "                query_normalized, k\n",
 630 |         "            )\n",
 631 |         "\n",
 632 |         "            print(\"\\nFormatting results...\")\n",
 633 |         "            results = []\n",
 634 |         "            for score, idx in zip(distances[0], indices[0]):\n",
 635 |         "                if idx >= 0 and idx < len(chunks_data['texts']):\n",
 636 |         "                    results.append({\n",
 637 |         "                        'text': chunks_data['texts'][idx],\n",
 638 |         "                        'distance': float(score),\n",
 639 |         "                        'strategy': chunk_strategy\n",
 640 |         "                    })\n",
 641 |         "                    print(f\"\\nMatch {len(results)}:\")\n",
 642 |         "                    print(f\"Score: {float(score):.4f}\")\n",
 643 |         "                    print(f\"Text preview: {chunks_data['texts'][idx][:100]}...\")\n",
 644 |         "\n",
 645 |         "            print(f\"\\nTotal matches found: {len(results)}\")\n",
 646 |         "            print(\"=\"*50)\n",
 647 |         "            return results\n",
 648 |         "\n",
 649 |         "        except Exception as e:\n",
 650 |         "            print(\"\\nERROR in cosine search:\")\n",
 651 |         "            print(f\"Error type: {type(e).__name__}\")\n",
 652 |         "            print(f\"Error message: {str(e)}\")\n",
 653 |         "            import traceback\n",
 654 |         "            print(\"\\nTraceback:\")\n",
 655 |         "            print(traceback.format_exc())\n",
 656 |         "            print(\"=\"*50)\n",
 657 |         "            return []\n",
 658 |         "\n",
 659 |         "    def generate_response(self, query: str, context_rag: list, model: Dict) -> dict:\n",
 660 |         "        \"\"\"Generate response using provided context with source tracking\"\"\"\n",
 661 |         "        try:\n",
 662 |         "            if not context_rag:\n",
 663 |         "                return {\n",
 664 |         "                    \"response_text\": \"No relevant context found.\",\n",
 665 |         "                    \"sources\": [],\n",
 666 |         "                    \"source_tracking\": {\n",
 667 |         "                        \"num_sources_provided\": 0,\n",
 668 |         "                        \"source_ids\": [],\n",
 669 |         "                        \"verification_status\": \"no_context\"\n",
 670 |         "                    },\n",
 671 |         "                    \"strategy\": None\n",
 672 |         "                }\n",
 673 |         "\n",
 674 |         "            print(\"\\n=== DEBUG: Context Chunks Passed to LLM ===\")\n",
 675 |         "            print(f\"Query: {query}\")\n",
 676 |         "            print(f\"Number of chunks: {len(context_rag)}\")\n",
 677 |         "\n",
 678 |         "            # Generate unique IDs for each source chunk\n",
 679 |         "            context_with_ids = []\n",
 680 |         "            for idx, doc in enumerate(context_rag):\n",
 681 |         "                source_id = f\"src_{idx}\"\n",
 682 |         "                context_with_ids.append({\n",
 683 |         "                    \"text\": doc['text'],\n",
 684 |         "                    \"id\": source_id,\n",
 685 |         "                    \"distance\": doc.get('distance', 0)\n",
 686 |         "                })\n",
 687 |         "                print(f\"\\nChunk {source_id}:\")\n",
 688 |         "                print(f\"Distance: {doc.get('distance', 'N/A')}\")\n",
 689 |         "                print(\"Text:\", doc['text'])\n",
 690 |         "            print(\"=\"*50)\n",
 691 |         "\n",
 692 |         "            # Format context with source IDs\n",
 693 |         "            formatted_context = \"\\n\\n\".join([\n",
 694 |         "                f\"[{doc['id']}] {doc['text']}\"\n",
 695 |         "                for doc in context_with_ids\n",
 696 |         "            ])\n",
 697 |         "\n",
 698 |         "            prompt = PromptTemplate(template=\"\"\"\n",
 699 |         "            Instructions:\n",
 700 |         "\n",
 701 |         "            You are a helpful assistant who answers questions strictly from the provided context.\n",
 702 |         "            Given the context information, provide a direct and concise answer to the question: {query}\n",
 703 |         "\n",
 704 |         "            Important rules:\n",
 705 |         "            1. Only use information present in the context\n",
 706 |         "            2. If you don't know or can't find the information, say \"I don't know\"\n",
 707 |         "            3. You must cite the source IDs [src_X] for every piece of information you use\n",
 708 |         "            4. Do not make assumptions or use external knowledge\n",
 709 |         "\n",
 710 |         "            You must format your response as a JSON string object, starting with \"LLM_Response:\"\n",
 711 |         "\n",
 712 |         "            Your answer must follow this exact format:\n",
 713 |         "\n",
 714 |         "            LLM_Response:\n",
 715 |         "            {{\n",
 716 |         "                \"response_text\": \"Your detailed answer here with [src_X] citations inline\",\n",
 717 |         "                \"sources\": [\n",
 718 |         "                    \"Copy and paste here the exact text segments you used, with their source IDs\"\n",
 719 |         "                ],\n",
 720 |         "                \"source_ids_used\": [\"List of all source IDs referenced in your answer\"]\n",
 721 |         "            }}\n",
 722 |         "\n",
 723 |         "            Context (with source IDs):\n",
 724 |         "            ---------------\n",
 725 |         "            {context}\n",
 726 |         "            ---------------\n",
 727 |         "            \"\"\")\n",
 728 |         "\n",
 729 |         "            model_type = model['type']\n",
 730 |         "            llm = model['llm']\n",
 731 |         "\n",
 732 |         "            chain = prompt | llm | StrOutputParser()\n",
 733 |         "\n",
 734 |         "            response = chain.invoke({\n",
 735 |         "                \"query\": query,\n",
 736 |         "                \"context\": formatted_context\n",
 737 |         "            })\n",
 738 |         "\n",
 739 |         "            response_text = response.split(\"LLM_Response:\")[-1].strip()\n",
 740 |         "\n",
 741 |         "            try:\n",
 742 |         "                if '{' in response_text and '}' in response_text:\n",
 743 |         "                    json_str = response_text[response_text.find('{'):response_text.rfind('}')+1]\n",
 744 |         "                    parsed_response = json.loads(json_str)\n",
 745 |         "\n",
 746 |         "                    # Verify source usage\n",
 747 |         "                    claimed_sources = set(parsed_response.get(\"source_ids_used\", []))\n",
 748 |         "                    available_sources = {doc[\"id\"] for doc in context_with_ids}\n",
 749 |         "\n",
 750 |         "                    verification_status = {\n",
 751 |         "                        \"status\": \"verified\" if claimed_sources.issubset(available_sources) else \"source_mismatch\",\n",
 752 |         "                        \"claimed_sources\": list(claimed_sources),\n",
 753 |         "                        \"available_sources\": list(available_sources),\n",
 754 |         "                        \"unauthorized_sources\": list(claimed_sources - available_sources)\n",
 755 |         "                    }\n",
 756 |         "\n",
 757 |         "                    return {\n",
 758 |         "                        \"response_text\": parsed_response.get(\"response_text\", response_text),\n",
 759 |         "                        \"sources\": parsed_response.get(\"sources\", []),\n",
 760 |         "                        \"source_tracking\": {\n",
 761 |         "                            \"num_sources_provided\": len(context_with_ids),\n",
 762 |         "                            \"source_ids\": [doc[\"id\"] for doc in context_with_ids],\n",
 763 |         "                            \"verification_status\": verification_status\n",
 764 |         "                        },\n",
 765 |         "                        \"strategy\": context_rag[0]['strategy'] if context_rag else None\n",
 766 |         "                    }\n",
 767 |         "                else:\n",
 768 |         "                    return {\n",
 769 |         "                        \"response_text\": response_text,\n",
 770 |         "                        \"sources\": [],\n",
 771 |         "                        \"source_tracking\": {\n",
 772 |         "                            \"num_sources_provided\": len(context_with_ids),\n",
 773 |         "                            \"source_ids\": [doc[\"id\"] for doc in context_with_ids],\n",
 774 |         "                            \"verification_status\": {\n",
 775 |         "                                \"status\": \"parsing_failed\",\n",
 776 |         "                                \"error\": \"Response not in JSON format\"\n",
 777 |         "                            }\n",
 778 |         "                        },\n",
 779 |         "                        \"strategy\": context_rag[0]['strategy'] if context_rag else None\n",
 780 |         "                    }\n",
 781 |         "\n",
 782 |         "            except json.JSONDecodeError:\n",
 783 |         "                return {\n",
 784 |         "                    \"response_text\": response_text,\n",
 785 |         "                    \"sources\": [],\n",
 786 |         "                    \"source_tracking\": {\n",
 787 |         "                        \"num_sources_provided\": len(context_with_ids),\n",
 788 |         "                        \"source_ids\": [doc[\"id\"] for doc in context_with_ids],\n",
 789 |         "                        \"verification_status\": {\n",
 790 |         "                            \"status\": \"parsing_failed\",\n",
 791 |         "                            \"error\": \"JSON decode error\"\n",
 792 |         "                        }\n",
 793 |         "                    },\n",
 794 |         "                    \"strategy\": context_rag[0]['strategy'] if context_rag else None\n",
 795 |         "                }\n",
 796 |         "\n",
 797 |         "        except Exception as e:\n",
 798 |         "            print(f\"An error occurred: {str(e)}\")\n",
 799 |         "            return {\n",
 800 |         "                \"response_text\": \"An error occurred while generating the response.\",\n",
 801 |         "                \"sources\": [],\n",
 802 |         "                \"source_tracking\": {\n",
 803 |         "                    \"num_sources_provided\": 0,\n",
 804 |         "                    \"source_ids\": [],\n",
 805 |         "                    \"verification_status\": {\n",
 806 |         "                        \"status\": \"error\",\n",
 807 |         "                        \"error\": str(e)\n",
 808 |         "                    }\n",
 809 |         "                },\n",
 810 |         "                \"strategy\": None\n",
 811 |         "            }"
 812 |       ],
 813 |       "metadata": {
 814 |         "id": "YY5rnivk-bAh"
 815 |       },
 816 |       "execution_count": null,
 817 |       "outputs": []
 818 |     },
 819 |     {
 820 |       "cell_type": "markdown",
 821 |       "source": [
 822 |         "#ModelConfig Class"
 823 |       ],
 824 |       "metadata": {
 825 |         "id": "ljqi1Qg8j9F8"
 826 |       }
 827 |     },
 828 |     {
 829 |       "cell_type": "code",
 830 |       "source": [
 831 |         "class ModelConfig:\n",
 832 |         "    \"\"\"Handles model configuration and management\"\"\"\n",
 833 |         "    def __init__(self,\n",
 834 |         "                 models: List[Dict],\n",
 835 |         "                 temperature: float = 0.3):\n",
 836 |         "        self.models = models\n",
 837 |         "        self.temperature = temperature\n",
 838 |         "        self.current_model = None\n",
 839 |         "        self.current_model_name = None\n",
 840 |         "\n",
 841 |         "\n",
 842 |         "    @contextmanager\n",
 843 |         "    def load_model(self, model_config: Dict):\n",
 844 |         "        \"\"\"Context manager for lazy loading and proper cleanup of models\"\"\"\n",
 845 |         "        try:\n",
 846 |         "            model_name = model_config[\"name\"]\n",
 847 |         "            model_type = model_config[\"type\"]\n",
 848 |         "\n",
 849 |         "            # Clear any existing model\n",
 850 |         "            self.cleanup_current_model()\n",
 851 |         "\n",
 852 |         "            if model_type == \"mistral_api\":\n",
 853 |         "                mistral_api_key = userdata.get('MISTRAL_API_KEY')\n",
 854 |         "                self.current_model = {\n",
 855 |         "                    'llm': ChatMistralAI(\n",
 856 |         "                        model=model_name,\n",
 857 |         "                        temperature=self.temperature,\n",
 858 |         "                        api_key=mistral_api_key\n",
 859 |         "                    ),\n",
 860 |         "                    'type': 'mistral_api'\n",
 861 |         "                }\n",
 862 |         "            else:  # huggingface\n",
 863 |         "                print(f\"Loading huggingface model: {model_name}\")\n",
 864 |         "\n",
 865 |         "                # Empty CUDA cache before loading new model\n",
 866 |         "                torch.cuda.empty_cache()\n",
 867 |         "                gc.collect()\n",
 868 |         "\n",
 869 |         "                tokenizer = AutoTokenizer.from_pretrained(\n",
 870 |         "                    pretrained_model_name_or_path=model_config[\"tokenizer\"],\n",
 871 |         "                    trust_remote_code=True,\n",
 872 |         "                    use_fast=True,\n",
 873 |         "                    padding_side=\"left\"\n",
 874 |         "                )\n",
 875 |         "\n",
 876 |         "                model = AutoModelForCausalLM.from_pretrained(\n",
 877 |         "                    pretrained_model_name_or_path=model_name,\n",
 878 |         "                    device_map=\"auto\",\n",
 879 |         "                    trust_remote_code=True,\n",
 880 |         "                    torch_dtype=torch.float16,\n",
 881 |         "                    use_cache=True,\n",
 882 |         "                    low_cpu_mem_usage=True,\n",
 883 |         "                )\n",
 884 |         "\n",
 885 |         "                pipe = pipeline(\n",
 886 |         "                    \"text-generation\",\n",
 887 |         "                    model=model,\n",
 888 |         "                    tokenizer=tokenizer,\n",
 889 |         "                    max_new_tokens=512,\n",
 890 |         "                    temperature=self.temperature,\n",
 891 |         "                    top_p=0.95,\n",
 892 |         "                    top_k=50,\n",
 893 |         "                    do_sample=True,\n",
 894 |         "                    device_map=\"auto\"\n",
 895 |         "                )\n",
 896 |         "\n",
 897 |         "                self.current_model = {\n",
 898 |         "                    'llm': HuggingFacePipeline(pipeline=pipe),\n",
 899 |         "                    'type': 'huggingface',\n",
 900 |         "                    'model': model,  # Keep reference for cleanup\n",
 901 |         "                    'pipe': pipe     # Keep reference for cleanup\n",
 902 |         "                }\n",
 903 |         "\n",
 904 |         "            self.current_model_name = model_name\n",
 905 |         "            yield self.current_model\n",
 906 |         "\n",
 907 |         "        finally:\n",
 908 |         "            # Cleanup will happen in cleanup_current_model()\n",
 909 |         "            pass\n",
 910 |         "\n",
 911 |         "    def cleanup_current_model(self):\n",
 912 |         "        \"\"\"Clean up the current model and free memory\"\"\"\n",
 913 |         "        if self.current_model is not None:\n",
 914 |         "            if self.current_model['type'] == 'huggingface':\n",
 915 |         "                # Delete model components explicitly\n",
 916 |         "                del self.current_model['llm']\n",
 917 |         "                del self.current_model['model']\n",
 918 |         "                del self.current_model['pipe']\n",
 919 |         "\n",
 920 |         "                # Clear CUDA cache\n",
 921 |         "                torch.cuda.empty_cache()\n",
 922 |         "\n",
 923 |         "                # Run garbage collection\n",
 924 |         "                gc.collect()\n",
 925 |         "\n",
 926 |         "            self.current_model = None\n",
 927 |         "            self.current_model_name = None"
 928 |       ],
 929 |       "metadata": {
 930 |         "id": "tCzG7OE0IiDT"
 931 |       },
 932 |       "execution_count": null,
 933 |       "outputs": []
 934 |     },
 935 |     {
 936 |       "cell_type": "markdown",
 937 |       "source": [
 938 |         "#ExperimentRunner Class"
 939 |       ],
 940 |       "metadata": {
 941 |         "id": "05gTul4pIW6S"
 942 |       }
 943 |     },
 944 |     {
 945 |       "cell_type": "code",
 946 |       "source": [
 947 |         "class ExperimentRunner:\n",
 948 |         "    \"\"\"Handles experiment execution\"\"\"\n",
 949 |         "    def __init__(self,\n",
 950 |         "                 model_config: ModelConfig,\n",
 951 |         "                 questions: List[str],\n",
 952 |         "                 chunk_strategies: List[str],\n",
 953 |         "                 semantic_enabled: bool = False,\n",
 954 |         "                 semantic_thresholds: List[int] = None,\n",
 955 |         "                 rag_pipeline: RAGPipeline = None):\n",
 956 |         "        self.model_config = model_config\n",
 957 |         "        self.questions = questions\n",
 958 |         "        self.chunk_strategies = chunk_strategies\n",
 959 |         "        self.semantic_enabled = semantic_enabled\n",
 960 |         "        self.semantic_thresholds = semantic_thresholds if semantic_enabled else []\n",
 961 |         "\n",
 962 |         "        # Use existing RAG pipeline or create new one\n",
 963 |         "        global _GLOBAL_RAG_PIPELINE\n",
 964 |         "        if rag_pipeline:\n",
 965 |         "            self.rag_pipeline = rag_pipeline\n",
 966 |         "        elif _GLOBAL_RAG_PIPELINE:\n",
 967 |         "            self.rag_pipeline = _GLOBAL_RAG_PIPELINE\n",
 968 |         "        else:\n",
 969 |         "            print(\"Initializing new RAG pipeline\")\n",
 970 |         "            _GLOBAL_RAG_PIPELINE = RAGPipeline()\n",
 971 |         "            self.rag_pipeline = _GLOBAL_RAG_PIPELINE\n",
 972 |         "\n",
 973 |         "    def run_experiments(self) -> Dict:\n",
 974 |         "        results = {\n",
 975 |         "            \"metadata\": {\n",
 976 |         "                \"timestamp\": time.strftime(\"%Y%m%d-%H%M%S\"),\n",
 977 |         "                \"models_tested\": [model[\"name\"] for model in self.model_config.models],\n",
 978 |         "                \"semantic_enabled\": self.semantic_enabled,\n",
 979 |         "                \"semantic_thresholds\": self.semantic_thresholds if self.semantic_enabled else [],\n",
 980 |         "                \"chunk_strategies\": self.chunk_strategies,\n",
 981 |         "                \"temperature\": self.model_config.temperature\n",
 982 |         "            },\n",
 983 |         "            \"results\": []\n",
 984 |         "        }\n",
 985 |         "\n",
 986 |         "        for model_config in self.model_config.models:\n",
 987 |         "            model_name = model_config[\"name\"]\n",
 988 |         "            print(f\"\\nTesting model: {model_name}\")\n",
 989 |         "\n",
 990 |         "            with self.model_config.load_model(model_config) as model:\n",
 991 |         "                for strategy in self.chunk_strategies:\n",
 992 |         "                    # Handle thresholds based on strategy type\n",
 993 |         "                    if strategy == \"semantic\" and self.semantic_enabled:\n",
 994 |         "                        thresholds_to_test = self.semantic_thresholds\n",
 995 |         "                    else:\n",
 996 |         "                        thresholds_to_test = [None]\n",
 997 |         "\n",
 998 |         "                    for threshold in thresholds_to_test:\n",
 999 |         "                        chunks_data = self.rag_pipeline.create_chunks(\n",
1000 |         "                            documents,\n",
1001 |         "                            threshold=threshold,\n",
1002 |         "                            chunk_strategy=strategy\n",
1003 |         "                        )\n",
1004 |         "\n",
1005 |         "                        chunk_stats = {\n",
1006 |         "                            \"strategy\": strategy,\n",
1007 |         "                            \"threshold\": threshold,\n",
1008 |         "                            \"stats\": chunks_data[\"chunk_stats\"]\n",
1009 |         "                        }\n",
1010 |         "\n",
1011 |         "                        for question in self.questions:\n",
1012 |         "                            print(f\"Processing question: {question}\")\n",
1013 |         "\n",
1014 |         "                            context = self.rag_pipeline.run_cosine_search(\n",
1015 |         "                                query=question,\n",
1016 |         "                                threshold=threshold,\n",
1017 |         "                                chunk_strategy=strategy\n",
1018 |         "                            )\n",
1019 |         "\n",
1020 |         "                            answer = self.rag_pipeline.generate_response(\n",
1021 |         "                                query=question,\n",
1022 |         "                                context_rag=context,\n",
1023 |         "                                model=model\n",
1024 |         "                            )\n",
1025 |         "\n",
1026 |         "                            results[\"results\"].append({\n",
1027 |         "                                \"model\": model_name,\n",
1028 |         "                                \"threshold\": threshold if strategy == \"semantic\" else None,\n",
1029 |         "                                \"chunk_strategy\": strategy,\n",
1030 |         "                                \"question\": question,\n",
1031 |         "                                \"response\": answer,\n",
1032 |         "                                \"chunk_stats\": chunk_stats[\"stats\"]\n",
1033 |         "                            })\n",
1034 |         "\n",
1035 |         "        return results"
1036 |       ],
1037 |       "metadata": {
1038 |         "id": "8hFyd9G1kC8M"
1039 |       },
1040 |       "execution_count": null,
1041 |       "outputs": []
1042 |     },
1043 |     {
1044 |       "cell_type": "markdown",
1045 |       "source": [
1046 |         "#Evaluator Class"
1047 |       ],
1048 |       "metadata": {
1049 |         "id": "EpjD-Qz54mfu"
1050 |       }
1051 |     },
1052 |     {
1053 |       "cell_type": "code",
1054 |       "source": [
1055 |         "class ExperimentEvaluator:\n",
1056 |         "    \"\"\"Handles pure evaluation logic\"\"\"\n",
1057 |         "    def __init__(self, api_key: str):\n",
1058 |         "        self.client = openai.OpenAI(api_key=api_key)\n",
1059 |         "        self.encoder = tiktoken.encoding_for_model(\"gpt-4o\")\n",
1060 |         "\n",
1061 |         "    def _get_baseline_answers(self, questions: List[str], source_docs: List) -> Dict[str, str]:\n",
1062 |         "        \"\"\"Get GPT-4o's own answers to the questions as baseline\"\"\"\n",
1063 |         "        print(\"\\n=== DEBUG: _get_baseline_answers ===\")\n",
1064 |         "        print(f\"Questions received: {questions}\")\n",
1065 |         "        print(f\"Number of document parts: {len(source_docs)}\")\n",
1066 |         "\n",
1067 |         "        # Concatenate all document parts\n",
1068 |         "        full_document = \"\\n\\n\".join([doc.text for doc in source_docs])\n",
1069 |         "        print(f\"\\nFull document length: {len(full_document)} characters\")\n",
1070 |         "\n",
1071 |         "        # Print sample from document\n",
1072 |         "        print(\"\\nSampling from document:\")\n",
1073 |         "        print(\"Start:\", full_document[:200], \"...\")\n",
1074 |         "        print(\"Middle:\", full_document[len(full_document)//2:len(full_document)//2 + 200], \"...\")\n",
1075 |         "        print(\"End:\", full_document[-200:], \"...\")\n",
1076 |         "\n",
1077 |         "        baseline_prompt = f\"\"\"Source Document:\n",
1078 |         "        {full_document}\n",
1079 |         "\n",
1080 |         "        Using ONLY the information from the source document above, answer these questions.\n",
1081 |         "        - If the exact information is found, provide it with specific numbers\n",
1082 |         "        - If information is not found, explicitly state that\n",
1083 |         "        - If there are metrics, make sure to include appropriate units\n",
1084 |         "\n",
1085 |         "        Format your response as a valid JSON object with questions as keys and answers as values.\n",
1086 |         "        Keep answers concise and factual.\n",
1087 |         "\n",
1088 |         "        Questions to answer:\n",
1089 |         "        {json.dumps(questions, indent=2)}\"\"\"\n",
1090 |         "\n",
1091 |         "        try:\n",
1092 |         "            print(\"\\n--- Getting Baseline Answers ---\")\n",
1093 |         "            response = self.client.chat.completions.create(\n",
1094 |         "                model=\"gpt-4o\",\n",
1095 |         "                messages=[\n",
1096 |         "                    {\"role\": \"system\", \"content\": \"You are a helpful assistant that provides JSON-formatted answers based on source documents.\"},\n",
1097 |         "                    {\"role\": \"user\", \"content\": baseline_prompt}\n",
1098 |         "                ],\n",
1099 |         "                temperature=0.1\n",
1100 |         "            )\n",
1101 |         "\n",
1102 |         "            content = response.choices[0].message.content\n",
1103 |         "            print(\"\\nRaw GPT-4 Response:\")\n",
1104 |         "            print(content)\n",
1105 |         "\n",
1106 |         "            if '{' in content and '}' in content:\n",
1107 |         "                json_str = content[content.find('{'):content.rfind('}')+1]\n",
1108 |         "                baseline_answers = json.loads(json_str)\n",
1109 |         "                print(\"\\nParsed Baseline Answers:\")\n",
1110 |         "                print(baseline_answers)\n",
1111 |         "                return baseline_answers\n",
1112 |         "            print(\"\\nWarning: No JSON structure found in response\")\n",
1113 |         "            return {\"error\": \"No JSON structure found\", \"questions\": questions}\n",
1114 |         "\n",
1115 |         "        except Exception as e:\n",
1116 |         "            print(f\"\\nError in _get_baseline_answers: {str(e)}\")\n",
1117 |         "            return {\"error\": str(e), \"questions\": questions}\n",
1118 |         "\n",
1119 |         "    def evaluate_experiments(self, experiment_results: Dict, *, source_docs: List) -> Dict:  # Updated signature\n",
1120 |         "        \"\"\"Core evaluation logic\"\"\"\n",
1121 |         "        try:\n",
1122 |         "            print(\"\\n=== DEBUG: evaluate_experiments ===\")\n",
1123 |         "            print(\"Getting questions...\")\n",
1124 |         "            questions = list(set(result[\"question\"] for result in experiment_results[\"results\"]))\n",
1125 |         "            print(f\"Questions extracted: {questions}\")\n",
1126 |         "\n",
1127 |         "            print(\"\\nGetting baseline answers...\")\n",
1128 |         "            baseline_answers = self._get_baseline_answers(questions, source_docs)  # Pass source_docs\n",
1129 |         "            print(f\"Baseline answers received: {baseline_answers}\")\n",
1130 |         "\n",
1131 |         "            model_strategy_combinations = set(\n",
1132 |         "                (result[\"model\"],\n",
1133 |         "                result[\"chunk_strategy\"],\n",
1134 |         "                result[\"threshold\"] if result[\"chunk_strategy\"] == \"semantic\" else None)\n",
1135 |         "                for result in experiment_results[\"results\"]\n",
1136 |         "            )\n",
1137 |         "\n",
1138 |         "            all_evaluations = []\n",
1139 |         "\n",
1140 |         "            for model, strategy, threshold in model_strategy_combinations:\n",
1141 |         "                relevant_results = [r for r in experiment_results[\"results\"]\n",
1142 |         "                                  if r[\"model\"] == model and\n",
1143 |         "                                     r[\"chunk_strategy\"] == strategy and\n",
1144 |         "                                     (r[\"threshold\"] == threshold if strategy == \"semantic\" else True)]\n",
1145 |         "\n",
1146 |         "                for result in relevant_results:\n",
1147 |         "                    print(f\"\\nEvaluating response for: {result['question']}\")\n",
1148 |         "                    baseline = baseline_answers.get(result[\"question\"], \"No baseline available\")\n",
1149 |         "                    print(f\"Using baseline answer: {baseline}\")\n",
1150 |         "\n",
1151 |         "                    evaluation = self._evaluate_single_response(result, baseline)\n",
1152 |         "                    all_evaluations.append(evaluation)\n",
1153 |         "\n",
1154 |         "            return {\n",
1155 |         "                \"metadata\": {\n",
1156 |         "                    \"timestamp\": datetime.now().isoformat(),\n",
1157 |         "                    \"model_used\": \"gpt-4o\",\n",
1158 |         "                    \"num_combinations_evaluated\": len(model_strategy_combinations),\n",
1159 |         "                    \"num_questions_evaluated\": len(questions),\n",
1160 |         "                    \"evaluation_status\": \"success\"\n",
1161 |         "                },\n",
1162 |         "                \"evaluations\": all_evaluations,\n",
1163 |         "                \"summary\": self._generate_summary(all_evaluations)\n",
1164 |         "            }\n",
1165 |         "\n",
1166 |         "        except Exception as e:\n",
1167 |         "            print(f\"\\nCritical error in evaluate_experiments: {str(e)}\")\n",
1168 |         "            return self._create_default_evaluation(experiment_results)\n",
1169 |         "\n",
1170 |         "    def _evaluate_single_response(self, result: Dict, baseline: str) -> Dict:\n",
1171 |         "        \"\"\"Evaluate a single response with clearer scoring criteria\"\"\"\n",
1172 |         "        evaluation_prompt = f\"\"\"Compare and evaluate this response. You must evaluate three separate aspects:\n",
1173 |         "\n",
1174 |         "    1. ACCURACY - Compare the model's answer against the baseline (ground truth)\n",
1175 |         "    2. SOURCE ATTRIBUTION - Check if the model's answer matches its cited sources\n",
1176 |         "    3. CONCISESNESS - Check if the model's answer is clear and direct\n",
1177 |         "\n",
1178 |         "    Question: {result[\"question\"]}\n",
1179 |         "\n",
1180 |         "    Baseline (Ground Truth): {baseline}\n",
1181 |         "\n",
1182 |         "    Model Response: {result.get(\"response\", {}).get(\"response_text\", \"\")}\n",
1183 |         "    Sources Cited: {json.dumps(result.get(\"response\", {}).get(\"sources\", []), indent=2)}\n",
1184 |         "\n",
1185 |         "    Scoring Criteria:\n",
1186 |         "\n",
1187 |         "    1. ACCURACY (0-100):\n",
1188 |         "      - Compare ONLY the model's answer against the baseline\n",
1189 |         "      - 100: Exact match with baseline (including numbers and units)\n",
1190 |         "      - 50: Partially correct but with some errors\n",
1191 |         "      - 0: Completely different from baseline or wrong\n",
1192 |         "\n",
1193 |         "    2. SOURCE ATTRIBUTION (0-100):\n",
1194 |         "      - Compare ONLY the model's answer against its cited sources\n",
1195 |         "      - 100: Answer exactly matches what appears in cited sources INCLUDING UNITS\n",
1196 |         "      - 50: Answer partially matches cited sources\n",
1197 |         "      - 0: Answer doesn't match cited sources or no sources cited\n",
1198 |         "\n",
1199 |         "      Note: For large numbers, different formats are acceptable (e.g., $19,000 million = $19 billion)\n",
1200 |         "      BUT the units must match what appears in the source document for full attribution score.\n",
1201 |         "      The units in the source document are authoritative.\n",
1202 |         "\n",
1203 |         "    3. CONCISENESS (0-100):\n",
1204 |         "      - 100: Clear, direct answer without extra information\n",
1205 |         "      - 50: Contains some irrelevant information\n",
1206 |         "      - 0: Verbose or unclear\n",
1207 |         "\n",
1208 |         "    Note: A response can have perfect source attribution (100) even if the answer is wrong,\n",
1209 |         "    as long as it accurately reflects what's in its cited sources.\n",
1210 |         "\n",
1211 |         "    Provide your evaluation in this exact JSON format:\n",
1212 |         "    {{\n",
1213 |         "        \"model\": \"{result[\"model\"]}\",\n",
1214 |         "        \"chunk_strategy\": \"{result[\"chunk_strategy\"]}\",\n",
1215 |         "        \"threshold\": {result[\"threshold\"] if result[\"chunk_strategy\"] == \"semantic\" else \"null\"},\n",
1216 |         "        \"question\": \"{result[\"question\"]}\",\n",
1217 |         "        \"baseline_answer\": \"{baseline}\",\n",
1218 |         "        \"model_response\": {json.dumps(result.get(\"response\", {}), indent=2)},\n",
1219 |         "        \"chunk_stats\": {json.dumps(result.get(\"chunk_stats\", {}), indent=2)},\n",
1220 |         "        \"scores\": {{\n",
1221 |         "            \"accuracy\": <score>,\n",
1222 |         "            \"source_attribution\": <score>,\n",
1223 |         "            \"conciseness\": <score>\n",
1224 |         "        }},\n",
1225 |         "        \"composite_score\": <average of scores>,\n",
1226 |         "        \"detailed_analysis\": {{\n",
1227 |         "            \"accuracy_analysis\": \"Explain ONLY how the answer compares to baseline. Explicitly state if numbers match or differ.\",\n",
1228 |         "            \"attribution_analysis\": \"Explain ONLY how well the answer matches its cited sources, regardless of accuracy.\",\n",
1229 |         "            \"conciseness_analysis\": \"Explain how clear and direct the answer is\"\n",
1230 |         "        }}\n",
1231 |         "    }}\n",
1232 |         "\n",
1233 |         "    Examples:\n",
1234 |         "\n",
1235 |         "    Bad Response (Perfect Attribution, Wrong Answer):\n",
1236 |         "    - If baseline is \"$10,347M\" but model answers \"$19,921M [src_2]\" and src_2 contains \"$19,921M\"\n",
1237 |         "    - Accuracy: 0 (completely different from baseline)\n",
1238 |         "    - Attribution: 100 (perfectly matches its cited source)\n",
1239 |         "\n",
1240 |         "    Good Response (Perfect Both):\n",
1241 |         "    - If baseline is \"$10,347M\" and model answers \"$10,347M [src_2]\" and src_2 contains \"$10,347M\"\n",
1242 |         "    - Accuracy: 100 (matches baseline)\n",
1243 |         "    - Attribution: 100 (matches source)\n",
1244 |         "    \"\"\"\n",
1245 |         "\n",
1246 |         "        try:\n",
1247 |         "            response = self.client.chat.completions.create(\n",
1248 |         "                model=\"gpt-4o\",\n",
1249 |         "                messages=[\n",
1250 |         "                    {\"role\": \"system\", \"content\": \"You are an expert at evaluating response accuracy against both baseline answers and source data.\"},\n",
1251 |         "                    {\"role\": \"user\", \"content\": evaluation_prompt}\n",
1252 |         "                ],\n",
1253 |         "                temperature=0.7,\n",
1254 |         "                max_tokens=1000\n",
1255 |         "            )\n",
1256 |         "\n",
1257 |         "            content = response.choices[0].message.content\n",
1258 |         "            if '{' in content and '}' in content:\n",
1259 |         "                json_str = content[content.find('{'):content.rfind('}')+1]\n",
1260 |         "                return json.loads(json_str)\n",
1261 |         "            return self._create_default_single_evaluation(result, baseline)\n",
1262 |         "\n",
1263 |         "        except Exception as e:\n",
1264 |         "            print(f\"Error evaluating response: {str(e)}\")\n",
1265 |         "            return self._create_default_single_evaluation(result, baseline)\n",
1266 |         "\n",
1267 |         "    def _create_default_single_evaluation(self, result: Dict, baseline: str) -> Dict:\n",
1268 |         "        \"\"\"Create a default evaluation for a single response when evaluation fails\"\"\"\n",
1269 |         "        return {\n",
1270 |         "            \"model\": result[\"model\"],\n",
1271 |         "            \"chunk_strategy\": result[\"chunk_strategy\"],\n",
1272 |         "            \"threshold\": result[\"threshold\"] if result[\"chunk_strategy\"] == \"semantic\" else None,\n",
1273 |         "            \"question\": result[\"question\"],\n",
1274 |         "            \"baseline_answer\": baseline,\n",
1275 |         "            \"model_response\": result.get(\"response\", {}),\n",
1276 |         "            \"scores\": {\n",
1277 |         "                \"source_accuracy\": 0,\n",
1278 |         "                \"source_attribution\": 0,\n",
1279 |         "                \"conciseness\": 0\n",
1280 |         "            },\n",
1281 |         "            \"composite_score\": 0,\n",
1282 |         "            \"detailed_analysis\": {\n",
1283 |         "                \"accuracy_analysis\": \"Evaluation failed\",\n",
1284 |         "                \"attribution_analysis\": \"Evaluation failed\",\n",
1285 |         "                \"conciseness_analysis\": \"Evaluation failed\"\n",
1286 |         "            }\n",
1287 |         "        }\n",
1288 |         "\n",
1289 |         "    def _generate_summary(self, evaluations: List[Dict]) -> Dict:\n",
1290 |         "        \"\"\"Generate summary statistics from evaluations with ordered results\"\"\"\n",
1291 |         "        if not evaluations:\n",
1292 |         "            return {\n",
1293 |         "                \"overall_performance\": \"No evaluations available\",\n",
1294 |         "                \"optimal_permutation\": \"Not available\",\n",
1295 |         "                \"performance_analysis\": \"Evaluation process failed\",\n",
1296 |         "                \"chunking_statistics\": {}\n",
1297 |         "            }\n",
1298 |         "\n",
1299 |         "        # Create ordered list of expected configurations\n",
1300 |         "        ordered_configs = []\n",
1301 |         "        if CHUNKING_CONFIGS[\"semantic_config\"][\"enabled\"]:\n",
1302 |         "            for threshold in CHUNKING_CONFIGS[\"semantic_config\"][\"thresholds\"]:\n",
1303 |         "                ordered_configs.append((\"semantic\", threshold))\n",
1304 |         "\n",
1305 |         "        for strategy in [s for s in CHUNKING_CONFIGS[\"strategies\"] if s != \"semantic\"]:\n",
1306 |         "            ordered_configs.append((strategy, None))\n",
1307 |         "\n",
1308 |         "        # Get unique models from evaluations\n",
1309 |         "        unique_models = sorted(set(eval[\"model\"] for eval in evaluations))\n",
1310 |         "\n",
1311 |         "        # Track chunk statistics and performance scores\n",
1312 |         "        chunking_statistics = {}\n",
1313 |         "        performance_scores = {}\n",
1314 |         "        ordered_analysis = {}\n",
1315 |         "\n",
1316 |         "        # Get document name from the documents list\n",
1317 |         "        document_name = os.path.basename(documents[0].metadata.get('file_path', 'Unknown Document'))\n",
1318 |         "\n",
1319 |         "        # Initialize tracking for all model-strategy combinations\n",
1320 |         "        for model in unique_models:\n",
1321 |         "            for strategy, threshold in ordered_configs:\n",
1322 |         "                key = (model, strategy, threshold)\n",
1323 |         "                performance_scores[key] = {\n",
1324 |         "                    \"count\": 0,\n",
1325 |         "                    \"total_composite\": 0\n",
1326 |         "                }\n",
1327 |         "\n",
1328 |         "        # First pass: calculate scores and collect statistics\n",
1329 |         "        best_score = 0\n",
1330 |         "        best_config = None\n",
1331 |         "\n",
1332 |         "        for eval in evaluations:\n",
1333 |         "            model = eval[\"model\"]\n",
1334 |         "            strategy = eval[\"chunk_strategy\"]\n",
1335 |         "            threshold = eval[\"threshold\"] if strategy == \"semantic\" else None\n",
1336 |         "            key = (model, strategy, threshold)\n",
1337 |         "\n",
1338 |         "            # Track performance scores\n",
1339 |         "            if key in performance_scores:\n",
1340 |         "                performance_scores[key][\"count\"] += 1\n",
1341 |         "                performance_scores[key][\"total_composite\"] += eval[\"composite_score\"]\n",
1342 |         "\n",
1343 |         "            # Track chunk statistics (only need one entry per strategy/threshold combination)\n",
1344 |         "            chunk_key = (strategy, threshold)\n",
1345 |         "            if chunk_key not in chunking_statistics:\n",
1346 |         "                chunk_stats = eval.get(\"chunk_stats\", {})\n",
1347 |         "                if chunk_stats:\n",
1348 |         "                    if threshold is not None:\n",
1349 |         "                        config_str = f\"{document_name} with {strategy} chunking (threshold: {threshold})\"\n",
1350 |         "                    else:\n",
1351 |         "                        config_str = f\"{document_name} with {strategy} chunking\"\n",
1352 |         "\n",
1353 |         "                    chunking_statistics[chunk_key] = {\n",
1354 |         "                        \"config_str\": config_str,\n",
1355 |         "                        \"stats\": {\n",
1356 |         "                            \"number_of_chunks\": chunk_stats.get(\"num_chunks\", \"N/A\"),\n",
1357 |         "                            \"average_chunk_size\": round(chunk_stats.get(\"avg_chunk_size\", 0), 2),\n",
1358 |         "                            \"min_chunk_size\": chunk_stats.get(\"min_chunk_size\", \"N/A\"),\n",
1359 |         "                            \"max_chunk_size\": chunk_stats.get(\"max_chunk_size\", \"N/A\")\n",
1360 |         "                        }\n",
1361 |         "                    }\n",
1362 |         "\n",
1363 |         "        # Second pass: create ordered performance analysis and chunk statistics\n",
1364 |         "        ordered_chunking_stats = {}\n",
1365 |         "        for strategy, threshold in ordered_configs:\n",
1366 |         "            # Add chunk statistics\n",
1367 |         "            chunk_key = (strategy, threshold)\n",
1368 |         "            if chunk_key in chunking_statistics:\n",
1369 |         "                config_str = chunking_statistics[chunk_key][\"config_str\"]\n",
1370 |         "                ordered_chunking_stats[config_str] = chunking_statistics[chunk_key][\"stats\"]\n",
1371 |         "\n",
1372 |         "            # Add performance analysis for each model\n",
1373 |         "            for model in unique_models:\n",
1374 |         "                key = (model, strategy, threshold)\n",
1375 |         "                scores = performance_scores[key]\n",
1376 |         "\n",
1377 |         "                if scores[\"count\"] > 0:\n",
1378 |         "                    avg_composite = round(scores[\"total_composite\"] / scores[\"count\"], 2)\n",
1379 |         "\n",
1380 |         "                    if threshold is not None:\n",
1381 |         "                        perf_key = f\"{model} with {strategy} chunking (threshold: {threshold})\"\n",
1382 |         "                    else:\n",
1383 |         "                        perf_key = f\"{model} with {strategy} chunking\"\n",
1384 |         "\n",
1385 |         "                    ordered_analysis[perf_key] = avg_composite\n",
1386 |         "\n",
1387 |         "                    if avg_composite > best_score:\n",
1388 |         "                        best_score = avg_composite\n",
1389 |         "                        best_config = perf_key\n",
1390 |         "\n",
1391 |         "        # Calculate overall average score\n",
1392 |         "        total_score = sum(eval[\"composite_score\"] for eval in evaluations)\n",
1393 |         "        avg_score = round(total_score / len(evaluations), 2) if evaluations else 0\n",
1394 |         "\n",
1395 |         "        return {\n",
1396 |         "            \"overall_performance\": f\"Average composite score across all evaluations: {avg_score:.2f}/100\",\n",
1397 |         "            \"optimal_permutation\": f\"Best performance: {best_config} (score: {best_score:.2f}/100)\",\n",
1398 |         "            \"performance_analysis\": ordered_analysis,\n",
1399 |         "            \"chunking_statistics\": ordered_chunking_stats\n",
1400 |         "        }\n",
1401 |         "\n",
1402 |         "\n",
1403 |         "    def _create_default_evaluation(self, experiment_results: Dict) -> Dict:\n",
1404 |         "        \"\"\"Create a default evaluation result when the evaluation process fails\"\"\"\n",
1405 |         "        return {\n",
1406 |         "            \"metadata\": {\n",
1407 |         "                \"timestamp\": datetime.now().isoformat(),\n",
1408 |         "                \"model_used\": \"gpt-4o\",\n",
1409 |         "                \"num_combinations_evaluated\": 0,\n",
1410 |         "                \"num_questions_evaluated\": 0,\n",
1411 |         "                \"evaluation_status\": \"failed\"\n",
1412 |         "            },\n",
1413 |         "            \"evaluations\": [\n",
1414 |         "                self._create_default_single_evaluation(result, \"Evaluation failed\")\n",
1415 |         "                for result in experiment_results[\"results\"]\n",
1416 |         "            ],\n",
1417 |         "            \"summary\": {\n",
1418 |         "                \"overall_performance\": \"Evaluation failed\",\n",
1419 |         "                \"optimal_permutation\": \"Not available\",\n",
1420 |         "                \"performance_analysis\": \"Evaluation process failed\",\n",
1421 |         "                \"chunking_statistics\": {}\n",
1422 |         "            }\n",
1423 |         "        }"
1424 |       ],
1425 |       "metadata": {
1426 |         "id": "1rAK93yw4qCx"
1427 |       },
1428 |       "execution_count": null,
1429 |       "outputs": []
1430 |     },
1431 |     {
1432 |       "cell_type": "markdown",
1433 |       "source": [
1434 |         "#Results Manager Class"
1435 |       ],
1436 |       "metadata": {
1437 |         "id": "lJmy7VbumFk4"
1438 |       }
1439 |     },
1440 |     {
1441 |       "cell_type": "code",
1442 |       "source": [
1443 |         "class ResultsManager:\n",
1444 |         "    \"\"\"Handles formatting, saving, and displaying evaluation results\"\"\"\n",
1445 |         "    def __init__(self, save_directory: str):\n",
1446 |         "        self.save_directory = save_directory\n",
1447 |         "        os.makedirs(save_directory, exist_ok=True)\n",
1448 |         "\n",
1449 |         "    def format_results(self, experiment_results: Dict, evaluation_results: Dict) -> Tuple[Dict, Dict]:\n",
1450 |         "        \"\"\"Format experiment and evaluation results into structured output\"\"\"\n",
1451 |         "        print(\"\\n=== Starting Results Formatting ===\")\n",
1452 |         "\n",
1453 |         "        # Format experiment results\n",
1454 |         "        formatted_experiment = {\n",
1455 |         "            \"metadata\": experiment_results.get(\"metadata\", {}),\n",
1456 |         "            \"results\": [{\n",
1457 |         "                \"model\": result[\"model\"],\n",
1458 |         "                \"chunk_strategy\": result[\"chunk_strategy\"],\n",
1459 |         "                \"threshold\": result[\"threshold\"],\n",
1460 |         "                \"question\": result[\"question\"],\n",
1461 |         "                \"response\": {\n",
1462 |         "                    \"answer\": result[\"response\"].get(\"response_text\", \"\"),\n",
1463 |         "                    \"sources\": result[\"response\"].get(\"sources\", [])\n",
1464 |         "                }\n",
1465 |         "            } for result in experiment_results[\"results\"]]\n",
1466 |         "        }\n",
1467 |         "\n",
1468 |         "        # Format evaluation results with baseline answer\n",
1469 |         "        formatted_evaluation = {\n",
1470 |         "            \"metadata\": evaluation_results[\"metadata\"],\n",
1471 |         "            \"evaluations\": [{\n",
1472 |         "                \"model\": eval.get(\"model\"),\n",
1473 |         "                \"chunk_strategy\": eval.get(\"chunk_strategy\"),\n",
1474 |         "                \"threshold\": eval.get(\"threshold\"),\n",
1475 |         "                \"question\": eval.get(\"question\"),\n",
1476 |         "                \"baseline_answer\": eval.get(\"baseline_answer\", \"No baseline available\"),  # Include baseline answer\n",
1477 |         "                \"model_response\": eval.get(\"model_response\", {}),\n",
1478 |         "                \"scores\": eval.get(\"scores\", {}),\n",
1479 |         "                \"composite_score\": eval.get(\"composite_score\"),\n",
1480 |         "                \"detailed_analysis\": eval.get(\"detailed_analysis\", {})\n",
1481 |         "            } for eval in evaluation_results.get(\"evaluations\", [])],\n",
1482 |         "            \"overall_summary\": evaluation_results.get(\"summary\", {})\n",
1483 |         "        }\n",
1484 |         "\n",
1485 |         "        return formatted_experiment, formatted_evaluation\n",
1486 |         "\n",
1487 |         "    def save_results(self, formatted_experiment: Dict, formatted_evaluation: Dict) -> Tuple[str, str]:\n",
1488 |         "        \"\"\"Save formatted results to JSON files\"\"\"\n",
1489 |         "        timestamp = time.strftime(\"%Y%m%d-%H%M%S\")\n",
1490 |         "\n",
1491 |         "        experiment_file = f\"{self.save_directory}/experiment_results_{timestamp}.json\"\n",
1492 |         "        evaluation_file = f\"{self.save_directory}/evaluation_results_{timestamp}.json\"\n",
1493 |         "\n",
1494 |         "        for filepath, data in [\n",
1495 |         "            (experiment_file, formatted_experiment),\n",
1496 |         "            (evaluation_file, formatted_evaluation)\n",
1497 |         "        ]:\n",
1498 |         "            with open(filepath, 'w', encoding='utf-8') as f:\n",
1499 |         "                json.dump(data, f, indent=2, ensure_ascii=False)\n",
1500 |         "\n",
1501 |         "        return experiment_file, evaluation_file\n",
1502 |         "\n",
1503 |         "    def display_results(self, evaluation_results: Dict):\n",
1504 |         "        \"\"\"Display evaluation results in a clear, formatted manner\"\"\"\n",
1505 |         "        print(\"\\n\" + \"=\"*80)\n",
1506 |         "        print(\"EVALUATION RESULTS\")\n",
1507 |         "        print(\"=\"*80)\n",
1508 |         "\n",
1509 |         "        # Display metadata\n",
1510 |         "        metadata = evaluation_results.get(\"metadata\", {})\n",
1511 |         "        print(\"\\nMETADATA:\")\n",
1512 |         "        print(\"-\"*80)\n",
1513 |         "        print(f\"Timestamp:           {metadata.get('timestamp', 'Not available')}\")\n",
1514 |         "        print(f\"Model Used:          {metadata.get('model_used', 'Not available')}\")\n",
1515 |         "        print(f\"Combinations:        {metadata.get('num_combinations_evaluated', 'Not available')}\")\n",
1516 |         "        print(f\"Questions:           {metadata.get('num_questions_evaluated', 'Not available')}\")\n",
1517 |         "        print(f\"Evaluation Status:   {metadata.get('evaluation_status', 'Not available')}\")\n",
1518 |         "\n",
1519 |         "        # Display evaluations\n",
1520 |         "        evaluations = evaluation_results.get(\"evaluations\", [])\n",
1521 |         "        if evaluations:\n",
1522 |         "            print(\"\\nDETAILED EVALUATIONS:\")\n",
1523 |         "            print(\"-\"*80)\n",
1524 |         "            for eval in evaluations:\n",
1525 |         "                print(f\"\\nQuestion: {eval.get('question', 'No question provided')}\")\n",
1526 |         "                print(f\"Model: {eval.get('model', 'No model specified')}\")\n",
1527 |         "                print(f\"Strategy: {eval.get('chunk_strategy', 'No strategy specified')}\")\n",
1528 |         "                if eval.get('threshold'):\n",
1529 |         "                    print(f\"Threshold: {eval.get('threshold')}\")\n",
1530 |         "\n",
1531 |         "                # Display baseline answer\n",
1532 |         "                print(\"\\nBaseline Answer:\")\n",
1533 |         "                baseline = eval.get('baseline_answer', 'No baseline answer available')\n",
1534 |         "                print(textwrap.fill(str(baseline), width=80))\n",
1535 |         "\n",
1536 |         "                print(\"\\nModel Response:\")\n",
1537 |         "                response = eval.get('model_response', {})\n",
1538 |         "                response_text = response.get('response_text', 'No response available')\n",
1539 |         "                if response_text:\n",
1540 |         "                    print(textwrap.fill(str(response_text), width=80))\n",
1541 |         "                else:\n",
1542 |         "                    print(\"No response available\")\n",
1543 |         "\n",
1544 |         "                print(\"\\nSource Data:\")\n",
1545 |         "                sources = response.get('sources', [])\n",
1546 |         "                if sources:\n",
1547 |         "                    for source in sources:\n",
1548 |         "                        if source:  # Check if source is not empty\n",
1549 |         "                            print(textwrap.fill(str(source), width=80))\n",
1550 |         "                else:\n",
1551 |         "                    print(\"No source data available\")\n",
1552 |         "\n",
1553 |         "                print(\"\\nScores:\")\n",
1554 |         "                scores = eval.get('scores', {})\n",
1555 |         "                for metric, score in scores.items():\n",
1556 |         "                    print(f\"- {metric.replace('_', ' ').capitalize()}: {score}/100\")\n",
1557 |         "                print(f\"Composite Score: {eval.get('composite_score', 0)}/100\")\n",
1558 |         "\n",
1559 |         "                print(\"\\nDetailed Analysis:\")\n",
1560 |         "                analysis = eval.get('detailed_analysis', {})\n",
1561 |         "                for aspect, details in analysis.items():\n",
1562 |         "                    if details:  # Check if details is not empty\n",
1563 |         "                        print(f\"\\n{aspect.replace('_', ' ').capitalize()}:\")\n",
1564 |         "                        print(textwrap.fill(str(details), width=80))\n",
1565 |         "\n",
1566 |         "        # Display summary\n",
1567 |         "        summary = evaluation_results.get(\"overall_summary\", {})\n",
1568 |         "        if summary:\n",
1569 |         "            print(\"\\nOVERALL SUMMARY:\")\n",
1570 |         "            print(\"-\"*80)\n",
1571 |         "\n",
1572 |         "            if \"overall_performance\" in summary:\n",
1573 |         "                print(\"\\nOverall Performance:\")\n",
1574 |         "                print(textwrap.fill(str(summary[\"overall_performance\"]), width=80))\n",
1575 |         "\n",
1576 |         "            if \"optimal_permutation\" in summary:\n",
1577 |         "                print(\"\\nOptimal Configuration:\")\n",
1578 |         "                print(textwrap.fill(str(summary[\"optimal_permutation\"]), width=80))\n",
1579 |         "\n",
1580 |         "            if \"chunking_statistics\" in summary:\n",
1581 |         "                print(\"\\nChunking Statistics:\")\n",
1582 |         "                chunk_stats = summary[\"chunking_statistics\"]\n",
1583 |         "                for config, stats in chunk_stats.items():\n",
1584 |         "                    print(f\"\\n{config}:\")\n",
1585 |         "                    print(f\"  Number of Chunks: {stats['number_of_chunks']}\")\n",
1586 |         "                    print(f\"  Average Chunk Size: {stats['average_chunk_size']}\")\n",
1587 |         "                    print(f\"  Min Chunk Size: {stats['min_chunk_size']}\")\n",
1588 |         "                    print(f\"  Max Chunk Size: {stats['max_chunk_size']}\")\n",
1589 |         "\n",
1590 |         "            if \"performance_analysis\" in summary:\n",
1591 |         "                print(\"\\nPerformance Analysis:\")\n",
1592 |         "                analysis = summary[\"performance_analysis\"]\n",
1593 |         "                if isinstance(analysis, dict):\n",
1594 |         "                    for config, score in analysis.items():\n",
1595 |         "                        print(f\"{config}: {score:.2f}\")\n",
1596 |         "                else:\n",
1597 |         "                    print(textwrap.fill(str(analysis), width=80))"
1598 |       ],
1599 |       "metadata": {
1600 |         "id": "vnMb5d8cmKQU"
1601 |       },
1602 |       "execution_count": null,
1603 |       "outputs": []
1604 |     },
1605 |     {
1606 |       "cell_type": "markdown",
1607 |       "source": [
1608 |         "#Main"
1609 |       ],
1610 |       "metadata": {
1611 |         "id": "koQ5ZObJC2ek"
1612 |       }
1613 |     },
1614 |     {
1615 |       "cell_type": "code",
1616 |       "execution_count": null,
1617 |       "metadata": {
1618 |         "id": "6qdI5iaXYsun",
1619 |         "collapsed": true
1620 |       },
1621 |       "outputs": [],
1622 |       "source": [
1623 |         "def main():\n",
1624 |         "    # Initialize configurations with semantic settings from config\n",
1625 |         "    semantic_enabled = CHUNKING_CONFIGS[\"semantic_config\"][\"enabled\"]\n",
1626 |         "    semantic_thresholds = CHUNKING_CONFIGS[\"semantic_config\"][\"thresholds\"]\n",
1627 |         "\n",
1628 |         "    # Update strategies list if semantic is enabled\n",
1629 |         "    strategies = CHUNKING_CONFIGS[\"strategies\"]\n",
1630 |         "    if semantic_enabled:\n",
1631 |         "        strategies = [\"semantic\"] + strategies\n",
1632 |         "\n",
1633 |         "    model_config = ModelConfig(\n",
1634 |         "        models=MODEL_CONFIGS[\"models\"],\n",
1635 |         "        temperature=0.3\n",
1636 |         "    )\n",
1637 |         "\n",
1638 |         "    # Initialize experiment runner with flexible configuration\n",
1639 |         "    experiment_runner = ExperimentRunner(\n",
1640 |         "        model_config=model_config,\n",
1641 |         "        questions=QUESTION_CONFIGS[\"questions\"],\n",
1642 |         "        chunk_strategies=strategies,\n",
1643 |         "        semantic_enabled=semantic_enabled,\n",
1644 |         "        semantic_thresholds=semantic_thresholds\n",
1645 |         "    )\n",
1646 |         "\n",
1647 |         "    print(\"Starting experiment with configurations:\")\n",
1648 |         "    print(f\"Models: {[model['name'] for model in model_config.models]}\")\n",
1649 |         "    if semantic_enabled:\n",
1650 |         "        print(f\"Semantic thresholds: {semantic_thresholds}\")\n",
1651 |         "    print(f\"Chunk strategies: {strategies}\")\n",
1652 |         "    print(f\"Number of questions: {len(QUESTION_CONFIGS['questions'])}\")\n",
1653 |         "\n",
1654 |         "    # Rest of the main function remains the same\n",
1655 |         "    experiment_results = experiment_runner.run_experiments()\n",
1656 |         "\n",
1657 |         "    print(\"\\nInitializing GPT-4o evaluation...\")\n",
1658 |         "    evaluator = ExperimentEvaluator(api_key=userdata.get('OPENAI_API_KEY'))\n",
1659 |         "\n",
1660 |         "    evaluation_results = evaluator.evaluate_experiments(\n",
1661 |         "        experiment_results=experiment_results,\n",
1662 |         "        source_docs=documents\n",
1663 |         "    )\n",
1664 |         "\n",
1665 |         "    results_manager = ResultsManager(save_directory=FILE_CONFIGS['save_directory'])\n",
1666 |         "\n",
1667 |         "    formatted_experiment, formatted_evaluation = results_manager.format_results(\n",
1668 |         "        experiment_results=experiment_results,\n",
1669 |         "        evaluation_results=evaluation_results\n",
1670 |         "    )\n",
1671 |         "\n",
1672 |         "    experiment_file, evaluation_file = results_manager.save_results(\n",
1673 |         "        formatted_experiment=formatted_experiment,\n",
1674 |         "        formatted_evaluation=formatted_evaluation\n",
1675 |         "    )\n",
1676 |         "\n",
1677 |         "    results_manager.display_results(evaluation_results=formatted_evaluation)\n",
1678 |         "\n",
1679 |         "    print(\"\\nExperiment complete!\")\n",
1680 |         "    print(f\"Results saved to:\")\n",
1681 |         "    print(f\"  Experiment results: {experiment_file}\")\n",
1682 |         "    print(f\"  Evaluation results: {evaluation_file}\")\n",
1683 |         "\n",
1684 |         "    torch.cuda.empty_cache()\n",
1685 |         "    gc.collect()\n",
1686 |         "\n",
1687 |         "    return formatted_experiment, formatted_evaluation\n",
1688 |         "\n",
1689 |         "\n",
1690 |         "if __name__ == \"__main__\":\n",
1691 |         "    results, evaluation = main()"
1692 |       ]
1693 |     }
1694 |   ]
1695 | }


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # rag-experimentation-framework
2 | Testing LLMs and RAG configurations at scale using an OpenAI Evaluator / Reflector
3 | 
4 | [Motivation for this Project & Results Overview](https://docs.google.com/presentation/d/13QGNKmmOQhmpwAxXuc4k98_ITSvrORRJw6J7J74dx7M/edit#slide=id.g318d9a5244c_0_0)
5 | 
6 | [A Systematic Framework for RAG Optimization: Data-Driven Design Through Controlled Experimentation](https://medium.com/@bill.leece/a-systematic-framework-for-rag-optimization-data-driven-design-through-controlled-experimentation-5e7d99643816)
7 | 


--------------------------------------------------------------------------------