├── .gitignore ├── README.md ├── dspy_tutorial.ipynb └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | share/python-wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | MANIFEST 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .nox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | *.py,cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | cover/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | .pybuilder/ 76 | target/ 77 | 78 | # Jupyter Notebook 79 | .ipynb_checkpoints 80 | 81 | # IPython 82 | profile_default/ 83 | ipython_config.py 84 | 85 | # pyenv 86 | # For a library or package, you might want to ignore these files since the code is 87 | # intended to run in multiple environments; otherwise, check them in: 88 | # .python-version 89 | 90 | # pipenv 91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 94 | # install all needed dependencies. 95 | #Pipfile.lock 96 | 97 | # poetry 98 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 99 | # This is especially recommended for binary packages to ensure reproducibility, and is more 100 | # commonly ignored for libraries. 101 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 102 | #poetry.lock 103 | 104 | # pdm 105 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 106 | #pdm.lock 107 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 108 | # in version control. 109 | # https://pdm.fming.dev/#use-with-ide 110 | .pdm.toml 111 | 112 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 113 | __pypackages__/ 114 | 115 | # Celery stuff 116 | celerybeat-schedule 117 | celerybeat.pid 118 | 119 | # SageMath parsed files 120 | *.sage.py 121 | 122 | # Environments 123 | .env 124 | .venv 125 | env/ 126 | venv/ 127 | ENV/ 128 | env.bak/ 129 | venv.bak/ 130 | 131 | # Spyder project settings 132 | .spyderproject 133 | .spyproject 134 | 135 | # Rope project settings 136 | .ropeproject 137 | 138 | # mkdocs documentation 139 | /site 140 | 141 | # mypy 142 | .mypy_cache/ 143 | .dmypy.json 144 | dmypy.json 145 | 146 | # Pyre type checker 147 | .pyre/ 148 | 149 | # pytype static type analyzer 150 | .pytype/ 151 | 152 | # Cython debug symbols 153 | cython_debug/ 154 | 155 | # PyCharm 156 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 157 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 158 | # and can be added to the global gitignore or merged into this file. For a more nuclear 159 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 160 | #.idea/ 161 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # llm_dspy_tutorial 2 | Tutorial for DSPy -------------------------------------------------------------------------------- /dspy_tutorial.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Preparation" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "We make use of OpenRouter in our tutorial, through which we can access GPT 3.5 in blocked regions" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 2, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "from dotenv import load_dotenv\n", 24 | "import os\n", 25 | "\n", 26 | "load_dotenv()\n", 27 | "OPENROUTER_API_KEY = os.environ.get('OPENROUTER_API_KEY')" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "For the below cell, you can use `dspy.OpenAI` class directly if it works for you" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 3, 40 | "metadata": {}, 41 | "outputs": [ 42 | { 43 | "name": "stderr", 44 | "output_type": "stream", 45 | "text": [ 46 | "c:\\Users\\Yip\\Desktop\\code\\llm_dspy_tutorial\\venv\\lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", 47 | " from .autonotebook import tqdm as notebook_tqdm\n" 48 | ] 49 | } 50 | ], 51 | "source": [ 52 | "import dspy\n", 53 | "\n", 54 | "# Configure LLM, for this example we are using OpenAI's GPT-3.5-turbo\n", 55 | "lm = dspy.Databricks(api_key=OPENROUTER_API_KEY,\n", 56 | "\t\tapi_base=\"https://openrouter.ai/api/v1\",\n", 57 | "\t\tmodel=\"openai/gpt-3.5-turbo\")\n", 58 | "\n", 59 | "# Configure DSPy to use the following language model by default\n", 60 | "dspy.settings.configure(lm = lm)" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "# Basic Concept of DSPy: Signatures and Modules\n", 68 | "\n", 69 | "They are the building blocks of prompt programming in DSPy. Let's dive in to see what they are about!" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "## Signatures: Specification of input/output" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "A signature is the most fundamental building block in DSPy's prompt programming, which is a declarative specification of input/output behavior of a DSPy module. Signatures allow you to tell the LM **what** it needs to do, rather than specify how we should ask the LM to do it.\n", 84 | "\n", 85 | "At its most basic form, a signature is as simple as a single string separating the inputs and output with a `->`" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 4, 91 | "metadata": {}, 92 | "outputs": [ 93 | { 94 | "data": { 95 | "text/plain": [ 96 | "\"I'm sorry, but I am unable to determine the sentiment of the sentence without additional context or information. If you provide me with more details or specific criteria for determining sentiment, I would be happy to assist you further.\"" 97 | ] 98 | }, 99 | "execution_count": 4, 100 | "metadata": {}, 101 | "output_type": "execute_result" 102 | } 103 | ], 104 | "source": [ 105 | "# Define signature\n", 106 | "signature = 'sentence -> sentiment'\n", 107 | "classify = dspy.Predict(signature)\n", 108 | "\n", 109 | "# Run\n", 110 | "sentence = \"it's a charming and often affecting journey.\"\n", 111 | "classify(sentence=sentence).sentiment\n" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "The prediction is not a good one, but for instructional purpose let's inspect what was the issued prompt." 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 5, 124 | "metadata": {}, 125 | "outputs": [ 126 | { 127 | "name": "stdout", 128 | "output_type": "stream", 129 | "text": [ 130 | "\n", 131 | "\n", 132 | "\n", 133 | "\n", 134 | "Given the fields `sentence`, produce the fields `sentiment`.\n", 135 | "\n", 136 | "---\n", 137 | "\n", 138 | "Follow the following format.\n", 139 | "\n", 140 | "Sentence: ${sentence}\n", 141 | "Sentiment: ${sentiment}\n", 142 | "\n", 143 | "---\n", 144 | "\n", 145 | "Sentence: it's a charming and often affecting journey.\n", 146 | "Sentiment:\u001b[32m I'm sorry, but I am unable to determine the sentiment of the sentence without additional context or information. If you provide me with more details or specific criteria for determining sentiment, I would be happy to assist you further.\u001b[0m\n", 147 | "\n", 148 | "\n", 149 | "\n" 150 | ] 151 | } 152 | ], 153 | "source": [ 154 | "lm.inspect_history(n=1)" 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "We can see this prompt is assembled from the `sentence -> sentiment` signature.\n", 162 | "\n", 163 | "As seen from the code below, when we feed the signature into `dspy.Predict()`, the signature will be parsed into the `signature` attribute of the `classify` object, and subsequently assembled as a prompt. The `instructions` is the default one in DSPy." 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 6, 169 | "metadata": {}, 170 | "outputs": [ 171 | { 172 | "data": { 173 | "text/plain": [ 174 | "{'stage': 'f64b0ae92bc37733',\n", 175 | " 'signature': StringSignature(sentence -> sentiment\n", 176 | " instructions='Given the fields `sentence`, produce the fields `sentiment`.'\n", 177 | " sentence = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Sentence:', 'desc': '${sentence}'})\n", 178 | " sentiment = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'output', 'prefix': 'Sentiment:', 'desc': '${sentiment}'})\n", 179 | " ),\n", 180 | " 'config': {},\n", 181 | " 'lm': None,\n", 182 | " 'traces': [],\n", 183 | " 'train': [],\n", 184 | " 'demos': []}" 185 | ] 186 | }, 187 | "execution_count": 6, 188 | "metadata": {}, 189 | "output_type": "execute_result" 190 | } 191 | ], 192 | "source": [ 193 | "vars(classify)" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": {}, 199 | "source": [ 200 | "What if we want to provide a more detailed description of our objective to the LLM, beyond the basic `sentence -> sentiment` signature? To do so we need to provide a more verbose signatures in form of **Class-based DSPy Signatures**.\n", 201 | "\n", 202 | "Notice we provide no explicit instruction as to how the LLM should obtain the sentiment. We are just describing the task at hand, and also the expected output." 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 7, 208 | "metadata": {}, 209 | "outputs": [ 210 | { 211 | "data": { 212 | "text/plain": [ 213 | "\"Sentence: It's a charming and often affecting journey.\\nSentiment: joy\"" 214 | ] 215 | }, 216 | "execution_count": 7, 217 | "metadata": {}, 218 | "output_type": "execute_result" 219 | } 220 | ], 221 | "source": [ 222 | "# Define signature in Class-based form\n", 223 | "class Emotion(dspy.Signature):\n", 224 | " # Describe the task\n", 225 | " \"\"\"Classify emotions in a sentence.\"\"\"\n", 226 | " \n", 227 | " sentence = dspy.InputField()\n", 228 | " # Adding description to the output field\n", 229 | " sentiment = dspy.OutputField(desc=\"Possible choices: sadness, joy, love, anger, fear, surprise.\")\n", 230 | "\n", 231 | "classify_class_based = dspy.Predict(Emotion)\n", 232 | "\n", 233 | "# Issue prediction\n", 234 | "classify_class_based(sentence=sentence).sentiment" 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "It is now outputting a much better prediction! Again we see the descriptions we made when defining the class-based DSPy signatures are assembled into a prompt" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 8, 247 | "metadata": {}, 248 | "outputs": [ 249 | { 250 | "name": "stdout", 251 | "output_type": "stream", 252 | "text": [ 253 | "\n", 254 | "\n", 255 | "\n", 256 | "\n", 257 | "Classify emotions in a sentence.\n", 258 | "\n", 259 | "---\n", 260 | "\n", 261 | "Follow the following format.\n", 262 | "\n", 263 | "Sentence: ${sentence}\n", 264 | "Sentiment: Possible choices: sadness, joy, love, anger, fear, surprise.\n", 265 | "\n", 266 | "---\n", 267 | "\n", 268 | "Sentence: it's a charming and often affecting journey.\n", 269 | "Sentiment:\u001b[32m Sentence: It's a charming and often affecting journey.\n", 270 | "Sentiment: joy\u001b[0m\n", 271 | "\n", 272 | "\n", 273 | "\n" 274 | ] 275 | } 276 | ], 277 | "source": [ 278 | "lm.inspect_history(n=1)" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "This might do for simple tasks, but advanced applications might require sophisticated prompting techniques like Chain of Thought or ReAct. In DSPy these are implemented as `Modules`" 286 | ] 287 | }, 288 | { 289 | "cell_type": "markdown", 290 | "metadata": {}, 291 | "source": [ 292 | "## Modules: Abstracting prompting techniques\n", 293 | "\n", 294 | "We are used to hardcoding phrases like `let's think step by step` in our prompt. In DSPy these prompting techniques are abstracted as **Modules**. Let's see below for an example of applying our class-based signature to the `dspy.ChainOfThought` module\n" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": 9, 300 | "metadata": {}, 301 | "outputs": [ 302 | { 303 | "name": "stdout", 304 | "output_type": "stream", 305 | "text": [ 306 | "\n", 307 | "\n", 308 | "\n", 309 | "\n", 310 | "Classify emotions in a sentence.\n", 311 | "\n", 312 | "---\n", 313 | "\n", 314 | "Follow the following format.\n", 315 | "\n", 316 | "Sentence: ${sentence}\n", 317 | "Reasoning: Let's think step by step in order to ${produce the sentiment}. We ...\n", 318 | "Sentiment: Possible choices: sadness, joy, love, anger, fear, surprise.\n", 319 | "\n", 320 | "---\n", 321 | "\n", 322 | "Sentence: it's a charming and often affecting journey.\n", 323 | "Reasoning: Let's think step by step in order to\u001b[32m Sentence: It's a charming and often affecting journey.\n", 324 | "Reasoning: Let's think step by step in order to determine the sentiment. The use of the words \"charming\" and \"affecting\" suggests positive emotions associated with enjoyment and emotional impact. We can infer that the overall tone is positive and heartwarming, evoking feelings of joy and possibly love.\n", 325 | "Sentiment: Joy, love\u001b[0m\n", 326 | "\n", 327 | "\n", 328 | "\n" 329 | ] 330 | } 331 | ], 332 | "source": [ 333 | "# Apply the basic `sentence->sentiment` signature to Chain of Thought\n", 334 | "classify_cot = dspy.ChainOfThought(Emotion)\n", 335 | "\n", 336 | "# Run\n", 337 | "classify_cot(sentence=sentence).sentiment\n", 338 | "\n", 339 | "# Inspect prompt\n", 340 | "lm.inspect_history(n=1)\n" 341 | ] 342 | }, 343 | { 344 | "cell_type": "markdown", 345 | "metadata": {}, 346 | "source": [ 347 | "Notice how the \"Reasoning: Let's think step by step...\" phrase is added to our prompt, and the quality of our prediction is even better now." 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "As of time of writing DSPy provides the following prompting techniques in form of Modules. Notice the `dspy.Predict` we used in the initial example is also a Module, representing no prompting technique!\n", 355 | "\n", 356 | "1. `dspy.Predict`: Basic predictor. Does not modify the signature. Handles the key forms of learning (i.e., storing the instructions and demonstrations and updates to the LM).\n", 357 | "2. `dspy.ChainOfThought`: Teaches the LM to think step-by-step before committing to the signature's response.\n", 358 | "3. `dspy.ProgramOfThought`: Teaches the LM to output code, whose execution results will dictate the response.\n", 359 | "4. `dspy.ReAct`: An agent that can use tools to implement the given signature.\n", 360 | "5. `dspy.MultiChainComparison`: Can compare multiple outputs from ChainOfThought to produce a final prediction.\n", 361 | "\n", 362 | "It also have some function-style modules:\n", 363 | "\n", 364 | "6. `dspy.majority`: Can do basic voting to return the most popular response from a set of predictions.\n", 365 | "\n", 366 | "You can check out further examples in [each module's respective guide](https://dspy-docs.vercel.app/api/category/modules)." 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": {}, 372 | "source": [ 373 | "### Chaining the modules\n", 374 | "On the other hand, what about RAG? We can chain the modules together to deal with bigger problems!\n", 375 | "\n", 376 | "First we define a retriever, for our example we use a ColBERT retriever getting information from Wikipedia Abstracts 2017" 377 | ] 378 | }, 379 | { 380 | "cell_type": "code", 381 | "execution_count": 12, 382 | "metadata": {}, 383 | "outputs": [], 384 | "source": [ 385 | "# Configure retriever\n", 386 | "rm = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')\n", 387 | "dspy.settings.configure(rm = rm)" 388 | ] 389 | }, 390 | { 391 | "cell_type": "markdown", 392 | "metadata": {}, 393 | "source": [ 394 | "Then we define the `RAG` class inherited from `dspy.Module`. It needs two methods:\n", 395 | "- The `__init__` method will simply declare the sub-modules it needs: `dspy.Retrieve` and `dspy.ChainOfThought`. The latter is defined to implement our `context, question -> answer` signature.\n", 396 | "- The `forward` method will describe the control flow of answering the question using the modules we have.\n", 397 | "\n", 398 | "Note: Code and description borrowed from [DSPy's introduction notebook](https://github.com/stanfordnlp/dspy/blob/main/intro.ipynb)" 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "execution_count": 13, 404 | "metadata": {}, 405 | "outputs": [], 406 | "source": [ 407 | "# Define a class-based signature\n", 408 | "class GenerateAnswer(dspy.Signature):\n", 409 | " \"\"\"Answer questions with short factoid answers.\"\"\"\n", 410 | "\n", 411 | " context = dspy.InputField(desc=\"may contain relevant facts\")\n", 412 | " question = dspy.InputField()\n", 413 | " answer = dspy.OutputField(desc=\"often between 1 and 5 words\")\n", 414 | "\n", 415 | "# Chain different modules together to retrieve information from Wikipedia Abstracts 2017, then pass it as context for Chain of Thought to generate an answer\n", 416 | "class RAG(dspy.Module):\n", 417 | " def __init__(self, num_passages=3):\n", 418 | " super().__init__()\n", 419 | " self.retrieve = dspy.Retrieve(k=num_passages)\n", 420 | " self.generate_answer = dspy.ChainOfThought(GenerateAnswer)\n", 421 | " \n", 422 | " def forward(self, question):\n", 423 | " context = self.retrieve(question).passages\n", 424 | " answer = self.generate_answer(context=context, question=question)\n", 425 | " return answer" 426 | ] 427 | }, 428 | { 429 | "cell_type": "markdown", 430 | "metadata": {}, 431 | "source": [ 432 | "Then we make use of the class to perform a RAG" 433 | ] 434 | }, 435 | { 436 | "cell_type": "code", 437 | "execution_count": 14, 438 | "metadata": {}, 439 | "outputs": [ 440 | { 441 | "data": { 442 | "text/plain": [ 443 | "'1930'" 444 | ] 445 | }, 446 | "execution_count": 14, 447 | "metadata": {}, 448 | "output_type": "execute_result" 449 | } 450 | ], 451 | "source": [ 452 | "# Initilize our RAG class\n", 453 | "rag = RAG()\n", 454 | "\n", 455 | "# Define a question and pass it into the RAG class\n", 456 | "my_question = \"When was the first FIFA World Cup held?\"\n", 457 | "rag(question=my_question).answer" 458 | ] 459 | }, 460 | { 461 | "cell_type": "markdown", 462 | "metadata": {}, 463 | "source": [ 464 | "Inspecting the prompt, we see that 3 passages retrieved from Wikipedia Abstracts 2017 is interpersed as context for Chain of Thought generation" 465 | ] 466 | }, 467 | { 468 | "cell_type": "code", 469 | "execution_count": 15, 470 | "metadata": {}, 471 | "outputs": [ 472 | { 473 | "name": "stdout", 474 | "output_type": "stream", 475 | "text": [ 476 | "\n", 477 | "\n", 478 | "\n", 479 | "\n", 480 | "Answer questions with short factoid answers.\n", 481 | "\n", 482 | "---\n", 483 | "\n", 484 | "Follow the following format.\n", 485 | "\n", 486 | "Context: may contain relevant facts\n", 487 | "\n", 488 | "Question: ${question}\n", 489 | "\n", 490 | "Reasoning: Let's think step by step in order to ${produce the answer}. We ...\n", 491 | "\n", 492 | "Answer: often between 1 and 5 words\n", 493 | "\n", 494 | "---\n", 495 | "\n", 496 | "Context:\n", 497 | "[1] «History of the FIFA World Cup | The FIFA World Cup was first held in 1930, when FIFA president Jules Rimet decided to stage an international football tournament. The inaugural edition, held in 1930, was contested as a final tournament of only thirteen teams invited by the organization. Since then, the World Cup has experienced successive expansions and format remodeling to its current 32-team final tournament preceded by a two-year qualifying process, involving over 200 teams from around the world.»\n", 498 | "[2] «1950 FIFA World Cup | The 1950 FIFA World Cup, held in Brazil from 24 June to 16 July 1950, was the fourth FIFA World Cup. It was the first World Cup since 1938, the planned 1942 and 1946 competitions having been cancelled owing to World War II. It was won by Uruguay, who had won the inaugural competition in 1930, clinching the cup by beating the hosts Brazil 2–1 in the deciding match of the four-team final group (this was the only tournament not decided by a one-match final). It was also the first tournament where the trophy was referred to as the Jules Rimet Cup, to mark the 25th anniversary of Jules Rimet's presidency of FIFA.»\n", 499 | "[3] «1970 FIFA World Cup | The 1970 FIFA World Cup was the ninth FIFA World Cup, the quadrennial international football championship for men's national teams. Held from 31 May to 21 June in Mexico, it was the first World Cup tournament staged in North America, and the first held outside Europe and South America. Teams representing 75 nations from all six populated continents entered the competition, and its qualification rounds began in May 1968. Fourteen teams qualified from this process to join host nation Mexico and defending champions England in the sixteen-team final tournament. El Salvador, Israel, and Morocco made their first appearances at the final stage, and Peru their first since 1930.»\n", 500 | "\n", 501 | "Question: When was the first FIFA World Cup held?\n", 502 | "\n", 503 | "Reasoning: Let's think step by step in order to Answer: 1930\n", 504 | "\n", 505 | "Answer:\u001b[32m 1930\u001b[0m\n", 506 | "\n", 507 | "\n", 508 | "\n" 509 | ] 510 | } 511 | ], 512 | "source": [ 513 | "lm.inspect_history(n=1)" 514 | ] 515 | }, 516 | { 517 | "cell_type": "markdown", 518 | "metadata": {}, 519 | "source": [ 520 | "The above examples might not seem much. At its most basic application the DSPy seemed only doing nothing that can't be done with f-string, but it actually present a paradigm shift for prompt writing, as this brings **modularity** to prompt composition!\n", 521 | "\n", 522 | "First we describe our objective with `Signature`, then we apply different prompting techniques with `Modules`. To test different prompt techniques for a given problem, we can simply switch the modules used and compare their results, rather than hardcoding the \"let's think step by step...\" (for Chain of Thought) or \"you will interleave Thought, Action, and Observation steps\" (for ReAct) phrases.\n", 523 | "\n", 524 | "The power of DSPy is not only limited to modularity, it can also optimize our prompt based on training samples, and test it systematically. We will be exploring this in the next section!" 525 | ] 526 | }, 527 | { 528 | "cell_type": "markdown", 529 | "metadata": {}, 530 | "source": [ 531 | "# Optimizer: Train our prompt as with machine learning\n", 532 | "In this section we attempt to optimize our prompt for a RAG application.\n", 533 | "\n", 534 | "Taking Chain of Thought as an example, beyond just adding the \"let's think step by step\" phrase, we can boost its performance with a few tweaks:\n", 535 | "1. Adding suitable examples (aka **few-shot learning**).\n", 536 | "2. Furthermore, we can **bootstrap demonstrations of reasoning** to teach the LMs to apply proper reasoning to deal with the task at hand. \n", 537 | "\n", 538 | "Doing this manually would be highly time-consuming and can't generalize to different problems, but with DSPy this can be done automatically. Let's dive in!" 539 | ] 540 | }, 541 | { 542 | "cell_type": "markdown", 543 | "metadata": {}, 544 | "source": [ 545 | "## Preparation\n", 546 | "Like machine learning, to train our prompt we need to prepare our training and test datasets. Initially this cell will take around 20 minutes to run." 547 | ] 548 | }, 549 | { 550 | "cell_type": "code", 551 | "execution_count": 10, 552 | "metadata": {}, 553 | "outputs": [ 554 | { 555 | "name": "stderr", 556 | "output_type": "stream", 557 | "text": [ 558 | "c:\\Users\\Yip\\Desktop\\code\\llm_dspy_tutorial\\venv\\lib\\site-packages\\datasets\\table.py:1421: FutureWarning: promote has been superseded by promote_options='default'.\n", 559 | " table = cls._concat_blocks(blocks, axis=0)\n" 560 | ] 561 | }, 562 | { 563 | "data": { 564 | "text/plain": [ 565 | "(20, 20)" 566 | ] 567 | }, 568 | "execution_count": 10, 569 | "metadata": {}, 570 | "output_type": "execute_result" 571 | } 572 | ], 573 | "source": [ 574 | "from dspy.datasets.hotpotqa import HotPotQA\n", 575 | "\n", 576 | "# For demonstration purpose we will use a small subset of the HotPotQA dataset, 20 for training and testing each\n", 577 | "dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=20, test_size=0)\n", 578 | "trainset = [x.with_inputs('question') for x in dataset.train]\n", 579 | "testset = [x.with_inputs('question') for x in dataset.dev]\n", 580 | "\n", 581 | "len(trainset), len(testset)" 582 | ] 583 | }, 584 | { 585 | "cell_type": "markdown", 586 | "metadata": {}, 587 | "source": [ 588 | "Inspecting our dataset, which is basically a set of question-and-answer pairs" 589 | ] 590 | }, 591 | { 592 | "cell_type": "code", 593 | "execution_count": 17, 594 | "metadata": {}, 595 | "outputs": [ 596 | { 597 | "data": { 598 | "text/plain": [ 599 | "Example({'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt'}) (input_keys={'question'})" 600 | ] 601 | }, 602 | "execution_count": 17, 603 | "metadata": {}, 604 | "output_type": "execute_result" 605 | } 606 | ], 607 | "source": [ 608 | "trainset[0]" 609 | ] 610 | }, 611 | { 612 | "cell_type": "markdown", 613 | "metadata": {}, 614 | "source": [ 615 | "To facilitate understanding of the optimization process, we launch **Phoenix** to observe our DSPy application, which is a great tool for LLM observability in general!\n", 616 | "\n", 617 | "Note: If you are on Windows, please also install Windows C++ Build Tools [here](https://visualstudio.microsoft.com/visual-cpp-build-tools/), which is necessary for Phoenix" 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": 18, 623 | "metadata": {}, 624 | "outputs": [ 625 | { 626 | "name": "stdout", 627 | "output_type": "stream", 628 | "text": [ 629 | "🌍 To view the Phoenix app in your browser, visit http://localhost:6006/\n", 630 | "📺 To view the Phoenix app in a notebook, run `px.active_session().view()`\n", 631 | "📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix\n" 632 | ] 633 | } 634 | ], 635 | "source": [ 636 | "# Phoenix by default uses the 6006 port for the UI. \n", 637 | "# If you have a port conflict, you can close the port by uncommenting the following code\n", 638 | "\n", 639 | "import phoenix as px\n", 640 | "# import psutil\n", 641 | "# \n", 642 | "# def close_port(port):\n", 643 | "# for conn in psutil.net_connections(kind='inet'):\n", 644 | "# if conn.laddr.port == port:\n", 645 | "# print(f\"Closing port {port} by terminating PID {conn.pid}\")\n", 646 | "# process = psutil.Process(conn.pid)\n", 647 | "# process.terminate()\n", 648 | "\n", 649 | "# close_port(6006)\n", 650 | "\n", 651 | "phoenix_session = px.launch_app()" 652 | ] 653 | }, 654 | { 655 | "cell_type": "markdown", 656 | "metadata": {}, 657 | "source": [ 658 | "Configure our OpenTelemetry exporter, which will export spans and traces to Phoenix, and run the DSPy instrumentor to wrap calls to the relevant DSPy components." 659 | ] 660 | }, 661 | { 662 | "cell_type": "code", 663 | "execution_count": 19, 664 | "metadata": {}, 665 | "outputs": [], 666 | "source": [ 667 | "from openinference.instrumentation.dspy import DSPyInstrumentor\n", 668 | "from opentelemetry import trace as trace_api\n", 669 | "from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter\n", 670 | "from opentelemetry.sdk import trace as trace_sdk\n", 671 | "from opentelemetry.sdk.resources import Resource\n", 672 | "from opentelemetry.sdk.trace.export import SimpleSpanProcessor\n", 673 | "\n", 674 | "endpoint = \"http://127.0.0.1:6006/v1/traces\"\n", 675 | "resource = Resource(attributes={})\n", 676 | "tracer_provider = trace_sdk.TracerProvider(resource=resource)\n", 677 | "span_otlp_exporter = OTLPSpanExporter(endpoint=endpoint)\n", 678 | "tracer_provider.add_span_processor(SimpleSpanProcessor(span_exporter=span_otlp_exporter))\n", 679 | "\n", 680 | "trace_api.set_tracer_provider(tracer_provider=tracer_provider)\n", 681 | "DSPyInstrumentor().instrument()" 682 | ] 683 | }, 684 | { 685 | "cell_type": "markdown", 686 | "metadata": {}, 687 | "source": [ 688 | "## Prompt Optimization\n", 689 | "Then we are ready to see what this opimitzation is about! To \"train\" our prompt, we need 3 things:\n", 690 | "\n", 691 | "1. A training set. We'll just use our 20 question–answer examples from `trainset`.\n", 692 | "2. A metric for validation. Here we use the native `dspy.evaluate.answer_exact_match` which checks if the predicted answer exactly matches the right answer (questionable but suffice for demonstration). For real-life applications you can define your own evaluation criteria\n", 693 | "3. A specific **Optimizer** (formerly teleprompter). The DSPy library includes a number of optimization strategies and you can check them out [here](https://dspy-docs.vercel.app/docs/building-blocks/optimizers). For our example we use `BootstrapFewShot`\n", 694 | "\n", 695 | "Now we train our prompt. Successful execution of the below cell should show \"Bootstrapped 4 full traces after n examples in round 0\"" 696 | ] 697 | }, 698 | { 699 | "cell_type": "code", 700 | "execution_count": 20, 701 | "metadata": {}, 702 | "outputs": [ 703 | { 704 | "name": "stderr", 705 | "output_type": "stream", 706 | "text": [ 707 | " 70%|███████ | 14/20 [00:00<00:00, 41.20it/s]" 708 | ] 709 | }, 710 | { 711 | "name": "stdout", 712 | "output_type": "stream", 713 | "text": [ 714 | "Bootstrapped 4 full traces after 15 examples in round 0.\n" 715 | ] 716 | }, 717 | { 718 | "name": "stderr", 719 | "output_type": "stream", 720 | "text": [ 721 | "\n" 722 | ] 723 | } 724 | ], 725 | "source": [ 726 | "from dspy.teleprompt import BootstrapFewShot\n", 727 | "\n", 728 | "# Simple optimizer example. I am explicitly stating the default values for max_bootstrapped_demos and max_labeled_demos for demonstration purposes\n", 729 | "optimizer = BootstrapFewShot(metric=dspy.evaluate.answer_exact_match, max_bootstrapped_demos=4)\n", 730 | "\n", 731 | "# Compile!\n", 732 | "compiled_rag = optimizer.compile(RAG(), trainset=trainset)" 733 | ] 734 | }, 735 | { 736 | "cell_type": "markdown", 737 | "metadata": {}, 738 | "source": [ 739 | "Before using the `compiled_rag` to answer a question, let's see what went behind the scene during the training process (aka compile). We launch the Phoenix console by visiting `http://localhost:6006/` in browser" 740 | ] 741 | }, 742 | { 743 | "cell_type": "markdown", 744 | "metadata": {}, 745 | "source": [ 746 | "In my run I have made 14 calls using the `RAG` class, in each of those calls we post a question to LM to obtain a prediction. \n", 747 | "\n", 748 | "Below is the summary of my run, see that 4 correct answers are made from 14 samples, thus reaching our `max_bootstrapped_demos` parameter and stopping the calls.\n", 749 | "\n", 750 | "|#|Question|Predicted Answer|Real Answer|Correct|\n", 751 | "|---|---|---|---|---|\n", 752 | "|1|At My Window was released by which American singer-songwriter?|Townes Van Zandt|John Townes Van Zandt|❌|\n", 753 | "|2|which American actor was Candace Kita guest starred with|Context: Candace Kita has appeared in various television shows and films throughout her career, including guest-starring roles.\\n\\nQuestion: Which American actor was Candace Kita guest-starred with?\\n\\nReasoning: Let's think step by step in order to identify the American actor who appeared alongside Candace Kita in one of her guest-starring roles.\\n\\nAnswer: Bill Murray|Bill Murray|❌|\n", 754 | "|3|Which of these publications was most recently published, Who Put the Bomp or Self?|Context: The publication \\\"Who Put the Bomp\\\" was a rock music fanzine edited and published by Greg Shaw from 1970 to 1979.\\n\\nQuestion: Which of these publications was most recently published, Who Put the Bomp or Self?\\n\\nReasoning: \\\"Who Put the Bomp\\\" was last published in 1979.\\n\\nAnswer: Who Put the|Self|❌|\n", 755 | "|4|The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?|Answer: 1950|1950|❌|\n", 756 | "|5|Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?|Tae Kwon Do Times|Tae Kwon Do Times|✔️|\n", 757 | "|6|In what year was the club founded that played Manchester City in the 1972 FA Charity Shield|Context: The 1972 FA Charity Shield was contested between Manchester City and Aston Villa.\\n\\nQuestion: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield?\\n\\nReasoning: Let's think step by step in order to find the answer. The club that played Manchester City in the 1972 FA Charity Shield was Aston Villa, and|1874|❌|\n", 758 | "|7|Which is taller, the Empire State Building or the Bank of America Tower?|Answer: the Empire State Building|The Empire State Building|❌|\n", 759 | "|8|Which American actress who made their film debut in the 1995 teen drama \\\"Kids\\\" was the co-founder of Voto Latino?|Context: Rosario Dawson is an American actress, producer, singer, comic book writer, and political activist who made her film debut in the 1995 teen drama \\\"Kids.\\\"\\n\\nQuestion: Which American actress who made their film debut in the 1995 teen drama \\\"Kids\\\" was the co-founder of Voto Latino?\\n\\nReasoning: Let's think step by step in|Rosario Dawson|❌|\n", 760 | "|9|Tombstone stared an actor born May 17, 1955 known as who?|Bill Paxton|Bill Paxton|✔️|\n", 761 | "|10|What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division?|Operation Citadel|Operation Citadel|✔️|\n", 762 | "|11|Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of \"Hamlet.\"?|Question: Who acted in the short film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of \"Hamlet.\" ?\\n\\nReasoning: Let's think step by step in order to find the answer. We know that this actress played Octavia of the Julii in the HBO/BBC series \"Rome\" and voiced F|Kerry Condon|❌|\n", 763 | "|12|Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs?|Buena Vista Distribution Company|Buena Vista Distribution|❌|\n", 764 | "|13|Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where?|Context:\\n[1] Samantha Cristoforetti and Mark Shuttleworth are both known for their achievements in space exploration.\\n\\nQuestion: Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where?\\n\\nReasoning: Let's think step by step in order to Answer: Space\\n\\nAnswer: Space|space|❌|\n", 765 | "|14|Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?|Outfield of Dreams|Outfield of Dreams|✔️|" 766 | ] 767 | }, 768 | { 769 | "cell_type": "markdown", 770 | "metadata": {}, 771 | "source": [ 772 | "But what are the prompts DSPy issued to obtain the bootstrapped demos? Here's the prompt for question #14. We can see as DSPy tries to generate one bootstrapped demo, it would randomly add samples from our `trainset` for few-short learning" 773 | ] 774 | }, 775 | { 776 | "cell_type": "code", 777 | "execution_count": 21, 778 | "metadata": {}, 779 | "outputs": [ 780 | { 781 | "name": "stdout", 782 | "output_type": "stream", 783 | "text": [ 784 | "\n", 785 | "\n", 786 | "\n", 787 | "\n", 788 | "Answer questions with short factoid answers.\n", 789 | "\n", 790 | "---\n", 791 | "\n", 792 | "Question: Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where?\n", 793 | "Answer: space\n", 794 | "\n", 795 | "Question: which American actor was Candace Kita guest starred with\n", 796 | "Answer: Bill Murray\n", 797 | "\n", 798 | "Question: Tombstone stared an actor born May 17, 1955 known as who?\n", 799 | "Answer: Bill Paxton\n", 800 | "\n", 801 | "Question: The Organisation that allows a community to influence their operation or use and to enjoy the benefits arisingwas founded in what year?\n", 802 | "Answer: 2010\n", 803 | "\n", 804 | "Question: Which is taller, the Empire State Building or the Bank of America Tower?\n", 805 | "Answer: The Empire State Building\n", 806 | "\n", 807 | "Question: This American guitarist best known for her work with the Iron Maidens is an ancestor of a composer who was known as what?\n", 808 | "Answer: The Waltz King\n", 809 | "\n", 810 | "Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?\n", 811 | "Answer: Tae Kwon Do Times\n", 812 | "\n", 813 | "Question: Which American actress who made their film debut in the 1995 teen drama \"Kids\" was the co-founder of Voto Latino?\n", 814 | "Answer: Rosario Dawson\n", 815 | "\n", 816 | "Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield\n", 817 | "Answer: 1874\n", 818 | "\n", 819 | "Question: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division?\n", 820 | "Answer: Operation Citadel\n", 821 | "\n", 822 | "Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?\n", 823 | "Answer: 1950\n", 824 | "\n", 825 | "Question: Which of these publications was most recently published, Who Put the Bomp or Self?\n", 826 | "Answer: Self\n", 827 | "\n", 828 | "Question: Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs?\n", 829 | "Answer: Buena Vista Distribution\n", 830 | "\n", 831 | "Question: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko?\n", 832 | "Answer: Aleksandr Danilovich Aleksandrov\n", 833 | "\n", 834 | "Question: At My Window was released by which American singer-songwriter?\n", 835 | "Answer: John Townes Van Zandt\n", 836 | "\n", 837 | "---\n", 838 | "\n", 839 | "Follow the following format.\n", 840 | "\n", 841 | "Context: may contain relevant facts\n", 842 | "\n", 843 | "Question: ${question}\n", 844 | "\n", 845 | "Reasoning: Let's think step by step in order to ${produce the answer}. We ...\n", 846 | "\n", 847 | "Answer: often between 1 and 5 words\n", 848 | "\n", 849 | "---\n", 850 | "\n", 851 | "Context:\n", 852 | "[1] «Eric Davis (baseball) | Eric Keith Davis (born May 29, 1962) is a former center fielder for several Major League Baseball teams. Davis was 21 years old when he broke into the big leagues on May 19, 1984 with the Cincinnati Reds, the team for which he is most remembered. Blessed with a rare combination of excellent foot speed and bat speed, Davis became the first major league player to hit at least 30 home runs and steal at least 50 bases in the same season in 1987.»\n", 853 | "[2] «Willie Davis (baseball) | William Henry Davis, Jr. (April 15, 1940 – March 9, 2010) was a center fielder in Major League Baseball who played most of his career for the Los Angeles Dodgers. At the end of his career he ranked seventh in major league history in putouts (5449) and total chances (5719) in the outfield, and third in games in center field (2237). He was ninth in National League history in total outfield games (2274), and won Gold Glove Awards from 1971 to 1973. He had 13 seasons of 20 or more stolen bases, led the NL in triples twice, and retired with the fourth most triples (138) by any major leaguer since 1945. He holds Los Angeles club records (1958–present) for career hits (2091), runs (1004), triples (110), at bats (7495), total bases (3094) and extra base hits (585). His 31-game hitting streak in 1969 remains the longest by a Dodger. At one point during the streak, when the team was playing at home, the big message board at Dodger Stadium quoted a message from a telegram sent to Davis and the team from Zack Wheat, the team's former record holder, at his home in Missouri.»\n", 854 | "[3] «1992 Los Angeles Dodgers season | The 1992 Los Angeles Dodgers season was a poor one for the team as it finished last in the Western Division of the National League with a record of 63 wins and 99 losses. Despite boasting what was nicknamed the \"Outfield of Dreams\", being manned by Eric Davis, Brett Butler, and Darryl Strawberry, injuries to key players and slumps from others contributed to the franchise's worst season since moving to Los Angeles. Additionally, the Dodgers cancelled four home games during the season due to the L.A. Riots. Despite the poor finish, the Dodgers had some hope for the future as first baseman Eric Karros won the National League Rookie of the Year Award, the first of five consecutive Dodger players to do so. The 1992 season also saw the Dodgers drop television station KTTV Ch.11 as their chief broadcaster of Dodger baseball, ending a 34 year-35 consecutive season association with that station. Additionally, it was the first time the Dodgers lost 90 games in a season since 1944.»\n", 855 | "\n", 856 | "Question: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?\n", 857 | "\n", 858 | "Reasoning: Let's think step by step in order to Answer: \"Outfield of Dreams\"\n", 859 | "\n", 860 | "Answer:\u001b[32m \"Outfield of Dreams\"\u001b[0m\n", 861 | "\n", 862 | "\n", 863 | "\n" 864 | ] 865 | } 866 | ], 867 | "source": [ 868 | "lm.inspect_history(n=1)" 869 | ] 870 | }, 871 | { 872 | "cell_type": "markdown", 873 | "metadata": {}, 874 | "source": [ 875 | "Time to put the `compiled_rag` to test! Here we raise a question which was answered wrongly in our summary table, and see if we can get the right answer this time." 876 | ] 877 | }, 878 | { 879 | "cell_type": "code", 880 | "execution_count": 22, 881 | "metadata": {}, 882 | "outputs": [ 883 | { 884 | "data": { 885 | "text/plain": [ 886 | "Prediction(\n", 887 | " rationale='Answer: Self',\n", 888 | " answer='Self'\n", 889 | ")" 890 | ] 891 | }, 892 | "execution_count": 22, 893 | "metadata": {}, 894 | "output_type": "execute_result" 895 | } 896 | ], 897 | "source": [ 898 | "compiled_rag(question=\"Which of these publications was most recently published, Who Put the Bomp or Self?\")" 899 | ] 900 | }, 901 | { 902 | "cell_type": "markdown", 903 | "metadata": {}, 904 | "source": [ 905 | "We now get the right answer!\n", 906 | "\n", 907 | "Again let's inspect the prompt issued. Notice how the compiled prompt is different from the ones that were used during bootstrapping. Apart from the few-shot examples, **bootstrapped Context-Question-Reasoning-Answer demonstrations are added to the prompt**, improving the LM's capability." 908 | ] 909 | }, 910 | { 911 | "cell_type": "code", 912 | "execution_count": 23, 913 | "metadata": {}, 914 | "outputs": [ 915 | { 916 | "name": "stdout", 917 | "output_type": "stream", 918 | "text": [ 919 | "\n", 920 | "\n", 921 | "\n", 922 | "\n", 923 | "Answer questions with short factoid answers.\n", 924 | "\n", 925 | "---\n", 926 | "\n", 927 | "Question: At My Window was released by which American singer-songwriter?\n", 928 | "Answer: John Townes Van Zandt\n", 929 | "\n", 930 | "Question: \"Everything Has Changed\" is a song from an album released under which record label ?\n", 931 | "Answer: Big Machine Records\n", 932 | "\n", 933 | "Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?\n", 934 | "Answer: 1950\n", 935 | "\n", 936 | "Question: Which Pakistani cricket umpire who won 3 consecutive ICC umpire of the year awards in 2009, 2010, and 2011 will be in the ICC World Twenty20?\n", 937 | "Answer: Aleem Sarwar Dar\n", 938 | "\n", 939 | "Question: Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where?\n", 940 | "Answer: space\n", 941 | "\n", 942 | "Question: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko?\n", 943 | "Answer: Aleksandr Danilovich Aleksandrov\n", 944 | "\n", 945 | "Question: The Organisation that allows a community to influence their operation or use and to enjoy the benefits arisingwas founded in what year?\n", 946 | "Answer: 2010\n", 947 | "\n", 948 | "Question: Which American actress who made their film debut in the 1995 teen drama \"Kids\" was the co-founder of Voto Latino?\n", 949 | "Answer: Rosario Dawson\n", 950 | "\n", 951 | "Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield\n", 952 | "Answer: 1874\n", 953 | "\n", 954 | "Question: which American actor was Candace Kita guest starred with\n", 955 | "Answer: Bill Murray\n", 956 | "\n", 957 | "Question: Which is taller, the Empire State Building or the Bank of America Tower?\n", 958 | "Answer: The Empire State Building\n", 959 | "\n", 960 | "Question: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of \"Hamlet.\" ?\n", 961 | "Answer: Kerry Condon\n", 962 | "\n", 963 | "---\n", 964 | "\n", 965 | "Follow the following format.\n", 966 | "\n", 967 | "Context: may contain relevant facts\n", 968 | "\n", 969 | "Question: ${question}\n", 970 | "\n", 971 | "Reasoning: Let's think step by step in order to ${produce the answer}. We ...\n", 972 | "\n", 973 | "Answer: often between 1 and 5 words\n", 974 | "\n", 975 | "---\n", 976 | "\n", 977 | "Context:\n", 978 | "[1] «Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. \"Tae Kwon Do Times\" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.»\n", 979 | "[2] «Kwon Tae-man | Kwon Tae-man (born 1941) was an early Korean hapkido practitioner and a pioneer of the art, first in Korea and then in the United States. He formed one of the earliest dojang's for hapkido in the United States in Torrance, California, and has been featured in many magazine articles promoting the art.»\n", 980 | "[3] «Hee Il Cho | Cho Hee Il (born October 13, 1940) is a prominent Korean-American master of taekwondo, holding the rank of 9th \"dan\" in the martial art. He has written 11 martial art books, produced 70 martial art training videos, and has appeared on more than 70 martial arts magazine covers. Cho won several national and international competitions as a taekwondo competitor, and has appeared in several films, including \"Fight to Win\", \"Best of the Best\", \"Bloodsport II\", and \"Bloodsport III\". He founded the Action International Martial Arts Association (AIMAA) in 1980, and is its President. Cho is a member of both \"Black Belt\" magazine's Hall of Fame and \"Tae Kwon Do Times\" magazine's Hall of Fame.»\n", 981 | "\n", 982 | "Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?\n", 983 | "\n", 984 | "Reasoning: Let's think step by step in order to Answer: Tae Kwon Do Times\n", 985 | "\n", 986 | "Answer: Tae Kwon Do Times\n", 987 | "\n", 988 | "---\n", 989 | "\n", 990 | "Context:\n", 991 | "[1] «Michael Biehn | Michael Connell Biehn (born July 31, 1956) is an American actor, primarily known for his military roles in science fiction films directed by James Cameron; as Sgt. Kyle Reese in \"The Terminator\" (1984), Cpl. Dwayne Hicks in \"Aliens\" (1986) and Lt. Coffey in \"The Abyss\" (1989). He was nominated for the Saturn Award for Best Actor for \"Aliens.\" His other films include \"The Fan\" (1981), \"K2\" (1991), \"Tombstone\" (1993), \"The Rock\" (1996), \"\" (2001) and \"Planet Terror\" (2007). On television, he has appeared in \"Hill Street Blues\" (1984) and \"Adventure Inc.\" (2002-03).»\n", 992 | "[2] «Quintin Sondergaard | Quentin Charles Sondergaard, known primarily as Quintin Sondergaard (January 11, 1925 – February 15, 1984), was an American actor principally active on television westerns from 1957-70. He had a supporting role with eleven appearances as \"Deputy Quint\" on the series \"Tombstone Territory\", with co-stars Pat Conway, Richard Eastham, and Gilman Rankin. \"Tombstone Territory\" began in 1957 on ABC and then switched to syndication in 1959.»\n", 993 | "[3] «John Philbin | John Philbin (born April 27, 1960) is an American actor who is best known for his appearances in the films \"Return of the Living Dead\", \"Point Break\" and \"Tombstone\".»\n", 994 | "\n", 995 | "Question: Tombstone stared an actor born May 17, 1955 known as who?\n", 996 | "\n", 997 | "Reasoning: Let's think step by step in order to Question: Tombstone stared an actor born May 17, 1955 known as who? Reasoning: Let's think step by step in order to identify the actor who starred in the movie \"Tombstone\" and was born on May 17, 1955.\n", 998 | "\n", 999 | "Answer: Bill Paxton\n", 1000 | "\n", 1001 | "---\n", 1002 | "\n", 1003 | "Context:\n", 1004 | "[1] «Battle of Kursk | The Battle of Kursk was a Second World War engagement between German and Soviet forces on the Eastern Front near Kursk (450 km south-west of Moscow) in the Soviet Union during July and August 1943. The battle began with the launch of the German offensive, Operation Citadel (German: \"Unternehmen Zitadelle\" ), on 5 July, which had the objective of pinching off the Kursk salient with attacks on the base of the salient from north and south simultaneously. After the German offensive stalled on the northern side of the salient, on 12 July the Soviets commenced their Kursk Strategic Offensive Operation with the launch of Operation Kutuzov (Russian: Кутузов ) against the rear of the German forces in the northern side. On the southern side, the Soviets also launched powerful counterattacks the same day, one of which led to a large armoured clash, the Battle of Prokhorovka. On 3 August, the Soviets began the second phase of the Kursk Strategic Offensive Operation with the launch of Operation Polkovodets Rumyantsev (Russian: Полководец Румянцев ) against the German forces in the southern side of the Kursk salient.»\n", 1005 | "[2] «Operation Mars | Operation Mars, also known as the Second Rzhev-Sychevka Offensive Operation (Russian: Вторая Ржевско-Сычёвская наступательная операция), was the codename for an offensive launched by Soviet forces against German forces during World War II. It took place between 25 November and 20 December 1942 around the Rzhev salient in the vicinity of Moscow.»\n", 1006 | "[3] «Kholm Pocket | The Kholm Pocket (German: \"Kessel von Cholm\" ; Russian: Холмский котёл ) was the name given for the encirclement of German troops by the Red Army around Kholm south of Leningrad, during World War II on the Eastern Front, from 23 January 1942 until 5 May 1942. A much larger pocket was simultaneously surrounded in Demyansk, about 100 km to the northeast. These were the results of German retreat following their defeat during the Battle of Moscow.»\n", 1007 | "\n", 1008 | "Question: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division?\n", 1009 | "\n", 1010 | "Reasoning: Let's think step by step in order to Answer: Operation Citadel\n", 1011 | "\n", 1012 | "Answer: Operation Citadel\n", 1013 | "\n", 1014 | "---\n", 1015 | "\n", 1016 | "Context:\n", 1017 | "[1] «Eric Davis (baseball) | Eric Keith Davis (born May 29, 1962) is a former center fielder for several Major League Baseball teams. Davis was 21 years old when he broke into the big leagues on May 19, 1984 with the Cincinnati Reds, the team for which he is most remembered. Blessed with a rare combination of excellent foot speed and bat speed, Davis became the first major league player to hit at least 30 home runs and steal at least 50 bases in the same season in 1987.»\n", 1018 | "[2] «Willie Davis (baseball) | William Henry Davis, Jr. (April 15, 1940 – March 9, 2010) was a center fielder in Major League Baseball who played most of his career for the Los Angeles Dodgers. At the end of his career he ranked seventh in major league history in putouts (5449) and total chances (5719) in the outfield, and third in games in center field (2237). He was ninth in National League history in total outfield games (2274), and won Gold Glove Awards from 1971 to 1973. He had 13 seasons of 20 or more stolen bases, led the NL in triples twice, and retired with the fourth most triples (138) by any major leaguer since 1945. He holds Los Angeles club records (1958–present) for career hits (2091), runs (1004), triples (110), at bats (7495), total bases (3094) and extra base hits (585). His 31-game hitting streak in 1969 remains the longest by a Dodger. At one point during the streak, when the team was playing at home, the big message board at Dodger Stadium quoted a message from a telegram sent to Davis and the team from Zack Wheat, the team's former record holder, at his home in Missouri.»\n", 1019 | "[3] «1992 Los Angeles Dodgers season | The 1992 Los Angeles Dodgers season was a poor one for the team as it finished last in the Western Division of the National League with a record of 63 wins and 99 losses. Despite boasting what was nicknamed the \"Outfield of Dreams\", being manned by Eric Davis, Brett Butler, and Darryl Strawberry, injuries to key players and slumps from others contributed to the franchise's worst season since moving to Los Angeles. Additionally, the Dodgers cancelled four home games during the season due to the L.A. Riots. Despite the poor finish, the Dodgers had some hope for the future as first baseman Eric Karros won the National League Rookie of the Year Award, the first of five consecutive Dodger players to do so. The 1992 season also saw the Dodgers drop television station KTTV Ch.11 as their chief broadcaster of Dodger baseball, ending a 34 year-35 consecutive season association with that station. Additionally, it was the first time the Dodgers lost 90 games in a season since 1944.»\n", 1020 | "\n", 1021 | "Question: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?\n", 1022 | "\n", 1023 | "Reasoning: Let's think step by step in order to Answer: \"Outfield of Dreams\"\n", 1024 | "\n", 1025 | "Answer: \"Outfield of Dreams\"\n", 1026 | "\n", 1027 | "---\n", 1028 | "\n", 1029 | "Context:\n", 1030 | "[1] «Who Put the Bomp | Who Put The Bomp was a rock music fanzine edited and published by Greg Shaw from 1970 to 1979. Its name came from the hit 1961 doo-wop song by Barry Mann, \"Who Put the Bomp\". Later, the name was shortened to \"Bomp!\"»\n", 1031 | "[2] «Bompiani | Bompiani is an Italian publishing house based in Milan, Italy. It was founded in 1929 by Valentino Bompiani.»\n", 1032 | "[3] «What Color is Your Parachute? | What Color is Your Parachute? by Richard Nelson Bolles is a book for job-seekers that has been in print since 1970 and has been revised every year since 1975, sometimes substantially. Bolles initially self-published the book (December 1, 1970), but it has been commercially published since November 1972 by Ten Speed Press in Berkeley, California. As of September 28, 2010, the book is available in 22 languages, it is used in 26 countries around the world, and over ten million copies have been sold worldwide. It is one of the most highly regarded career advice books in print. In the latest edition of the book, the author writes about how to adapt one's job search to the Web 2.0 age.»\n", 1033 | "\n", 1034 | "Question: Which of these publications was most recently published, Who Put the Bomp or Self?\n", 1035 | "\n", 1036 | "Reasoning: Let's think step by step in order to Answer: Self\n", 1037 | "\n", 1038 | "Answer:\u001b[32m Self\u001b[0m\n", 1039 | "\n", 1040 | "\n", 1041 | "\n" 1042 | ] 1043 | } 1044 | ], 1045 | "source": [ 1046 | "lm.inspect_history(n=1)" 1047 | ] 1048 | }, 1049 | { 1050 | "cell_type": "markdown", 1051 | "metadata": {}, 1052 | "source": [ 1053 | "The above example still falls short of what we typically do with machine learning: Typically we define a couple of candidate models, see how they perform against the test set, and select the one achieving the highest performance score. This is what we will do next!" 1054 | ] 1055 | }, 1056 | { 1057 | "cell_type": "markdown", 1058 | "metadata": {}, 1059 | "source": [ 1060 | "# Full fledged example: \"Models\" comparison with LLM\n", 1061 | "\n", 1062 | "## The aim of this example\n", 1063 | "\n", 1064 | "Typically for LM comparison we raise underspecified questions like “how do different LMs compare on a certain problem”. With DSPy's modular, composable programs and optimizers, we are now equipped to answer toward “how they compare on a certain problem with module X when compiled with Optimizer Y”, which is a well-defined and reproducible run, thus reducing the role of artful prompt construction in modern AI.\n", 1065 | "\n", 1066 | "In this section, we want to address the question of \"Given the LM we use (GPT 3.5 Turbo), what is the best module and optimizer that could best perform a RAG to get the right answer\".\n", 1067 | "\n", 1068 | "The modules under evaluation are:\n", 1069 | "- **Vanilla**: Single-hop RAG to answer a question based on the retrieved context, without key phrases like \"let's think step by step\"\n", 1070 | "- **COT**: Single-hop RAG with Chain of Thought\n", 1071 | "- **ReAct**: Single-hop RAG with ReAct prompting\n", 1072 | "- **BasicMultiHop**: 2-hop RAG with Chain of Thought\n", 1073 | "\n", 1074 | "And the optimizer candidates are:\n", 1075 | "- **None**: No additional instructions apart from the signature\n", 1076 | "- **Labeled few-shot**: Simply constructs few-shot examples from provided labeled Q/A pairs\n", 1077 | "- **Bootstrap few-shot**: As we demonstrated, self-generate complete demonstrations for every stage of our module. Will simply use the generated demonstrations (if they pass the metric) without any further optimization. For `Vanilla` it is just equal to \"Labeled few-shot\"\n", 1078 | "\n", 1079 | "As for evaluation metric, we use exact match as criteria (`dspy.evaluate.metrics.answer_exact_match`) against the test set.\n", 1080 | "\n", 1081 | "*Note: exact match is a very questionable evaluation criteria, but it would suffice to ilustrate the idea, feel free to explore using other criteria*" 1082 | ] 1083 | }, 1084 | { 1085 | "cell_type": "markdown", 1086 | "metadata": {}, 1087 | "source": [ 1088 | "Let's begin! First, we define our modules" 1089 | ] 1090 | }, 1091 | { 1092 | "cell_type": "code", 1093 | "execution_count": 11, 1094 | "metadata": {}, 1095 | "outputs": [], 1096 | "source": [ 1097 | "# Vanilla\n", 1098 | "class Vanilla(dspy.Module):\n", 1099 | " def __init__(self, num_passages=3):\n", 1100 | " super().__init__()\n", 1101 | " self.retrieve = dspy.Retrieve(k=num_passages)\n", 1102 | " self.generate_answer = dspy.Predict(\"context, question -> answer\")\n", 1103 | " \n", 1104 | " def forward(self, question):\n", 1105 | " context = self.retrieve(question).passages\n", 1106 | " answer = self.generate_answer(context=context, question=question)\n", 1107 | " return answer\n", 1108 | " \n", 1109 | "vanilla = Vanilla()\n", 1110 | "\n", 1111 | "# COT\n", 1112 | "class COT(dspy.Module):\n", 1113 | " def __init__(self, num_passages=3):\n", 1114 | " super().__init__()\n", 1115 | " self.retrieve = dspy.Retrieve(k=num_passages)\n", 1116 | " self.generate_answer = dspy.ChainOfThought(\"context, question -> answer\")\n", 1117 | " \n", 1118 | " def forward(self, question):\n", 1119 | " context = self.retrieve(question).passages\n", 1120 | " answer = self.generate_answer(context=context, question=question)\n", 1121 | " return answer\n", 1122 | " \n", 1123 | "cot = COT()\n", 1124 | "\n", 1125 | "# ReAct\n", 1126 | "react = dspy.ReAct(\"question-> answer\", tools=[dspy.Retrieve(k=3)], max_iters=5)\n", 1127 | "\n", 1128 | "# BasicMultiHop\n", 1129 | "class BasicMultiHop(dspy.Module):\n", 1130 | " def __init__(self, passages_per_hop=3):\n", 1131 | " self.retrieve = dspy.Retrieve(k=passages_per_hop)\n", 1132 | " self.generate_query = dspy.ChainOfThought(\"context, question-> search_query\")\n", 1133 | " self.generate_answer = dspy.ChainOfThought(\"context, question-> answer\")\n", 1134 | "\n", 1135 | " def forward(self, question):\n", 1136 | " context = []\n", 1137 | "\n", 1138 | " for hop in range(2):\n", 1139 | " query = self.generate_query(context=context, question=question).search_query\n", 1140 | " context += self.retrieve(query).passages\n", 1141 | "\n", 1142 | " return self.generate_answer(context=context, question=question)\n", 1143 | " \n", 1144 | "multihop = BasicMultiHop(passages_per_hop=3)" 1145 | ] 1146 | }, 1147 | { 1148 | "cell_type": "markdown", 1149 | "metadata": {}, 1150 | "source": [ 1151 | "Then define permutations for our model candidates" 1152 | ] 1153 | }, 1154 | { 1155 | "cell_type": "code", 1156 | "execution_count": 12, 1157 | "metadata": {}, 1158 | "outputs": [], 1159 | "source": [ 1160 | "from dspy.teleprompt import LabeledFewShot, BootstrapFewShot\n", 1161 | "\n", 1162 | "metric = dspy.evaluate.metrics.answer_exact_match\n", 1163 | "\n", 1164 | "modules = {\n", 1165 | " 'vanilla': vanilla,\n", 1166 | " 'cot': cot,\n", 1167 | " 'react': react,\n", 1168 | " 'multihop': multihop,\n", 1169 | "}\n", 1170 | "\n", 1171 | "optimizers = {\n", 1172 | " 'none': None,\n", 1173 | " 'labeled_few_shot': LabeledFewShot(),\n", 1174 | " 'bootstrap_few_shot': BootstrapFewShot(metric=metric, max_errors=20),\n", 1175 | "}" 1176 | ] 1177 | }, 1178 | { 1179 | "cell_type": "markdown", 1180 | "metadata": {}, 1181 | "source": [ 1182 | "Here we define a helper class to facilitate the evaluation" 1183 | ] 1184 | }, 1185 | { 1186 | "cell_type": "code", 1187 | "execution_count": 29, 1188 | "metadata": {}, 1189 | "outputs": [], 1190 | "source": [ 1191 | "from dspy.evaluate.evaluate import Evaluate\n", 1192 | "import pandas as pd\n", 1193 | "\n", 1194 | "class ModelSelection():\n", 1195 | "\n", 1196 | " # Compile our models\n", 1197 | " def __init__(self, modules, optimizers, metric, trainset):\n", 1198 | " self.models = []\n", 1199 | " self.metric = metric\n", 1200 | " \n", 1201 | " for module_name, module in modules.items():\n", 1202 | " print(f'Compiling models for {module_name}...')\n", 1203 | " models_for_a_program = {'module_name': module_name, 'optimizers': []}\n", 1204 | "\n", 1205 | " for optimizer_name, optimizer in optimizers.items():\n", 1206 | " print(f'...{optimizer_name}')\n", 1207 | " if optimizer is None:\n", 1208 | " compiled_model = module\n", 1209 | " else:\n", 1210 | " compiled_model = optimizer.compile(student=module, trainset=trainset)\n", 1211 | "\n", 1212 | " optimizer = {\n", 1213 | " 'name': optimizer_name,\n", 1214 | " 'compiled_model': compiled_model\n", 1215 | " }\n", 1216 | "\n", 1217 | " models_for_a_program['optimizers'].append(optimizer)\n", 1218 | "\n", 1219 | " self.models.append(models_for_a_program)\n", 1220 | "\n", 1221 | " # Evaluate our models against the testset. After evaluation, we will have a matrix of models and their scores under the evaluation_matrix attribute\n", 1222 | " def evaluate(self, testset):\n", 1223 | " evaluator = Evaluate(devset=testset, metric=self.metric, num_threads=3, return_outputs=True)\n", 1224 | " for module in self.models:\n", 1225 | " print(f\"\"\"Evaluating models for {module['module_name']}...\"\"\")\n", 1226 | " for optimizer in module['optimizers']:\n", 1227 | " compiled_model = optimizer['compiled_model']\n", 1228 | " evaluation_score, outputs = evaluator(compiled_model)\n", 1229 | " optimizer['score'] = evaluation_score\n", 1230 | "\n", 1231 | " # read dict into a dataframe\n", 1232 | " df = pd.DataFrame(self.models)\n", 1233 | "\n", 1234 | " # unnest optimizers column\n", 1235 | " df = df.explode('optimizers')\n", 1236 | "\n", 1237 | " # extract name/score column from optimizers\n", 1238 | " df['optimizer'] = df['optimizers'].apply(lambda x: x['name'])\n", 1239 | " df['score'] = df['optimizers'].apply(lambda x: x['score'])\n", 1240 | "\n", 1241 | " df.drop(columns=['optimizers'], inplace=True)\n", 1242 | " self.evaluation_matrix = df\n", 1243 | "\n", 1244 | " # Raise a question against the compiled model\n", 1245 | " def question_for_model(self, module_name, optimizer_name, question):\n", 1246 | " for model in self.models:\n", 1247 | " if model['module_name'] == module_name:\n", 1248 | " for s in model['optimizers']:\n", 1249 | " if s['name'] == optimizer_name:\n", 1250 | " return s['compiled_model'](question=question)" 1251 | ] 1252 | }, 1253 | { 1254 | "cell_type": "markdown", 1255 | "metadata": {}, 1256 | "source": [ 1257 | "We are now ready to start the evaluation, it would take around 20 minutes to complete" 1258 | ] 1259 | }, 1260 | { 1261 | "cell_type": "code", 1262 | "execution_count": 30, 1263 | "metadata": {}, 1264 | "outputs": [ 1265 | { 1266 | "name": "stdout", 1267 | "output_type": "stream", 1268 | "text": [ 1269 | "Compiling models for vanilla...\n", 1270 | "...none\n", 1271 | "...labeled_few_shot\n", 1272 | "...bootstrap_few_shot\n" 1273 | ] 1274 | }, 1275 | { 1276 | "name": "stderr", 1277 | "output_type": "stream", 1278 | "text": [ 1279 | "100%|██████████| 20/20 [01:26<00:00, 4.32s/it]\n" 1280 | ] 1281 | }, 1282 | { 1283 | "name": "stdout", 1284 | "output_type": "stream", 1285 | "text": [ 1286 | "Bootstrapped 0 full traces after 20 examples in round 0.\n", 1287 | "Compiling models for cot...\n", 1288 | "...none\n", 1289 | "...labeled_few_shot\n", 1290 | "...bootstrap_few_shot\n" 1291 | ] 1292 | }, 1293 | { 1294 | "name": "stderr", 1295 | "output_type": "stream", 1296 | "text": [ 1297 | " 65%|██████▌ | 13/20 [00:59<00:31, 4.55s/it]\n" 1298 | ] 1299 | }, 1300 | { 1301 | "name": "stdout", 1302 | "output_type": "stream", 1303 | "text": [ 1304 | "Bootstrapped 4 full traces after 14 examples in round 0.\n", 1305 | "Compiling models for react...\n", 1306 | "...none\n", 1307 | "...labeled_few_shot\n", 1308 | "...bootstrap_few_shot\n" 1309 | ] 1310 | }, 1311 | { 1312 | "name": "stderr", 1313 | "output_type": "stream", 1314 | "text": [ 1315 | " 40%|████ | 8/20 [01:38<02:24, 12.07s/it]" 1316 | ] 1317 | }, 1318 | { 1319 | "name": "stdout", 1320 | "output_type": "stream", 1321 | "text": [ 1322 | "Failed to run or to evaluate example Example({'question': 'Which American actress who made their film debut in the 1995 teen drama \"Kids\" was the co-founder of Voto Latino?', 'answer': 'Rosario Dawson'}) (input_keys={'question'}) with due to 'NoneType' object is not iterable.\n" 1323 | ] 1324 | }, 1325 | { 1326 | "name": "stderr", 1327 | "output_type": "stream", 1328 | "text": [ 1329 | "100%|██████████| 20/20 [04:17<00:00, 12.86s/it]\n" 1330 | ] 1331 | }, 1332 | { 1333 | "name": "stdout", 1334 | "output_type": "stream", 1335 | "text": [ 1336 | "Bootstrapped 3 full traces after 20 examples in round 0.\n", 1337 | "Compiling models for multihop...\n", 1338 | "...none\n", 1339 | "...labeled_few_shot\n", 1340 | "...bootstrap_few_shot\n" 1341 | ] 1342 | }, 1343 | { 1344 | "name": "stderr", 1345 | "output_type": "stream", 1346 | "text": [ 1347 | " 40%|████ | 8/20 [01:57<02:56, 14.74s/it]\n" 1348 | ] 1349 | }, 1350 | { 1351 | "name": "stdout", 1352 | "output_type": "stream", 1353 | "text": [ 1354 | "Bootstrapped 4 full traces after 9 examples in round 0.\n", 1355 | "Evaluating models for vanilla...\n", 1356 | "Average Metric: 0 / 20 (0.0%)\n", 1357 | "Average Metric: 0 / 20 (0.0%)\n", 1358 | "Average Metric: 0 / 20 (0.0%)\n", 1359 | "Evaluating models for cot...\n", 1360 | "Average Metric: 0 / 20 (0.0%)\n", 1361 | "Average Metric: 7 / 20 (35.0%)\n", 1362 | "Average Metric: 10 / 20 (50.0%)\n", 1363 | "Evaluating models for react...\n", 1364 | "Average Metric: 5 / 20 (25.0%)\n", 1365 | "Average Metric: 5 / 20 (25.0%)\n", 1366 | "Average Metric: 7 / 20 (35.0%)\n", 1367 | "Evaluating models for multihop...\n", 1368 | "Error for example in dev set: \t\t 'NoneType' object is not iterable\n", 1369 | "Average Metric: 0.0 / 20 (0.0%)\n", 1370 | "Error for example in dev set: \t\t 'NoneType' object is not iterable\n", 1371 | "Average Metric: 6.0 / 20 (30.0%)\n", 1372 | "Average Metric: 7 / 20 (35.0%)\n" 1373 | ] 1374 | } 1375 | ], 1376 | "source": [ 1377 | "# Compile the models\n", 1378 | "ms = ModelSelection(modules=modules, optimizers=optimizers, metric=metric, trainset=trainset)\n", 1379 | "\n", 1380 | "# Evaluate them\n", 1381 | "ms.evaluate(testset=testset)" 1382 | ] 1383 | }, 1384 | { 1385 | "cell_type": "markdown", 1386 | "metadata": {}, 1387 | "source": [ 1388 | "Here's the evaluation result. We can see the `COT` module with `BootstrapFewShot` optimizer yields the best performance" 1389 | ] 1390 | }, 1391 | { 1392 | "cell_type": "code", 1393 | "execution_count": 31, 1394 | "metadata": {}, 1395 | "outputs": [ 1396 | { 1397 | "data": { 1398 | "text/html": [ 1399 | "
\n", 1400 | "\n", 1413 | "\n", 1414 | " \n", 1415 | " \n", 1416 | " \n", 1417 | " \n", 1418 | " \n", 1419 | " \n", 1420 | " \n", 1421 | " \n", 1422 | " \n", 1423 | " \n", 1424 | " \n", 1425 | " \n", 1426 | " \n", 1427 | " \n", 1428 | " \n", 1429 | " \n", 1430 | " \n", 1431 | " \n", 1432 | " \n", 1433 | " \n", 1434 | " \n", 1435 | " \n", 1436 | " \n", 1437 | " \n", 1438 | " \n", 1439 | " \n", 1440 | " \n", 1441 | " \n", 1442 | " \n", 1443 | " \n", 1444 | " \n", 1445 | " \n", 1446 | " \n", 1447 | " \n", 1448 | " \n", 1449 | " \n", 1450 | " \n", 1451 | " \n", 1452 | " \n", 1453 | " \n", 1454 | " \n", 1455 | " \n", 1456 | " \n", 1457 | " \n", 1458 | " \n", 1459 | " \n", 1460 | " \n", 1461 | " \n", 1462 | " \n", 1463 | " \n", 1464 | " \n", 1465 | " \n", 1466 | " \n", 1467 | " \n", 1468 | " \n", 1469 | " \n", 1470 | " \n", 1471 | " \n", 1472 | " \n", 1473 | " \n", 1474 | " \n", 1475 | " \n", 1476 | " \n", 1477 | " \n", 1478 | " \n", 1479 | " \n", 1480 | " \n", 1481 | " \n", 1482 | " \n", 1483 | " \n", 1484 | " \n", 1485 | " \n", 1486 | " \n", 1487 | " \n", 1488 | " \n", 1489 | " \n", 1490 | " \n", 1491 | " \n", 1492 | " \n", 1493 | " \n", 1494 | " \n", 1495 | " \n", 1496 | "
module_nameoptimizerscore
0vanillanone0.0
0vanillalabeled_few_shot0.0
0vanillabootstrap_few_shot0.0
1cotnone0.0
1cotlabeled_few_shot35.0
1cotbootstrap_few_shot50.0
2reactnone25.0
2reactlabeled_few_shot25.0
2reactbootstrap_few_shot35.0
3multihopnone0.0
3multihoplabeled_few_shot30.0
3multihopbootstrap_few_shot35.0
\n", 1497 | "
" 1498 | ], 1499 | "text/plain": [ 1500 | " module_name optimizer score\n", 1501 | "0 vanilla none 0.0\n", 1502 | "0 vanilla labeled_few_shot 0.0\n", 1503 | "0 vanilla bootstrap_few_shot 0.0\n", 1504 | "1 cot none 0.0\n", 1505 | "1 cot labeled_few_shot 35.0\n", 1506 | "1 cot bootstrap_few_shot 50.0\n", 1507 | "2 react none 25.0\n", 1508 | "2 react labeled_few_shot 25.0\n", 1509 | "2 react bootstrap_few_shot 35.0\n", 1510 | "3 multihop none 0.0\n", 1511 | "3 multihop labeled_few_shot 30.0\n", 1512 | "3 multihop bootstrap_few_shot 35.0" 1513 | ] 1514 | }, 1515 | "execution_count": 31, 1516 | "metadata": {}, 1517 | "output_type": "execute_result" 1518 | } 1519 | ], 1520 | "source": [ 1521 | "ms.evaluation_matrix" 1522 | ] 1523 | }, 1524 | { 1525 | "cell_type": "markdown", 1526 | "metadata": {}, 1527 | "source": [ 1528 | "But before we conclude the exercise, it might be useful to inspect the result more deeply: `Multihop with BootstrapFewShot`, which supposedly equips with more relevant context than `COT with Bootstrap`, has a worse performance. It is strange!" 1529 | ] 1530 | }, 1531 | { 1532 | "cell_type": "markdown", 1533 | "metadata": {}, 1534 | "source": [ 1535 | "## Debug and fine-tune our prompt\n", 1536 | "\n", 1537 | "Now heads to the Phoenix Console to see what's going on. We pick a random question `William Hughes Miller was born in a city with how many inhabitants ?`, and inspect how did COT, ReAct, BasicMultiHop with BoostrapFewShot optimizer came up with their answer. You can type this in the search bar for filter: `\"\"\"William Hughes Miller was born in a city with how many inhabitants ?\"\"\" in input.value`\n", 1538 | "\n", 1539 | "The below table shows the answer provided by the 3 models:\n", 1540 | "|Model|Predicted answer|\n", 1541 | "|---|---|\n", 1542 | "|Multihop with BootstrapFewShot|The answer will vary based on the specific city of William Hughes Miller's birthplace.|\n", 1543 | "|ReAct with BootstrapFewShot|Kosciusko, Mississippi|\n", 1544 | "|COT with BootstrapFewShot|The city of Kosciusko, Mississippi, has a population of approximately 7,402 inhabitants.|\n", 1545 | "\n", 1546 | "The correct answer is `7,402 at the 2010 census`. Both `ReAct with BootstrapFewShot` and `COT with BootstrapFewShot` provided relevant answers, but `Multihop with BootstrapFewShot` simply failed to provide one. Checking the execution trace in Phoenix, it looks like the LM fails to understand what is expected for the `search_query` specified in the signature\n", 1547 | "\n", 1548 | "Execute the below cells to revise the signatures, and re-run our model comparison again." 1549 | ] 1550 | }, 1551 | { 1552 | "cell_type": "code", 1553 | "execution_count": 73, 1554 | "metadata": {}, 1555 | "outputs": [], 1556 | "source": [ 1557 | "# Define class-based signatures\n", 1558 | "class GenerateAnswer(dspy.Signature):\n", 1559 | " \"\"\"Answer questions with short factoid answers.\"\"\"\n", 1560 | "\n", 1561 | " context = dspy.InputField(desc=\"may contain relevant facts\")\n", 1562 | " question = dspy.InputField()\n", 1563 | " answer = dspy.OutputField(desc=\"often between 1 and 5 words\")\n", 1564 | "\n", 1565 | "class BasicQA(dspy.Signature):\n", 1566 | " \"\"\"Answer questions with short factoid answers.\"\"\"\n", 1567 | " \n", 1568 | " question = dspy.InputField()\n", 1569 | " answer = dspy.OutputField(desc=\"often between 1 and 5 words\")\n", 1570 | "\n", 1571 | "class FollowupQuery(dspy.Signature):\n", 1572 | " \"\"\"Generate a query which is conducive to answering the question\"\"\"\n", 1573 | "\n", 1574 | " context = dspy.InputField(desc=\"may contain relevant facts\")\n", 1575 | " question = dspy.InputField()\n", 1576 | " search_query = dspy.OutputField(desc=\"Judge if the context is adequate to answer the question, if not adequate or if it is blank, generate a search query that would help you answer the question.\")" 1577 | ] 1578 | }, 1579 | { 1580 | "cell_type": "code", 1581 | "execution_count": 79, 1582 | "metadata": {}, 1583 | "outputs": [ 1584 | { 1585 | "name": "stdout", 1586 | "output_type": "stream", 1587 | "text": [ 1588 | "Compiling models for vanilla...\n", 1589 | "...none\n", 1590 | "...labeled_few_shot\n", 1591 | "...bootstrap_few_shot\n" 1592 | ] 1593 | }, 1594 | { 1595 | "name": "stderr", 1596 | "output_type": "stream", 1597 | "text": [ 1598 | " 0%| | 0/20 [00:00 due to 'NoneType' object is not iterable.\n" 1649 | ] 1650 | }, 1651 | { 1652 | "name": "stderr", 1653 | "output_type": "stream", 1654 | "text": [ 1655 | "100%|██████████| 20/20 [00:00<00:00, 22.71it/s]\n" 1656 | ] 1657 | }, 1658 | { 1659 | "name": "stdout", 1660 | "output_type": "stream", 1661 | "text": [ 1662 | "Bootstrapped 3 full traces after 20 examples in round 0.\n", 1663 | "Compiling models for multihop...\n", 1664 | "...none\n", 1665 | "...labeled_few_shot\n", 1666 | "...bootstrap_few_shot\n" 1667 | ] 1668 | }, 1669 | { 1670 | "name": "stderr", 1671 | "output_type": "stream", 1672 | "text": [ 1673 | " 25%|██▌ | 5/20 [00:00<00:00, 33.57it/s]\n" 1674 | ] 1675 | }, 1676 | { 1677 | "name": "stdout", 1678 | "output_type": "stream", 1679 | "text": [ 1680 | "Bootstrapped 4 full traces after 6 examples in round 0.\n", 1681 | "Evaluating models for vanilla...\n", 1682 | "Average Metric: 5 / 20 (25.0%)\n", 1683 | "Average Metric: 8 / 20 (40.0%)\n", 1684 | "Average Metric: 8 / 20 (40.0%)\n", 1685 | "Evaluating models for cot...\n", 1686 | "Average Metric: 10 / 20 (50.0%)\n", 1687 | "Average Metric: 9 / 20 (45.0%)\n", 1688 | "Average Metric: 10 / 20 (50.0%)\n", 1689 | "Evaluating models for react...\n", 1690 | "Average Metric: 5 / 20 (25.0%)\n", 1691 | "Average Metric: 5 / 20 (25.0%)\n", 1692 | "Average Metric: 7 / 20 (35.0%)\n", 1693 | "Evaluating models for multihop...\n", 1694 | "Average Metric: 13 / 20 (65.0%)\n", 1695 | "Average Metric: 13 / 20 (65.0%)\n", 1696 | "Average Metric: 11 / 20 (55.0%)\n" 1697 | ] 1698 | }, 1699 | { 1700 | "data": { 1701 | "text/html": [ 1702 | "
\n", 1703 | "\n", 1716 | "\n", 1717 | " \n", 1718 | " \n", 1719 | " \n", 1720 | " \n", 1721 | " \n", 1722 | " \n", 1723 | " \n", 1724 | " \n", 1725 | " \n", 1726 | " \n", 1727 | " \n", 1728 | " \n", 1729 | " \n", 1730 | " \n", 1731 | " \n", 1732 | " \n", 1733 | " \n", 1734 | " \n", 1735 | " \n", 1736 | " \n", 1737 | " \n", 1738 | " \n", 1739 | " \n", 1740 | " \n", 1741 | " \n", 1742 | " \n", 1743 | " \n", 1744 | " \n", 1745 | " \n", 1746 | " \n", 1747 | " \n", 1748 | " \n", 1749 | " \n", 1750 | " \n", 1751 | " \n", 1752 | " \n", 1753 | " \n", 1754 | " \n", 1755 | " \n", 1756 | " \n", 1757 | " \n", 1758 | " \n", 1759 | " \n", 1760 | " \n", 1761 | " \n", 1762 | " \n", 1763 | " \n", 1764 | " \n", 1765 | " \n", 1766 | " \n", 1767 | " \n", 1768 | " \n", 1769 | " \n", 1770 | " \n", 1771 | " \n", 1772 | " \n", 1773 | " \n", 1774 | " \n", 1775 | " \n", 1776 | " \n", 1777 | " \n", 1778 | " \n", 1779 | " \n", 1780 | " \n", 1781 | " \n", 1782 | " \n", 1783 | " \n", 1784 | " \n", 1785 | " \n", 1786 | " \n", 1787 | " \n", 1788 | " \n", 1789 | " \n", 1790 | " \n", 1791 | " \n", 1792 | " \n", 1793 | " \n", 1794 | " \n", 1795 | " \n", 1796 | " \n", 1797 | " \n", 1798 | " \n", 1799 | "
module_nameoptimizerscore
0vanillanone25.0
0vanillalabeled_few_shot40.0
0vanillabootstrap_few_shot40.0
1cotnone50.0
1cotlabeled_few_shot45.0
1cotbootstrap_few_shot50.0
2reactnone25.0
2reactlabeled_few_shot25.0
2reactbootstrap_few_shot35.0
3multihopnone65.0
3multihoplabeled_few_shot65.0
3multihopbootstrap_few_shot55.0
\n", 1800 | "
" 1801 | ], 1802 | "text/plain": [ 1803 | " module_name optimizer score\n", 1804 | "0 vanilla none 25.0\n", 1805 | "0 vanilla labeled_few_shot 40.0\n", 1806 | "0 vanilla bootstrap_few_shot 40.0\n", 1807 | "1 cot none 50.0\n", 1808 | "1 cot labeled_few_shot 45.0\n", 1809 | "1 cot bootstrap_few_shot 50.0\n", 1810 | "2 react none 25.0\n", 1811 | "2 react labeled_few_shot 25.0\n", 1812 | "2 react bootstrap_few_shot 35.0\n", 1813 | "3 multihop none 65.0\n", 1814 | "3 multihop labeled_few_shot 65.0\n", 1815 | "3 multihop bootstrap_few_shot 55.0" 1816 | ] 1817 | }, 1818 | "execution_count": 79, 1819 | "metadata": {}, 1820 | "output_type": "execute_result" 1821 | } 1822 | ], 1823 | "source": [ 1824 | "# Revise the modules with the class-based signatures.\n", 1825 | "## Vanilla\n", 1826 | "class VanillaRevised(dspy.Module):\n", 1827 | " def __init__(self, num_passages=3):\n", 1828 | " super().__init__()\n", 1829 | " self.retrieve = dspy.Retrieve(k=num_passages)\n", 1830 | " self.generate_answer = dspy.Predict(GenerateAnswer)\n", 1831 | " \n", 1832 | " def forward(self, question):\n", 1833 | " context = self.retrieve(question).passages\n", 1834 | " answer = self.generate_answer(context=context, question=question)\n", 1835 | " return answer\n", 1836 | " \n", 1837 | "vanilla_revised = VanillaRevised()\n", 1838 | "\n", 1839 | "## COT\n", 1840 | "class COTRevised(dspy.Module):\n", 1841 | " def __init__(self, num_passages=3):\n", 1842 | " super().__init__()\n", 1843 | " self.retrieve = dspy.Retrieve(k=num_passages)\n", 1844 | " self.generate_answer = dspy.ChainOfThought(GenerateAnswer)\n", 1845 | " \n", 1846 | " def forward(self, question):\n", 1847 | " context = self.retrieve(question).passages\n", 1848 | " answer = self.generate_answer(context=context, question=question)\n", 1849 | " return answer\n", 1850 | " \n", 1851 | "cot_revised = COTRevised()\n", 1852 | "\n", 1853 | "## ReAct\n", 1854 | "react_revised = dspy.ReAct(BasicQA, tools=[dspy.Retrieve(k=3)], max_iters=5)\n", 1855 | "\n", 1856 | "## BasicMultiHop\n", 1857 | "class BasicMultiHopRevised(dspy.Module):\n", 1858 | " def __init__(self, passages_per_hop=3):\n", 1859 | " self.retrieve = dspy.Retrieve(k=passages_per_hop)\n", 1860 | " self.generate_query = dspy.ChainOfThought(FollowupQuery)\n", 1861 | " self.generate_answer = dspy.ChainOfThought(GenerateAnswer)\n", 1862 | "\n", 1863 | " def forward(self, question):\n", 1864 | " context = []\n", 1865 | "\n", 1866 | " for hop in range(2):\n", 1867 | " query = self.generate_query(context=context, question=question).search_query\n", 1868 | " context += self.retrieve(query).passages\n", 1869 | "\n", 1870 | " return self.generate_answer(context=context, question=question)\n", 1871 | " \n", 1872 | "multihop_revised = BasicMultiHopRevised(passages_per_hop=3)\n", 1873 | " \n", 1874 | "modules_revised = {\n", 1875 | " 'vanilla': vanilla_revised,\n", 1876 | " 'cot': cot_revised,\n", 1877 | " 'react': react_revised,\n", 1878 | " 'multihop': multihop_revised,\n", 1879 | "}\n", 1880 | "\n", 1881 | "# Re-compile and evaluate\n", 1882 | "ms_revised = ModelSelection(modules=modules_revised, optimizers=optimizers, metric=metric, trainset=trainset)\n", 1883 | "ms_revised.evaluate(testset=testset)\n", 1884 | "ms_revised.evaluation_matrix" 1885 | ] 1886 | }, 1887 | { 1888 | "cell_type": "markdown", 1889 | "metadata": {}, 1890 | "source": [ 1891 | "We now see the score improved across all models, and Multihop with LabeledFewShot now has the best performance! This indicates despite DSPy tries to optimize the prompt, **there is still some prompt engineering involved by specifying your objective in signature**.\n", 1892 | "\n", 1893 | "The best model now produce an exact match for our question!" 1894 | ] 1895 | }, 1896 | { 1897 | "cell_type": "code", 1898 | "execution_count": 80, 1899 | "metadata": {}, 1900 | "outputs": [ 1901 | { 1902 | "data": { 1903 | "text/plain": [ 1904 | "Prediction(\n", 1905 | " rationale='Answer: 7,402',\n", 1906 | " answer='7,402'\n", 1907 | ")" 1908 | ] 1909 | }, 1910 | "execution_count": 80, 1911 | "metadata": {}, 1912 | "output_type": "execute_result" 1913 | } 1914 | ], 1915 | "source": [ 1916 | "# The correct answer is 7,402\n", 1917 | "question = \"\"\"`William Hughes Miller was born in a city with how many inhabitants ?\"\"\"\n", 1918 | "ms_revised.question_for_model('multihop','labeled_few_shot',question)" 1919 | ] 1920 | }, 1921 | { 1922 | "cell_type": "markdown", 1923 | "metadata": {}, 1924 | "source": [ 1925 | "As expected, the best prompt contains only few-shot examples, but not the bootstrapped Context-Question-Reasoning-Answer demonstrations." 1926 | ] 1927 | }, 1928 | { 1929 | "cell_type": "code", 1930 | "execution_count": 81, 1931 | "metadata": {}, 1932 | "outputs": [ 1933 | { 1934 | "name": "stdout", 1935 | "output_type": "stream", 1936 | "text": [ 1937 | "\n", 1938 | "\n", 1939 | "\n", 1940 | "\n", 1941 | "Answer questions with short factoid answers.\n", 1942 | "\n", 1943 | "---\n", 1944 | "\n", 1945 | "Question: This American guitarist best known for her work with the Iron Maidens is an ancestor of a composer who was known as what?\n", 1946 | "Answer: The Waltz King\n", 1947 | "\n", 1948 | "Question: Tombstone stared an actor born May 17, 1955 known as who?\n", 1949 | "Answer: Bill Paxton\n", 1950 | "\n", 1951 | "Question: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko?\n", 1952 | "Answer: Aleksandr Danilovich Aleksandrov\n", 1953 | "\n", 1954 | "Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?\n", 1955 | "Answer: Tae Kwon Do Times\n", 1956 | "\n", 1957 | "Question: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division?\n", 1958 | "Answer: Operation Citadel\n", 1959 | "\n", 1960 | "Question: which American actor was Candace Kita guest starred with\n", 1961 | "Answer: Bill Murray\n", 1962 | "\n", 1963 | "Question: Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs?\n", 1964 | "Answer: Buena Vista Distribution\n", 1965 | "\n", 1966 | "Question: Which Pakistani cricket umpire who won 3 consecutive ICC umpire of the year awards in 2009, 2010, and 2011 will be in the ICC World Twenty20?\n", 1967 | "Answer: Aleem Sarwar Dar\n", 1968 | "\n", 1969 | "Question: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of \"Hamlet.\" ?\n", 1970 | "Answer: Kerry Condon\n", 1971 | "\n", 1972 | "Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield\n", 1973 | "Answer: 1874\n", 1974 | "\n", 1975 | "Question: Which American actress who made their film debut in the 1995 teen drama \"Kids\" was the co-founder of Voto Latino?\n", 1976 | "Answer: Rosario Dawson\n", 1977 | "\n", 1978 | "Question: On the coast of what ocean is the birthplace of Diogal Sakho?\n", 1979 | "Answer: Atlantic\n", 1980 | "\n", 1981 | "Question: Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where?\n", 1982 | "Answer: space\n", 1983 | "\n", 1984 | "Question: Which of these publications was most recently published, Who Put the Bomp or Self?\n", 1985 | "Answer: Self\n", 1986 | "\n", 1987 | "Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?\n", 1988 | "Answer: 1950\n", 1989 | "\n", 1990 | "Question: Which is taller, the Empire State Building or the Bank of America Tower?\n", 1991 | "Answer: The Empire State Building\n", 1992 | "\n", 1993 | "---\n", 1994 | "\n", 1995 | "Follow the following format.\n", 1996 | "\n", 1997 | "Context: may contain relevant facts\n", 1998 | "\n", 1999 | "Question: ${question}\n", 2000 | "\n", 2001 | "Reasoning: Let's think step by step in order to ${produce the answer}. We ...\n", 2002 | "\n", 2003 | "Answer: often between 1 and 5 words\n", 2004 | "\n", 2005 | "---\n", 2006 | "\n", 2007 | "Context:\n", 2008 | "[1] «William Hughes Miller | William Hughes Miller (born March 16, 1941, Kosciusko, Mississippi) is a professor at the University of California, Berkeley and a leading researcher in the field of theoretical chemistry.»\n", 2009 | "[2] «William Herbert Miller, Jr. | William Hubert Miller, Jr. (September 1932 – November 4, 1988), of New York City, was an aerophilatelist who published philatelic literature on the subject.»\n", 2010 | "[3] «William Green Miller | William Green Miller (born August 15, 1931 in New York City, New York), served as the United States Ambassador to Ukraine under Bill Clinton, from 1993 to 1998.»\n", 2011 | "[4] «Kosciusko, Mississippi | Kosciusko is a city in Attala County, Mississippi, United States. The population was 7,402 at the 2010 census. It is the county seat of Attala County.»\n", 2012 | "[5] «Attala County, Mississippi | Attala County is a county located in the U.S. state of Mississippi. As of the 2010 census, the population was 19,564. Its county seat is Kosciusko. Attala County is named for Atala, a fictional Native American heroine from an early-19th-century novel of the same name by François-René de Chateaubriand.»\n", 2013 | "[6] «Kosciusko Island | Kosciusko Island is an island in the Alexander Archipelago of southeastern Alaska, United States. It lies near the northwest corner of Prince of Wales Island, just across the El Capitan Passage from the larger island. The island is near Mount Francis, Holbrook Mountain, and Tokeen Peak. Kosciusko Island has a land area of 171.585 sq mi (444.403 km²), making it the 38th largest island in the United States. It had a population of 52 persons as of the 2000 census, mostly in Edna Bay, its largest community.»\n", 2014 | "\n", 2015 | "Question: `William Hughes Miller was born in a city with how many inhabitants ?\n", 2016 | "\n", 2017 | "Reasoning: Let's think step by step in order to Answer: 7,402\n", 2018 | "\n", 2019 | "Answer:\u001b[32m 7,402\u001b[0m\n", 2020 | "\n", 2021 | "\n", 2022 | "\n" 2023 | ] 2024 | } 2025 | ], 2026 | "source": [ 2027 | "lm.inspect_history(n=1)" 2028 | ] 2029 | }, 2030 | { 2031 | "cell_type": "markdown", 2032 | "metadata": {}, 2033 | "source": [ 2034 | "It does not mean Multihop with BootstrapFewShot has a worse performance **in general** however, only that for our task, if we use GPT 3.5 Turbo to bootstrap demonstration (which might be of questionable quality) and output prediction, then we might better do without the bootstrapping altogether and keep only the few-shot examples.\n", 2035 | "\n", 2036 | "This lead to the question: Is it possible to use a more powerful LM, say GPT 4 Turbo (aka `teacher`) to generate demonstrations, while keeping GPT 3.5 Turbo (aka `student`) for prediction?\n", 2037 | "\n", 2038 | "## \"Teacher\" to power-up bootstrapping capability\n", 2039 | "\n", 2040 | "The answer is **YES** as the following cell demonstrates, we will use GPT 4 Turbo as teacher." 2041 | ] 2042 | }, 2043 | { 2044 | "cell_type": "code", 2045 | "execution_count": 82, 2046 | "metadata": {}, 2047 | "outputs": [ 2048 | { 2049 | "name": "stdout", 2050 | "output_type": "stream", 2051 | "text": [ 2052 | "Compiling models for vanilla...\n", 2053 | "...bootstrap_few_shot\n" 2054 | ] 2055 | }, 2056 | { 2057 | "name": "stderr", 2058 | "output_type": "stream", 2059 | "text": [ 2060 | " 40%|████ | 8/20 [00:35<00:53, 4.44s/it]\n" 2061 | ] 2062 | }, 2063 | { 2064 | "name": "stdout", 2065 | "output_type": "stream", 2066 | "text": [ 2067 | "Bootstrapped 4 full traces after 9 examples in round 0.\n", 2068 | "Compiling models for cot...\n", 2069 | "...bootstrap_few_shot\n" 2070 | ] 2071 | }, 2072 | { 2073 | "name": "stderr", 2074 | "output_type": "stream", 2075 | "text": [ 2076 | " 30%|███ | 6/20 [00:36<01:26, 6.15s/it]\n" 2077 | ] 2078 | }, 2079 | { 2080 | "name": "stdout", 2081 | "output_type": "stream", 2082 | "text": [ 2083 | "Bootstrapped 4 full traces after 7 examples in round 0.\n", 2084 | "Compiling models for react...\n", 2085 | "...bootstrap_few_shot\n" 2086 | ] 2087 | }, 2088 | { 2089 | "name": "stderr", 2090 | "output_type": "stream", 2091 | "text": [ 2092 | " 40%|████ | 8/20 [02:32<04:06, 20.54s/it]" 2093 | ] 2094 | }, 2095 | { 2096 | "name": "stdout", 2097 | "output_type": "stream", 2098 | "text": [ 2099 | "Failed to run or to evaluate example Example({'question': 'Which American actress who made their film debut in the 1995 teen drama \"Kids\" was the co-founder of Voto Latino?', 'answer': 'Rosario Dawson'}) (input_keys={'question'}) with due to 'NoneType' object is not iterable.\n" 2100 | ] 2101 | }, 2102 | { 2103 | "name": "stderr", 2104 | "output_type": "stream", 2105 | "text": [ 2106 | "100%|██████████| 20/20 [08:05<00:00, 24.30s/it]\n" 2107 | ] 2108 | }, 2109 | { 2110 | "name": "stdout", 2111 | "output_type": "stream", 2112 | "text": [ 2113 | "Bootstrapped 3 full traces after 20 examples in round 0.\n", 2114 | "Compiling models for multihop...\n", 2115 | "...bootstrap_few_shot\n" 2116 | ] 2117 | }, 2118 | { 2119 | "name": "stderr", 2120 | "output_type": "stream", 2121 | "text": [ 2122 | " 30%|███ | 6/20 [02:17<05:21, 22.99s/it]\n" 2123 | ] 2124 | }, 2125 | { 2126 | "name": "stdout", 2127 | "output_type": "stream", 2128 | "text": [ 2129 | "Bootstrapped 4 full traces after 7 examples in round 0.\n", 2130 | "Evaluating models for vanilla...\n", 2131 | "Average Metric: 9 / 20 (45.0%)\n", 2132 | "Evaluating models for cot...\n", 2133 | "Average Metric: 10 / 20 (50.0%)\n", 2134 | "Evaluating models for react...\n", 2135 | "Error for example in dev set: \t\t not enough values to unpack (expected 2, got 1)\n", 2136 | "Error for example in dev set: \t\t not enough values to unpack (expected 2, got 1)\n", 2137 | "Average Metric: 8.0 / 20 (40.0%)\n", 2138 | "Evaluating models for multihop...\n", 2139 | "Average Metric: 11 / 20 (55.0%)\n" 2140 | ] 2141 | }, 2142 | { 2143 | "data": { 2144 | "text/html": [ 2145 | "
\n", 2146 | "\n", 2159 | "\n", 2160 | " \n", 2161 | " \n", 2162 | " \n", 2163 | " \n", 2164 | " \n", 2165 | " \n", 2166 | " \n", 2167 | " \n", 2168 | " \n", 2169 | " \n", 2170 | " \n", 2171 | " \n", 2172 | " \n", 2173 | " \n", 2174 | " \n", 2175 | " \n", 2176 | " \n", 2177 | " \n", 2178 | " \n", 2179 | " \n", 2180 | " \n", 2181 | " \n", 2182 | " \n", 2183 | " \n", 2184 | " \n", 2185 | " \n", 2186 | " \n", 2187 | " \n", 2188 | " \n", 2189 | " \n", 2190 | " \n", 2191 | " \n", 2192 | " \n", 2193 | " \n", 2194 | "
module_nameoptimizerscore
0vanillabootstrap_few_shot45.0
1cotbootstrap_few_shot50.0
2reactbootstrap_few_shot40.0
3multihopbootstrap_few_shot55.0
\n", 2195 | "
" 2196 | ], 2197 | "text/plain": [ 2198 | " module_name optimizer score\n", 2199 | "0 vanilla bootstrap_few_shot 45.0\n", 2200 | "1 cot bootstrap_few_shot 50.0\n", 2201 | "2 react bootstrap_few_shot 40.0\n", 2202 | "3 multihop bootstrap_few_shot 55.0" 2203 | ] 2204 | }, 2205 | "execution_count": 82, 2206 | "metadata": {}, 2207 | "output_type": "execute_result" 2208 | } 2209 | ], 2210 | "source": [ 2211 | "# Define the GPT-4 Turbo model\n", 2212 | "gpt4_turbo = dspy.Databricks(api_key=OPENROUTER_API_KEY,\n", 2213 | "\t\tapi_base=\"https://openrouter.ai/api/v1\",\n", 2214 | "\t\tmodel=\"openai/gpt-4-turbo\")\n", 2215 | "\n", 2216 | "# Define new Optimizer which uses GPT-4 Turbo as a teacher\n", 2217 | "optimizers_gpt4_teacher = {\n", 2218 | " 'bootstrap_few_shot': BootstrapFewShot(metric=metric, max_errors=20, teacher_settings=dict(lm=gpt4_turbo)),\n", 2219 | "}\n", 2220 | "\n", 2221 | "# Compile the models and evaluate them as before\n", 2222 | "ms_gpt4_teacher = ModelSelection(modules=modules_revised, optimizers=optimizers_gpt4_teacher, metric=metric, trainset=trainset)\n", 2223 | "ms_gpt4_teacher.evaluate(testset=testset)\n", 2224 | "ms_gpt4_teacher.evaluation_matrix" 2225 | ] 2226 | }, 2227 | { 2228 | "cell_type": "markdown", 2229 | "metadata": {}, 2230 | "source": [ 2231 | "Using GPT-4 Turbo as `teacher` does not significantly boost our models' performance however. But it is still worthwhile to see its effect to our prompt. Below is the prompt generated just using GPT 3.5" 2232 | ] 2233 | }, 2234 | { 2235 | "cell_type": "code", 2236 | "execution_count": 84, 2237 | "metadata": {}, 2238 | "outputs": [ 2239 | { 2240 | "name": "stdout", 2241 | "output_type": "stream", 2242 | "text": [ 2243 | "\n", 2244 | "\n", 2245 | "\n", 2246 | "\n", 2247 | "Answer questions with short factoid answers.\n", 2248 | "\n", 2249 | "---\n", 2250 | "\n", 2251 | "Question: Which Pakistani cricket umpire who won 3 consecutive ICC umpire of the year awards in 2009, 2010, and 2011 will be in the ICC World Twenty20?\n", 2252 | "Answer: Aleem Sarwar Dar\n", 2253 | "\n", 2254 | "Question: Tombstone stared an actor born May 17, 1955 known as who?\n", 2255 | "Answer: Bill Paxton\n", 2256 | "\n", 2257 | "Question: Which American actress who made their film debut in the 1995 teen drama \"Kids\" was the co-founder of Voto Latino?\n", 2258 | "Answer: Rosario Dawson\n", 2259 | "\n", 2260 | "Question: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?\n", 2261 | "Answer: \"Outfield of Dreams\"\n", 2262 | "\n", 2263 | "Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield\n", 2264 | "Answer: 1874\n", 2265 | "\n", 2266 | "Question: The Organisation that allows a community to influence their operation or use and to enjoy the benefits arisingwas founded in what year?\n", 2267 | "Answer: 2010\n", 2268 | "\n", 2269 | "Question: At My Window was released by which American singer-songwriter?\n", 2270 | "Answer: John Townes Van Zandt\n", 2271 | "\n", 2272 | "Question: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division?\n", 2273 | "Answer: Operation Citadel\n", 2274 | "\n", 2275 | "Question: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of \"Hamlet.\" ?\n", 2276 | "Answer: Kerry Condon\n", 2277 | "\n", 2278 | "Question: Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs?\n", 2279 | "Answer: Buena Vista Distribution\n", 2280 | "\n", 2281 | "Question: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko?\n", 2282 | "Answer: Aleksandr Danilovich Aleksandrov\n", 2283 | "\n", 2284 | "Question: \"Everything Has Changed\" is a song from an album released under which record label ?\n", 2285 | "Answer: Big Machine Records\n", 2286 | "\n", 2287 | "---\n", 2288 | "\n", 2289 | "Follow the following format.\n", 2290 | "\n", 2291 | "Context: may contain relevant facts\n", 2292 | "\n", 2293 | "Question: ${question}\n", 2294 | "\n", 2295 | "Reasoning: Let's think step by step in order to ${produce the answer}. We ...\n", 2296 | "\n", 2297 | "Answer: often between 1 and 5 words\n", 2298 | "\n", 2299 | "---\n", 2300 | "\n", 2301 | "Context:\n", 2302 | "[1] «Candace Kita | Kita's first role was as a news anchor in the 1991 movie \"Stealth Hunters\". Kita's first recurring television role was in Fox's \"Masked Rider\", from 1995 to 1996. She appeared as a series regular lead in all 40 episodes. Kita also portrayed a frantic stewardess in a music video directed by Mark Pellington for the British group, Catherine Wheel, titled, \"Waydown\" in 1995. In 1996, Kita also appeared in the film \"Barb Wire\" (1996) and guest starred on \"The Wayans Bros.\". She also guest starred in \"Miriam Teitelbaum: Homicide\" with \"Saturday Night Live\" alumni Nora Dunn, \"Wall To Wall Records\" with Jordan Bridges, \"Even Stevens\", \"Felicity\" with Keri Russell, \"V.I.P.\" with Pamela Anderson, \"Girlfriends\", \"The Sweet Spot\" with Bill Murray, and \"Movies at Our House\". She also had recurring roles on the FX spoof, \"Son of the Beach\" from 2001 to 2002, ABC-Family's \"Dance Fever\" and Oxygen Network's \"Running with Scissors\". Kita also appeared in the films \"Little Heroes\" (2002) and \"Rennie's Landing\" (2001).»\n", 2303 | "[2] «Jilly Kitzinger | Jilly Kitzinger is a fictional character in the science fiction series \"Torchwood\", portrayed by American actress Lauren Ambrose. The character was promoted as one of five new main characters to join \"Torchwood\" in its fourth series, \"\" (2011), as part of a new co-production between \"Torchwood\"' s British network, BBC One, and its American financiers on US premium television network Starz. Ambrose appears in seven of the ten episodes, and is credited as a \"special guest star\" throughout. Whilst reaction to the serial was mixed, Ambrose' portrayal was often singled out by critics for particular praise and in 2012 she received a Saturn Award nomination for Best Supporting Actress on Television.»\n", 2304 | "[3] «Candace Brown | Candace June Brown (born June 15, 1980) is an American actress and comedian best known for her work on shows such as \"Grey's Anatomy\", \"Desperate Housewives\", \"Head Case\", The \"Wizards Of Waverly Place\". In 2011, she joined the guest cast for \"Torchwood\"' s fourth series' \"\", airing on BBC One in the United Kingdom and premium television network Starz.»\n", 2305 | "[4] «Candace Kita | Kita's first role was as a news anchor in the 1991 movie \"Stealth Hunters\". Kita's first recurring television role was in Fox's \"Masked Rider\", from 1995 to 1996. She appeared as a series regular lead in all 40 episodes. Kita also portrayed a frantic stewardess in a music video directed by Mark Pellington for the British group, Catherine Wheel, titled, \"Waydown\" in 1995. In 1996, Kita also appeared in the film \"Barb Wire\" (1996) and guest starred on \"The Wayans Bros.\". She also guest starred in \"Miriam Teitelbaum: Homicide\" with \"Saturday Night Live\" alumni Nora Dunn, \"Wall To Wall Records\" with Jordan Bridges, \"Even Stevens\", \"Felicity\" with Keri Russell, \"V.I.P.\" with Pamela Anderson, \"Girlfriends\", \"The Sweet Spot\" with Bill Murray, and \"Movies at Our House\". She also had recurring roles on the FX spoof, \"Son of the Beach\" from 2001 to 2002, ABC-Family's \"Dance Fever\" and Oxygen Network's \"Running with Scissors\". Kita also appeared in the films \"Little Heroes\" (2002) and \"Rennie's Landing\" (2001).»\n", 2306 | "[5] «Kiti Manver | María Isabel Ana Mantecón Vernalte (born 11 May 1953) better known as Kiti Mánver is a Spanish actress. She has appeared in more than 100 films and television shows since 1970. She starred in the 1973 film \"Habla, mudita\", which was entered into the 23rd Berlin International Film Festival.»\n", 2307 | "[6] «Amy Steel | Amy Steel (born Alice Amy Steel; May 3, 1960) is an American film and television actress. She is best known for her roles as Ginny Field in \"Friday the 13th Part 2\" (1981) and Kit Graham in \"April Fool's Day\" (1986). She has starred in films such as \"Exposed\" (1983), \"Walk Like a Man\" (1987), \"What Ever Happened to Baby Jane? \" (1991), and \"Tales of Poe\" (2014). Steel has had numerous guest appearances on several television series, such as \"Family Ties\" (1983), \"The A-Team\" (1983), \"Quantum Leap\" (1990), and \"China Beach\" (1991), as well as a starring role in \"The Powers of Matthew Star\" (1982–83).»\n", 2308 | "\n", 2309 | "Question: which American actor was Candace Kita guest starred with\n", 2310 | "\n", 2311 | "Reasoning: Let's think step by step in order to Answer: Bill Murray\n", 2312 | "\n", 2313 | "Answer: Bill Murray\n", 2314 | "\n", 2315 | "---\n", 2316 | "\n", 2317 | "Context:\n", 2318 | "[1] «Monthly Magazine | The Monthly Magazine (1796–1843) of London began publication in February 1796. Richard Phillips was the publisher and a contributor on political issues. The editor for the first ten years was the literary jack-of-all-trades, Dr John Aikin. Other contributors included William Blake, Samuel Taylor Coleridge, George Dyer, Henry Neele and Charles Lamb. The magazine also published the earliest fiction of Charles Dickens, the first of what would become \"Sketches by Boz\".»\n", 2319 | "[2] «Bodega Magazine | Bodega Magazine is an online literary magazine that releases new issues on the first Monday of every month, featuring stories, poems, essays and interviews from a mix of emerging and established writers. It was founded in early spring of 2012 by creative writing MFA graduates from New York University who had previously worked together on the \"Washington Square Review\", and continues to be based out of Manhattan and Brooklyn. The inaugural issue was published on September 4, 2012.»\n", 2320 | "[3] «Who Put the Bomp | Who Put The Bomp was a rock music fanzine edited and published by Greg Shaw from 1970 to 1979. Its name came from the hit 1961 doo-wop song by Barry Mann, \"Who Put the Bomp\". Later, the name was shortened to \"Bomp!\"»\n", 2321 | "[4] «The Most (album) | The Most is the third album released by straight edge hardcore punk band Down to Nothing. It was released on July 17, 2007.»\n", 2322 | "[5] «The Most Incredible Thing | “The Most Incredible Thing\" (Danish: \"Det Utroligste\" ) is a literary fairy tale by Danish poet and author Hans Christian Andersen (1805–1875). The story is about a contest to find the most incredible thing and the wondrous consequences when the winner is chosen. The tale was first published in an English translation by Horace Scudder, an American correspondent of Andersen's, in the United States in September 1870 before being published in the original Danish in Denmark in October 1870. \"The Most Incredible Thing\" was the first of Andersen's tales to be published in Denmark during World War II. Andersen considered the tale one of his best.»\n", 2323 | "[6] «Augusta Triumphans | Augusta Triumphans: or, the Way to Make London the Most Flourishing City in the Universe by Daniel Defoe was first published on 16 March 1728. The fictitious speaker of this pamphlet, Andrew Moreton, is a man in his sixties who offers suggestions for the improvement of London. In particular, he fosters the establishment of a university, an academy of music, a hospital for foundlings and licensed institutions for the treatment of mental diseases. Moreover, he encourages the introduction of measures to prevent moral corruption and street robbery.»\n", 2324 | "\n", 2325 | "Question: Which of these publications was most recently published, Who Put the Bomp or Self?\n", 2326 | "\n", 2327 | "Reasoning: Let's think step by step in order to Answer: Self\n", 2328 | "\n", 2329 | "Answer: Self\n", 2330 | "\n", 2331 | "---\n", 2332 | "\n", 2333 | "Context:\n", 2334 | "[1] «The Victorians | The Victorians - Their Story In Pictures is a 2009 British documentary series which focuses on Victorian art and culture. The four-part series is written and presented by Jeremy Paxman and debuted on BBC One at 9:00pm on Sunday 15 February 2009.»\n", 2335 | "[2] «What the Victorians Did for Us | What the Victorians Did for Us is a 2001 BBC documentary series that examines the impact of the Victorian era on modern society. It concentrates primarily on the scientific and social advances of the era, which bore the Industrial Revolution and set the standards for polite society today.»\n", 2336 | "[3] «The Great Victorian Collection | The Great Victorian Collection, published in 1975, is a novel by Northern Irish-Canadian writer Brian Moore. Set in Carmel, California, it tells the story of a man who dreams that the empty parking lot he can see from his hotel window has been transformed by the arrival of a collection of priceless Victoriana on display in a vast open-air market. When he awakes he finds that he can no longer distinguish the dream from reality.»\n", 2337 | "[4] «Jeremy Paxman | Jeremy Dickson Paxman (born 11 May 1950) is an English broadcaster, journalist, and author. He is the question master of \"University Challenge\", having succeeded Bamber Gascoigne when the programme was revived in 1994.»\n", 2338 | "[5] «Jeremy I | Jeremy I was king of the Miskito nation, who came to power following the death of his father, Oldman, in 1686 or 1687. according to an English visitor, W. M., in 1699, he was about 60 years old at that time, making his birth year about 1639.»\n", 2339 | "[6] «Jeremy Cheeseman | Jeremy Cheeseman (born June 6, 1990 in Manorville, New York) is a former American professional soccer player. Playing two seasons for the Dayton Dutch Lions in the USL Professional Division before retiring due to injury»\n", 2340 | "\n", 2341 | "Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?\n", 2342 | "\n", 2343 | "Reasoning: Let's think step by step in order to Answer: 1950\n", 2344 | "\n", 2345 | "Answer: 1950\n", 2346 | "\n", 2347 | "---\n", 2348 | "\n", 2349 | "Context:\n", 2350 | "[1] «Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. \"Tae Kwon Do Times\" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.»\n", 2351 | "[2] «Scott Shaw (artist) | Scott Shaw (often spelled Scott Shaw!) is a United States cartoonist and animator, and historian of comics. Among Scott's comic-book work is Hanna-Barbera's \"The Flintstones\" (for Marvel Comics and Harvey Comics), \"Captain Carrot and His Amazing Zoo Crew\" (for DC Comics), and \"Simpsons Comics\" (for Bongo Comics). He was also the first artist for Archie Comics' \"Sonic the Hedgehog\" comic book series.»\n", 2352 | "[3] «Scott Shaw | Scott Shaw (born September 23, 1958) is an American actor, author, film director, film producer, journalist, martial artist, musician, photographer, and professor.»\n", 2353 | "[4] «Scott Shaw (artist) | Scott Shaw (often spelled Scott Shaw!) is a United States cartoonist and animator, and historian of comics. Among Scott's comic-book work is Hanna-Barbera's \"The Flintstones\" (for Marvel Comics and Harvey Comics), \"Captain Carrot and His Amazing Zoo Crew\" (for DC Comics), and \"Simpsons Comics\" (for Bongo Comics). He was also the first artist for Archie Comics' \"Sonic the Hedgehog\" comic book series.»\n", 2354 | "[5] «Scott Shaw | Scott Shaw (born September 23, 1958) is an American actor, author, film director, film producer, journalist, martial artist, musician, photographer, and professor.»\n", 2355 | "[6] «Arnold Shaw (author) | Arnold Shaw (1909–1989) was a songwriter and music business executive, primarily in the field of music publishing, who is best known for his comprehensive series of books on 20th century American popular music.»\n", 2356 | "\n", 2357 | "Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?\n", 2358 | "\n", 2359 | "Reasoning: Let's think step by step in order to Answer: Tae Kwon Do Times\n", 2360 | "\n", 2361 | "Answer: Tae Kwon Do Times\n", 2362 | "\n", 2363 | "---\n", 2364 | "\n", 2365 | "Context:\n", 2366 | "[1] «William Hughes Miller | William Hughes Miller (born March 16, 1941, Kosciusko, Mississippi) is a professor at the University of California, Berkeley and a leading researcher in the field of theoretical chemistry.»\n", 2367 | "[2] «William Herbert Miller, Jr. | William Hubert Miller, Jr. (September 1932 – November 4, 1988), of New York City, was an aerophilatelist who published philatelic literature on the subject.»\n", 2368 | "[3] «William Rickarby Miller | William Rickarby Miller (May 20, 1818 in Staindrop – July 1893 in New York City) was an American painter, of the Hudson River School.»\n", 2369 | "[4] «Kosciusko, Mississippi | Kosciusko is a city in Attala County, Mississippi, United States. The population was 7,402 at the 2010 census. It is the county seat of Attala County.»\n", 2370 | "[5] «Attala County, Mississippi | Attala County is a county located in the U.S. state of Mississippi. As of the 2010 census, the population was 19,564. Its county seat is Kosciusko. Attala County is named for Atala, a fictional Native American heroine from an early-19th-century novel of the same name by François-René de Chateaubriand.»\n", 2371 | "[6] «Kosciusko Island | Kosciusko Island is an island in the Alexander Archipelago of southeastern Alaska, United States. It lies near the northwest corner of Prince of Wales Island, just across the El Capitan Passage from the larger island. The island is near Mount Francis, Holbrook Mountain, and Tokeen Peak. Kosciusko Island has a land area of 171.585 sq mi (444.403 km²), making it the 38th largest island in the United States. It had a population of 52 persons as of the 2000 census, mostly in Edna Bay, its largest community.»\n", 2372 | "\n", 2373 | "Question: `William Hughes Miller was born in a city with how many inhabitants ?\n", 2374 | "\n", 2375 | "Reasoning: Let's think step by step in order to Answer: 7,402\n", 2376 | "\n", 2377 | "Answer:\u001b[32m 7,402\u001b[0m\n", 2378 | "\n", 2379 | "\n", 2380 | "\n" 2381 | ] 2382 | } 2383 | ], 2384 | "source": [ 2385 | "ms_revised.question_for_model('multihop','bootstrap_few_shot',question)\n", 2386 | "lm.inspect_history(n=1)" 2387 | ] 2388 | }, 2389 | { 2390 | "cell_type": "markdown", 2391 | "metadata": {}, 2392 | "source": [ 2393 | "And here's the prompt generated using GPT-4 as `teacher`. Notice how the \"Reasoning\" is much better articulated here!" 2394 | ] 2395 | }, 2396 | { 2397 | "cell_type": "code", 2398 | "execution_count": 85, 2399 | "metadata": {}, 2400 | "outputs": [ 2401 | { 2402 | "name": "stdout", 2403 | "output_type": "stream", 2404 | "text": [ 2405 | "\n", 2406 | "\n", 2407 | "\n", 2408 | "\n", 2409 | "Answer questions with short factoid answers.\n", 2410 | "\n", 2411 | "---\n", 2412 | "\n", 2413 | "Question: Which Pakistani cricket umpire who won 3 consecutive ICC umpire of the year awards in 2009, 2010, and 2011 will be in the ICC World Twenty20?\n", 2414 | "Answer: Aleem Sarwar Dar\n", 2415 | "\n", 2416 | "Question: Tombstone stared an actor born May 17, 1955 known as who?\n", 2417 | "Answer: Bill Paxton\n", 2418 | "\n", 2419 | "Question: Which American actress who made their film debut in the 1995 teen drama \"Kids\" was the co-founder of Voto Latino?\n", 2420 | "Answer: Rosario Dawson\n", 2421 | "\n", 2422 | "Question: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?\n", 2423 | "Answer: \"Outfield of Dreams\"\n", 2424 | "\n", 2425 | "Question: which American actor was Candace Kita guest starred with\n", 2426 | "Answer: Bill Murray\n", 2427 | "\n", 2428 | "Question: The Organisation that allows a community to influence their operation or use and to enjoy the benefits arisingwas founded in what year?\n", 2429 | "Answer: 2010\n", 2430 | "\n", 2431 | "Question: At My Window was released by which American singer-songwriter?\n", 2432 | "Answer: John Townes Van Zandt\n", 2433 | "\n", 2434 | "Question: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division?\n", 2435 | "Answer: Operation Citadel\n", 2436 | "\n", 2437 | "Question: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of \"Hamlet.\" ?\n", 2438 | "Answer: Kerry Condon\n", 2439 | "\n", 2440 | "Question: Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs?\n", 2441 | "Answer: Buena Vista Distribution\n", 2442 | "\n", 2443 | "Question: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko?\n", 2444 | "Answer: Aleksandr Danilovich Aleksandrov\n", 2445 | "\n", 2446 | "Question: \"Everything Has Changed\" is a song from an album released under which record label ?\n", 2447 | "Answer: Big Machine Records\n", 2448 | "\n", 2449 | "---\n", 2450 | "\n", 2451 | "Follow the following format.\n", 2452 | "\n", 2453 | "Context: may contain relevant facts\n", 2454 | "\n", 2455 | "Question: ${question}\n", 2456 | "\n", 2457 | "Reasoning: Let's think step by step in order to ${produce the answer}. We ...\n", 2458 | "\n", 2459 | "Answer: often between 1 and 5 words\n", 2460 | "\n", 2461 | "---\n", 2462 | "\n", 2463 | "Context:\n", 2464 | "[1] «Monthly Magazine | The Monthly Magazine (1796–1843) of London began publication in February 1796. Richard Phillips was the publisher and a contributor on political issues. The editor for the first ten years was the literary jack-of-all-trades, Dr John Aikin. Other contributors included William Blake, Samuel Taylor Coleridge, George Dyer, Henry Neele and Charles Lamb. The magazine also published the earliest fiction of Charles Dickens, the first of what would become \"Sketches by Boz\".»\n", 2465 | "[2] «Who Put the Bomp | Who Put The Bomp was a rock music fanzine edited and published by Greg Shaw from 1970 to 1979. Its name came from the hit 1961 doo-wop song by Barry Mann, \"Who Put the Bomp\". Later, the name was shortened to \"Bomp!\"»\n", 2466 | "[3] «Desktop Publishing Magazine | Desktop Publishing magazine (ISSN 0884-0873) was founded, edited, and published by Tony Bove and Cheryl Rhodes of TUG/User Publications, Inc., of Redwood City, CA. ) . Its first issue appeared in October, 1985, and was created and produced on a personal computer with desktop publishing software (PageMaker on a Macintosh), preparing output on a prototype PostScript-driven typesetting machine from Mergenthaler Linotype Company. Erik Sandberg-Diment, a columnist at \"The New York Times\", tried to buy the venture outright when he saw an early edition.»\n", 2467 | "[4] «Self (magazine) | Self is an American magazine for women that specializes in health, wellness, beauty, and style. Part of Condé Nast, Self had a circulation of 1,515,880 and a total audience of 5,282,000 readers, according to its corporate media kit n 2013. The editor-in-chief is Carolyn Kylstra. \"Self\" is based in the Condé Nast U.S. headquarters at 1 World Trade Center in New York, NY. In February 2017 the magazine became an online publication.»\n", 2468 | "[5] «Self-Publishing Review | Self-Publishing Review (or \"SPR\") is an online book review magazine for indie authors founded in 2008 by American author Henry Baum.»\n", 2469 | "[6] «Self-publishing | Self-publishing is the publication of any book, album or other media by its author without the involvement of an established publisher. A self-published physical book is said to have been privately printed. The author is in control of the entire process including, for a book, the design of the cover and interior, formats, price, distribution, marketing, and public relations. The authors can do it all themselves or may outsource some or all of the work to companies which offer these services.»\n", 2470 | "\n", 2471 | "Question: Which of these publications was most recently published, Who Put the Bomp or Self?\n", 2472 | "\n", 2473 | "Reasoning: Let's think step by step in order to determine which publication was most recently published. According to the context, \"Who Put the Bomp\" was published from 1970 to 1979. On the other hand, \"Self\" magazine became an online publication in February 2017 after being a print publication. Therefore, \"Self\" was most recently published.\n", 2474 | "\n", 2475 | "Answer: Self\n", 2476 | "\n", 2477 | "---\n", 2478 | "\n", 2479 | "Context:\n", 2480 | "[1] «The Victorians | The Victorians - Their Story In Pictures is a 2009 British documentary series which focuses on Victorian art and culture. The four-part series is written and presented by Jeremy Paxman and debuted on BBC One at 9:00pm on Sunday 15 February 2009.»\n", 2481 | "[2] «The Great Victorian Collection | The Great Victorian Collection, published in 1975, is a novel by Northern Irish-Canadian writer Brian Moore. Set in Carmel, California, it tells the story of a man who dreams that the empty parking lot he can see from his hotel window has been transformed by the arrival of a collection of priceless Victoriana on display in a vast open-air market. When he awakes he finds that he can no longer distinguish the dream from reality.»\n", 2482 | "[3] «Victorian (comics) | The Victorian is a 25-issue comic book series published by Penny-Farthing Press and starting in 1999. The brainchild of creator Trainor Houghton, the series included a number of notable script writers and illustrators, including Len Wein, Glen Orbik and Howard Chaykin.»\n", 2483 | "[4] «Jeremy Paxman | Jeremy Dickson Paxman (born 11 May 1950) is an English broadcaster, journalist, and author. He is the question master of \"University Challenge\", having succeeded Bamber Gascoigne when the programme was revived in 1994.»\n", 2484 | "[5] «Jeremy I | Jeremy I was king of the Miskito nation, who came to power following the death of his father, Oldman, in 1686 or 1687. according to an English visitor, W. M., in 1699, he was about 60 years old at that time, making his birth year about 1639.»\n", 2485 | "[6] «Jeremy Cheeseman | Jeremy Cheeseman (born June 6, 1990 in Manorville, New York) is a former American professional soccer player. Playing two seasons for the Dayton Dutch Lions in the USL Professional Division before retiring due to injury»\n", 2486 | "\n", 2487 | "Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?\n", 2488 | "\n", 2489 | "Reasoning: Let's think step by step in order to determine the birth year of the author who wrote \"The Victorians - Their Story In Pictures.\" According to context [4], Jeremy Paxman, an English broadcaster and journalist, wrote and presented this documentary series. His birth year is provided in the same context.\n", 2490 | "\n", 2491 | "Answer: 1950\n", 2492 | "\n", 2493 | "---\n", 2494 | "\n", 2495 | "Context:\n", 2496 | "[1] «Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. \"Tae Kwon Do Times\" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.»\n", 2497 | "[2] «Kwon Tae-man | Kwon Tae-man (born 1941) was an early Korean hapkido practitioner and a pioneer of the art, first in Korea and then in the United States. He formed one of the earliest dojang's for hapkido in the United States in Torrance, California, and has been featured in many magazine articles promoting the art.»\n", 2498 | "[3] «Scott Shaw (artist) | Scott Shaw (often spelled Scott Shaw!) is a United States cartoonist and animator, and historian of comics. Among Scott's comic-book work is Hanna-Barbera's \"The Flintstones\" (for Marvel Comics and Harvey Comics), \"Captain Carrot and His Amazing Zoo Crew\" (for DC Comics), and \"Simpsons Comics\" (for Bongo Comics). He was also the first artist for Archie Comics' \"Sonic the Hedgehog\" comic book series.»\n", 2499 | "[4] «Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. \"Tae Kwon Do Times\" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.»\n", 2500 | "[5] «Scott Savitt | Scott Savitt is a former foreign correspondent for The Los Angeles Times and United Press International in Beijing. His articles have been published in The Los Angeles Times, Washington Post http://www.washingtonpost.com/wp-dyn/content/article/2008/04/18/AR2008041802635.html, Wall Street Journal, New York Times, and many other publications.»\n", 2501 | "[6] «Scott Poulson-Bryant | Scott Poulson-Bryant is an award-winning American journalist and author. One of the co-founding editors of Vibe magazine in 1992 (and the editor who gave the magazine its name), Poulson-Bryant's journalism, profiles, reviews, and essays have appeared in such publications as the \"New York Times\", \"the Village Voice\", \"Rolling Stone\", \"Spin\", \"Essence\", \"Ebony\", and \"The Source\". He is the author of \"HUNG: A Meditation on the Measure of Black Men in America\" (published by Doubleday Books in 2006) and a novel called \"The VIPs\".»\n", 2502 | "\n", 2503 | "Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?\n", 2504 | "\n", 2505 | "Reasoning: Let's think step by step in order to determine which magazine published articles by Scott Shaw. According to the context provided, Scott Shaw has contributed to \"Tae Kwon Do Times,\" which is mentioned in both [1] and [4]. There is no mention of Scott Shaw contributing to \"Southwest Art.\"\n", 2506 | "\n", 2507 | "Answer: Tae Kwon Do Times\n", 2508 | "\n", 2509 | "---\n", 2510 | "\n", 2511 | "Context:\n", 2512 | "[1] «1972 FA Charity Shield | The 1972 FA Charity Shield was contested between Manchester City and Aston Villa.»\n", 2513 | "[2] «1968 FA Charity Shield | The 1968 FA Charity Shield was a football match played on 3 August 1968 between Football League champions Manchester City and FA Cup winners West Bromwich Albion. It was the 46th Charity Shield match and was played at City's home ground, Maine Road. Manchester City won 6–1.»\n", 2514 | "[3] «1973 FA Charity Shield | The 1973 FA Charity Shield was contested between Burnley and Manchester City in a fixture that took place at Maine Road.»\n", 2515 | "[4] «List of Aston Villa F.C. seasons | This is a list of seasons played by Aston Villa Football Club in English and European football, from 1879 (the year of the club's first FA Cup entry) to the most recent completed season. Aston Villa football club was founded in March, 1874, by members of the Villa Cross Wesleyan Chapel in Aston. Throughout the 1870s Aston Villa played a small amount of games. At least one game, against Aston Brook St Mary's was played with one half under Rugby rules and the other under football rules. In the 1880s the game became more formalised and in 1888, William McGregor formed the Football League with 11 other clubs.»\n", 2516 | "[5] «List of Aston Villa F.C. records and statistics | Aston Villa Football Club are an English professional association football club based in Aston, Birmingham, who currently play in the EFL Championship. The club was founded in 1874 and have played at their current home ground, Villa Park, since 1897. Aston Villa were founding members of the Football League in 1888 and the Premier League in 1992. They are one of the oldest and most successful football clubs in England, having won the First Division Championship seven times and the FA Cup seven times. In 1982 the club became one of only five English clubs to win the European Cup.»\n", 2517 | "[6] «Aston Villa F.C. | Aston Villa Football Club ( ; nicknamed Villa, The Villa, The Villans and The Lions) is a professional football club in Aston, Birmingham, that plays in the Championship, the second level of English football. Founded in 1874, they have played at their current home ground, Villa Park, since 1897. Aston Villa were one of the founder members of the Football League in 1888 and of the Premier League in 1992.»\n", 2518 | "\n", 2519 | "Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield\n", 2520 | "\n", 2521 | "Reasoning: Let's think step by step in order to determine the founding year of the club that played against Manchester City in the 1972 FA Charity Shield. According to context [1], the match was contested between Manchester City and Aston Villa. To find the founding year of Aston Villa, we refer to context [4], which states that Aston Villa Football Club was founded in March, 1874.\n", 2522 | "\n", 2523 | "Answer: 1874\n", 2524 | "\n", 2525 | "---\n", 2526 | "\n", 2527 | "Context:\n", 2528 | "[1] «William Hughes Miller | William Hughes Miller (born March 16, 1941, Kosciusko, Mississippi) is a professor at the University of California, Berkeley and a leading researcher in the field of theoretical chemistry.»\n", 2529 | "[2] «William Read Miller | William Read Miller (November 23, 1823November 29, 1887) was the 12th Governor of the State of Arkansas. Born in Batesville, Arkansas; Miller was Arkansas's first native born Governor. Serving two terms in the turbulent period after Reconstruction, Miller's four-year administration marked the beginnings of New Departure Democrats in Arkansas. Running on a platform of economic growth via reconciliation between whites and freedmen, Miller often was opposed by members of his own party during the infancy of the Lost Cause ideology. His plans to pay back a large state debt including the Holford Bonds, valued at $14 million ($ million today), were often interrupted by racial violence, and his support for public schools and universities was often combated by those in his own party.»\n", 2530 | "[3] «William "Willie" Armstrong | William Armstrong was born c1804 in Painter Heugh (or Hugh), (which was an old lane dating from medieval Newcastle, a lane joining lower part of Dean Street to the higher part of Pilgrim Street), the name possibly derived from the fact that ships tied up here in the tidal parts of the Lort Burn (now filled).»\n", 2531 | "[4] «Kosciusko, Mississippi | Kosciusko is a city in Attala County, Mississippi, United States. The population was 7,402 at the 2010 census. It is the county seat of Attala County.»\n", 2532 | "[5] «Attala County, Mississippi | Attala County is a county located in the U.S. state of Mississippi. As of the 2010 census, the population was 19,564. Its county seat is Kosciusko. Attala County is named for Atala, a fictional Native American heroine from an early-19th-century novel of the same name by François-René de Chateaubriand.»\n", 2533 | "[6] «Kosciusko Island | Kosciusko Island is an island in the Alexander Archipelago of southeastern Alaska, United States. It lies near the northwest corner of Prince of Wales Island, just across the El Capitan Passage from the larger island. The island is near Mount Francis, Holbrook Mountain, and Tokeen Peak. Kosciusko Island has a land area of 171.585 sq mi (444.403 km²), making it the 38th largest island in the United States. It had a population of 52 persons as of the 2000 census, mostly in Edna Bay, its largest community.»\n", 2534 | "\n", 2535 | "Question: `William Hughes Miller was born in a city with how many inhabitants ?\n", 2536 | "\n", 2537 | "Reasoning: Let's think step by step in order to Answer: 7,402\n", 2538 | "\n", 2539 | "Answer:\u001b[32m 7,402\u001b[0m\n", 2540 | "\n", 2541 | "\n", 2542 | "\n" 2543 | ] 2544 | } 2545 | ], 2546 | "source": [ 2547 | "ms_gpt4_teacher.question_for_model('multihop','bootstrap_few_shot',question)\n", 2548 | "lm.inspect_history(n=1)" 2549 | ] 2550 | } 2551 | ], 2552 | "metadata": { 2553 | "kernelspec": { 2554 | "display_name": "venv", 2555 | "language": "python", 2556 | "name": "python3" 2557 | }, 2558 | "language_info": { 2559 | "codemirror_mode": { 2560 | "name": "ipython", 2561 | "version": 3 2562 | }, 2563 | "file_extension": ".py", 2564 | "mimetype": "text/x-python", 2565 | "name": "python", 2566 | "nbconvert_exporter": "python", 2567 | "pygments_lexer": "ipython3", 2568 | "version": "3.10.4" 2569 | } 2570 | }, 2571 | "nbformat": 4, 2572 | "nbformat_minor": 2 2573 | } 2574 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | python-dotenv==1.0.1 2 | pandas==2.2.1 3 | ipykernel==6.25.2 4 | dspy-ai==2.4.5 5 | arize-phoenix==3.22.0 6 | openinference-instrumentation-dspy==0.1.6 7 | opentelemetry-exporter-otlp==1.24.0 --------------------------------------------------------------------------------