├── README.md └── dspy_with_kg.ipynb /README.md: -------------------------------------------------------------------------------- 1 | The DSPy (Declarative Self-improving Language Programs in Python) framework has excited the developer community with its ability to automatically optimize and enhance language model pipelines, which may reduce the need to manually fine-tune prompt templates. 2 | 3 | We designed a custom DSPy pipeline integrating with knowledge graphs. The reason? One of the main strengths of knowledge graphs is their ability to provide rich contextual information from interconnected data, which can potentially enhance the system’s retrieval and reasoning abilities.We tested both the vanilla DSPy RAG pipeline and our customized one to see how they work, and you'll find out in the video that I encountered more surprises than expected. 4 | YouTube Video: https://www.youtube.com/watch?v=mHREErgLmi0 5 | -------------------------------------------------------------------------------- /dspy_with_kg.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "id": "22120951-8427-4bf4-b2d6-691c9712ccec", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "!pip install dspy-ai\n", 11 | "!pip install -r dspy-requirements-2.txt\n", 12 | "!pip install -U dspy-ai" 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "id": "d739c01f-7c4e-460e-8be5-a000b2c997df", 18 | "metadata": {}, 19 | "source": [ 20 | "# Environment Setup" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 1, 26 | "id": "b3f07590-b09a-4411-b01c-30a65305619b", 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "import dspy\n", 31 | "from dspy.retrieve.chromadb_rm import ChromadbRM\n", 32 | "import json\n", 33 | "from langchain.chains import GraphCypherQAChain\n", 34 | "from langchain_community.graphs import Neo4jGraph\n", 35 | "from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n", 36 | "import os\n", 37 | "\n", 38 | "os.environ[\"OPENAI_API_KEY\"] = 'open-api-key'" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "id": "7c81e150-8a40-42a9-bbec-57228b6a48a2", 44 | "metadata": {}, 45 | "source": [ 46 | "# Elon Musk's Wikipedia Page as Our Knowledge Source" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "id": "fc09bba3-fe76-4a42-80ad-bd41195e4fd5", 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [ 56 | "from langchain_community.document_loaders import WikipediaLoader\n", 57 | "\n", 58 | "query = \"Elon Musk\"\n", 59 | "raw_documents = WikipediaLoader(query=query).load()" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "id": "59ab852b-305f-4958-a86a-3cda391a3f2f", 65 | "metadata": {}, 66 | "source": [ 67 | "Embed our text data in chromadb vector database" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 5, 73 | "id": "39618be9-74bd-4e8c-8766-e582617e0a12", 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", 78 | "from langchain_community.vectorstores import Chroma\n", 79 | "from langchain_openai import OpenAIEmbeddings\n", 80 | "\n", 81 | "# Split\n", 82 | "text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n", 83 | " chunk_size=512, chunk_overlap=100\n", 84 | ")\n", 85 | "all_splits = text_splitter.split_documents(raw_documents)\n", 86 | "\n", 87 | "import chromadb.utils.embedding_functions as embedding_functions\n", 88 | "embeddings = OpenAIEmbeddings(model=\"text-embedding-ada-002\")\n", 89 | "\n", 90 | "CHROMA_COLLECTION_NAME = \"dspy-rag-chroma\"\n", 91 | "CHROMADB_DIR = \"dspy_rag_chroma/\"\n", 92 | "\n", 93 | "# Index\n", 94 | "vectorstore = Chroma.from_documents(\n", 95 | " documents=all_splits,\n", 96 | " collection_name=CHROMA_COLLECTION_NAME,\n", 97 | " embedding=embeddings,\n", 98 | " persist_directory=CHROMADB_DIR\n", 99 | ")\n", 100 | "retriever = vectorstore.as_retriever()" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 31, 106 | "id": "b55641b8-e4ba-4d7d-9985-7eac6ea8f418", 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [ 110 | "import chromadb\n", 111 | "from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction\n", 112 | "from dspy.retrieve.chromadb_rm import ChromadbRM\n", 113 | "import os\n", 114 | "\n", 115 | "embedding_function = OpenAIEmbeddingFunction(\n", 116 | " api_key=os.environ.get('OPENAI_API_KEY'),\n", 117 | " model_name=\"text-embedding-ada-002\"\n", 118 | ")\n", 119 | "\n", 120 | "rm = ChromadbRM(CHROMA_COLLECTION_NAME, CHROMADB_DIR, embedding_function, k=3)" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "id": "8281f98a-ba08-4c61-9837-e1b764c7195d", 126 | "metadata": {}, 127 | "source": [ 128 | "# Build Knowlegde Graph of Elon Musk with Diffbot's Natural Language API" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 43, 134 | "id": "59a33a80-17a5-4681-8843-e416d818e770", 135 | "metadata": {}, 136 | "outputs": [], 137 | "source": [ 138 | "# DiffbotGraphTransformer calls Diffbot Natural Language API to extract entities and relationships in the article\n", 139 | "from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer\n", 140 | "import os\n", 141 | "\n", 142 | "diffbot_api_key = \"Diffbot-token\" \n", 143 | "\n", 144 | "diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)\n", 145 | "\n", 146 | "# Diffbot's Natural Language API converts unstructured text data into knowlegde graphs\n", 147 | "graph_documents = diffbot_nlp.convert_to_graph_documents(raw_documents)\n", 148 | "\n", 149 | "# add knowledge graph data to the neo4j database\n", 150 | "graph.add_graph_documents(graph_documents)" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 10, 156 | "id": "1df2cc1b-2b7f-4bf0-b89a-490f4f168275", 157 | "metadata": {}, 158 | "outputs": [], 159 | "source": [ 160 | "# connect to our neo4j database\n", 161 | "from langchain_community.graphs import Neo4jGraph\n", 162 | "\n", 163 | "url = \"url\"\n", 164 | "username = \"neo4j\"\n", 165 | "password = \"pwd\"\n", 166 | "\n", 167 | "graph = Neo4jGraph(url=url, username=username, password=password)" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "id": "e52d1e1b-3898-4e39-8cbe-1cb0a821d17f", 173 | "metadata": {}, 174 | "source": [ 175 | "# Vanilla DSPy RAG Pipeline" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 84, 181 | "id": "99c30f09-f3bc-4bbc-aa89-ab004b9b7aff", 182 | "metadata": {}, 183 | "outputs": [], 184 | "source": [ 185 | "class GenerateAnswer(dspy.Signature):\n", 186 | " \"\"\"Answer questions with short factoid answers.\"\"\"\n", 187 | "\n", 188 | " context = dspy.InputField(desc=\"may contain relevant facts\")\n", 189 | " question = dspy.InputField()\n", 190 | " answer = dspy.OutputField(desc=\"often between 1 and 5 words\")\n", 191 | "\n", 192 | "vanilla_dspy_rag_lm = dspy.OpenAI(model='gpt-3.5-turbo-instruct')\n", 193 | "dspy.settings.configure(lm=vanilla_dspy_rag_lm, rm=rm)\n", 194 | "\n", 195 | "class vanilla_dspy_rag(dspy.Module):\n", 196 | " \n", 197 | " # we set num_passages=1 to avoid the same passage being repeatedly retrieved for multiple times\n", 198 | " def __init__(self, num_passages=1):\n", 199 | " super().__init__()\n", 200 | "\n", 201 | " self.retrieve = dspy.Retrieve(k=num_passages)\n", 202 | " self.generate_answer = dspy.ChainOfThought(GenerateAnswer)\n", 203 | " \n", 204 | " def forward(self, question):\n", 205 | " context = self.retrieve(question).passages\n", 206 | " prediction = self.generate_answer(context=context, question=question)\n", 207 | " return dspy.Prediction(context=context, answer=prediction.answer)" 208 | ] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "id": "7a2ad833-e51b-44af-8f0e-3b4a23db79cd", 213 | "metadata": {}, 214 | "source": [ 215 | "# initial_context_from_knowledge_graph : fetch structured and grounded facts in knowlegde graph" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": 12, 221 | "id": "e3a2ce20-e3c5-4b42-b426-16d4f4ccbc6b", 222 | "metadata": {}, 223 | "outputs": [], 224 | "source": [ 225 | "# function to query neo4j graph database\n", 226 | "\n", 227 | "initial_context_from_knowledge_graph = GraphCypherQAChain.from_llm(\n", 228 | " ChatOpenAI(model=\"gpt-3.5-turbo-instruct\", temperature=0), graph=graph, verbose=True\n", 229 | ")" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "id": "4032bc20-643f-4063-be0c-ce456da73437", 235 | "metadata": {}, 236 | "source": [ 237 | "# Complete Pipeline of DSPy RAG with Knowledge Graph" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 95, 243 | "id": "725fa0f1-df50-4da6-8d22-cbdaa1f1d423", 244 | "metadata": {}, 245 | "outputs": [], 246 | "source": [ 247 | "# Set up the LM\n", 248 | "lm = dspy.OpenAI(model='gpt-3.5-turbo-instruct')\n", 249 | "dspy.settings.configure(lm=lm, rm=rm)\n", 250 | "\n", 251 | "class EnrichQueryWithKG(dspy.Signature):\n", 252 | " \"\"\"Refines and enriches questions using knowledge graph context.\"\"\"\n", 253 | " question = dspy.InputField()\n", 254 | " kg_context = dspy.InputField(type=str, desc=\"Knowledge graph metadata\") #changed from dict to str\n", 255 | " enriched_query = dspy.OutputField(desc=\"Query enriched with KG context\")\n", 256 | "\n", 257 | "class EnhanceContextWithKG(dspy.Signature):\n", 258 | " \"\"\"Consolidates and summarizes information from enriched queries and passages.\"\"\"\n", 259 | " query = dspy.InputField()\n", 260 | " passages = dspy.InputField(desc=\"Passages retrieved based on the enriched query\")\n", 261 | " enhanced_context = dspy.OutputField(desc=\"Context enhanced with KG data\")\n", 262 | "\n", 263 | "class KGEnrichedQueryModule(dspy.Module):\n", 264 | " def __init__(self, temperature: int = 0):\n", 265 | " super().__init__()\n", 266 | " self.temperature = temperature\n", 267 | " self.enrich_query_module = dspy.ChainOfThought(EnrichQueryWithKG)\n", 268 | " self.enhance_context_module = dspy.ChainOfThought(EnhanceContextWithKG)\n", 269 | " self.retrieve = rm\n", 270 | "\n", 271 | " def forward(self, question: str, kg_context: str) -> str: #change from dict to str\n", 272 | " enriched_query = self.enrich_query_module(question=question, kg_context=kg_context).enriched_query\n", 273 | " passages = self.retrieve(enriched_query)\n", 274 | " final_answer = self.enhance_context_module(query=enriched_query, passages=passages).enhanced_context\n", 275 | " return final_answer\n" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "id": "82c7e6e0-dc85-43ef-8b1c-7f1323798c04", 281 | "metadata": {}, 282 | "source": [ 283 | "# Question 1 : What industry or industries is Neuralink in?\n", 284 | "# A. Vanilla DSPy RAG " 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 73, 290 | "id": "b7231921-ba3b-4ecb-b33f-a77585112db6", 291 | "metadata": {}, 292 | "outputs": [ 293 | { 294 | "data": { 295 | "text/plain": [ 296 | "Prediction(\n", 297 | " context=['=== History ===\\nNeuralink was founded in 2016 by Elon Musk and a founding team of seven scientists and engineers. The group of initial hires consisted of experts in areas such as neuroscience, biochemistry and robotics. The trademark \"Neuralink\" was purchased from its previous owners in January 2017.In April 2017, Neuralink announced that it was aiming to make devices to treat serious brain diseases in the short-term, with the eventual goal of human enhancement, sometimes called transhumanism. Musk had said his interest in the idea partly stemmed from the science fiction concept of \"neural lace\" in the fictional universe in The Culture, a series of 10 novels by Iain M. Banks.Musk defined the neural lace as a \"digital layer above the cortex\" that would not necessarily imply extensive surgical insertion but ideally an implant through a vein or artery. He said the long-term goal is to achieve \"symbiosis with artificial intelligence\", which he perceives as an existential threat to humanity if it goes unchecked. He believes the device will be \"something analogous to a video game, like a saved game situation, where you are able to resume and upload your last state\" and \"address brain injuries or spinal injuries and make up for whatever lost capacity somebody has with a chip.\"As of 2020, Neuralink was headquartered in San Francisco\\'s Mission District, sharing the Pioneer building with OpenAI, another company co-founded by Musk.\\nAs of 2022, Neuralink\\'s headquarters were in Fremont, California. Jared Birchall, the head of Musk\\'s family office, was listed as CEO, CFO and president of Neuralink in 2018. As of September 2018, Musk was the majority owner of Neuralink but did not hold an executive position. By August 2020, only three of the eight founding scientists remained at the company, according to an article by Stat News which reported that Neuralink had seen \"years of internal conflict in which rushed timelines have clashed with the slow and incremental pace of science.\"In April 2021, Neuralink demonstrated a monkey playing the game \"Pong\" using the Neuralink implant. While similar technology has existed since 2002, when a research group first demonstrated a monkey moving a computer cursor with neural signals, scientists acknowledged the engineering progress in making the implant wireless and increasing the number of implanted electrodes. In May 2021, co-founder and president Max Hodak announced that he no longer works with the company. Only two of the eight co-founders remained at the company by January 2022.'],\n", 298 | " answer='Biotechnology, neuroscience, and artificial intelligence.'\n", 299 | ")" 300 | ] 301 | }, 302 | "execution_count": 73, 303 | "metadata": {}, 304 | "output_type": "execute_result" 305 | } 306 | ], 307 | "source": [ 308 | "question_neuralink_industry = 'What industry or industries is Neuralink in?'\n", 309 | "vanilla_dspy_rag = vanilla_dspy_rag()\n", 310 | "vanilla_dspy_rag_q1 = vanilla_dspy_rag.forward(question_neuralink_industry)\n", 311 | "vanilla_dspy_rag_q1" 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "execution_count": 76, 317 | "id": "42d7a6e8-1e6d-4017-a1fc-5c7dced2afb1", 318 | "metadata": {}, 319 | "outputs": [ 320 | { 321 | "name": "stdout", 322 | "output_type": "stream", 323 | "text": [ 324 | "\n", 325 | "\n", 326 | "\n", 327 | "\n", 328 | "Answer questions with short factoid answers.\n", 329 | "\n", 330 | "---\n", 331 | "\n", 332 | "Follow the following format.\n", 333 | "\n", 334 | "Context: may contain relevant facts\n", 335 | "\n", 336 | "Question: ${question}\n", 337 | "\n", 338 | "Reasoning: Let's think step by step in order to ${produce the answer}. We ...\n", 339 | "\n", 340 | "Answer: often between 1 and 5 words\n", 341 | "\n", 342 | "---\n", 343 | "\n", 344 | "Context:\n", 345 | "«=== History ===\n", 346 | "Neuralink was founded in 2016 by Elon Musk and a founding team of seven scientists and engineers. The group of initial hires consisted of experts in areas such as neuroscience, biochemistry and robotics. The trademark \"Neuralink\" was purchased from its previous owners in January 2017.In April 2017, Neuralink announced that it was aiming to make devices to treat serious brain diseases in the short-term, with the eventual goal of human enhancement, sometimes called transhumanism. Musk had said his interest in the idea partly stemmed from the science fiction concept of \"neural lace\" in the fictional universe in The Culture, a series of 10 novels by Iain M. Banks.Musk defined the neural lace as a \"digital layer above the cortex\" that would not necessarily imply extensive surgical insertion but ideally an implant through a vein or artery. He said the long-term goal is to achieve \"symbiosis with artificial intelligence\", which he perceives as an existential threat to humanity if it goes unchecked. He believes the device will be \"something analogous to a video game, like a saved game situation, where you are able to resume and upload your last state\" and \"address brain injuries or spinal injuries and make up for whatever lost capacity somebody has with a chip.\"As of 2020, Neuralink was headquartered in San Francisco's Mission District, sharing the Pioneer building with OpenAI, another company co-founded by Musk.\n", 347 | "As of 2022, Neuralink's headquarters were in Fremont, California. Jared Birchall, the head of Musk's family office, was listed as CEO, CFO and president of Neuralink in 2018. As of September 2018, Musk was the majority owner of Neuralink but did not hold an executive position. By August 2020, only three of the eight founding scientists remained at the company, according to an article by Stat News which reported that Neuralink had seen \"years of internal conflict in which rushed timelines have clashed with the slow and incremental pace of science.\"In April 2021, Neuralink demonstrated a monkey playing the game \"Pong\" using the Neuralink implant. While similar technology has existed since 2002, when a research group first demonstrated a monkey moving a computer cursor with neural signals, scientists acknowledged the engineering progress in making the implant wireless and increasing the number of implanted electrodes. In May 2021, co-founder and president Max Hodak announced that he no longer works with the company. Only two of the eight co-founders remained at the company by January 2022.»\n", 348 | "\n", 349 | "Question: What industry or industries is Neuralink in?\n", 350 | "\n", 351 | "Reasoning: Let's think step by step in order to\u001b[32m determine what industry Neuralink is in. We know that Neuralink was founded by a team of scientists and engineers, and that their goal is to create devices for brain diseases and human enhancement. We also know that they share a building with another company co-founded by Elon Musk.\n", 352 | "\n", 353 | "Answer: Biotechnology, neuroscience, and artificial intelligence.\u001b[0m\n", 354 | "\n", 355 | "\n", 356 | "\n" 357 | ] 358 | } 359 | ], 360 | "source": [ 361 | "# We can check and see what's happening under the hood of the Vanilla DSPy RAG pipeline\n", 362 | "vanilla_dspy_rag_lm.inspect_history()" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "id": "40f8fc5a-247d-47d3-a468-84a6d0879710", 368 | "metadata": {}, 369 | "source": [ 370 | "# Fact-check with Our Knowledge Graph" 371 | ] 372 | }, 373 | { 374 | "cell_type": "code", 375 | "execution_count": 17, 376 | "id": "3a4ae2cd-c3ed-4a1d-a824-742cac648a66", 377 | "metadata": {}, 378 | "outputs": [ 379 | { 380 | "name": "stdout", 381 | "output_type": "stream", 382 | "text": [ 383 | "\n", 384 | "\n", 385 | "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n", 386 | "Generated Cypher:\n", 387 | "\u001b[32;1m\u001b[1;3mcypher\n", 388 | "MATCH (o:Organization {name: \"Neuralink\"})-[:INDUSTRY]->(s:Skill)\n", 389 | "RETURN s.name\n", 390 | "\u001b[0m\n", 391 | "Full Context:\n", 392 | "\u001b[32;1m\u001b[1;3m[{'s.name': 'computer interfaces'}, {'s.name': 'neurotechnology'}]\u001b[0m\n", 393 | "\n", 394 | "\u001b[1m> Finished chain.\u001b[0m\n" 395 | ] 396 | } 397 | ], 398 | "source": [ 399 | "# Reminder that our question is: question_neuralink_industry = 'What industry or industries is Neuralink in?'\n", 400 | "kg_context = initial_context_from_knowledge_graph.run(question_neuralink_industry)" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": 18, 406 | "id": "b00f9a36-07e3-47db-a8b8-a7a1946db021", 407 | "metadata": {}, 408 | "outputs": [ 409 | { 410 | "data": { 411 | "text/plain": [ 412 | "'Neuralink is in the computer interfaces and neurotechnology industries.'" 413 | ] 414 | }, 415 | "execution_count": 18, 416 | "metadata": {}, 417 | "output_type": "execute_result" 418 | } 419 | ], 420 | "source": [ 421 | "kg_context" 422 | ] 423 | }, 424 | { 425 | "cell_type": "markdown", 426 | "id": "e1eba594-b5dc-4c68-9544-2be324085243", 427 | "metadata": {}, 428 | "source": [ 429 | "# B. DSPy RAG with Knowledge Graphs" 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": 110, 435 | "id": "785f272c-a664-4beb-8f17-7ddcbbdec436", 436 | "metadata": {}, 437 | "outputs": [], 438 | "source": [ 439 | "dspy_w_kg = KGEnrichedQueryModule(temperature=0)" 440 | ] 441 | }, 442 | { 443 | "cell_type": "code", 444 | "execution_count": 73, 445 | "id": "65fbb7ac-aa2d-4d06-86a2-125afbfcb0b6", 446 | "metadata": {}, 447 | "outputs": [ 448 | { 449 | "data": { 450 | "text/plain": [ 451 | "'Neuralink is in the computer interfaces and neurotechnology industries.'" 452 | ] 453 | }, 454 | "execution_count": 73, 455 | "metadata": {}, 456 | "output_type": "execute_result" 457 | } 458 | ], 459 | "source": [ 460 | "kg_context" 461 | ] 462 | }, 463 | { 464 | "cell_type": "code", 465 | "execution_count": 75, 466 | "id": "d6db12e0-5c47-4981-8967-67a2796d4a05", 467 | "metadata": {}, 468 | "outputs": [ 469 | { 470 | "name": "stdout", 471 | "output_type": "stream", 472 | "text": [ 473 | "can confidently say that Neuralink is in the technology industry, specifically in the field of neurotechnology and brain-computer interfaces.\n" 474 | ] 475 | } 476 | ], 477 | "source": [ 478 | "answer = dspy_w_kg.forward(question=question_neuralink_industry, kg_context = kg_context)\n", 479 | "print(answer)" 480 | ] 481 | }, 482 | { 483 | "cell_type": "code", 484 | "execution_count": 79, 485 | "id": "ca0b345d-20ea-4ed1-94a0-57038888c824", 486 | "metadata": {}, 487 | "outputs": [ 488 | { 489 | "name": "stdout", 490 | "output_type": "stream", 491 | "text": [ 492 | "\n", 493 | "\n", 494 | "\n", 495 | "\n", 496 | "Consolidates and summarizes information from enriched queries and passages.\n", 497 | "\n", 498 | "---\n", 499 | "\n", 500 | "Follow the following format.\n", 501 | "\n", 502 | "Query: ${query}\n", 503 | "\n", 504 | "Passages: Passages retrieved based on the enriched query\n", 505 | "\n", 506 | "Reasoning: Let's think step by step in order to ${produce the enhanced_context}. We ...\n", 507 | "\n", 508 | "Enhanced Context: Context enhanced with KG data\n", 509 | "\n", 510 | "---\n", 511 | "\n", 512 | "Query: What industry or industries is Neuralink in that are related to computer interfaces and neurotechnology?\n", 513 | "\n", 514 | "Passages: «{'long_text': 'Neuralink Corp. is an American neurotechnology company that is developing implantable brain–computer interfaces (BCIs), based in Fremont, California, as of 2024. Founded by Elon Musk and a team of seven scientists and engineers, Neuralink was launched in 2016 and was first publicly reported in March 2017.Since its founding, the company has hired several high-profile neuroscientists from various universities. By July 2019, it had received $158 million in funding (of which $100 million was from Musk) and was employing a staff of 90 employees. At that time, Neuralink announced that it was working on a \"sewing machine-like\" device capable of implanting very thin (4 to 6 μm in width) threads into the brain, and demonstrated a system that reads information from a lab rat via 1,500 electrodes. They had anticipated starting experiments with humans in 2020, but have since moved that projection to 2023. As of May 2023, they have been approved for human trials in the United States.The company has faced criticism for a large amount of euthanization of primates that underwent medical trials. Veterinary records of the monkeys showed a number of complications with electrodes being surgically implanted.\\nOn January 29, 2024, Musk said that Neuralink had successfully implanted a Neuralink device in a human and that the patient was recovering.\\n\\n\\n== Company =='}»\n", 515 | "\n", 516 | "Reasoning: Let's think step by step in order to produce the enhanced context. We start by understanding the query, which is asking for the industry or industries that Neuralink is in. We then look at the specific keywords in the query, such as \"computer interfaces\" and \"neurotechnology\". Based on these keywords, we can infer that Neuralink is likely in the technology industry. To further enhance our understanding, we can look at the passages retrieved based on the enriched query. These passages mention that Neuralink is a neurotechnology company that is developing implantable brain-computer interfaces. This confirms our initial inference that Neuralink is in the technology industry. Additionally, the passages mention that Neuralink has received funding and is employing a staff, further solidifying its position in the industry. Finally, we\n", 517 | "\n", 518 | "Enhanced Context:\u001b[32m can confidently say that Neuralink is in the technology industry, specifically in the field of neurotechnology and brain-computer interfaces.\u001b[0m\n", 519 | "\n", 520 | "\n", 521 | "\n" 522 | ] 523 | } 524 | ], 525 | "source": [ 526 | "# We can check and see what's happening under the hood of the DSPy RAG w/ Knowledge Graph pipeline\n", 527 | "lm.inspect_history()" 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "id": "eeb246aa-8ab7-4705-b8b8-249318006038", 533 | "metadata": {}, 534 | "source": [ 535 | "# Question 2 : List out the companies that Elon Musk co-founded with other founders.\n", 536 | "# A. Vanilla DSPy RAG " 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 86, 542 | "id": "da6cc58d-f9ee-476a-bdd9-1749890450ee", 543 | "metadata": { 544 | "scrolled": true 545 | }, 546 | "outputs": [ 547 | { 548 | "data": { 549 | "text/plain": [ 550 | "Prediction(\n", 551 | " context=['In October 2002, eBay acquired PayPal for $1.5 billion, and that same year, with $100 million of the money he made, Musk founded SpaceX, a spaceflight services company. In 2004, he became an early investor in electric vehicle manufacturer Tesla Motors, Inc. (now Tesla, Inc.). He became its chairman and product architect, assuming the position of CEO in 2008. In 2006, Musk helped create SolarCity, a solar-energy company that was acquired by Tesla in 2016 and became Tesla Energy. In 2013, he proposed a hyperloop high-speed vactrain transportation system. In 2015, he co-founded OpenAI, a nonprofit artificial intelligence research company. The following year, Musk co-founded Neuralink—a neurotechnology company developing brain–computer interfaces—and the Boring Company, a tunnel construction company. In 2022, he acquired Twitter for $44 billion. He subsequently merged the company into newly created X Corp. and rebranded the service as X the following year. In March 2023, he founded xAI, an artificial intelligence company.\\nMusk has expressed views that have made him a polarizing figure. He has been criticized for making unscientific and misleading statements, including COVID-19 misinformation and antisemitic conspiracy theories. His ownership of Twitter has been similarly controversial, being marked by the laying off of a large number of employees, an increase in hate speech and misinformation and disinformation on the website, as well as changes to Twitter Blue verification. In 2018, the U.S. Securities and Exchange Commission (SEC) sued him, alleging that he had falsely announced that he had secured funding for a private takeover of Tesla. To settle the case, Musk stepped down as the chairman of Tesla and paid a $20 million fine.'],\n", 552 | " answer='SpaceX, Tesla, SolarCity, OpenAI, Neuralink, Boring Company, xAI'\n", 553 | ")" 554 | ] 555 | }, 556 | "execution_count": 86, 557 | "metadata": {}, 558 | "output_type": "execute_result" 559 | } 560 | ], 561 | "source": [ 562 | "question_org_elon_cofounded = \"List out the companies that Elon Musk co-founded with other founders.\"\n", 563 | "vanilla_dspy_rag = vanilla_dspy_rag()\n", 564 | "vanilla_dspy_rag_q2 = vanilla_dspy_rag.forward(question=question_org_elon_cofounded)\n", 565 | "vanilla_dspy_rag_q2" 566 | ] 567 | }, 568 | { 569 | "cell_type": "code", 570 | "execution_count": 88, 571 | "id": "7e940330-66e8-4af9-a901-5c45560e14cf", 572 | "metadata": {}, 573 | "outputs": [ 574 | { 575 | "name": "stdout", 576 | "output_type": "stream", 577 | "text": [ 578 | "\n", 579 | "\n", 580 | "\n", 581 | "\n", 582 | "Answer questions with short factoid answers.\n", 583 | "\n", 584 | "---\n", 585 | "\n", 586 | "Follow the following format.\n", 587 | "\n", 588 | "Context: may contain relevant facts\n", 589 | "\n", 590 | "Question: ${question}\n", 591 | "\n", 592 | "Reasoning: Let's think step by step in order to ${produce the answer}. We ...\n", 593 | "\n", 594 | "Answer: often between 1 and 5 words\n", 595 | "\n", 596 | "---\n", 597 | "\n", 598 | "Context:\n", 599 | "«In October 2002, eBay acquired PayPal for $1.5 billion, and that same year, with $100 million of the money he made, Musk founded SpaceX, a spaceflight services company. In 2004, he became an early investor in electric vehicle manufacturer Tesla Motors, Inc. (now Tesla, Inc.). He became its chairman and product architect, assuming the position of CEO in 2008. In 2006, Musk helped create SolarCity, a solar-energy company that was acquired by Tesla in 2016 and became Tesla Energy. In 2013, he proposed a hyperloop high-speed vactrain transportation system. In 2015, he co-founded OpenAI, a nonprofit artificial intelligence research company. The following year, Musk co-founded Neuralink—a neurotechnology company developing brain–computer interfaces—and the Boring Company, a tunnel construction company. In 2022, he acquired Twitter for $44 billion. He subsequently merged the company into newly created X Corp. and rebranded the service as X the following year. In March 2023, he founded xAI, an artificial intelligence company.\n", 600 | "Musk has expressed views that have made him a polarizing figure. He has been criticized for making unscientific and misleading statements, including COVID-19 misinformation and antisemitic conspiracy theories. His ownership of Twitter has been similarly controversial, being marked by the laying off of a large number of employees, an increase in hate speech and misinformation and disinformation on the website, as well as changes to Twitter Blue verification. In 2018, the U.S. Securities and Exchange Commission (SEC) sued him, alleging that he had falsely announced that he had secured funding for a private takeover of Tesla. To settle the case, Musk stepped down as the chairman of Tesla and paid a $20 million fine.»\n", 601 | "\n", 602 | "Question: List out the companies that Elon Musk co-founded with other founders.\n", 603 | "\n", 604 | "Reasoning: Let's think step by step in order to\u001b[32m identify the companies that Elon Musk co-founded with other founders. We will look for companies that he founded after 2002, when he acquired PayPal.\n", 605 | "\n", 606 | "Answer: SpaceX, Tesla, SolarCity, OpenAI, Neuralink, Boring Company, xAI\u001b[0m\n", 607 | "\n", 608 | "\n", 609 | "\n" 610 | ] 611 | } 612 | ], 613 | "source": [ 614 | "vanilla_dspy_rag_lm.inspect_history(15)" 615 | ] 616 | }, 617 | { 618 | "cell_type": "markdown", 619 | "id": "d01a8d37-03be-4e58-a21c-9cd2e26e44db", 620 | "metadata": {}, 621 | "source": [ 622 | "# Did we just spot hallucination in RAG?" 623 | ] 624 | }, 625 | { 626 | "cell_type": "markdown", 627 | "id": "14095fee-7c65-47ff-ace9-12735b243d96", 628 | "metadata": {}, 629 | "source": [ 630 | "This is wrong: \"Elon co-founded SpaceX, Tesla, the Boring Company, xAI\" " 631 | ] 632 | }, 633 | { 634 | "cell_type": "markdown", 635 | "id": "536079c6-7062-424c-9915-c54c82a00e6b", 636 | "metadata": {}, 637 | "source": [ 638 | "# B. DSPy RAG with Knowledge Graph" 639 | ] 640 | }, 641 | { 642 | "cell_type": "code", 643 | "execution_count": 42, 644 | "id": "a2fbcb1c-4bb9-4b85-859e-2c63825aba23", 645 | "metadata": {}, 646 | "outputs": [ 647 | { 648 | "name": "stdout", 649 | "output_type": "stream", 650 | "text": [ 651 | "\n", 652 | "\n", 653 | "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n", 654 | "Generated Cypher:\n", 655 | "\u001b[32;1m\u001b[1;3mcypher\n", 656 | "MATCH (p:Person {name: \"Elon Musk\"})-[:FOUNDED_BY]-(o:Organization)-[:FOUNDED_BY]-(coFounder:Person)\n", 657 | "WHERE p <> coFounder\n", 658 | "RETURN o.name AS CompanyName, collect(coFounder.name) AS CoFounders\n", 659 | "\u001b[0m\n", 660 | "Full Context:\n", 661 | "\u001b[32;1m\u001b[1;3m[{'CompanyName': 'SolarCity', 'CoFounders': ['Lyndon Rive']}, {'CompanyName': 'Zip2', 'CoFounders': ['Kimbal Musk']}, {'CompanyName': 'OpenAI', 'CoFounders': ['Wojciech Zaremba', 'Sam Altman', 'Andrej Karpathy', 'Trevor Blackwell', 'Ilya Sutskever']}, {'CompanyName': 'Excession LLC', 'CoFounders': ['Jared Birchall']}]\u001b[0m\n", 662 | "\n", 663 | "\u001b[1m> Finished chain.\u001b[0m\n" 664 | ] 665 | } 666 | ], 667 | "source": [ 668 | "# Reminder that our question is: question_org_elon_cofounded = \"List out the companies that Elon Musk co-founded with other founders.\"\n", 669 | "kg_context = initial_context_from_knowledge_graph(question_org_elon_cofounded)" 670 | ] 671 | }, 672 | { 673 | "cell_type": "code", 674 | "execution_count": 43, 675 | "id": "f84216eb-f307-449a-a773-132dfef59e5f", 676 | "metadata": {}, 677 | "outputs": [ 678 | { 679 | "data": { 680 | "text/plain": [ 681 | "{'query': 'List out the companies that Elon Musk co-founded with other founders.',\n", 682 | " 'result': \"I don't know the answer.\",\n", 683 | " 'intermediate_steps': [{'query': 'cypher\\nMATCH (p:Person {name: \"Elon Musk\"})-[:FOUNDED_BY]-(o:Organization)-[:FOUNDED_BY]-(coFounder:Person)\\nWHERE p <> coFounder\\nRETURN o.name AS CompanyName, collect(coFounder.name) AS CoFounders\\n'},\n", 684 | " {'context': [{'CompanyName': 'SolarCity', 'CoFounders': ['Lyndon Rive']},\n", 685 | " {'CompanyName': 'Zip2', 'CoFounders': ['Kimbal Musk']},\n", 686 | " {'CompanyName': 'OpenAI',\n", 687 | " 'CoFounders': ['Wojciech Zaremba',\n", 688 | " 'Sam Altman',\n", 689 | " 'Andrej Karpathy',\n", 690 | " 'Trevor Blackwell',\n", 691 | " 'Ilya Sutskever']},\n", 692 | " {'CompanyName': 'Excession LLC', 'CoFounders': ['Jared Birchall']}]}]}" 693 | ] 694 | }, 695 | "execution_count": 43, 696 | "metadata": {}, 697 | "output_type": "execute_result" 698 | } 699 | ], 700 | "source": [ 701 | "kg_context" 702 | ] 703 | }, 704 | { 705 | "cell_type": "code", 706 | "execution_count": 96, 707 | "id": "3ba778de-5d33-432b-afe7-027e3ab6e5bd", 708 | "metadata": {}, 709 | "outputs": [ 710 | { 711 | "data": { 712 | "text/plain": [ 713 | "\"These are the companies that Elon has co-founded with other people:[{'CompanyName': 'SolarCity', 'CoFounders': ['Lyndon Rive']}, {'CompanyName': 'Zip2', 'CoFounders': ['Kimbal Musk']}, {'CompanyName': 'OpenAI', 'CoFounders': ['Wojciech Zaremba', 'Sam Altman', 'Andrej Karpathy', 'Trevor Blackwell', 'Ilya Sutskever']}, {'CompanyName': 'Excession LLC', 'CoFounders': ['Jared Birchall']}]\"" 714 | ] 715 | }, 716 | "execution_count": 96, 717 | "metadata": {}, 718 | "output_type": "execute_result" 719 | } 720 | ], 721 | "source": [ 722 | "# turn our dictionary-type kg data into string-type to fit in the DSPy pipeline\n", 723 | "kg_context = \"These are the companies that Elon has co-founded with other people:\" + str(kg_data['intermediate_steps'][1]['context'])\n", 724 | "kg_context" 725 | ] 726 | }, 727 | { 728 | "cell_type": "code", 729 | "execution_count": 97, 730 | "id": "a94f2a2f-e9ad-42b3-a03f-a5ad288ac32d", 731 | "metadata": {}, 732 | "outputs": [ 733 | { 734 | "name": "stdout", 735 | "output_type": "stream", 736 | "text": [ 737 | "According to KG data, Elon Musk is listed as a co-founder of several companies, including SpaceX, Tesla, X Corp., the Boring Company, xAI, Neuralink, OpenAI, and the\n" 738 | ] 739 | } 740 | ], 741 | "source": [ 742 | "dspy_w_kg = KGEnrichedQueryModule(temperature=0)\n", 743 | "answer = dspy_w_kg.forward(question=question_org_elon_cofounded, kg_context = kg_context)\n", 744 | "print(answer)" 745 | ] 746 | }, 747 | { 748 | "cell_type": "code", 749 | "execution_count": 103, 750 | "id": "9bee5167-6723-4f6f-a8b6-f215cb56fa31", 751 | "metadata": {}, 752 | "outputs": [ 753 | { 754 | "name": "stdout", 755 | "output_type": "stream", 756 | "text": [ 757 | "\n", 758 | "\n", 759 | "\n", 760 | "\n", 761 | "Refines and enriches questions using knowledge graph context.\n", 762 | "\n", 763 | "---\n", 764 | "\n", 765 | "Follow the following format.\n", 766 | "\n", 767 | "Question: ${question}\n", 768 | "\n", 769 | "Kg Context: Knowledge graph metadata\n", 770 | "\n", 771 | "Reasoning: Let's think step by step in order to ${produce the enriched_query}. We ...\n", 772 | "\n", 773 | "Enriched Query: Query enriched with KG context\n", 774 | "\n", 775 | "---\n", 776 | "\n", 777 | "Question: List out the companies that Elon Musk co-founded with other founders.\n", 778 | "\n", 779 | "Kg Context: These are the companies that Elon has co-founded with other people:[{'CompanyName': 'SolarCity', 'CoFounders': ['Lyndon Rive']}, {'CompanyName': 'Zip2', 'CoFounders': ['Kimbal Musk']}, {'CompanyName': 'OpenAI', 'CoFounders': ['Wojciech Zaremba', 'Sam Altman', 'Andrej Karpathy', 'Trevor Blackwell', 'Ilya Sutskever']}, {'CompanyName': 'Excession LLC', 'CoFounders': ['Jared Birchall']}]\n", 780 | "\n", 781 | "Reasoning: Let's think step by step in order to\u001b[32m find out the companies that Elon Musk co-founded with other founders. First, we need to identify the companies that Elon has co-founded. Then, we need to look at the list of co-founders for each company and check if Elon is one of them. Finally, we can compile a list of companies where Elon is listed as a co-founder along with other people.\n", 782 | "\n", 783 | "Enriched Query: List out the companies where Elon Musk is listed as a co-founder along with other people.\u001b[0m\n", 784 | "\n", 785 | "\n", 786 | "\n", 787 | "\n", 788 | "\n", 789 | "\n", 790 | "\n", 791 | "Consolidates and summarizes information from enriched queries and passages.\n", 792 | "\n", 793 | "---\n", 794 | "\n", 795 | "Follow the following format.\n", 796 | "\n", 797 | "Query: ${query}\n", 798 | "\n", 799 | "Passages: Passages retrieved based on the enriched query\n", 800 | "\n", 801 | "Reasoning: Let's think step by step in order to ${produce the enhanced_context}. We ...\n", 802 | "\n", 803 | "Enhanced Context: Context enhanced with KG data\n", 804 | "\n", 805 | "---\n", 806 | "\n", 807 | "Query: List out the companies where Elon Musk is listed as a co-founder along with other people.\n", 808 | "\n", 809 | "Passages:\n", 810 | "[1] «{'long_text': \"Elon Reeve Musk ( EE-lon; born June 28, 1971) is a businessman and investor. He is the founder, chairman, CEO, and CTO of SpaceX; angel investor, CEO, product architect, and former chairman of Tesla, Inc.; owner, executive chairman, and CTO of X Corp.; founder of the Boring Company and xAI; co-founder of Neuralink and OpenAI; and president of the Musk Foundation. He is one of the wealthiest people in the world, with an estimated net worth of US$190 billion as of March 2024, according to the Bloomberg Billionaires Index, and $195 billion according to Forbes, primarily from his ownership stakes in Tesla and SpaceX.A member of the wealthy South African Musk family, Elon was born in Pretoria and briefly attended the University of Pretoria before immigrating to Canada at age 18, acquiring citizenship through his Canadian-born mother. Two years later, he matriculated at Queen's University at Kingston in Canada. Musk later transferred to the University of Pennsylvania, and received bachelor's degrees in economics and physics. He moved to California in 1995 to attend Stanford University, but dropped out after two days and, with his brother Kimbal, co-founded online city guide software company Zip2. The startup was acquired by Compaq for $307 million in 1999, and that same year Musk co-founded X.com, a direct bank. X.com merged with Confinity in 2000 to form PayPal.\"}»\n", 811 | "[2] «{'long_text': \"Elon Reeve Musk ( EE-lon; born June 28, 1971) is a businessman and investor. He is the founder, chairman, CEO, and CTO of SpaceX; angel investor, CEO, product architect, and former chairman of Tesla, Inc.; owner, executive chairman, and CTO of X Corp.; founder of the Boring Company and xAI; co-founder of Neuralink and OpenAI; and president of the Musk Foundation. He is one of the wealthiest people in the world, with an estimated net worth of US$190 billion as of March 2024, according to the Bloomberg Billionaires Index, and $195 billion according to Forbes, primarily from his ownership stakes in Tesla and SpaceX.A member of the wealthy South African Musk family, Elon was born in Pretoria and briefly attended the University of Pretoria before immigrating to Canada at age 18, acquiring citizenship through his Canadian-born mother. Two years later, he matriculated at Queen's University at Kingston in Canada. Musk later transferred to the University of Pennsylvania, and received bachelor's degrees in economics and physics. He moved to California in 1995 to attend Stanford University, but dropped out after two days and, with his brother Kimbal, co-founded online city guide software company Zip2. The startup was acquired by Compaq for $307 million in 1999, and that same year Musk co-founded X.com, a direct bank. X.com merged with Confinity in 2000 to form PayPal.\"}»\n", 812 | "[3] «{'long_text': \"Elon Reeve Musk ( EE-lon; born June 28, 1971) is a businessman and investor. He is the founder, chairman, CEO, and CTO of SpaceX; angel investor, CEO, product architect, and former chairman of Tesla, Inc.; owner, executive chairman, and CTO of X Corp.; founder of the Boring Company and xAI; co-founder of Neuralink and OpenAI; and president of the Musk Foundation. He is one of the wealthiest people in the world, with an estimated net worth of US$190 billion as of March 2024, according to the Bloomberg Billionaires Index, and $195 billion according to Forbes, primarily from his ownership stakes in Tesla and SpaceX.A member of the wealthy South African Musk family, Elon was born in Pretoria and briefly attended the University of Pretoria before immigrating to Canada at age 18, acquiring citizenship through his Canadian-born mother. Two years later, he matriculated at Queen's University at Kingston in Canada. Musk later transferred to the University of Pennsylvania, and received bachelor's degrees in economics and physics. He moved to California in 1995 to attend Stanford University, but dropped out after two days and, with his brother Kimbal, co-founded online city guide software company Zip2. The startup was acquired by Compaq for $307 million in 1999, and that same year Musk co-founded X.com, a direct bank. X.com merged with Confinity in 2000 to form PayPal.\"}»\n", 813 | "\n", 814 | "Reasoning: Let's think step by step in order to\u001b[32m produce the enhanced_context. We first need to identify the companies where Elon Musk is listed as a co-founder. To do this, we can use KG data to find information about Elon Musk's involvement in various companies. We can then cross-reference this information with a list of companies where he is listed as a co-founder. Once we have this list, we can use it to filter out any irrelevant information and focus on the companies that we are interested in. This will help us to produce a more accurate and relevant enhanced context.\n", 815 | "\n", 816 | "Enhanced Context: According to KG data, Elon Musk is listed as a co-founder of several companies, including SpaceX, Tesla, X Corp., the Boring Company, xAI, Neuralink, OpenAI, and the\u001b[0m\n", 817 | "\n", 818 | "\n", 819 | "\n" 820 | ] 821 | } 822 | ], 823 | "source": [ 824 | "lm.inspect_history(7)" 825 | ] 826 | } 827 | ], 828 | "metadata": { 829 | "kernelspec": { 830 | "display_name": "Python 3 (ipykernel)", 831 | "language": "python", 832 | "name": "python3" 833 | }, 834 | "language_info": { 835 | "codemirror_mode": { 836 | "name": "ipython", 837 | "version": 3 838 | }, 839 | "file_extension": ".py", 840 | "mimetype": "text/x-python", 841 | "name": "python", 842 | "nbconvert_exporter": "python", 843 | "pygments_lexer": "ipython3", 844 | "version": "3.10.12" 845 | } 846 | }, 847 | "nbformat": 4, 848 | "nbformat_minor": 5 849 | } 850 | --------------------------------------------------------------------------------