├── .env.template ├── .gitignore ├── README.md ├── end-to-end-lupus.ipynb ├── truncated-pdfs ├── GAP-between-patients-and-clinicians_2023_Best-Practice-trunc.pdf ├── biomolecules-11-00928-v2-trunc.pdf ├── nihms-362971-trunc.pdf └── pgpm-13-39-trunc.pdf └── webinar_demo ├── Demo_Hybrid.ipynb ├── Demo_KGB.ipynb └── Demo_QA.ipynb /.env.template: -------------------------------------------------------------------------------- 1 | NEO4J_URI=neo4j+s://.databases.neo4j.io 2 | NEO4J_USERNAME=neo4j 3 | NEO4J_PASSWORD= 4 | 5 | OPENAI_API_KEY=sk-... -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .env 3 | .idea/ 4 | .ipynb_checkpoints/ 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # GraphRAG Python package Examples 2 | 3 | This repository contains a worked example using the [GraphRAG Python package](https://neo4j.com/docs/neo4j-graphrag-python/current/index.html) for Neo4j. The example demonstrates the end-to-end workflow, starting from unstructured documents (in this case pdfs), to knowledge graph construction, knowledge graph retriever design, and a working GraphRAG pipeline. Research papers on Lupus are used as the data source. 4 | 5 | 1. The [end-to-end-lupus](end-to-end-lupus.ipynb) notebook contains the worked example. 6 | 2. The [corresponding blog post](https://neo4j.com/blog/graphrag-python-package/) has a full write-up walking through the example with more details, explanations, and resources. 7 | 3. The [truncated-pdfs](truncated-pdfs) directory contains the pdf source files. They were obtained from [NIH PubMed](https://pubmed.ncbi.nlm.nih.gov/). Some pages at the end containing references have been truncated to better focus the knowledge graph on medical information rather than citations and other publications. 8 | 9 | 10 | -------------------------------------------------------------------------------- /end-to-end-lupus.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "metadata": {}, 5 | "cell_type": "markdown", 6 | "source": [ 7 | "# GraphRAG Python package End-to-End Example\n", 8 | "\n", 9 | "This notebook contains an end-to-end worked example using the [GraphRAG Python package](https://neo4j.com/docs/neo4j-graphrag-python/current/index.html) for Neo4j. It starts with unstructured documents (in this case pdfs), and progresses through knowledge graph construction, knowledge graph retriever design, and complete GraphRAG pipelines. \n", 10 | "\n", 11 | "Research papers on Lupus are used as the data source. We design a couple of different retrievers based on different knowledge graph retrieval patterns. \n", 12 | "\n", 13 | "For more details and explanations around each of the below steps, see the [corresponding blog post](https://neo4j.com/blog/graphrag-python-package/) which contains a full write-up, in-depth comparison of the retrieval patterns, and additional learning resources. \n", 14 | "\n", 15 | "## Pre-Requisites\n", 16 | "\n", 17 | "1. __Create a Neo4j Database__: To work through this RAG example, you need a database for storing and retrieving data. There are many options for this. You can quickly start a free Neo4j Graph Database using [Neo4j AuraDB](https://neo4j.com/product/auradb/?ref=neo4j-home-hero). You can use __AuraDB Free__ or start an __AuraDB Professional (Pro) free trial__ for higher ingestion and retrieval performance. The Pro instances have a bit more RAM; we recommend them for the best user experience.\n", 18 | "2. __Obtain an OpenAI Key__: This example requires an OpenAI API key to use language models and embedders. The cost should be very minimal. If you do not yet have an OpenAI API key you can [create an OpenAI account](https://platform.openai.com/signup) or [sign in](https://platform.openai.com/login). Next, navigate to the [API key page](https://platform.openai.com/account/api-keys) and click \"Create new secret key\". Optionally naming the key. \n", 19 | "3. __Fill in Credentials__: Either by copying the [`.env.template`](.env.template) file, naming it `.env`, and filling in the appropriate credentials, or by manually putting the credentials into the second code cell below. You will need:\n", 20 | " 1. The Neo4j URI, username, and password variables from when you created the database. If you created your database on AuraDB, they are in the file you downloaded.\n", 21 | " 2. Your OpenAI API key.\n", 22 | "\n" 23 | ], 24 | "id": "cf0be1ddc2bd2434" 25 | }, 26 | { 27 | "metadata": {}, 28 | "cell_type": "markdown", 29 | "source": "## Setup", 30 | "id": "799a7b09475ed2a4" 31 | }, 32 | { 33 | "metadata": {}, 34 | "cell_type": "code", 35 | "source": [ 36 | "%%capture\n", 37 | "%pip install fsspec langchain-text-splitters tiktoken openai python-dotenv numpy torch neo4j-graphrag" 38 | ], 39 | "id": "9820f541adf30bfd", 40 | "outputs": [], 41 | "execution_count": null 42 | }, 43 | { 44 | "metadata": {}, 45 | "cell_type": "code", 46 | "source": [ 47 | "from dotenv import load_dotenv\n", 48 | "import os\n", 49 | "\n", 50 | "# load neo4j credentials (and openai api key in background).\n", 51 | "load_dotenv('.env', override=True)\n", 52 | "NEO4J_URI = os.getenv('NEO4J_URI')\n", 53 | "NEO4J_USERNAME = os.getenv('NEO4J_USERNAME')\n", 54 | "NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')\n", 55 | "\n", 56 | "#uncomment this line if you aren't using a .env file\n", 57 | "# os.environ['OPENAI_API_KEY'] = 'copy_paste_the_openai_key_here'" 58 | ], 59 | "id": "a023c71324bf6e7f", 60 | "outputs": [], 61 | "execution_count": null 62 | }, 63 | { 64 | "metadata": {}, 65 | "cell_type": "markdown", 66 | "source": [ 67 | "## Knowledge Graph Building\n", 68 | "\n", 69 | "The `SimpleKGPipeline` class allows you to automatically build a knowledge graph with a few key inputs, including\n", 70 | "- a driver to connect to Neo4j,\n", 71 | "- an LLM for entity extraction, and\n", 72 | "- an embedding model to create vectors on text chunks for similarity search.\n", 73 | "\n", 74 | "There are also some optional inputs, such as node labels, relationship types, and a custom prompt template, which we will use to improve the quality of the knowledge graph. For full details on this, see [the blog](https://neo4j.com/blog/graphrag-python-package/).\n" 75 | ], 76 | "id": "418bd212bae7f492" 77 | }, 78 | { 79 | "metadata": {}, 80 | "cell_type": "code", 81 | "source": [ 82 | "import neo4j\n", 83 | "from neo4j_graphrag.llm import OpenAILLM\n", 84 | "from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings\n", 85 | "\n", 86 | "driver = neo4j.GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))\n", 87 | "\n", 88 | "ex_llm=OpenAILLM(\n", 89 | " model_name=\"gpt-4o-mini\",\n", 90 | " model_params={\n", 91 | " \"response_format\": {\"type\": \"json_object\"}, # use json_object formatting for best results\n", 92 | " \"temperature\": 0 # turning temperature down for more deterministic results\n", 93 | " }\n", 94 | ")\n", 95 | "\n", 96 | "#create text embedder\n", 97 | "embedder = OpenAIEmbeddings()" 98 | ], 99 | "id": "fd60a2f512b9701d", 100 | "outputs": [], 101 | "execution_count": null 102 | }, 103 | { 104 | "metadata": {}, 105 | "cell_type": "code", 106 | "source": [ 107 | "#define node labels\n", 108 | "basic_node_labels = [\"Object\", \"Entity\", \"Group\", \"Person\", \"Organization\", \"Place\"]\n", 109 | "\n", 110 | "academic_node_labels = [\"ArticleOrPaper\", \"PublicationOrJournal\"]\n", 111 | "\n", 112 | "medical_node_labels = [\"Anatomy\", \"BiologicalProcess\", \"Cell\", \"CellularComponent\", \n", 113 | " \"CellType\", \"Condition\", \"Disease\", \"Drug\",\n", 114 | " \"EffectOrPhenotype\", \"Exposure\", \"GeneOrProtein\", \"Molecule\",\n", 115 | " \"MolecularFunction\", \"Pathway\"]\n", 116 | "\n", 117 | "node_labels = basic_node_labels + academic_node_labels + medical_node_labels\n", 118 | "\n", 119 | "# define relationship types\n", 120 | "rel_types = [\"ACTIVATES\", \"AFFECTS\", \"ASSESSES\", \"ASSOCIATED_WITH\", \"AUTHORED\",\n", 121 | " \"BIOMARKER_FOR\", \"CAUSES\", \"CITES\", \"CONTRIBUTES_TO\", \"DESCRIBES\", \"EXPRESSES\",\n", 122 | " \"HAS_REACTION\", \"HAS_SYMPTOM\", \"INCLUDES\", \"INTERACTS_WITH\", \"PRESCRIBED\",\n", 123 | " \"PRODUCES\", \"RECEIVED\", \"RESULTS_IN\", \"TREATS\", \"USED_FOR\"]\n" 124 | ], 125 | "id": "6cba83fa4638e21d", 126 | "outputs": [], 127 | "execution_count": null 128 | }, 129 | { 130 | "metadata": {}, 131 | "cell_type": "code", 132 | "source": [ 133 | "prompt_template = '''\n", 134 | "You are a medical researcher tasks with extracting information from papers \n", 135 | "and structuring it in a property graph to inform further medical and research Q&A.\n", 136 | "\n", 137 | "Extract the entities (nodes) and specify their type from the following Input text.\n", 138 | "Also extract the relationships between these nodes. the relationship direction goes from the start node to the end node. \n", 139 | "\n", 140 | "\n", 141 | "Return result as JSON using the following format:\n", 142 | "{{\"nodes\": [ {{\"id\": \"0\", \"label\": \"the type of entity\", \"properties\": {{\"name\": \"name of entity\" }} }}],\n", 143 | " \"relationships\": [{{\"type\": \"TYPE_OF_RELATIONSHIP\", \"start_node_id\": \"0\", \"end_node_id\": \"1\", \"properties\": {{\"details\": \"Description of the relationship\"}} }}] }}\n", 144 | "\n", 145 | "- Use only the information from the Input text. Do not add any additional information. \n", 146 | "- If the input text is empty, return empty Json. \n", 147 | "- Make sure to create as many nodes and relationships as needed to offer rich medical context for further research.\n", 148 | "- An AI knowledge assistant must be able to read this graph and immediately understand the context to inform detailed research questions. \n", 149 | "- Multiple documents will be ingested from different sources and we are using this property graph to connect information, so make sure entity types are fairly general. \n", 150 | "\n", 151 | "Use only fhe following nodes and relationships (if provided):\n", 152 | "{schema}\n", 153 | "\n", 154 | "Assign a unique ID (string) to each node, and reuse it to define relationships.\n", 155 | "Do respect the source and target node types for relationship and\n", 156 | "the relationship direction.\n", 157 | "\n", 158 | "Do not return any additional information other than the JSON in it.\n", 159 | "\n", 160 | "Examples:\n", 161 | "{examples}\n", 162 | "\n", 163 | "Input text:\n", 164 | "\n", 165 | "{text}\n", 166 | "'''" 167 | ], 168 | "id": "8cbcdedcd1757b1d", 169 | "outputs": [], 170 | "execution_count": null 171 | }, 172 | { 173 | "metadata": {}, 174 | "cell_type": "code", 175 | "source": [ 176 | "from neo4j_graphrag.experimental.components.text_splitters.fixed_size_splitter import FixedSizeSplitter\n", 177 | "from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline\n", 178 | "\n", 179 | "kg_builder_pdf = SimpleKGPipeline(\n", 180 | " llm=ex_llm,\n", 181 | " driver=driver,\n", 182 | " text_splitter=FixedSizeSplitter(chunk_size=500, chunk_overlap=100),\n", 183 | " embedder=embedder,\n", 184 | " entities=node_labels,\n", 185 | " relations=rel_types,\n", 186 | " prompt_template=prompt_template,\n", 187 | " from_pdf=True\n", 188 | ")" 189 | ], 190 | "id": "94e1dfac980e6527", 191 | "outputs": [], 192 | "execution_count": null 193 | }, 194 | { 195 | "metadata": {}, 196 | "cell_type": "markdown", 197 | "source": "Below, we run the `SimpleKGPipeline` to construct our knowledge graph from 3 pdf documents and store in Neo4j.", 198 | "id": "4d8ef81d0b5ac70c" 199 | }, 200 | { 201 | "metadata": {}, 202 | "cell_type": "code", 203 | "source": [ 204 | "pdf_file_paths = ['truncated-pdfs/biomolecules-11-00928-v2-trunc.pdf', \n", 205 | " 'truncated-pdfs/GAP-between-patients-and-clinicians_2023_Best-Practice-trunc.pdf', \n", 206 | " 'truncated-pdfs/pgpm-13-39-trunc.pdf']\n", 207 | "\n", 208 | "for path in pdf_file_paths:\n", 209 | " print(f\"Processing : {path}\")\n", 210 | " pdf_result = await kg_builder_pdf.run_async(file_path=path)\n", 211 | " print(f\"Result: {pdf_result}\")" 212 | ], 213 | "id": "edeee98826a970a8", 214 | "outputs": [], 215 | "execution_count": null 216 | }, 217 | { 218 | "metadata": {}, 219 | "cell_type": "markdown", 220 | "source": "## Knowledge Graph Retrieval", 221 | "id": "a9c0fd965b15b143" 222 | }, 223 | { 224 | "metadata": {}, 225 | "cell_type": "markdown", 226 | "source": "We will leverage Neo4j's vector search capabilities here. To do this, we need to begin by creating a vector index on the text chunks from the PDFs, which are stored on `Chunk` nodes in our knowledge graph.", 227 | "id": "4d7b98e310104246" 228 | }, 229 | { 230 | "metadata": {}, 231 | "cell_type": "code", 232 | "source": [ 233 | "from neo4j_graphrag.indexes import create_vector_index\n", 234 | "\n", 235 | "create_vector_index(driver, name=\"text_embeddings\", label=\"Chunk\",\n", 236 | " embedding_property=\"embedding\", dimensions=1536, similarity_fn=\"cosine\")" 237 | ], 238 | "id": "940b051107b89204", 239 | "outputs": [], 240 | "execution_count": null 241 | }, 242 | { 243 | "metadata": {}, 244 | "cell_type": "markdown", 245 | "source": "Now that the index is set up, we will start simple with a __VectorRetriever__. The __VectorRetriever__ just queries `Chunk` nodes via vector search, bringing back the text and some metadata.", 246 | "id": "ec95391c989ee694" 247 | }, 248 | { 249 | "metadata": {}, 250 | "cell_type": "code", 251 | "source": [ 252 | "from neo4j_graphrag.retrievers import VectorRetriever\n", 253 | "\n", 254 | "vector_retriever = VectorRetriever(\n", 255 | " driver,\n", 256 | " index_name=\"text_embeddings\",\n", 257 | " embedder=embedder,\n", 258 | " return_properties=[\"text\"],\n", 259 | ")" 260 | ], 261 | "id": "eeda7e519c60a02b", 262 | "outputs": [], 263 | "execution_count": null 264 | }, 265 | { 266 | "metadata": {}, 267 | "cell_type": "markdown", 268 | "source": "Below we visualize the context we get back when submitting a search prompt. ", 269 | "id": "982112cc273a472e" 270 | }, 271 | { 272 | "metadata": {}, 273 | "cell_type": "code", 274 | "source": [ 275 | "import json\n", 276 | "\n", 277 | "vector_res = vector_retriever.get_search_results(query_text = \"How is precision medicine applied to Lupus?\", \n", 278 | " top_k=3)\n", 279 | "for i in vector_res.records: print(\"====\\n\" + json.dumps(i.data(), indent=4))" 280 | ], 281 | "id": "ca27731a1f99d71d", 282 | "outputs": [], 283 | "execution_count": null 284 | }, 285 | { 286 | "metadata": {}, 287 | "cell_type": "markdown", 288 | "source": [ 289 | "The GraphRAG Python Package offers [a wide range of useful retrievers](https://neo4j.com/docs/neo4j-graphrag-python/current/user_guide_rag.html#retriever-configuration), each covering different knowledge graph retrieval patterns.\n", 290 | "\n", 291 | "Below we will use the __`VectorCypherRetriever`__, which allows you to run a graph traversal after finding nodes with vector search. This uses Cypher, Neo4j's graph query language, to define the logic for traversing the graph. \n", 292 | "\n", 293 | "As a simple starting point, we'll traverse up to 3 hops out from each Chunk, capture the relationships encountered, and include them in the response alongside our text chunks.\n" 294 | ], 295 | "id": "2b2ee3da2339365f" 296 | }, 297 | { 298 | "metadata": {}, 299 | "cell_type": "code", 300 | "source": [ 301 | "from neo4j_graphrag.retrievers import VectorCypherRetriever\n", 302 | "\n", 303 | "vc_retriever = VectorCypherRetriever(\n", 304 | " driver,\n", 305 | " index_name=\"text_embeddings\",\n", 306 | " embedder=embedder,\n", 307 | " retrieval_query=\"\"\"\n", 308 | "//1) Go out 2-3 hops in the entity graph and get relationships\n", 309 | "WITH node AS chunk\n", 310 | "MATCH (chunk)<-[:FROM_CHUNK]-()-[relList:!FROM_CHUNK]-{1,2}()\n", 311 | "UNWIND relList AS rel\n", 312 | "\n", 313 | "//2) collect relationships and text chunks\n", 314 | "WITH collect(DISTINCT chunk) AS chunks, \n", 315 | " collect(DISTINCT rel) AS rels\n", 316 | "\n", 317 | "//3) format and return context\n", 318 | "RETURN '=== text ===\\n' + apoc.text.join([c in chunks | c.text], '\\n---\\n') + '\\n\\n=== kg_rels ===\\n' +\n", 319 | " apoc.text.join([r in rels | startNode(r).name + ' - ' + type(r) + '(' + coalesce(r.details, '') + ')' + ' -> ' + endNode(r).name ], '\\n---\\n') AS info\n", 320 | "\"\"\"\n", 321 | ")" 322 | ], 323 | "id": "2c81ad815a7b1bf9", 324 | "outputs": [], 325 | "execution_count": null 326 | }, 327 | { 328 | "metadata": {}, 329 | "cell_type": "markdown", 330 | "source": "Below we visualize the context we get back when submitting a search prompt. ", 331 | "id": "ef5183123679ca2d" 332 | }, 333 | { 334 | "metadata": {}, 335 | "cell_type": "code", 336 | "source": [ 337 | "vc_res = vc_retriever.get_search_results(query_text = \"How is precision medicine applied to Lupus?\", top_k=3)\n", 338 | "\n", 339 | "# print output\n", 340 | "kg_rel_pos = vc_res.records[0]['info'].find('\\n\\n=== kg_rels ===\\n')\n", 341 | "print(\"# Text Chunk Context:\")\n", 342 | "print(vc_res.records[0]['info'][:kg_rel_pos])\n", 343 | "print(\"# KG Context From Relationships:\")\n", 344 | "print(vc_res.records[0]['info'][kg_rel_pos:])" 345 | ], 346 | "id": "4a1df583f2450f35", 347 | "outputs": [], 348 | "execution_count": null 349 | }, 350 | { 351 | "metadata": {}, 352 | "cell_type": "markdown", 353 | "source": [ 354 | "## GraphRAG\n", 355 | " \n", 356 | " You can construct GraphRAG pipelines with the `GraphRAG` class. At a minimum, you will need to pass the constructor an LLM and a retriever. You can optionally pass a custom prompt template. We will do so here just to provide a bit more guidance for the LLM to stick to information from our data source.\n", 357 | " \n", 358 | "Below we create `GraphRAG` objects for both the vector and vector-cypher retrievers. " 359 | ], 360 | "id": "f2e55b8b3511cf1" 361 | }, 362 | { 363 | "metadata": {}, 364 | "cell_type": "code", 365 | "source": [ 366 | "from neo4j_graphrag.llm import OpenAILLM as LLM\n", 367 | "from neo4j_graphrag.generation import RagTemplate\n", 368 | "from neo4j_graphrag.generation.graphrag import GraphRAG\n", 369 | "\n", 370 | "llm = LLM(model_name=\"gpt-4o\", model_params={\"temperature\": 0.0})\n", 371 | "\n", 372 | "rag_template = RagTemplate(template='''Answer the Question using the following Context. Only respond with information mentioned in the Context. Do not inject any speculative information not mentioned. \n", 373 | "\n", 374 | "# Question:\n", 375 | "{query_text}\n", 376 | " \n", 377 | "# Context:\n", 378 | "{context}\n", 379 | "\n", 380 | "# Answer:\n", 381 | "''', expected_inputs=['query_text', 'context'])\n", 382 | "\n", 383 | "v_rag = GraphRAG(llm=llm, retriever=vector_retriever, prompt_template=rag_template)\n", 384 | "vc_rag = GraphRAG(llm=llm, retriever=vc_retriever, prompt_template=rag_template)" 385 | ], 386 | "id": "8e2cf317a83d59ae", 387 | "outputs": [], 388 | "execution_count": null 389 | }, 390 | { 391 | "metadata": {}, 392 | "cell_type": "markdown", 393 | "source": "Now we can run GraphRAG and examine the outputs. ", 394 | "id": "fc8e13302fdeadd3" 395 | }, 396 | { 397 | "metadata": {}, 398 | "cell_type": "code", 399 | "source": [ 400 | "q = \"How is precision medicine applied to Lupus? provide in list format.\"\n", 401 | "print(f\"Vector Response: \\n{v_rag.search(q, retriever_config={'top_k':5}).answer}\")\n", 402 | "print(\"\\n===========================\\n\")\n", 403 | "print(f\"Vector + Cypher Response: \\n{vc_rag.search(q, retriever_config={'top_k':5}).answer}\")" 404 | ], 405 | "id": "4d0aff643e689611", 406 | "outputs": [], 407 | "execution_count": null 408 | }, 409 | { 410 | "metadata": {}, 411 | "cell_type": "code", 412 | "source": [ 413 | "q = \"Can you summarize systemic lupus erythematosus (SLE)? including common effects, biomarkers, and treatments? Provide in detailed list format.\"\n", 414 | "\n", 415 | "v_rag_result = v_rag.search(q, retriever_config={'top_k': 5}, return_context=True)\n", 416 | "vc_rag_result = vc_rag.search(q, retriever_config={'top_k': 5}, return_context=True)\n", 417 | "\n", 418 | "print(f\"Vector Response: \\n{v_rag_result.answer}\")\n", 419 | "print(\"\\n===========================\\n\")\n", 420 | "print(f\"Vector + Cypher Response: \\n{vc_rag_result.answer}\")" 421 | ], 422 | "id": "a2c44f834bdb13ad", 423 | "outputs": [], 424 | "execution_count": null 425 | }, 426 | { 427 | "metadata": {}, 428 | "cell_type": "code", 429 | "source": "for i in v_rag_result.retriever_result.items: print(json.dumps(eval(i.content), indent=1))", 430 | "id": "1088766f7109b6b3", 431 | "outputs": [], 432 | "execution_count": null 433 | }, 434 | { 435 | "metadata": {}, 436 | "cell_type": "code", 437 | "source": [ 438 | "vc_ls = vc_rag_result.retriever_result.items[0].content.split('\\\\n---\\\\n')\n", 439 | "for i in vc_ls:\n", 440 | " if \"biomarker\" in i: print(i)" 441 | ], 442 | "id": "8b10fdab707a0d4a", 443 | "outputs": [], 444 | "execution_count": null 445 | }, 446 | { 447 | "metadata": {}, 448 | "cell_type": "code", 449 | "source": [ 450 | "vc_ls = vc_rag_result.retriever_result.items[0].content.split('\\\\n---\\\\n')\n", 451 | "for i in vc_ls:\n", 452 | " if \"treat\" in i: print(i)" 453 | ], 454 | "id": "7f1510ef53a9d253", 455 | "outputs": [], 456 | "execution_count": null 457 | }, 458 | { 459 | "metadata": {}, 460 | "cell_type": "code", 461 | "source": [ 462 | "q = \"Can you summarize systemic lupus erythematosus (SLE)? including common effects, biomarkers, treatments, and current challenges faced by Physicians and patients? provide in list format with details for each item.\"\n", 463 | "print(f\"Vector Response: \\n{v_rag.search(q, retriever_config={'top_k': 5}).answer}\")\n", 464 | "print(\"\\n===========================\\n\")\n", 465 | "print(f\"Vector + Cypher Response: \\n{vc_rag.search(q, retriever_config={'top_k': 5}).answer}\")" 466 | ], 467 | "id": "d03ec43f6f02f608", 468 | "outputs": [], 469 | "execution_count": null 470 | }, 471 | { 472 | "metadata": {}, 473 | "cell_type": "code", 474 | "source": "", 475 | "id": "c7d66300af9eac12", 476 | "outputs": [], 477 | "execution_count": null 478 | } 479 | ], 480 | "metadata": { 481 | "kernelspec": { 482 | "display_name": "Python 3", 483 | "language": "python", 484 | "name": "python3" 485 | }, 486 | "language_info": { 487 | "codemirror_mode": { 488 | "name": "ipython", 489 | "version": 2 490 | }, 491 | "file_extension": ".py", 492 | "mimetype": "text/x-python", 493 | "name": "python", 494 | "nbconvert_exporter": "python", 495 | "pygments_lexer": "ipython2", 496 | "version": "2.7.6" 497 | } 498 | }, 499 | "nbformat": 4, 500 | "nbformat_minor": 5 501 | } 502 | -------------------------------------------------------------------------------- /truncated-pdfs/GAP-between-patients-and-clinicians_2023_Best-Practice-trunc.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-product-examples/graphrag-python-examples/2c2a4b6c23fae2b6317c688252175c23fc728a07/truncated-pdfs/GAP-between-patients-and-clinicians_2023_Best-Practice-trunc.pdf -------------------------------------------------------------------------------- /truncated-pdfs/biomolecules-11-00928-v2-trunc.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-product-examples/graphrag-python-examples/2c2a4b6c23fae2b6317c688252175c23fc728a07/truncated-pdfs/biomolecules-11-00928-v2-trunc.pdf -------------------------------------------------------------------------------- /truncated-pdfs/nihms-362971-trunc.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-product-examples/graphrag-python-examples/2c2a4b6c23fae2b6317c688252175c23fc728a07/truncated-pdfs/nihms-362971-trunc.pdf -------------------------------------------------------------------------------- /truncated-pdfs/pgpm-13-39-trunc.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neo4j-product-examples/graphrag-python-examples/2c2a4b6c23fae2b6317c688252175c23fc728a07/truncated-pdfs/pgpm-13-39-trunc.pdf -------------------------------------------------------------------------------- /webinar_demo/Demo_Hybrid.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "metadata": {}, 5 | "cell_type": "markdown", 6 | "source": "# Demo: HybridRetriever\n", 7 | "id": "96ef8ad2442c6215" 8 | }, 9 | { 10 | "metadata": {}, 11 | "cell_type": "code", 12 | "outputs": [], 13 | "execution_count": 2, 14 | "source": [ 15 | "import os\n", 16 | "from dotenv import load_dotenv\n", 17 | "\n", 18 | "# load neo4j credentials (and openai api key in background)\n", 19 | "load_dotenv('.env', override=True)\n", 20 | "NEO4J_URI = os.getenv('NEO4J_URI', 'bolt://localhost:7687')\n", 21 | "NEO4J_USERNAME = os.getenv('NEO4J_USERNAME', 'neo4j')\n", 22 | "NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')\n", 23 | "NEO4J_DATABASE = os.getenv('NEO4J_DATABASE', None)" 24 | ], 25 | "id": "6dafc17f-9b7a-42d9-b85b-50b3d86747b4" 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 7, 30 | "id": "5a48cba7-f135-45bd-89c4-d4d32462c75d", 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "import logging\n", 35 | "logging.getLogger(\"neo4j.notifications\").setLevel(logging.ERROR)" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 8, 41 | "id": "d9bf50ad-ad12-4d65-b808-c5af93c147be", 42 | "metadata": {}, 43 | "outputs": [], 44 | "source": [ 45 | "import neo4j\n", 46 | "\n", 47 | "driver = neo4j.GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD), database=NEO4J_DATABASE)" 48 | ] 49 | }, 50 | { 51 | "cell_type": "raw", 52 | "id": "f00673c8-9397-493e-b1ec-cbc439f937bb", 53 | "metadata": {}, 54 | "source": [ 55 | "from neo4j_graphrag.indexes import create_fulltext_index\n", 56 | "\n", 57 | "create_fulltext_index(\n", 58 | " driver,\n", 59 | " name=\"chunk_text\", \n", 60 | " label=\"Chunk\",\n", 61 | " node_properties=[\"text\"],\n", 62 | ")\n" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "id": "9466d87a-d5cd-4996-aeb3-a3be94f6d38f", 69 | "metadata": {}, 70 | "outputs": [], 71 | "source": [ 72 | "from neo4j_graphrag.retrievers import HybridRetriever\n", 73 | "from neo4j_graphrag.embeddings import OpenAIEmbeddings\n", 74 | "\n", 75 | "hybrid_retriever = HybridRetriever(\n", 76 | " driver,\n", 77 | " vector_index_name=\"text_embeddings\",\n", 78 | " fulltext_index_name=\"chunk_text\",\n", 79 | " embedder=OpenAIEmbeddings(),\n", 80 | " return_properties=[\"text\"],\n", 81 | ")" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 6, 87 | "id": "da12efd6-e12d-478d-b888-226eb8fc89b7", 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "name": "stderr", 92 | "output_type": "stream", 93 | "text": [ 94 | "Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: CALL subquery without a variable scope clause is now deprecated. Use CALL () { ... }} {position: line: 1, column: 1, offset: 0} for query: 'CALL { CALL db.index.vector.queryNodes($vector_index_name, $top_k, $query_vector) YIELD node, score WITH collect({node:node, score:score}) AS nodes, max(score) AS vector_index_max_score UNWIND nodes AS n RETURN n.node AS node, (n.score / vector_index_max_score) AS score UNION CALL db.index.fulltext.queryNodes($fulltext_index_name, $query_text, {limit: $top_k}) YIELD node, score WITH collect({node:node, score:score}) AS nodes, max(score) AS ft_index_max_score UNWIND nodes AS n RETURN n.node AS node, (n.score / ft_index_max_score) AS score } WITH node, max(score) AS score ORDER BY score DESC LIMIT $top_k RETURN node {.text} as node, score'\n" 95 | ] 96 | } 97 | ], 98 | "source": [ 99 | "hybrid_res = hybrid_retriever.search(\n", 100 | " query_text=\"How is precision medicine applied to Lupus? provide in list format\"\n", 101 | ")" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 23, 107 | "id": "7de1ac91-dfa9-406a-85bf-05811ce9234b", 108 | "metadata": {}, 109 | "outputs": [ 110 | { 111 | "data": { 112 | "text/plain": [ 113 | "\"{'text': 'precise and systematic fashion as suggested here.\\\\nFuture care will involve molecular diagnostics throughout\\\\nthe patient timecourse to drive the least toxic combination\\\\nof therapies. Recent evidence suggests a paradigm shift is\\\\non the way but it is hard to predict how fast it will come.\\\\nDisclosure\\\\nThe authors report no con flicts of interest in this work.\\\\nReferences\\\\n1. Lisnevskaia L, Murphy G, Isenberg DA. Systemic lupus\\\\nerythematosus. Lancet .2014 ;384:1878 –1888. doi:10.1016/S0140-\\\\n6736(14)60128'}\"" 114 | ] 115 | }, 116 | "execution_count": 23, 117 | "metadata": {}, 118 | "output_type": "execute_result" 119 | } 120 | ], 121 | "source": [ 122 | "hybrid_res.items[0].content" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": null, 128 | "id": "1c0538d4-8c44-4843-bca7-0d8f992f8175", 129 | "metadata": {}, 130 | "outputs": [], 131 | "source": [] 132 | } 133 | ], 134 | "metadata": { 135 | "kernelspec": { 136 | "display_name": "Python 3 (ipykernel)", 137 | "language": "python", 138 | "name": "python3" 139 | }, 140 | "language_info": { 141 | "codemirror_mode": { 142 | "name": "ipython", 143 | "version": 3 144 | }, 145 | "file_extension": ".py", 146 | "mimetype": "text/x-python", 147 | "name": "python", 148 | "nbconvert_exporter": "python", 149 | "pygments_lexer": "ipython3", 150 | "version": "3.12.3" 151 | } 152 | }, 153 | "nbformat": 4, 154 | "nbformat_minor": 5 155 | } 156 | -------------------------------------------------------------------------------- /webinar_demo/Demo_KGB.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "f65863a3-47e2-4c55-899d-8902d21d614e", 6 | "metadata": {}, 7 | "source": [ 8 | "# GraphRAG Python package - From PDF to Q&A with LUPUS example\n", 9 | "\n", 10 | "In this notebook we will:\n", 11 | "\n", 12 | "- Ingest PDFs into neo4j:\n", 13 | " - chunk PDFs into smaller pieces\n", 14 | " - save each of this chunk in neo4j\n", 15 | " - perform entity and relation extraction for each chunk and save them in the graph\n" 16 | ] 17 | }, 18 | { 19 | "cell_type": "raw", 20 | "id": "55f89455-b55e-46af-95bc-ed8987eebbe2", 21 | "metadata": {}, 22 | "source": [ 23 | "!pip install python-dotenv neo4j-graphrag openai" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "id": "c2e0e26a-a334-4337-a6b6-c7dc3e8f286e", 29 | "metadata": {}, 30 | "source": [ 31 | "## Setup\n", 32 | "\n", 33 | "Define our variables:\n", 34 | "- Neo4j credentials\n", 35 | "- List of files to be processed\n", 36 | "- List of entities and relationships we are interested in and we will ask the LLM to find for us\n", 37 | "- The LLM and embedder we want to use: OpenAI for this demo, but others are supported (VertexAI, MistralAI, Anthropic...)\n", 38 | " (note: OPENAI_API_KEY must be defined in the env vars)\n", 39 | "- We also decide to use a custom prompt for entity and relation extraction (instead of the default one), so it is also defined below." 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "id": "b827af0b-3ded-4667-88db-e6449a9444b3", 45 | "metadata": { 46 | "ExecuteTime": { 47 | "end_time": "2024-10-18T09:48:26.018450Z", 48 | "start_time": "2024-10-18T09:48:26.004809Z" 49 | } 50 | }, 51 | "source": [ 52 | "import os\n", 53 | "from dotenv import load_dotenv\n", 54 | "\n", 55 | "# load neo4j credentials (and openai api key in background)\n", 56 | "load_dotenv('.env', override=True)\n", 57 | "NEO4J_URI = os.getenv('NEO4J_URI', 'bolt://localhost:7687')\n", 58 | "NEO4J_USERNAME = os.getenv('NEO4J_USERNAME', 'neo4j')\n", 59 | "NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')\n", 60 | "NEO4J_DATABASE = os.getenv('NEO4J_DATABASE', 'neo4j')" 61 | ], 62 | "outputs": [], 63 | "execution_count": 1 64 | }, 65 | { 66 | "cell_type": "code", 67 | "id": "099bcd3f-9cf9-4b50-98eb-38c0f0ddb20c", 68 | "metadata": { 69 | "ExecuteTime": { 70 | "end_time": "2024-10-18T09:48:26.051117Z", 71 | "start_time": "2024-10-18T09:48:26.047937Z" 72 | } 73 | }, 74 | "source": [ 75 | "FILE_PATHS = [\n", 76 | " # \"Clinical and Immunological Biomarkers for Systemic Lupus Erythematosus\"\n", 77 | " 'truncated-pdfs/biomolecules-11-00928-v2-trunc.pdf', \n", 78 | " ## \"The communication GAP between patients and clinicians and the importance of patient reported outcomes \n", 79 | " ## in Systemic Lupus Erythematosus\"\n", 80 | " # 'truncated-pdfs/GAP-between-patients-and-clinicians_2023_Best-Practice-trunc.pdf', \n", 81 | " ## \"Towards Precision Medicine in Systemic Lupus Erythematosus\"\n", 82 | " # 'truncated-pdfs/pgpm-13-39-trunc.pdf'\n", 83 | "]" 84 | ], 85 | "outputs": [], 86 | "execution_count": 2 87 | }, 88 | { 89 | "cell_type": "code", 90 | "id": "3bcf12d1-8f12-4aaf-8534-3d44ba7f9e46", 91 | "metadata": { 92 | "ExecuteTime": { 93 | "end_time": "2024-10-18T09:48:26.147795Z", 94 | "start_time": "2024-10-18T09:48:26.144419Z" 95 | } 96 | }, 97 | "source": [ 98 | "#define node labels\n", 99 | "basic_node_labels = [\"Object\", \"Entity\", \"Group\", \"Person\", \"Organization\", \"Place\"]\n", 100 | "\n", 101 | "academic_node_labels = [\"ArticleOrPaper\", \"PublicationOrJournal\"]\n", 102 | "\n", 103 | "medical_node_labels = [\"Anatomy\", \"BiologicalProcess\", \"Cell\", \"CellularComponent\", \n", 104 | " \"CellType\", \"Condition\", \"Disease\", \"Drug\",\n", 105 | " \"EffectOrPhenotype\", \"Exposure\", \"GeneOrProtein\", \"Molecule\",\n", 106 | " \"MolecularFunction\", \"Pathway\"]\n", 107 | "\n", 108 | "node_labels = basic_node_labels + academic_node_labels + medical_node_labels\n", 109 | "\n", 110 | "# define relationship types\n", 111 | "rel_types = [\"ACTIVATES\", \"AFFECTS\", \"ASSESSES\", \"ASSOCIATED_WITH\", \"AUTHORED\",\n", 112 | " \"BIOMARKER_FOR\", \"CAUSES\", \"CITES\", \"CONTRIBUTES_TO\", \"DESCRIBES\", \"EXPRESSES\",\n", 113 | " \"HAS_REACTION\", \"HAS_SYMPTOM\", \"INCLUDES\", \"INTERACTS_WITH\", \"PRESCRIBED\",\n", 114 | " \"PRODUCES\", \"RECEIVED\", \"RESULTS_IN\", \"TREATS\", \"USED_FOR\"]" 115 | ], 116 | "outputs": [], 117 | "execution_count": 3 118 | }, 119 | { 120 | "cell_type": "code", 121 | "id": "acba4084-67bf-4706-b3ef-580d21ac1320", 122 | "metadata": { 123 | "ExecuteTime": { 124 | "end_time": "2024-10-18T09:48:26.615992Z", 125 | "start_time": "2024-10-18T09:48:26.189731Z" 126 | } 127 | }, 128 | "source": [ 129 | "from neo4j_graphrag.llm import OpenAILLM\n", 130 | "from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings\n", 131 | "\n", 132 | "# create text embedder (for chunk text)\n", 133 | "embedder = OpenAIEmbeddings()\n", 134 | "\n", 135 | "# create a llm object (for entity and relation extraction)\n", 136 | "llm = OpenAILLM(\n", 137 | " model_name=\"gpt-4o-mini\",\n", 138 | " model_params={\n", 139 | " \"response_format\": {\"type\": \"json_object\"}, # use json_object formatting for best results\n", 140 | " \"temperature\": 0 # turning temperature down for more deterministic results\n", 141 | " }\n", 142 | ")\n" 143 | ], 144 | "outputs": [ 145 | { 146 | "ename": "OpenAIError", 147 | "evalue": "The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable", 148 | "output_type": "error", 149 | "traceback": [ 150 | "\u001B[0;31m---------------------------------------------------------------------------\u001B[0m", 151 | "\u001B[0;31mOpenAIError\u001B[0m Traceback (most recent call last)", 152 | "Cell \u001B[0;32mIn[4], line 5\u001B[0m\n\u001B[1;32m 2\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01mneo4j_graphrag\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01membeddings\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mopenai\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m OpenAIEmbeddings\n\u001B[1;32m 4\u001B[0m \u001B[38;5;66;03m# create text embedder (for chunk text)\u001B[39;00m\n\u001B[0;32m----> 5\u001B[0m embedder \u001B[38;5;241m=\u001B[39m \u001B[43mOpenAIEmbeddings\u001B[49m\u001B[43m(\u001B[49m\u001B[43m)\u001B[49m\n\u001B[1;32m 7\u001B[0m \u001B[38;5;66;03m# create an llm object (for entity and relation extraction)\u001B[39;00m\n\u001B[1;32m 8\u001B[0m llm \u001B[38;5;241m=\u001B[39m OpenAILLM(\n\u001B[1;32m 9\u001B[0m model_name\u001B[38;5;241m=\u001B[39m\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mgpt-4o-mini\u001B[39m\u001B[38;5;124m\"\u001B[39m,\n\u001B[1;32m 10\u001B[0m model_params\u001B[38;5;241m=\u001B[39m{\n\u001B[0;32m (...)\u001B[0m\n\u001B[1;32m 13\u001B[0m }\n\u001B[1;32m 14\u001B[0m )\n", 153 | "File \u001B[0;32m~/PycharmProjects/graphrag-python-examples/.venv/lib/python3.12/site-packages/neo4j_graphrag/embeddings/openai.py:51\u001B[0m, in \u001B[0;36mOpenAIEmbeddings.__init__\u001B[0;34m(self, model, **kwargs)\u001B[0m\n\u001B[1;32m 49\u001B[0m \u001B[38;5;28;01mdef\u001B[39;00m \u001B[38;5;21m__init__\u001B[39m(\u001B[38;5;28mself\u001B[39m, model: \u001B[38;5;28mstr\u001B[39m \u001B[38;5;241m=\u001B[39m \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mtext-embedding-ada-002\u001B[39m\u001B[38;5;124m\"\u001B[39m, \u001B[38;5;241m*\u001B[39m\u001B[38;5;241m*\u001B[39mkwargs: Any) \u001B[38;5;241m-\u001B[39m\u001B[38;5;241m>\u001B[39m \u001B[38;5;28;01mNone\u001B[39;00m:\n\u001B[1;32m 50\u001B[0m \u001B[38;5;28msuper\u001B[39m()\u001B[38;5;241m.\u001B[39m\u001B[38;5;21m__init__\u001B[39m(model, \u001B[38;5;241m*\u001B[39m\u001B[38;5;241m*\u001B[39mkwargs)\n\u001B[0;32m---> 51\u001B[0m \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39mopenai_client \u001B[38;5;241m=\u001B[39m \u001B[43mopenai\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mOpenAI\u001B[49m\u001B[43m(\u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n", 154 | "File \u001B[0;32m~/PycharmProjects/graphrag-python-examples/.venv/lib/python3.12/site-packages/openai/_client.py:105\u001B[0m, in \u001B[0;36mOpenAI.__init__\u001B[0;34m(self, api_key, organization, project, base_url, timeout, max_retries, default_headers, default_query, http_client, _strict_response_validation)\u001B[0m\n\u001B[1;32m 103\u001B[0m api_key \u001B[38;5;241m=\u001B[39m os\u001B[38;5;241m.\u001B[39menviron\u001B[38;5;241m.\u001B[39mget(\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mOPENAI_API_KEY\u001B[39m\u001B[38;5;124m\"\u001B[39m)\n\u001B[1;32m 104\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m api_key \u001B[38;5;129;01mis\u001B[39;00m \u001B[38;5;28;01mNone\u001B[39;00m:\n\u001B[0;32m--> 105\u001B[0m \u001B[38;5;28;01mraise\u001B[39;00m OpenAIError(\n\u001B[1;32m 106\u001B[0m \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mThe api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable\u001B[39m\u001B[38;5;124m\"\u001B[39m\n\u001B[1;32m 107\u001B[0m )\n\u001B[1;32m 108\u001B[0m \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39mapi_key \u001B[38;5;241m=\u001B[39m api_key\n\u001B[1;32m 110\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m organization \u001B[38;5;129;01mis\u001B[39;00m \u001B[38;5;28;01mNone\u001B[39;00m:\n", 155 | "\u001B[0;31mOpenAIError\u001B[0m: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable" 156 | ] 157 | } 158 | ], 159 | "execution_count": 4 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 5, 164 | "id": "1689f679-367c-490d-a9fa-f8da66ac974a", 165 | "metadata": {}, 166 | "outputs": [], 167 | "source": [ 168 | "# optional: define your own prompt template for entity/relation extraction\n", 169 | "# it must have 'text' placeholder and can use the 'schema' key\n", 170 | "\n", 171 | "prompt_template = '''\n", 172 | "You are a medical researcher tasks with extracting information from papers \n", 173 | "and structuring it in a property graph to inform further medical and research Q&A.\n", 174 | "\n", 175 | "Extract the entities (nodes) and specify their type from the following Input text.\n", 176 | "Also extract the relationships between these nodes. the relationship direction goes from the start node to the end node. \n", 177 | "\n", 178 | "\n", 179 | "Return result as JSON using the following format:\n", 180 | "{{\"nodes\": [ {{\"id\": \"0\", \"label\": \"the type of entity\", \"properties\": {{\"name\": \"name of entity\" }} }}],\n", 181 | " \"relationships\": [{{\"type\": \"TYPE_OF_RELATIONSHIP\", \"start_node_id\": \"0\", \"end_node_id\": \"1\", \"properties\": {{\"details\": \"Description of the relationship\"}} }}] }}\n", 182 | "\n", 183 | "- Use only the information from the Input text. Do not add any additional information. \n", 184 | "- If the input text is empty, return empty Json. \n", 185 | "- Make sure to create as many nodes and relationships as needed to offer rich medical context for further research.\n", 186 | "- An AI knowledge assistant must be able to read this graph and immediately understand the context to inform detailed research questions. \n", 187 | "- Multiple documents will be ingested from different sources and we are using this property graph to connect information, so make sure entity types are fairly general. \n", 188 | "\n", 189 | "Use only fhe following nodes and relationships (if provided):\n", 190 | "{schema}\n", 191 | "\n", 192 | "Assign a unique ID (string) to each node, and reuse it to define relationships.\n", 193 | "Do respect the source and target node types for relationship and\n", 194 | "the relationship direction.\n", 195 | "\n", 196 | "Do not return any additional information other than the JSON in it.\n", 197 | "\n", 198 | "Input text:\n", 199 | "\n", 200 | "{text}\n", 201 | "'''" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "id": "e7b3b091-ef2c-4ad1-a75f-33aeccf04a3f", 207 | "metadata": {}, 208 | "source": [ 209 | "## Knowledge Graph Building\n", 210 | "\n", 211 | "We can finally create our Neo4j driver and `SimpleKGPipeline` and run the pipeline on the list of documents:" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 6, 217 | "id": "8b53738e-cf44-470a-95a2-5a66ee63db2a", 218 | "metadata": {}, 219 | "outputs": [], 220 | "source": [ 221 | "import neo4j\n", 222 | "\n", 223 | "driver = neo4j.GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD), database=NEO4J_DATABASE)" 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": 7, 229 | "id": "0f1b9d59-3b8c-44b8-ba97-270eca07a7d7", 230 | "metadata": {}, 231 | "outputs": [], 232 | "source": [ 233 | "# from neo4j_graphrag.experimental.components.text_splitters.fixed_size_splitter import FixedSizeSplitter\n", 234 | "from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline\n", 235 | "\n", 236 | "kg_builder_pdf = SimpleKGPipeline(\n", 237 | " driver=driver,\n", 238 | " llm=llm,\n", 239 | " # text_splitter=FixedSizeSplitter(chunk_size=500, chunk_overlap=100),\n", 240 | " embedder=embedder,\n", 241 | " entities=node_labels,\n", 242 | " relations=rel_types,\n", 243 | " prompt_template=prompt_template,\n", 244 | " from_pdf=True\n", 245 | ")" 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "id": "edfe7434-00eb-4926-a094-a8f80fcaff47", 251 | "metadata": { 252 | "ExecuteTime": { 253 | "end_time": "2024-10-18T09:48:26.619385901Z", 254 | "start_time": "2024-10-18T09:47:41.302479Z" 255 | } 256 | }, 257 | "source": [ 258 | "for path in FILE_PATHS:\n", 259 | " print(f\"Processing : {path}\")\n", 260 | " pdf_result = await kg_builder_pdf.run_async(file_path=path)\n", 261 | " print(f\"PDF Processing Result: {pdf_result}\")" 262 | ], 263 | "outputs": [ 264 | { 265 | "name": "stdout", 266 | "output_type": "stream", 267 | "text": [ 268 | "Processing : truncated-pdfs/biomolecules-11-00928-v2-trunc.pdf\n" 269 | ] 270 | }, 271 | { 272 | "ename": "NameError", 273 | "evalue": "name 'kg_builder_pdf' is not defined", 274 | "output_type": "error", 275 | "traceback": [ 276 | "\u001B[0;31m---------------------------------------------------------------------------\u001B[0m", 277 | "\u001B[0;31mNameError\u001B[0m Traceback (most recent call last)", 278 | "Cell \u001B[0;32mIn[5], line 3\u001B[0m\n\u001B[1;32m 1\u001B[0m \u001B[38;5;28;01mfor\u001B[39;00m path \u001B[38;5;129;01min\u001B[39;00m FILE_PATHS:\n\u001B[1;32m 2\u001B[0m \u001B[38;5;28mprint\u001B[39m(\u001B[38;5;124mf\u001B[39m\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mProcessing : \u001B[39m\u001B[38;5;132;01m{\u001B[39;00mpath\u001B[38;5;132;01m}\u001B[39;00m\u001B[38;5;124m\"\u001B[39m)\n\u001B[0;32m----> 3\u001B[0m pdf_result \u001B[38;5;241m=\u001B[39m \u001B[38;5;28;01mawait\u001B[39;00m \u001B[43mkg_builder_pdf\u001B[49m\u001B[38;5;241m.\u001B[39mrun_async(file_path\u001B[38;5;241m=\u001B[39mpath)\n\u001B[1;32m 4\u001B[0m \u001B[38;5;28mprint\u001B[39m(\u001B[38;5;124mf\u001B[39m\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mPDF Processing Result: \u001B[39m\u001B[38;5;132;01m{\u001B[39;00mpdf_result\u001B[38;5;132;01m}\u001B[39;00m\u001B[38;5;124m\"\u001B[39m)\n", 279 | "\u001B[0;31mNameError\u001B[0m: name 'kg_builder_pdf' is not defined" 280 | ] 281 | } 282 | ], 283 | "execution_count": 5 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": null, 288 | "id": "def87ea9-6569-4f12-8879-189b33d2a534", 289 | "metadata": {}, 290 | "outputs": [], 291 | "source": [] 292 | } 293 | ], 294 | "metadata": { 295 | "kernelspec": { 296 | "display_name": "Python 3 (ipykernel)", 297 | "language": "python", 298 | "name": "python3" 299 | }, 300 | "language_info": { 301 | "codemirror_mode": { 302 | "name": "ipython", 303 | "version": 3 304 | }, 305 | "file_extension": ".py", 306 | "mimetype": "text/x-python", 307 | "name": "python", 308 | "nbconvert_exporter": "python", 309 | "pygments_lexer": "ipython3", 310 | "version": "3.12.3" 311 | } 312 | }, 313 | "nbformat": 4, 314 | "nbformat_minor": 5 315 | } 316 | -------------------------------------------------------------------------------- /webinar_demo/Demo_QA.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "f65863a3-47e2-4c55-899d-8902d21d614e", 6 | "metadata": {}, 7 | "source": [ 8 | "# GraphRAG Python package - From PDF to Q&A with LUPUS example\n", 9 | "\n", 10 | "In this notebook we will:\n", 11 | "\n", 12 | "- Implement GraphRAG with vector and vector cypher retrievers\n" 13 | ] 14 | }, 15 | { 16 | "cell_type": "raw", 17 | "id": "55f89455-b55e-46af-95bc-ed8987eebbe2", 18 | "metadata": {}, 19 | "source": [ 20 | "!pip install python-dotenv neo4j-graphrag openai" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "id": "c2e0e26a-a334-4337-a6b6-c7dc3e8f286e", 26 | "metadata": {}, 27 | "source": [ 28 | "## Setup\n", 29 | "\n", 30 | "Define our variables:\n", 31 | "- Neo4j credentials" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 23, 37 | "id": "b827af0b-3ded-4667-88db-e6449a9444b3", 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "import os\n", 42 | "from dotenv import load_dotenv\n", 43 | "\n", 44 | "# load neo4j credentials (and openai api key in background)\n", 45 | "load_dotenv('.env', override=True)\n", 46 | "NEO4J_URI = os.getenv('NEO4J_URI', 'bolt://localhost:7687')\n", 47 | "NEO4J_USERNAME = os.getenv('NEO4J_USERNAME', 'neo4j')\n", 48 | "NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')\n", 49 | "NEO4J_DATABASE = os.getenv('NEO4J_DATABASE', None)" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 24, 55 | "id": "6adb4306-9d15-46e1-8d29-665272786bfe", 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "import neo4j\n", 60 | "\n", 61 | "driver = neo4j.GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD), database=NEO4J_DATABASE)" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "id": "08102bd4-a9a7-4455-bfd9-b4c0c4152ee2", 67 | "metadata": {}, 68 | "source": [ 69 | "## Knowledge Graph Retrieval\n", 70 | "\n", 71 | "In this section, we investigate several supported retrieval methods, starting with the VectorRetriever which is a simple vector search. For this, we need to add a vector index on Chunks' embeddings property that was created by the SimpleKGBuilder pipeline:" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 66, 77 | "id": "84032d14-e791-4ad2-bf0e-fc0eb79d0aad", 78 | "metadata": {}, 79 | "outputs": [], 80 | "source": [ 81 | "VECTOR_INDEX_NAME = \"text_embeddings\"" 82 | ] 83 | }, 84 | { 85 | "cell_type": "raw", 86 | "id": "65674ee2-2e9e-4967-8da5-96a876cd0262", 87 | "metadata": {}, 88 | "source": [ 89 | "from neo4j_graphrag.indexes import create_vector_index\n", 90 | "\n", 91 | "\n", 92 | "create_vector_index(\n", 93 | " driver, \n", 94 | " name=\"text_embeddings\", \n", 95 | " label=\"Chunk\",\n", 96 | " embedding_property=\"embedding\", \n", 97 | " dimensions=1536,\n", 98 | " similarity_fn=\"cosine\",\n", 99 | ")" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 25, 105 | "id": "1457d723-12dd-4f15-ae74-ec23f79d2379", 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [ 109 | "from neo4j_graphrag.embeddings import OpenAIEmbeddings\n", 110 | "embedder = OpenAIEmbeddings()" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 26, 116 | "id": "11cc975d-5fee-463f-84bb-74f067688b64", 117 | "metadata": {}, 118 | "outputs": [], 119 | "source": [ 120 | "from neo4j_graphrag.retrievers import VectorRetriever\n", 121 | "\n", 122 | "vector_retriever = VectorRetriever(\n", 123 | " driver,\n", 124 | " index_name=VECTOR_INDEX_NAME,\n", 125 | " embedder=embedder,\n", 126 | " return_properties=[\"text\"],\n", 127 | ")\n", 128 | "\n", 129 | "vector_res = vector_retriever.search(\n", 130 | " query_text=\"How is precision medicine applied to Lupus?\", \n", 131 | " top_k=3,\n", 132 | ")" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 68, 138 | "id": "d67d3a10-461c-466b-9c99-3846f1902c5d", 139 | "metadata": {}, 140 | "outputs": [ 141 | { 142 | "data": { 143 | "text/plain": [ 144 | "3" 145 | ] 146 | }, 147 | "execution_count": 68, 148 | "metadata": {}, 149 | "output_type": "execute_result" 150 | } 151 | ], 152 | "source": [ 153 | "len(vector_res.items)" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 28, 159 | "id": "ef0efb62-fa70-442d-a354-3ee0fdc5417a", 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "name": "stdout", 164 | "output_type": "stream", 165 | "text": [ 166 | "====\n", 167 | "{'text': 'precise and systematic fashion as suggested here.\\nFuture care will involve molecular diagnostics throughout\\nthe patient timecourse to drive the least toxic combination\\nof therapies. Recent evidence suggests a paradigm shift is\\non the way but it is hard to predict how fast it will come.\\nDisclosure\\nThe authors report no con flicts of interest in this work.\\nReferences\\n1. Lisnevskaia L, Murphy G, Isenberg DA. Systemic lupus\\nerythematosus. Lancet .2014 ;384:1878 –1888. doi:10.1016/S0140-\\n6736(14)60128'}\n", 168 | "====\n", 169 | "{'text': 'd IS agents.\\nPrecision medicine consists of a tailored approach to\\neach patient, based on genetic and epigenetic singularities,\\nwhich in fluence disease pathophysiology and drug\\nresponse. Precision medicine in SLE is trying to address\\nthe need to assess SLE patients optimally, predict disease\\ncourse and treatment response at diagnosis. Ideally every\\npatient would undergo an initial evaluation that would\\nprofile his/her disease, assessing the main pathophysiolo-\\ngic pathway through biomarkers, ther'}\n", 170 | "====\n", 171 | "{'text': 'REVIEW\\nT owards Precision Medicine in Systemic Lupus\\nErythematosus\\nThis article was published in the following Dove Press journal:\\nPharmacogenomics and Personalized Medicine\\nElliott Lever1\\nMarta R Alves2\\nDavid A Isenberg1\\n1Centre for Rheumatology, Division of\\nMedicine, University College Hospital\\nLondon, London, UK;2Internal Medicine,\\nDepartment of Medicine, Centro\\nHospitalar do Porto, Porto, PortugalAbstract: Systemic lupus erythematosus (SLE) is a remarkable condition characterised by\\ndiversit'}\n" 172 | ] 173 | } 174 | ], 175 | "source": [ 176 | "for i in vector_res.items: \n", 177 | " print(\"====\\n\" + i.content)" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "id": "fb424faf-6af7-480e-973f-2819743643c7", 183 | "metadata": {}, 184 | "source": [ 185 | "The GraphRAG Python Package offers a whole host of other useful retrieval covering different patterns.\n", 186 | "\n", 187 | "Below we will use the VectorCypherRetriever which allows you to run a graph traversal after finding text chunks. We will use the Cypher Query language to define the logic to traverse the graph.\n", 188 | "\n", 189 | "As a simple starting point, lets traverse up to 2 hops out from each chunk and textualize the different relationships we pick up. We will use something called a quantified path pattern to accomplish in this.\n" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": 29, 195 | "id": "f40e3595-94c9-47bc-a55b-b3a51b925609", 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [ 199 | "from neo4j_graphrag.retrievers import VectorCypherRetriever\n", 200 | "\n", 201 | "vc_retriever = VectorCypherRetriever(\n", 202 | " driver,\n", 203 | " index_name=\"text_embeddings\",\n", 204 | " embedder=embedder,\n", 205 | " retrieval_query=\"\"\"\n", 206 | "//1) Go out 2-3 hops in the entity graph and get relationships\n", 207 | "WITH node AS chunk\n", 208 | "MATCH (chunk)<-[:FROM_CHUNK]-()-[relList:!FROM_CHUNK]-{1,2}(:__Entity__)\n", 209 | "UNWIND relList AS rel\n", 210 | "\n", 211 | "//2) collect relationships and text chunks\n", 212 | "WITH collect(DISTINCT chunk) AS chunks, \n", 213 | " collect(DISTINCT rel) AS rels\n", 214 | "\n", 215 | "//3) format and return context\n", 216 | "RETURN '=== text ===\\n' + apoc.text.join([c in chunks | c.text], '\\n---\\n') + '\\n\\n=== kg_rels ===\\n' +\n", 217 | " apoc.text.join([r in rels | startNode(r).name + ' - ' + type(r) + '(' + coalesce(r.details, '') + ')' + ' -> ' + endNode(r).name ], '\\n---\\n') AS info\n", 218 | "\"\"\"\n", 219 | ")" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": 65, 225 | "id": "32875324-17f9-45a7-96dc-7111715ed831", 226 | "metadata": {}, 227 | "outputs": [ 228 | { 229 | "name": "stdout", 230 | "output_type": "stream", 231 | "text": [ 232 | "# Text Chunk Context:\n", 233 | " N. Engl. J. Med.\\n---\\nLisnevskaia L - AUTHORED() -> Systemic lupus erythematosus\\n---\\nMurphy G - AUTHORED() -> Systemic lupus erythematosus\\n---\\nIsenberg DA - AUTHORED() -> Systemic lupus erythematosus\\n---\\nSystemic lupus erythematosus - CITES(Published in) -> Lancet\\n---\\nSystemic lupus erythematosus - CITES(Systemic lupus erythematosus is discussed in the Lancet publication.) -> Lancet\\n---\\nSystemic lupus erythe\n" 237 | ] 238 | } 239 | ], 240 | "source": [ 241 | "vc_res = vc_retriever.search(query_text = \"How is precision medicine applied to Lupus?\", top_k=3)\n", 242 | "\n", 243 | "# print output\n", 244 | "context = vc_res.items[0].content\n", 245 | "kg_rel_pos = context.find('\\\\n\\\\n=== kg_rels ===\\\\n')\n", 246 | "print(\"# Text Chunk Context:\")\n", 247 | "print(context[:kg_rel_pos][:500])\n", 248 | "print()\n", 249 | "print(\"# KG Context From Relationships:\")\n", 250 | "print(context[kg_rel_pos:][:500])" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "id": "cd158191-903b-4ad4-9db6-120f1c1c9f55", 256 | "metadata": {}, 257 | "source": [ 258 | "## Q&A with GraphRAG\n", 259 | "\n", 260 | "You can construct GraphRAG pipelines with the GraphRAG class. At minimum, you will need to pass the constructor an LLM and a retriever. Optionally, you can also pass a custom prompt template." 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 35, 266 | "id": "71c647ba-ac61-441c-9100-75dc825084ab", 267 | "metadata": {}, 268 | "outputs": [], 269 | "source": [ 270 | "from neo4j_graphrag.llm import OpenAILLM\n", 271 | "from neo4j_graphrag.generation import RagTemplate\n", 272 | "from neo4j_graphrag.generation.graphrag import GraphRAG\n", 273 | "\n", 274 | "llm = OpenAILLM(model_name=\"gpt-4o\", model_params={\"temperature\": 0.0, \"seed\": 100})\n", 275 | "\n", 276 | "rag_template = RagTemplate(template='''Answer the Question using the following Context. Only respond with information mentioned in the Context. Do not inject any speculative information not mentioned. \n", 277 | "\n", 278 | "# Question:\n", 279 | "{query_text}\n", 280 | " \n", 281 | "# Context:\n", 282 | "{context}\n", 283 | "\n", 284 | "# Answer:\n", 285 | "''', expected_inputs=['query_text', 'context'])\n", 286 | "\n", 287 | "v_rag = GraphRAG(llm=llm, retriever=vector_retriever, prompt_template=rag_template)\n", 288 | "vc_rag = GraphRAG(llm=llm, retriever=vc_retriever, prompt_template=rag_template)" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": 70, 294 | "id": "b240a9e9-4cda-42da-b0fd-c2ed4a14e748", 295 | "metadata": {}, 296 | "outputs": [ 297 | { 298 | "name": "stdout", 299 | "output_type": "stream", 300 | "text": [ 301 | "Vector Response: \n", 302 | "- Precision medicine in lupus involves a tailored approach based on genetic and epigenetic singularities.\n", 303 | "- It aims to assess lupus patients optimally and predict disease course and treatment response at diagnosis.\n", 304 | "- Ideally, each patient would undergo an initial evaluation to profile their disease, assessing the main pathophysiologic pathway through biomarkers.\n", 305 | "\n", 306 | "===========================\n", 307 | "\n", 308 | "Vector + Cypher Response: \n", 309 | "- Precision medicine in lupus involves a tailored approach to each patient based on genetic and epigenetic singularities.\n", 310 | "- It aims to assess lupus patients optimally, predict disease course, and treatment response at diagnosis.\n", 311 | "- Ideally, every patient would undergo an initial evaluation that profiles their disease, assessing the main pathophysiologic pathway through biomarkers.\n" 312 | ] 313 | } 314 | ], 315 | "source": [ 316 | "q = \"How is precision medicine applied to Lupus? provide in list format.\"\n", 317 | "print(f\"Vector Response: \\n{v_rag.search(q, retriever_config={'top_k':5}).answer}\")\n", 318 | "print(\"\\n===========================\\n\")\n", 319 | "print(f\"Vector + Cypher Response: \\n{vc_rag.search(q, retriever_config={'top_k':5}).answer}\")" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": null, 325 | "id": "19aa69d2-ecf9-44ee-b51f-21b9f9f4adb2", 326 | "metadata": {}, 327 | "outputs": [], 328 | "source": [] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": 63, 333 | "id": "2e3998d5-ed7b-494e-9531-7c1fe3a6dc4e", 334 | "metadata": {}, 335 | "outputs": [ 336 | { 337 | "name": "stdout", 338 | "output_type": "stream", 339 | "text": [ 340 | "Vector Response: \n", 341 | "- Most frequent symptoms of lupus:\n", 342 | " - Generalized pain\n", 343 | " - Fatigue\n", 344 | " - Depression\n", 345 | "\n", 346 | "- Difficulty in diagnosing lupus:\n", 347 | " - Symptoms like generalized pain, fatigue, and depression are often considered unrelated to SLE by physicians and may not be well addressed during clinical evaluations.\n", 348 | " - Lupus has many different expressions, making it more complex to discuss and diagnose compared to simpler conditions like a cold.\n", 349 | "\n", 350 | "===========================\n", 351 | "\n", 352 | "Vector + Cypher Response: \n", 353 | "- Most frequent symptoms of lupus:\n", 354 | " - Skin lesions\n", 355 | " - Renal symptoms\n", 356 | " - Dermatological symptoms\n", 357 | " - Neuropsychiatric symptoms\n", 358 | " - Cardiovascular symptoms\n", 359 | " - Thrombocytopenia\n", 360 | " - Haemolytic anaemia\n", 361 | " - Accelerated atherosclerosis\n", 362 | " - Active disease\n", 363 | " - Previous damage\n", 364 | " - Complications of therapy\n", 365 | "\n", 366 | "- Reasons for difficulty in diagnosing lupus:\n", 367 | " - Symptoms like generalized pain, fatigue, and depression are often considered unrelated to SLE and not well addressed during clinical evaluation.\n", 368 | " - The presence of ANA-negative cases, which can complicate diagnosis.\n", 369 | " - Low specificity of ANA tests.\n", 370 | " - The complexity of symptoms and their overlap with other conditions.\n" 371 | ] 372 | } 373 | ], 374 | "source": [ 375 | "q = \"What are the most frequent symptoms of lupus and why is it difficult to diagnose? Show result in a list\"\n", 376 | "\n", 377 | "v_rag_result = v_rag.search(q, retriever_config={'top_k': 2}, return_context=True)\n", 378 | "vc_rag_result = vc_rag.search(q, retriever_config={'top_k': 2}, return_context=True)\n", 379 | "\n", 380 | "print(f\"Vector Response: \\n{v_rag_result.answer}\")\n", 381 | "print(\"\\n===========================\\n\")\n", 382 | "print(f\"Vector + Cypher Response: \\n{vc_rag_result.answer}\")" 383 | ] 384 | }, 385 | { 386 | "cell_type": "code", 387 | "execution_count": null, 388 | "id": "6c11b31b-caa7-419a-bd5f-e4e2bd41b532", 389 | "metadata": {}, 390 | "outputs": [], 391 | "source": [] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": 37, 396 | "id": "1b15d15b-fdd2-4275-9c06-764e1ece4ff5", 397 | "metadata": {}, 398 | "outputs": [ 399 | { 400 | "name": "stdout", 401 | "output_type": "stream", 402 | "text": [ 403 | "Vector Response: \n", 404 | "- **Systemic Lupus Erythematosus (SLE) Overview:**\n", 405 | " - SLE is a systemic autoimmune disease characterized by aberrant activity of the immune system.\n", 406 | " - It presents with a wide range of clinical manifestations and can cause damage to various organs.\n", 407 | "\n", 408 | "- **Common Effects:**\n", 409 | " - SLE imposes a significant burden on patients' lives.\n", 410 | " - It affects health-related quality of life (HRQoL) due to its symptoms and disease activity.\n", 411 | "\n", 412 | "- **Biomarkers:**\n", 413 | " - SLE is diagnosed and classified based on clinical symptoms, signs, and laboratory biomarkers.\n", 414 | " - Biomarkers reflect immune reactivity and inflammation in various organs.\n", 415 | " - Novel biomarkers have been discovered through \"omics\" research.\n", 416 | " - One particular biomarker may only reflect a specific aspect of SLE and not the overall state of the disease.\n", 417 | "\n", 418 | "- **Treatments:**\n", 419 | " - Physicians focus on controlling disease activity to prevent damage accrual.\n", 420 | " - There is a gap between physicians' focus on disease control and patients' focus on symptoms affecting HRQoL.\n", 421 | " - The role of Patient Reported Outcomes (PROs) is explored to bridge this gap.\n", 422 | "\n", 423 | "===========================\n", 424 | "\n", 425 | "Vector + Cypher Response: \n", 426 | "**Systemic Lupus Erythematosus (SLE) Summary:**\n", 427 | "\n", 428 | "**Common Effects:**\n", 429 | "- Multi-organ involvement\n", 430 | "- Complex clinical picture with a wide range of manifestations\n", 431 | "- Varying severity and unpredictable relapsing and remitting course\n", 432 | "- Chronic systemic inflammation\n", 433 | "- Organ damage\n", 434 | "- Symptoms include nephritis, arthritis, vasculitis, fatigue, widespread body pain, depression, anxiety, cognitive dysfunction, sleep disturbance, malar rash, diffuse alopecia, myalgia, fever, rash, cutaneous vasculitis, renal issues, pleurisy, pericarditis, thrombocytopenia, and haemolytic anaemia\n", 435 | "\n", 436 | "**Biomarkers:**\n", 437 | "- Antibodies against paraoxonase 1 and high-density lipoprotein for endothelial damage\n", 438 | "- Anti-Nucleosome Antibodies (ANuA) related to disease activity and glomerulonephritis\n", 439 | "- Anti-C1q antibodies as a non-invasive biomarker for predicting renal flares\n", 440 | "- ANA (Antinuclear Antibody) test\n", 441 | "- Anti-dsDNA antibodies\n", 442 | "- Anti-Sm antibodies\n", 443 | "- Complement proteins (C3 and C4 levels)\n", 444 | "- Various cytokines (INF type 1, TNF, IL1, IL4, IL17, INF γ)\n", 445 | "- Proteinuria, urinary casts, hemolytic anemia with reticulocytosis, white blood cell count\n", 446 | "\n", 447 | "**Treatments:**\n", 448 | "- Treat-to-target strategy aiming for remission or low disease activity\n", 449 | "- Antimalarials\n", 450 | "- Glucocorticoid therapy\n", 451 | "- Immunosuppressants (IS)\n", 452 | "- Biologics like Belimumab and Rituximab\n", 453 | "- Multidisciplinary approach including surgery, physiotherapy, non-prescription drugs, sports, family and peer support, diet, psychological aspects, and exercise\n", 454 | "\n", 455 | "**Additional Information:**\n", 456 | "- SLE is more prevalent in women and non-white populations\n", 457 | "- Risk factors include ultraviolet light, toxins, and infections\n", 458 | "- Genetic and epigenetic factors contribute to susceptibility\n", 459 | "- SLE affects health-related quality of life (HRQoL) and imposes a significant disease burden\n", 460 | "- Physicians focus on controlling disease activity and preventing damage accrual, while patients focus on symptoms impacting HRQoL\n" 461 | ] 462 | } 463 | ], 464 | "source": [ 465 | "q = \"Can you summarize systemic lupus erythematosus (SLE)? including common effects, biomarkers, and treatments? Provide in detailed list format.\"\n", 466 | "\n", 467 | "v_rag_result = v_rag.search(q, retriever_config={'top_k': 5}, return_context=True)\n", 468 | "vc_rag_result = vc_rag.search(q, retriever_config={'top_k': 5}, return_context=True)\n", 469 | "\n", 470 | "print(f\"Vector Response: \\n{v_rag_result.answer}\")\n", 471 | "print(\"\\n===========================\\n\")\n", 472 | "print(f\"Vector + Cypher Response: \\n{vc_rag_result.answer}\")" 473 | ] 474 | }, 475 | { 476 | "cell_type": "code", 477 | "execution_count": null, 478 | "id": "a1af214e-cdea-4889-8309-706e856a0215", 479 | "metadata": {}, 480 | "outputs": [], 481 | "source": [] 482 | } 483 | ], 484 | "metadata": { 485 | "kernelspec": { 486 | "display_name": "Python 3 (ipykernel)", 487 | "language": "python", 488 | "name": "python3" 489 | }, 490 | "language_info": { 491 | "codemirror_mode": { 492 | "name": "ipython", 493 | "version": 3 494 | }, 495 | "file_extension": ".py", 496 | "mimetype": "text/x-python", 497 | "name": "python", 498 | "nbconvert_exporter": "python", 499 | "pygments_lexer": "ipython3", 500 | "version": "3.12.3" 501 | } 502 | }, 503 | "nbformat": 4, 504 | "nbformat_minor": 5 505 | } 506 | --------------------------------------------------------------------------------