├── README.md
├── data
├── cake.json
└── schema.json
└── jsonQueryRAG.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # jsonQueryRAG
2 | LLM query engine to retrieve augmented responses from json files.
3 |
4 | ```markdown
5 | # JSON Query RAG (jsonQueryRAG)
6 |
7 | This repository contains a Colab notebook that demonstrates how to set up a system to query JSON data using natural language and obtain responses. The implementation utilizes the `jsonpath-ng`, `llama-index`, `openai`, `transformers`, and `accelerate` libraries.
8 |
9 | ## Getting Started
10 |
11 | 1. Open the notebook in Google Colab by clicking the badge below:
12 | [](https://colab.research.google.com/github/mickymult/jsonQueryRAG/blob/main/jsonQueryRAG.ipynb)
13 |
14 | 2. Install the necessary libraries by running the first few cells in the notebook.
15 |
16 | ## Usage
17 |
18 | - The notebook contains cells for installing necessary libraries, setting up logging, specifying Open AI API keys, and importing required modules.
19 | - It also includes cells for specifying the paths to your JSON and schema files, reading these files, and setting up the JSON query engines.
20 | - To query the JSON data, modify the query text in the cell with `nl_query_engine.query()` and run the cell.
21 |
22 | ## Example Query
23 |
24 | The notebook includes an example query that asks for the types of toppings available for a donut cake, and displays the natural language response.
25 |
26 | ```python
27 | nl_response = nl_query_engine.query(
28 | "what type of toppings are available for donut cake?",
29 | )
30 | display(Markdown(f"
Natural language Response
{nl_response}"))
31 | ```
32 |
33 | The expected output for this query is:
34 |
35 | ```
36 | The available toppings for the donut cake are None, Glazed, Sugar, Powdered Sugar, Chocolate with Sprinkles, Chocolate, and Maple.
37 | ```
38 |
39 | ## Dependencies
40 |
41 | - jsonpath-ng
42 | - llama-index
43 | - openai
44 | - transformers
45 | - accelerate
46 |
47 | ## License
48 |
49 | MIT
50 |
51 |
52 | ```
53 |
--------------------------------------------------------------------------------
/data/cake.json:
--------------------------------------------------------------------------------
1 | {
2 | "items":
3 | {
4 | "item":
5 | [
6 | {
7 | "id": "0001",
8 | "type": "donut",
9 | "name": "Cake",
10 | "ppu": 0.55,
11 | "batters":
12 | {
13 | "batter":
14 | [
15 | { "id": "1001", "type": "Regular" },
16 | { "id": "1002", "type": "Chocolate" },
17 | { "id": "1003", "type": "Blueberry" },
18 | { "id": "1004", "type": "Devil's Food" }
19 | ]
20 | },
21 | "topping":
22 | [
23 | { "id": "5001", "type": "None" },
24 | { "id": "5002", "type": "Glazed" },
25 | { "id": "5005", "type": "Sugar" },
26 | { "id": "5007", "type": "Powdered Sugar" },
27 | { "id": "5006", "type": "Chocolate with Sprinkles" },
28 | { "id": "5003", "type": "Chocolate" },
29 | { "id": "5004", "type": "Maple" }
30 | ]
31 | },
32 | {
33 | "id": "0002",
34 | "type": "donut",
35 | "name": "Raised",
36 | "ppu": 0.55,
37 | "batters":
38 | {
39 | "batter":
40 | [
41 | { "id": "1001", "type": "Regular" }
42 | ]
43 | },
44 | "topping":
45 | [
46 | { "id": "5001", "type": "None" },
47 | { "id": "5002", "type": "Glazed" },
48 | { "id": "5005", "type": "Sugar" },
49 | { "id": "5003", "type": "Chocolate" },
50 | { "id": "5004", "type": "Maple" }
51 | ]
52 | },
53 |
54 | {
55 | "id": "0003",
56 | "type": "donut",
57 | "name": "Old Fashioned",
58 | "ppu": 0.55,
59 | "batters":
60 | {
61 | "batter":
62 | [
63 | { "id": "1001", "type": "Regular" },
64 | { "id": "1002", "type": "Chocolate" }
65 | ]
66 | },
67 | "topping":
68 | [
69 | { "id": "5001", "type": "None" },
70 | { "id": "5002", "type": "Glazed" },
71 | { "id": "5003", "type": "Chocolate" },
72 | { "id": "5004", "type": "Maple" }
73 | ]
74 | },
75 | {
76 | "id": "0004",
77 | "type": "bar",
78 | "name": "Bar",
79 | "ppu": 0.75,
80 | "batters":
81 | {
82 | "batter":
83 | [
84 | { "id": "1001", "type": "Regular" }
85 | ]
86 | },
87 | "topping":
88 | [
89 | { "id": "5003", "type": "Chocolate" },
90 | { "id": "5004", "type": "Maple" }
91 | ],
92 | "fillings":
93 | {
94 | "filling":
95 | [
96 | { "id": "7001", "name": "None", "addcost": 0 },
97 | { "id": "7002", "name": "Custard", "addcost": 0.25 },
98 | { "id": "7003", "name": "Whipped Cream", "addcost": 0.25 }
99 | ]
100 | }
101 | },
102 |
103 | {
104 | "id": "0005",
105 | "type": "twist",
106 | "name": "Twist",
107 | "ppu": 0.65,
108 | "batters":
109 | {
110 | "batter":
111 | [
112 | { "id": "1001", "type": "Regular" }
113 | ]
114 | },
115 | "topping":
116 | [
117 | { "id": "5002", "type": "Glazed" },
118 | { "id": "5005", "type": "Sugar" }
119 | ]
120 | },
121 |
122 | {
123 | "id": "0006",
124 | "type": "filled",
125 | "name": "Filled",
126 | "ppu": 0.75,
127 | "batters":
128 | {
129 | "batter":
130 | [
131 | { "id": "1001", "type": "Regular" }
132 | ]
133 | },
134 | "topping":
135 | [
136 | { "id": "5002", "type": "Glazed" },
137 | { "id": "5007", "type": "Powdered Sugar" },
138 | { "id": "5003", "type": "Chocolate" },
139 | { "id": "5004", "type": "Maple" }
140 | ],
141 | "fillings":
142 | {
143 | "filling":
144 | [
145 | { "id": "7002", "name": "Custard", "addcost": 0 },
146 | { "id": "7003", "name": "Whipped Cream", "addcost": 0 },
147 | { "id": "7004", "name": "Strawberry Jelly", "addcost": 0 },
148 | { "id": "7005", "name": "Rasberry Jelly", "addcost": 0 }
149 | ]
150 | }
151 | }
152 | ]
153 | }
154 | }
155 |
--------------------------------------------------------------------------------
/data/schema.json:
--------------------------------------------------------------------------------
1 | {
2 | "$schema": "http://json-schema.org/draft-07/schema#",
3 | "type": "object",
4 | "properties": {
5 | "items": {
6 | "type": "object",
7 | "properties": {
8 | "item": {
9 | "type": "array",
10 | "items": {
11 | "type": "object",
12 | "properties": {
13 | "id": { "type": "string" },
14 | "type": { "type": "string" },
15 | "name": { "type": "string" },
16 | "ppu": { "type": "number" },
17 | "batters": {
18 | "type": "object",
19 | "properties": {
20 | "batter": {
21 | "type": "array",
22 | "items": {
23 | "type": "object",
24 | "properties": {
25 | "id": { "type": "string" },
26 | "type": { "type": "string" }
27 | },
28 | "required": ["id", "type"]
29 | }
30 | }
31 | },
32 | "required": ["batter"]
33 | },
34 | "topping": {
35 | "type": "array",
36 | "items": {
37 | "type": "object",
38 | "properties": {
39 | "id": { "type": "string" },
40 | "type": { "type": "string" }
41 | },
42 | "required": ["id", "type"]
43 | }
44 | },
45 | "fillings": {
46 | "type": "object",
47 | "properties": {
48 | "filling": {
49 | "type": "array",
50 | "items": {
51 | "type": "object",
52 | "properties": {
53 | "id": { "type": "string" },
54 | "name": { "type": "string" },
55 | "addcost": { "type": "number" }
56 | },
57 | "required": ["id", "name", "addcost"]
58 | }
59 | }
60 | },
61 | "required": ["filling"]
62 | }
63 | },
64 | "required": ["id", "type", "name", "ppu", "batters", "topping"],
65 | "dependencies": {
66 | "fillings": ["type"]
67 | }
68 | }
69 | }
70 | },
71 | "required": ["item"]
72 | }
73 | },
74 | "required": ["items"]
75 | }
76 |
--------------------------------------------------------------------------------
/jsonQueryRAG.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "provenance": [],
7 | "authorship_tag": "ABX9TyObK/A4SR5nUAdNmM5Vzppe",
8 | "include_colab_link": true
9 | },
10 | "kernelspec": {
11 | "name": "python3",
12 | "display_name": "Python 3"
13 | },
14 | "language_info": {
15 | "name": "python"
16 | }
17 | },
18 | "cells": [
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {
22 | "id": "view-in-github",
23 | "colab_type": "text"
24 | },
25 | "source": [
26 | "
"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 1,
32 | "metadata": {
33 | "colab": {
34 | "base_uri": "https://localhost:8080/"
35 | },
36 | "id": "eiU7Mfixy3YM",
37 | "outputId": "8e08ebd3-a23b-4ebb-8515-a58fc26ca14f"
38 | },
39 | "outputs": [
40 | {
41 | "output_type": "stream",
42 | "name": "stdout",
43 | "text": [
44 | "Collecting jsonpath-ng\n",
45 | " Downloading jsonpath_ng-1.6.0-py3-none-any.whl (29 kB)\n",
46 | "Collecting ply (from jsonpath-ng)\n",
47 | " Downloading ply-3.11-py2.py3-none-any.whl (49 kB)\n",
48 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.6/49.6 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
49 | "\u001b[?25hInstalling collected packages: ply, jsonpath-ng\n",
50 | "Successfully installed jsonpath-ng-1.6.0 ply-3.11\n"
51 | ]
52 | }
53 | ],
54 | "source": [
55 | "# First, install the jsonpath-ng package which is used by default to parse & execute the JSONPath queries.\n",
56 | "!pip install jsonpath-ng"
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "source": [
62 | "!pip install -q llama-index\n",
63 | "!pip install -q openai\n",
64 | "!pip install -q transformers\n",
65 | "!pip install -q accelerate"
66 | ],
67 | "metadata": {
68 | "colab": {
69 | "base_uri": "https://localhost:8080/"
70 | },
71 | "id": "JM6FUfFCznad",
72 | "outputId": "67cfb59c-71e3-4cdf-9f67-ade2ac70d404"
73 | },
74 | "execution_count": 2,
75 | "outputs": [
76 | {
77 | "output_type": "stream",
78 | "name": "stdout",
79 | "text": [
80 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m744.1/744.1 kB\u001b[0m \u001b[31m6.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
81 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.8/1.8 MB\u001b[0m \u001b[31m13.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
82 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.0/77.0 kB\u001b[0m \u001b[31m4.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
83 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m17.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
84 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m143.4/143.4 kB\u001b[0m \u001b[31m8.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
85 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.4/49.4 kB\u001b[0m \u001b[31m6.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
86 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m40.0/40.0 kB\u001b[0m \u001b[31m4.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
87 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.7/7.7 MB\u001b[0m \u001b[31m21.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
88 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m302.0/302.0 kB\u001b[0m \u001b[31m25.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
89 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.8/3.8 MB\u001b[0m \u001b[31m57.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
90 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m56.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
91 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m295.0/295.0 kB\u001b[0m \u001b[31m31.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
92 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m258.1/258.1 kB\u001b[0m \u001b[31m5.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
93 | "\u001b[?25h"
94 | ]
95 | }
96 | ]
97 | },
98 | {
99 | "cell_type": "code",
100 | "source": [
101 | "import logging\n",
102 | "import sys\n",
103 | "\n",
104 | "logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n",
105 | "logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))"
106 | ],
107 | "metadata": {
108 | "id": "uzZ0RuBmzBtv"
109 | },
110 | "execution_count": 3,
111 | "outputs": []
112 | },
113 | {
114 | "cell_type": "code",
115 | "source": [
116 | "import os\n",
117 | "import openai\n",
118 | "\n",
119 | "os.environ[\"OPENAI_API_KEY\"] = \"YOUR_OPENAI_KEY_HERE\"\n",
120 | "openai.api_key = os.environ[\"OPENAI_API_KEY\"]"
121 | ],
122 | "metadata": {
123 | "id": "O4QWDOUCzIvY"
124 | },
125 | "execution_count": 4,
126 | "outputs": []
127 | },
128 | {
129 | "cell_type": "code",
130 | "source": [
131 | "from IPython.display import Markdown, display"
132 | ],
133 | "metadata": {
134 | "id": "Q79X4l05z1bb"
135 | },
136 | "execution_count": 5,
137 | "outputs": []
138 | },
139 | {
140 | "cell_type": "code",
141 | "source": [
142 | "import json\n",
143 | "\n",
144 | "# Specify the folders containing the JSON and schema files\n",
145 | "json_folder = 'data'\n",
146 | "schema_folder = 'data'"
147 | ],
148 | "metadata": {
149 | "id": "QE4kB3380Gte"
150 | },
151 | "execution_count": 19,
152 | "outputs": []
153 | },
154 | {
155 | "cell_type": "code",
156 | "source": [
157 | "# Specify the filenames of the JSON and schema files\n",
158 | "json_filename = 'cake.json'\n",
159 | "schema_filename = 'schema.json'"
160 | ],
161 | "metadata": {
162 | "id": "ciWksToR0as8"
163 | },
164 | "execution_count": 20,
165 | "outputs": []
166 | },
167 | {
168 | "cell_type": "code",
169 | "source": [
170 | "# Construct the paths to the JSON and schema files\n",
171 | "json_filepath = os.path.join(json_folder, json_filename)\n",
172 | "schema_filepath = os.path.join(schema_folder, schema_filename)"
173 | ],
174 | "metadata": {
175 | "id": "NGlR4WRk0iHb"
176 | },
177 | "execution_count": 21,
178 | "outputs": []
179 | },
180 | {
181 | "cell_type": "code",
182 | "source": [
183 | "# Read the JSON file\n",
184 | "with open(json_filepath, 'r') as json_file:\n",
185 | " json_value = json.load(json_file)"
186 | ],
187 | "metadata": {
188 | "id": "Ctz7hB4d0mNw"
189 | },
190 | "execution_count": 22,
191 | "outputs": []
192 | },
193 | {
194 | "cell_type": "code",
195 | "source": [
196 | "# Read the schema file\n",
197 | "with open(schema_filepath, 'r') as schema_file:\n",
198 | " json_schema = json.load(schema_file)"
199 | ],
200 | "metadata": {
201 | "id": "VwsjqQFg0pKL"
202 | },
203 | "execution_count": 23,
204 | "outputs": []
205 | },
206 | {
207 | "cell_type": "code",
208 | "source": [
209 | "from llama_index.indices.service_context import ServiceContext\n",
210 | "from llama_index.llms import OpenAI\n",
211 | "from llama_index.indices.struct_store import JSONQueryEngine\n",
212 | "\n",
213 | "llm = OpenAI(model=\"gpt-4\")\n",
214 | "service_context = ServiceContext.from_defaults(llm=llm)\n",
215 | "nl_query_engine = JSONQueryEngine(\n",
216 | " json_value=json_value, json_schema=json_schema, service_context=service_context\n",
217 | ")\n",
218 | "raw_query_engine = JSONQueryEngine(\n",
219 | " json_value=json_value,\n",
220 | " json_schema=json_schema,\n",
221 | " service_context=service_context,\n",
222 | " synthesize_response=False,\n",
223 | ")"
224 | ],
225 | "metadata": {
226 | "colab": {
227 | "base_uri": "https://localhost:8080/"
228 | },
229 | "id": "zw6X6Y3M0xgI",
230 | "outputId": "ce72282a-92ad-4b79-e9a6-208efbb0d655"
231 | },
232 | "execution_count": 24,
233 | "outputs": [
234 | {
235 | "output_type": "stream",
236 | "name": "stderr",
237 | "text": [
238 | "[nltk_data] Downloading package punkt to /tmp/llama_index...\n",
239 | "[nltk_data] Unzipping tokenizers/punkt.zip.\n"
240 | ]
241 | }
242 | ]
243 | },
244 | {
245 | "cell_type": "code",
246 | "source": [
247 | "nl_response = nl_query_engine.query(\n",
248 | " \"what type of toppings are available for donut cake?\",\n",
249 | ")\n"
250 | ],
251 | "metadata": {
252 | "id": "2kCk5rae1ABk"
253 | },
254 | "execution_count": 25,
255 | "outputs": []
256 | },
257 | {
258 | "cell_type": "code",
259 | "source": [
260 | "display(Markdown(f\"Natural language Response
{nl_response}\"))"
261 | ],
262 | "metadata": {
263 | "colab": {
264 | "base_uri": "https://localhost:8080/",
265 | "height": 109
266 | },
267 | "id": "ji5joIz71PBZ",
268 | "outputId": "f33799c0-ae07-45f3-ed0d-375aee7d484c"
269 | },
270 | "execution_count": 26,
271 | "outputs": [
272 | {
273 | "output_type": "display_data",
274 | "data": {
275 | "text/plain": [
276 | ""
277 | ],
278 | "text/markdown": "Natural language Response
The available toppings for the donut cake are None, Glazed, Sugar, Powdered Sugar, Chocolate with Sprinkles, Chocolate, and Maple."
279 | },
280 | "metadata": {}
281 | }
282 | ]
283 | }
284 | ]
285 | }
--------------------------------------------------------------------------------