├── .gitattributes
├── LICENSE
├── README.md
├── index.html
├── search.php
├── search_basic.php
└── style.css


/.gitattributes:
--------------------------------------------------------------------------------
1 | # Auto detect text files and perform LF normalization
2 | * text=auto
3 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 Josh Clemm
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # ai-search
 2 | A Basic open source AI search engine, modeled after [Perplexity.ai](Perplexity.ai). If you’re not familiar with an AI-powered question-answering platform, they use a large language model like ChatGPT to answer your questions, but improves on ChatGPT in that it pulls in accurate and real-time search results to supplement the answer (so no “knowledge cutoff”). And lists citations within the answer itself which builds confidence it’s not hallucinating and allows you to research topics further.
 3 | 
 4 | ## How to Run summary
 5 | 1. Clone / download repo
 6 | 2. Go get your API keys and add them to `search.php` (look for "[Fill me in]")
 7 | 3. Run locally using php (php -S localhost:8000)
 8 | 
 9 | ## Step by step details
10 | 
11 | ### Step 1: Get Search Results for a user query
12 | 
13 | The main challenge with LLMs like ChatGPT is that they have knowledge cutoffs (and they occasionally tend to [hallucinate](https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence))). It’s because they’re trained on data up to a specific date (eg Sep 2021). So if you want an answer to an up-to-date question or you simply want to research a topic in detail, you’ll need to _augment_ the answer with relevant sources. This technique is known as [RAG](https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview) (retrieval augmented generation). And in our case we can simply supply the LLM up-to-date information from search engines like Google or Bing.
14 | 
15 | To build this yourself, you’ll want to first sign up for an API key from [Bing](https://www.microsoft.com/en-us/bing/apis/bing-web-search-api), Google (via [Serper](https://serper.dev/)), [Brave](https://brave.com/search/api/), or others. Bing, Brave, and Serper all offer free usage to get started.
16 | 
17 | In `search.php`, put your API key where appropriate (look for "[Fill me in]"). For this example, I'm have code for both Brave and Google via Serper.
18 | 
19 | 
20 | ### Step 2: Decide the LLM you want to use
21 | 
22 | Here, you’ll need to sign up for an API key from an LLM provider. There’s a lot of providers to choose from right now. For example there’s [OpenAI](https://platform.openai.com/docs/overview), [Anthropic](https://www.anthropic.com/api), [Anyscale](https://www.anyscale.com/), [Groq](https://groq.com/), [Cloudflare](https://ai.cloudflare.com/), [Perplexity](https://docs.perplexity.ai/docs/getting-started), [Lepton](https://www.lepton.ai/), or the big players like AWS, Azure, or Google Cloud. I’ve used many of these with success and they offer a subset of current and popular closed and open source models. And each model has unique strengths, different costs, and different speeds. For example, gpt-4 is very accurate but expensive and slow. When in doubt, I’d recommend using chatgpt-3.5-turbo from OpenAI. It’s good enough, cheap enough, and fast enough to test this out.
23 | 
24 | Fortunately, most of these LLM serving providers are compatible with OpenAI’s API format, so switching to another provider / model is only minimal work (or just ask a [chatbot](https://yaddleai.com/search/?q=Show+the+code+to+call+openAI%27s+API) to write the code!).
25 | 
26 | In `search.php`, put your API keys where appropriate (look for "[Fill me in]"). For this example, I'm using OpenAI (for chatgpt-3.5-turbo / gpt-4) and Groq (for Mixtral-8b7b). So to keep your work minimal, just go get keys for one or both of those.
27 | 
28 | ### Step 3: Craft a prompt to pass along the search results in the context window
29 | 
30 | When you want to ask an LLM a question, you can provide a lot of additional context. Each model has its own unique limit and some of them are very large. For [gpt-4-turbo](https://platform.openai.com/docs/models/continuous-model-upgrades), you could pass along the entirety of the [1st Harry Potter book](https://towardsdatascience.com/de-coded-understanding-context-windows-for-transformer-models-cd1baca6427e) with your question. Google’s super powerful [Gemini 1.5](https://medium.com/google-cloud/googles-gemini-1-5-pro-revolutionizing-ai-with-a-1m-token-context-window-bfea5adfd35f) can support a context size of over a million tokens. That’s enough to pass along the entirety of the 7-book Harry Potter series!
31 | 
32 | Fortunately, passing along the snippets of 8-10 search results is far smaller, allowing you to use many of the faster (and much cheaper) models like gpt-3.5-turbo or mistral-7b.
33 | 
34 | In my experience, passing along the user question, custom prompt message, and search result snippets are usually under 1K tokens. This is well under even the most basic model’s limits so this should be no problem.
35 | 
36 | `search.php` has the sample prompt I’ve been playing around with you. Hat-tip to the folks at [Lepton AI](https://www.lepton.ai/) who [open-sourced a similar project](https://github.com/leptonai/search_with_lepton) which helped me refine this prompt.
37 | 
38 | ### Step 4: Add Related or Follow Up Questions
39 | 
40 | One of the nice features of Perplexity is how they suggest follow up questions. Fortunately, this is easy to replicate.
41 | 
42 | To do this, you can make a second call to your LLM (in parallel) asking for related questions. And don’t forget to pass along those citations in the context again.
43 | 
44 | Or, you can attempt to construct a prompt so that the LLM answers the question AND comes up with related questions. This saves an API call and some tokens, but it’s a bit challenging getting these LLMs to always answer in a consistent and repeatable format.
45 | 
46 | ### Step 5: Make it look a lot better!
47 | 
48 | To make this a complete example, we need a usable UI. I kept the UI as simple as possible and everything is in `index.html`. I’m using Bootstrap, jquery, and some basic CSS / javascript, markdown, and a JS syntax highlighter to make this happen.
49 | 
50 | To improve the experience, the UI does the following:
51 | * The answer **streams** back to the user (improving perception of speed)
52 | * The **citations** are replaced by a nicer in-line UI with a clickable popup for the user to learn more
53 | * The **sources** considered are included after the answer in case the user wants to explore further
54 | * Markdown and code syntax **highlighting** are used if necessary
55 | 
56 | ### Working Example
57 | 
58 | To explore a working example, check out [https://yaddleai.com](https://yaddleai.com/). It's mostly the same code though I added a second search call in parallel to fetch images, I wrote a separate page to fetch the latest news, and a few other minor improvements.
59 | 


--------------------------------------------------------------------------------
/index.html:
--------------------------------------------------------------------------------
  1 | <!DOCTYPE html>
  2 | <html lang="en" class="h-100 bg-body" data-bs-theme="light">
  3 | 
  4 | <head>
  5 |   <title>Answers</title>
  6 |   <meta name="viewport" content="width=device-width, initial-scale=1.0">
  7 |   <meta name="description" content="A better search engine that uses AI to summarize search results to get you the answer faster and more accurately with citations to explore more. See live at https://yaddleai.com/">
  8 |   <meta name="author" content="Josh Clemm">
  9 | 
 10 |   <link rel="preconnect" href="https://fonts.googleapis.com">
 11 |   <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
 12 |   <link href="https://fonts.googleapis.com/css2?family=Bebas+Neue&family=Nunito+Sans:ital,opsz,wght@0,6..12,200..1000;1,6..12,200..1000&display=swap" rel="stylesheet">
 13 | 
 14 |   <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.2/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-T3c6CoIi6uLrA9TneNEoa7RxnatzjcDSCmG1MXxSR1GAsXEV/Dwwykc2MPK8M2HN" crossorigin="anonymous">
 15 |   <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.11.3/font/bootstrap-icons.min.css">
 16 |   <link rel="stylesheet" href="https://unpkg.com/highlightjs-copy/dist/highlightjs-copy.min.css" />
 17 |   <link rel="stylesheet" href="./style.css">
 18 | 
 19 |   <style>
 20 |     body {
 21 |       font-family: "Nunito Sans", sans-serif;
 22 |       font-optical-sizing: auto;
 23 |     }
 24 | 
 25 |     .disable-form {
 26 |       opacity: 0.4;
 27 |       pointer-events: none;
 28 |     }
 29 | 
 30 |     .hide {
 31 |       display: none;
 32 |     }
 33 | 
 34 |     #answerContent:empty {
 35 |       width: 95%;
 36 |       height: 160px;
 37 | 
 38 |       background:
 39 |         linear-gradient(0.25turn, transparent, #fff, transparent),
 40 |         linear-gradient(#eee, #eee),
 41 |         linear-gradient(#eee, #eee),
 42 |         linear-gradient(#eee, #eee),
 43 |         linear-gradient(#eee, #eee),
 44 |         linear-gradient(#eee, #eee);
 45 |       background-repeat: no-repeat;
 46 |       background-size: 500px 160px, 100% 20px, 100% 20px, 100% 20px, 100% 20px, 100% 20px;
 47 |       background-position: -500px 0, 0px 0, 0px 30px, 0px 60px, 0px 90px, 0px 120px;
 48 |       animation: loading 1.5s infinite;
 49 |     }
 50 | 
 51 |     @keyframes loading {
 52 |       to {
 53 |         background-position:
 54 |           800px 0,
 55 |           0px 0, 0px 30px, 0px 60px, 0px 90px, 0px 120px;
 56 |       }
 57 |     }
 58 | 
 59 |     .citation {
 60 |       display: inline-block;
 61 |       font-size: 10px;
 62 |       vertical-align: top;
 63 |       border-radius: 1000px;
 64 |       text-align: center;
 65 |       width: 17px;
 66 |       margin-left: 2px;
 67 |       padding: 0px;
 68 |     }
 69 | 
 70 |     .source {
 71 |       text-align: left;
 72 |       font-size: 14px;
 73 |       margin: 5px;
 74 |     }
 75 | 
 76 |     .modal-body {
 77 |       font-size: small;
 78 |     }
 79 | 
 80 |     body {
 81 |       box-shadow: inset 0 0 5rem rgba(13, 110, 253, .1);
 82 |     }
 83 | 
 84 |     .navbar-nav .nav-link {
 85 |       border-bottom: .25rem solid transparent;
 86 |     }
 87 | 
 88 |     .navbar-nav .nav-link:hover,
 89 |     .navbar-nav .nav-link:focus {
 90 |       border-bottom-color: rgba(13, 110, 253, .25);
 91 |     }
 92 | 
 93 |     .navbar-nav .active {
 94 |       border-bottom-color: rgb(13, 110, 253);
 95 |     }
 96 | 
 97 |     .questions a {
 98 |       --bs-btn-font-size: .85rem;
 99 |       margin-bottom: 8px;
100 |     }
101 | 
102 |     [data-bs-theme=dark] a.btn-light {
103 |       --bs-btn-color: #fff;
104 |       --bs-btn-bg: #6c757d;
105 |       --bs-btn-border-color: #6c757d;
106 |       --bs-btn-hover-color: #fff;
107 |       --bs-btn-hover-bg: #5c636a;
108 |       --bs-btn-hover-border-color: #565e64;
109 |       --bs-btn-focus-shadow-rgb: 130, 138, 145;
110 |       --bs-btn-active-color: #fff;
111 |       --bs-btn-active-bg: #565e64;
112 |       --bs-btn-active-border-color: #51585e;
113 |       --bs-btn-active-shadow: inset 0 3px 5px rgba(0, 0, 0, 0.125);
114 |       --bs-btn-disabled-color: #fff;
115 |       --bs-btn-disabled-bg: #6c757d;
116 |       --bs-btn-disabled-border-color: #6c757d;
117 |     }
118 | 
119 |     #relatedContent.questions a {
120 |       text-align: left;
121 |       width: 100%;
122 |     }
123 | 
124 |     #additionalSection {
125 |       font-size: small;
126 |       color: gray;
127 |     }
128 | 
129 |     #additionalSection a {
130 |       color: gray;
131 |     }
132 | 
133 |     #sourcesContent a {
134 |       font-size: 18px;
135 |       text-decoration: none;
136 |       line-height: 24px;
137 |     }
138 | 
139 |     #sourcesContent p {
140 |       font-size: smaller;
141 |     }
142 | 
143 |     #sourcesContent img {
144 |       width: 14px;
145 |       height: 14px;
146 |       margin-right: 6px;
147 |       margin-bottom: 3px;
148 |     }
149 | 
150 |     #sourcesContent p:not(:last-child) {
151 |       padding-bottom: 0.5em;
152 |     }
153 |     
154 |     footer {
155 |       font-size: small;
156 |     }
157 |   </style>
158 | 
159 | </head>
160 | 
161 | <body id="body" class="bg-body">
162 |   <div class="container mb-3 d-flex flex-column min-vh-100" style="max-width: 860px;">
163 |     <div id="header" class="w-100 mx-auto mb-auto px-2">
164 |       <nav class="navbar navbar-expand-lg">
165 |         <a class="navbar-brand text-primary" style="font-size: 2em; font-family: 'Bebas Neue';" href="/">Answers AI</a>
166 |       </nav>
167 |     </div>
168 | 
169 |     <div id="searchBox" class="pt-3 px-2">
170 |       <form id="theForm" class="mt-3" action="">
171 |         <div class="input-group shadow-sm">
172 |           <input type="search" id="inlineFormInput" name="q" class="form-control form-control-lg" placeholder="What do you want to know?" aria-label="question box" aria-describedby="basic-addon2">
173 |           <button type="submit" id="formSubmit" class="btn btn-primary icon-link icon-link-hover"> <svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-arrow-right" viewBox="0 0 16 16">
174 |               <path fill-rule="evenodd" d="M1 8a.5.5 0 0 1 .5-.5h11.793l-3.147-3.146a.5.5 0 0 1 .708-.708l4 4a.5.5 0 0 1 0 .708l-4 4a.5.5 0 0 1-.708-.708L13.293 8.5H1.5A.5.5 0 0 1 1 8" />
175 |             </svg></button>
176 |         </div>
177 |         <!-- <div class="mt-2 row align-items-center">
178 |               <a href="#" class="col-auto ms-auto text-body-tertiary" style="font-size: small;" onclick="$('#advanced').slideToggle();">
179 |                 Advanced <svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="currentColor" class="bi bi-chevron-down" viewBox="0 0 16 16">
180 |                   <path fill-rule="evenodd" d="M1.646 4.646a.5.5 0 0 1 .708 0L8 10.293l5.646-5.647a.5.5 0 0 1 .708.708l-6 6a.5.5 0 0 1-.708 0l-6-6a.5.5 0 0 1 0-.708"/>
181 |                 </svg>
182 |               </a>
183 |             </div> -->
184 |         <div class="mt-2 row align-items-center">
185 |           <div class="col-auto ms-auto">
186 |           <select id="selectLLM" name="model" style="border:0;color:gray;text-align:right;background-color:inherit;" class="form-select-sm">
187 |             <option selected value="gpt-3.5-turbo">gpt-3.5-turbo</option>
188 |             <option value="gpt-4">gpt-4</option>
189 |             <option value="mixtral-8x7b">mixtral-8x7b</option>
190 |             <option value="llama2-70b">llama2-70b</option>
191 |           </select>
192 |         </div>
193 | 
194 |             <!-- <div id="advanced" class="col-auto ms-auto text-body-tertiary" style="font-size: small;" >
195 |               <label for="selectLLM" class="form-label">Model:</label>
196 |               <label for="selectVerbose" class="form-label">Verbosity:</label>
197 |               <select id="selectVerbose" name="v" style="border:0;color:gray;text-align:right;background-color:inherit;" class="form-select-sm">
198 |                 <option selected value="gpt-3.5-turbo">Low</option>
199 |                 <option value="gpt-4">Med</option>
200 |                 <option value="mixtral-8x7b">High</option>
201 |               </select>
202 |             </div> -->
203 |         </div>
204 |       </form>
205 | 
206 |       <div id="suggestions" class="questions mt-5 text-center"></div>
207 |     </div>
208 | 
209 |     <div id="results" class="w-100 mx-auto p-2 hide">
210 |       <h3 class="mt-2 mb-3">
211 |         <span id="answerTitle" style="letter-spacing: -.8px;font-weight: 600;">Answer</span>
212 |         <!-- <span id="answerTitle" style="font-family: 'Bebas Neue';letter-spacing: .5px;">Answer</span> -->
213 |       </h3>
214 |       <div id="answerContent"></div>
215 | 
216 |       <h6 id="relatedSection" class="mb-3 mt-4 hide">
217 |         <svg style="margin-right:6px; margin-bottom:2px;" xmlns="http://www.w3.org/2000/svg" width="24" height="24" fill="currentColor" class="bi bi-chat-left-quote" viewBox="0 0 16 16">
218 |           <path d="M14 1a1 1 0 0 1 1 1v8a1 1 0 0 1-1 1H4.414A2 2 0 0 0 3 11.586l-2 2V2a1 1 0 0 1 1-1zM2 0a2 2 0 0 0-2 2v12.793a.5.5 0 0 0 .854.353l2.853-2.853A1 1 0 0 1 4.414 12H14a2 2 0 0 0 2-2V2a2 2 0 0 0-2-2z" />
219 |           <path d="M7.066 4.76A1.665 1.665 0 0 0 4 5.668a1.667 1.667 0 0 0 2.561 1.406c-.131.389-.375.804-.777 1.22a.417.417 0 1 0 .6.58c1.486-1.54 1.293-3.214.682-4.112zm4 0A1.665 1.665 0 0 0 8 5.668a1.667 1.667 0 0 0 2.561 1.406c-.131.389-.375.804-.777 1.22a.417.417 0 1 0 .6.58c1.486-1.54 1.293-3.214.682-4.112z" />
220 |         </svg>
221 |         <span>Followup Questions</span>
222 |       </h6>
223 |       <div id="relatedContent" class="questions"></div>
224 | 
225 |       <h6 id="sourcesSection" class="mb-3 mt-4 hide">
226 |         <svg style="margin-right:6px; margin-bottom:2px;" xmlns="http://www.w3.org/2000/svg" width="24" height="24" fill="currentColor" class="bi bi-globe-americas" viewBox="0 0 16 16">
227 |           <path d="M8 0a8 8 0 1 0 0 16A8 8 0 0 0 8 0M2.04 4.326c.325 1.329 2.532 2.54 3.717 3.19.48.263.793.434.743.484q-.121.12-.242.234c-.416.396-.787.749-.758 1.266.035.634.618.824 1.214 1.017.577.188 1.168.38 1.286.983.082.417-.075.988-.22 1.52-.215.782-.406 1.48.22 1.48 1.5-.5 3.798-3.186 4-5 .138-1.243-2-2-3.5-2.5-.478-.16-.755.081-.99.284-.172.15-.322.279-.51.216-.445-.148-2.5-2-1.5-2.5.78-.39.952-.171 1.227.182.078.099.163.208.273.318.609.304.662-.132.723-.633.039-.322.081-.671.277-.867.434-.434 1.265-.791 2.028-1.12.712-.306 1.365-.587 1.579-.88A7 7 0 1 1 2.04 4.327Z" />
228 |         </svg>
229 |         <span>Web Sources</span>
230 |       </h6>
231 |       <div id="sourcesContent"></div>
232 | 
233 |       <div id="additionalSection" class="w-100 mx-auto hide">
234 |         <div class="py-4">
235 |           <p>
236 |             <em>This answer uses a large language model (<span id="model"></span>) to summarize search results and can make mistakes. Consider checking important information.
237 |               Answer generated in <span id="duration"></span>.</em>
238 |           </p>
239 |           <p></p>
240 |         </div>
241 |       </div>
242 |     </div>
243 | 
244 |     <footer class="mt-auto text-body-tertiary text-center pt-4">
245 |       <p>Copyright &copy; 2024 &nbsp;<a class="text-reset" target="_blank" href="https://joshclemm.com">Josh Clemm</a> &nbsp;|&nbsp; <a class="text-reset" target="_blank" href="https://clemmapps.com">Clemm Apps</a></p>
246 |     </footer>
247 |   </div>
248 | 
249 |   <script src="https://cdn.jsdelivr.net/npm/jquery@2.1.1/dist/jquery.min.js"></script>
250 |   <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.2/dist/js/bootstrap.bundle.min.js" integrity="sha384-C6RzsynM9kWDrMNeT87bh95OGNyZPhcTNXj1NW7RuBCsyN/o0jlpcV8Qyq46cDfL" crossorigin="anonymous"></script>
251 |   <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
252 |   <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.9.0/build/highlight.min.js"></script>
253 |   <script src="https://unpkg.com/highlightjs-copy@1.0.5/dist/highlightjs-copy.min.js"></script>
254 | 
255 |   <script>
256 | 
257 |     hljs.addPlugin(new CopyButtonPlugin());
258 | 
259 |     function generate_suggestions(num) {
260 |       const options = [
261 |           { e: "💪", s: "What's the 7 minute workout?" },
262 |           { e: "🥃", s: "What's in an Old Fashioned?" },
263 |           { e: "🧵", s: "The history of sewing machines" },
264 |           { e: "💥", s: "What's the speed of sound?" },
265 |           { e: "🌎", s: "What causes earthquakes?" },
266 |           { e: "🌻", s: "How often does the corpse flower bloom?" },
267 |           { e: "🪸", s: "How does coral grow?" },
268 |           { e: "🪐", s: "What were inspirations for Star Wars?" },
269 |           { e: "🍗", s: "What happened with the Neanderthals?" },
270 |           { e: "🤷‍♂️", s: "Who is Josh Clemm?" },
271 |           { e: "🌯", s: "America's best burritos" },
272 |           { e: "🌌", s: "What makes the sky blue?" },
273 |           { e: "🐝", s: "How do bees build honeycombs?" },
274 |           { e: "🍂", s: "Why do leaves change color?" },
275 |           { e: "🌊", s: "What lives in deep ocean?" },
276 |           { e: "🦋", s: "How do caterpillars become butterflies?" },
277 |           { e: "🏰", s: "Why was the Great Wall built?" },
278 |           { e: "🫖", s: "How did afternoon tea start?" },
279 |           { e: "📚", s: "What was the Library of Alexandria?" },
280 |           { e: "🏅", s: "How did the Olympics begin?" },
281 |           { e: "🖨️", s: "What revolutionized with the printing press?" },
282 |           { e: "🐞", s: "What was the first computer bug?" },
283 |           { e: "💻", s: "How was the Internet created?" },
284 |           { e: "🥗", s: "Who invented the Caesar salad?" },
285 |           { e: "🚀", s: "What made Apollo 11 significant?" },
286 |           { e: "📱", s: "How was the first smartphone made?" },
287 |           { e: "🍿", s: "Why does popcorn pop?" },
288 |           { e: "🔥", s: "What is Turkmenistan's 'door to hell'?" },
289 |           { e: "🐙", s: "How do octopuses change color?" },
290 |           { e: "🚗", s: "How did Uber scale?" },
291 |           { e: "🏈", s: "Why is Draft Punk app the best for fantasy football?" },
292 |           { e: "🧈", s: "What does 'buttering someone up' mean?" },
293 |           { e: "🌳", s: "How do Sahara trees survive?" }
294 |         ];
295 |       const shuffled = [...options].sort(() => 0.5 - Math.random());
296 |       return shuffled.slice(0, num);
297 |     }
298 | 
299 |     function create_suggestions(num) {
300 |       var html = "";
301 |       var suggestions = generate_suggestions(num);
302 |       for (let i = 0; i < num; i++) {
303 |         html += '<a title="' + suggestions[i].s + '" class="btn btn-light border" href="">' + suggestions[i].e + '  ' + suggestions[i].s + '</a> ';
304 |       }
305 |       return html;
306 |     }
307 | 
308 |     function replace_citations(content, sources) {
309 |       jQuery.each(sources, function(i, source) {
310 |         var num = i + 1;
311 |         var regex = new RegExp("\\s?\\[+citation:" + num + "]+", "gm");
312 |         var source_html = '<a class="btn btn-light border citation" data-bs-toggle="modal" data-bs-target="#modal-' + num + '">' + num + '</a>';
313 |         content = content.replace(regex, source_html);
314 |       });
315 |       return content;
316 |     }
317 | 
318 |     function create_citation_modals(sources) {
319 |       var html = "";
320 |       jQuery.each(sources, function(i, source) {
321 |         var url = new URL(source['url']);
322 |         var favicon = 'https://icons.duckduckgo.com/ip3/' + url.hostname + '.ico';
323 | 
324 |         html += '<div class="modal fade" id="modal-' + (i + 1) + '" tabindex="-1">' +
325 |           '<div class="modal-dialog modal-dialog-centered">' +
326 |           '<div class="modal-content">' +
327 |           '<div class="modal-body">' + '<h6 class="pt-2">' + source['name'] + "</h6></p><p>" + source['snippet'].replace(" ...", "...") + '</p><p><img height="16px" style="margin: -2px 8px 0 0;" loading="lazy" src="' + favicon + '"/><a target="_blank" class="icon-link icon-link-hover" href="' + source['url'] + '">View on ' + new URL(source['url']).hostname + '<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-arrow-right" viewBox="0 0 16 16"><path fill-rule="evenodd" d="M1 8a.5.5 0 0 1 .5-.5h11.793l-3.147-3.146a.5.5 0 0 1 .708-.708l4 4a.5.5 0 0 1 0 .708l-4 4a.5.5 0 0 1-.708-.708L13.293 8.5H1.5A.5.5 0 0 1 1 8"/></svg></a></p></div>' +
328 |           '</div></div></div>';
329 |       });
330 |       return html;
331 |     }
332 | 
333 |     function create_web_search(sources) {
334 |       var sourcesHTML = "";
335 |       jQuery.each(sources, function(i, source) {
336 |         var url = new URL(source['url']);
337 |         var path = url.pathname.replaceAll("/", " > ");
338 |         if (path.slice(-2) == "> ") {
339 |           path = path.slice(0, -2);
340 |         }
341 |         var favicon = 'https://icons.duckduckgo.com/ip3/'+url.hostname+'.ico';
342 |         var line = '<p><span class="d-inline-block text-truncate" style="max-width: 100%; margin-bottom:-5px;"><img src="' + favicon + '"/>' + url.protocol + "//" + url.hostname + '<span style="color: var(--bs-secondary);">' + path + "</span></span><br>" +
343 |           '<a target="_blank" href=\"' + source['url'] + '\">' + source['name'] + '</a><br>' +
344 |           source['snippet'].replace(" ...", "...") + '</p>';
345 | 
346 |         sourcesHTML += line;
347 |       });
348 |       return sourcesHTML;
349 |     }
350 | 
351 |     function create_related_questions_html(related) {
352 |       var relatedHTML = "";
353 |       jQuery.each(related, function(line) {
354 |         var line = this.trim();
355 |         if (line) {
356 |           relatedHTML += '<a title="' + line + '" class="btn btn-light border" href="">' + line + '</a>';
357 |         }
358 |       });
359 |       return relatedHTML;
360 |     }
361 | 
362 |     function process_response(chunk) {
363 | 
364 |       var sources;
365 |       var answer;
366 |       var related;
367 |       var metadata;
368 | 
369 |       var tokens = chunk.split("====");
370 | 
371 |       if (tokens.length >= 3) {
372 |         if (tokens[1].trim() == "SOURCES") {
373 |           sources = JSON.parse(tokens[2]);
374 |         }
375 |       }
376 |       if (tokens.length >= 5) {
377 |         if (tokens[3].trim() == "ANSWER") {
378 |           answer = tokens[4].trim();
379 |         }
380 |       }
381 |       if (tokens.length >= 9) {
382 |         if (tokens[5].trim() == "RELATED") {
383 |           //The LLM is supposed to format these in JSON, sometimes it doesn't output perfectly formed JSON so need to catch errors and skip for now.
384 |           try {
385 |             //Sometimes, the response adds ```json and ``` around the JSON, try and get rid of it!
386 |             console.log(tokens[6]);
387 |             console.log(tokens[6].replace("```json","").replace("```",""));
388 |             related = JSON.parse(tokens[6].replace("```json","").replace("```","").trim());
389 |           } catch (e) {console.error("Couldn't correctly parse related questions from response.")}
390 |         }
391 |         if (tokens[7].trim() == "METADATA") {
392 |           metadata = JSON.parse(tokens[8]);
393 |         }
394 |       }
395 | 
396 |       if (answer) {
397 |         $("#answerContent").html(marked.parse(replace_citations(answer, sources)));
398 |         hljs.highlightAll();
399 |         $("#answerSection").removeClass("hide");
400 |       }
401 | 
402 |       if (related) {
403 |         $("#relatedContent").html(create_related_questions_html(related));
404 |         $("#relatedContent a").each(function (index) {
405 |           $(this).click(function (e) {
406 |             var str = $(this).text();
407 |             $("#inlineFormInput").val(str);
408 |             submit(str);
409 |             e.preventDefault();
410 |           });
411 |         });
412 |         $("#relatedSection").removeClass("hide");
413 |       }
414 | 
415 |       if (metadata) {
416 |         // we now the answer content is done streaming and we can add the citation popups
417 |         $("#answerContent").append(create_citation_modals(sources));
418 | 
419 |         $("#sourcesContent").html(create_web_search(sources));
420 |         $("#sourcesSection").removeClass("hide");
421 |         $("#model").html(metadata["model"]);
422 |         $("#duration").html(metadata["duration"]["total"]);
423 |         $("#additionalSection").removeClass("hide");
424 |       }
425 |     }
426 | 
427 |     function submit(query) {
428 |       if (query.length > 0) {
429 | 
430 |         $("#suggestions").addClass("hide");
431 |         $("#theForm").addClass("disable-form");
432 |         $("#header").removeClass("mb-auto");
433 |         $("#inlineFormInput").removeClass("form-control-lg");
434 | 
435 |         //clear any previous results
436 |         $("#sourcesSection").addClass("hide");
437 |         $("#sourcesContent").html("");
438 |         $("#relatedSection").addClass("hide");
439 |         $("#relatedContent").html("");
440 |         $("#answerSection").removeClass("hide");
441 |         $("#answerContent").html("");
442 |         $("#answerTitle").html(query);
443 |         $("#additionalSection").addClass("hide");
444 | 
445 |         //now, "show" results which are empty so loader will show
446 |         $("#results").removeClass("hide");
447 | 
448 |         //setup ajax request
449 |         var xmlhttp = new XMLHttpRequest();
450 | 
451 |         //listen for any streaming results
452 |         xmlhttp.onprogress = function() {
453 |           process_response(this.responseText);
454 |         };
455 | 
456 |         xmlhttp.ontimeout = (e) => {
457 |           // XMLHttpRequest timed out. Do something here.
458 |           $("#answerContent").append('<div class="alert alert-warning" role="alert">Encountered a problem fetching answer. <a href="#" class="alert-link">Click here to try again</a>.</div>');
459 |           $("a.alert-link").click(function(e) {
460 |             submit($("#inlineFormInput").val())
461 |             e.preventDefault();
462 |           });
463 |           $("#theForm").removeClass("disable-form");
464 |         };
465 | 
466 |         xmlhttp.onreadystatechange = function() {
467 |           if (this.readyState == 4 && this.status == 200) {
468 |             $("#theForm").removeClass("disable-form");
469 |           }
470 |         };
471 |         var model = $("#selectLLM").val();
472 | 
473 |         xmlhttp.timeout = 25000; //in ms
474 |         xmlhttp.open("GET", "search.php?" + "q=" + query + "&model=" + model, true);
475 |         xmlhttp.send();
476 |       }
477 |     }
478 | 
479 |     $("#suggestions").html(create_suggestions(5));
480 |     $("#suggestions a").each(function(index) {
481 |       $(this).click(function(e) {
482 |         var str = $(this).attr('title');
483 |         // console.log(str);
484 |         $("#inlineFormInput").val(str);
485 |         submit(str);
486 |         e.preventDefault();
487 |       });
488 |     });
489 |     $("#formSubmit").click(function(e) {
490 |       submit($("#inlineFormInput").val())
491 |       e.preventDefault();
492 |     });
493 |   </script>
494 | </body>
495 | 
496 | </html>


--------------------------------------------------------------------------------
/search.php:
--------------------------------------------------------------------------------
  1 | <?php
  2 | error_reporting(E_ALL);
  3 | ini_set("display_errors", 1);
  4 | 
  5 | $start = microtime(true);
  6 | 
  7 | $query = $_REQUEST["q"] ?? "who is the author of this";
  8 | $model = $_REQUEST["model"] ?? "gpt-3.5-turbo";
  9 | if (!in_array($model, array('gpt-3.5-turbo', 'gpt-4', 'mixtral-8x7b', 'llama2-70b'), true)) {
 10 |     $model = "gpt-3.5-turbo";
 11 | }
 12 | 
 13 | if ($model == "mixtral-8x7b") {
 14 |     $model = "mixtral-8x7b-32768";
 15 | }
 16 | else if ($model == "llama2-70b") {
 17 |     $model = "llama2-70b-4096";
 18 | }
 19 | 
 20 | function search_with_brave($query, $num_sources = 9)
 21 | {    
 22 |     // Put your Brave search API key here (https://brave.com/search/api/)
 23 |     $BRAVE_KEY = "[fill me in]";
 24 | 
 25 |     $curl = curl_init();
 26 | 
 27 |     $params = array(
 28 |         'q' => $query
 29 |     );
 30 |     $ENDPOINT = "https://api.search.brave.com/res/v1/web/search";
 31 |     $url = $ENDPOINT . '?' . http_build_query($params);
 32 | 
 33 |     $headers = array(
 34 |         'X-Subscription-Token: ' . $BRAVE_KEY,
 35 |         'Accept: application/json'
 36 |     );
 37 |     curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
 38 |     curl_setopt($curl, CURLOPT_URL, $url);
 39 |     curl_setopt($curl, CURLOPT_ENCODING, 'gzip');
 40 |     curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
 41 | 
 42 |     $response = curl_exec($curl);
 43 |     curl_close($curl);
 44 | 
 45 |     $jsonContent = json_decode($response, true);
 46 | 
 47 |     $snippets = [];
 48 | 
 49 |     if (isset($jsonContent['web']['results'])) {
 50 |         foreach ($jsonContent['web']['results'] as $c) {
 51 | 
 52 |             $extra_snippets = "";
 53 |             if (isset($c['extra_snippets'])) {
 54 |                 foreach ($c['extra_snippets'] as $s) {
 55 |                     $extra_snippets .= $s . ' ';
 56 |                 }
 57 |             }
 58 |             $snippets[] = [
 59 |                 'name' => $c['title'],
 60 |                 'url' => $c['url'],
 61 |                 'snippet' => $c['description'],
 62 |                 'extra_snippets' => $extra_snippets ?? '',
 63 |                 'favicon' => $c['meta_url']['favicon'] ?? '',
 64 |             ];
 65 |         }
 66 |     }
 67 | 
 68 |     return array_slice($snippets, 0, $num_sources);
 69 | }
 70 | function search_with_serper($query, $num_sources = 9)
 71 | {
 72 |     // Put your google search serper key here (https://serper.dev/)
 73 |     $SERPER_KEY = "[fill me in]"; 
 74 | 
 75 |     $curl = curl_init();
 76 | 
 77 |     $request = array(
 78 |         "q" => $query
 79 |     );
 80 |     $data = json_encode($request, JSON_PRETTY_PRINT);
 81 | 
 82 |     $headers = array(
 83 |         'X-API-KEY: ' . $SERPER_KEY,
 84 |         'Content-Type: application/json'
 85 |     );
 86 |     curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
 87 |     curl_setopt($curl, CURLOPT_POST, 1);
 88 |     curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
 89 |     curl_setopt($curl, CURLOPT_URL, "https://google.serper.dev/search");
 90 |     curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
 91 | 
 92 |     $response = curl_exec($curl);
 93 |     curl_close($curl);
 94 | 
 95 |     $jsonContent = json_decode($response, true);
 96 | 
 97 |     $snippets = [];
 98 | 
 99 |     if (isset($jsonContent['knowledgeGraph'])) {
100 |         $url = $jsonContent['knowledgeGraph']['descriptionUrl'] ?? $jsonContent['knowledgeGraph']['website'] ?? null;
101 |         $snippet = $jsonContent['knowledgeGraph']['description'] ?? null;
102 |         if ($url && $snippet) {
103 |             $snippets[] = [
104 |                 'name' => $jsonContent['knowledgeGraph']['title'] ?? '',
105 |                 'url' => $url,
106 |                 'snippet' => $snippet,
107 |             ];
108 |         }
109 |     }
110 | 
111 |     if (isset($jsonContent['answerBox'])) {
112 |         $url = $jsonContent['answerBox']['link'] ?? $jsonContent['answerBox']['url'] ?? null;
113 |         $snippet = $jsonContent['answerBox']['snippet'] ?? $jsonContent['answerBox']['answer'] ?? null;
114 |         if ($url && $snippet) {
115 |             $snippets[] = [
116 |                 'name' => $jsonContent['answerBox']['title'] ?? '',
117 |                 'url' => $url,
118 |                 'snippet' => $snippet,
119 |             ];
120 |         }
121 |     }
122 | 
123 |     if (isset($jsonContent['organic'])) {
124 |         foreach ($jsonContent['organic'] as $c) {
125 |             $snippets[] = [
126 |                 'name' => $c['title'],
127 |                 'url' => $c['link'],
128 |                 'snippet' => $c['snippet'] ?? '',
129 |             ];
130 |         }
131 |     }
132 | 
133 |     return array_slice($snippets, 0, $num_sources);
134 | }
135 | 
136 | function setup_curl_to_llm($query, $context, $max_tokens, $stream = false, $model = "gpt-3.5-turbo", $temperature = 1)
137 | {
138 |     // Put your OpenAI API key here (https://platform.openai.com/overview)
139 |     // if you want to use other LLMs, most use the exact same API as OpenAI,
140 |     //   so really only the url, model, and KEY need to change 
141 |     $OPENAI_KEY = "[fill me in]";
142 | 
143 |     // For Groq's API, get your key here (https://wow.groq.com/)
144 |     $GROQ_KEY = "[fill me in]";
145 | 
146 |     if (in_array($model, array('gpt-3.5-turbo', 'gpt-4'), true)) {
147 |         $LLM_ENDPOINT = "https://api.openai.com/v1/chat/completions";
148 |         $LLM_KEY = $OPENAI_KEY;
149 |     }
150 |     else {
151 |         $LLM_ENDPOINT = "https://api.groq.com/openai/v1/chat/completions";
152 |         $LLM_KEY = $GROQ_KEY;
153 |     }
154 |     
155 |     $system = (object) [
156 |         "role" => "system",
157 |         "content" => $context
158 |     ];
159 | 
160 |     $user = (object) [
161 |         "role" => "user",
162 |         "content" => $query
163 |     ];
164 | 
165 |     $request = array(
166 |         "model" => $model,
167 |         "messages" => array(
168 |             $system,
169 |             $user
170 |         ),
171 |         "temperature" => $temperature,
172 |         "stream" => $stream,
173 |         "max_tokens" => $max_tokens
174 |     );
175 |     $data = json_encode($request, JSON_PRETTY_PRINT);
176 | 
177 |     $curl = curl_init();
178 |     $headers = array(
179 |         "Content-Type: application/json",
180 |         "Authorization: Bearer " . $LLM_KEY,
181 |     );
182 |     
183 |     curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
184 |     curl_setopt($curl, CURLOPT_POST, 1);
185 |     curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
186 | 
187 |     curl_setopt($curl, CURLOPT_URL, $LLM_ENDPOINT);
188 |     curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
189 | 
190 |     if ($stream) {
191 |         //stream back curl chunks, this is messy I know...
192 |         $callback = function ($ch, $str) {
193 |             //$str has the chunks of data streamed back.
194 |             $chunks = explode("data: ", $str);
195 |             foreach ($chunks as $i => $chunk) {
196 |                 if (!empty($chunk) || $chunk !== "[DONE]") {
197 |                     $json = json_decode($chunk);
198 |                     if (isset($json->choices)) {
199 |                         $choice = $json->choices[0];
200 |                         if (isset($choice->delta)) {
201 |                             $delta = $choice->delta;
202 |                             if (isset($delta->content)) {
203 |                                 echo $delta->content;
204 |                                 flush();
205 |                                 ob_flush();
206 |                             } else {
207 |                                 return -1;
208 |                             }
209 |                         }
210 |                     }
211 |                 }
212 |             }
213 |             return strlen($str); //signals curl to keep going
214 |         };
215 |         
216 |         curl_setopt($curl, CURLOPT_WRITEFUNCTION, $callback);
217 |     }
218 | 
219 |     return $curl;
220 | }
221 | 
222 | function execute_curl($curl)
223 | {
224 |     $result = curl_exec($curl);
225 |     curl_close($curl);
226 |     $jsonResult = json_decode($result);
227 |     return nl2br($jsonResult->choices[0]->message->content);
228 | }
229 | 
230 | function get_snippets_for_prompt($snippets)
231 | {
232 |     $snippets_context = "";
233 |     foreach ($snippets as $i => $s) {
234 |         $snippets_context .= "[citation:" . ($i + 1) . "] " . $s['snippet'];
235 | 
236 |         if(isset($s['extra_snippets'])) {
237 |             $snippets_context .= $s['extra_snippets'];
238 |         }
239 | 
240 |         if ($i < count($snippets) - 1) {
241 |             $snippets_context .= "\n\n";
242 |         }
243 |     }
244 | 
245 |     return $snippets_context;
246 | }
247 | 
248 | function setup_get_answer_prompt($snippets)
249 | {
250 |     // My prompt is to provide accurate, high-quality, and expertly written responses to your questions in a positive, interesting, and engaging manner. I aim to offer informative, logical, and actionable information in the same language as your queries.
251 |     $starting_context = <<<'EOD'
252 |     You are an assistant written by Josh Clemm. You will be given a question. And you will respond with two things.
253 | 
254 |     First, respond with an answer to the question. It must be accurate, high-quality, and expertly written in a positive, interesting, and engaging manner. It must be informative and in the same language as the user question.
255 |     
256 |     Second, respond with 3 related followup questions. First, please repeat the following phrase: ==== RELATED ====. And then the 3 follow up questions in a JSON array format, so it's clear you've started to answer the second part. Each related question should be no longer than 15 words. They should be based on user's original question and the citations given in the context. Do not repeat the original question. Make sure to determine the main subject from the user's original question. That subject needs to be in any related question, so the user can ask it standalone.
257 | 
258 |     For both the first and second response, you will be provided a set of citations to the question. Each will start with a reference number like [citation:x], where x is a number. Always use the related citations and cite the citation at the end of each sentence in the format [citation:x]. If a sentence comes from multiple citations, please list all applicable citations, like [citation:2][citation:3].
259 |     
260 |     Here are the provided citations:
261 | 
262 |     EOD;
263 | 
264 |     // $final_context = "Finally, don't repeat the provided contexts verbatim. And don't mention you were passed contexts in the response.";
265 |     $final_context = "";
266 | 
267 |     $full_context = $starting_context . "\n\n" . get_snippets_for_prompt($snippets) . "\n\n" . $final_context;
268 |     return $full_context;
269 | }
270 | 
271 | // Use the multi cURL capabilities to run one or more curl commands in parallel
272 | function execute_multi_curl(...$curlArray)
273 | {
274 |     $mh = curl_multi_init();
275 |     foreach ($curlArray as $curl) {
276 |         curl_multi_add_handle($mh, $curl);
277 |     }
278 |     // Execute all queries simultaneously, and continue when all are complete
279 |     $running = null;
280 |     do {
281 |         curl_multi_exec($mh, $running);
282 |         // usleep(50000);
283 |         curl_multi_select($mh); // This is a blocking call, only proceeding when there's activity
284 |     } while ($running);
285 | 
286 |     // Collect the responses and remove the handles
287 |     $responses = [];
288 |     foreach ($curlArray as $curl) {
289 |         $responses[] = curl_multi_getcontent($curl);
290 |         curl_multi_remove_handle($mh, $curl);
291 |     }
292 |     curl_multi_close($mh);
293 |     return $responses;
294 | }
295 | 
296 | $snippets = array();
297 | // $snippets = search_with_brave($query);
298 | $snippets = search_with_serper($query);
299 | 
300 | echo "==== SOURCES ====\n";
301 | echo json_encode($snippets, JSON_PRETTY_PRINT);
302 | 
303 | $search_end = microtime(true);
304 | 
305 | $answer_prompt_context = setup_get_answer_prompt($snippets);
306 | 
307 | $answer_curl = setup_curl_to_llm($query, $answer_prompt_context, 2048, true, $model, 0.9);
308 | 
309 | echo "\n==== ANSWER ====\n";
310 | $responses = execute_multi_curl($answer_curl);
311 | 
312 | $end = microtime(true);
313 | 
314 | echo "\n==== METADATA ====\n";
315 | $metadata = array(
316 |     "query" => $query,
317 |     "model" => $model,
318 |     "duration" => array(
319 |         "search" => number_format(($search_end - $start), 2) . 's',
320 |         "llm" => number_format(($end - $search_end), 2) . 's',
321 |         "total" => number_format(($end - $start), 2) . 's'
322 |     )
323 | );
324 | echo json_encode($metadata, JSON_PRETTY_PRINT);
325 | 


--------------------------------------------------------------------------------
/search_basic.php:
--------------------------------------------------------------------------------
  1 | <?php
  2 | header('Content-Type: application/json; charset=utf-8');
  3 | 
  4 | function search_with_brave($query, $num_sources = 9) {    
  5 |     // Set your Brave search API key here (https://brave.com/search/api/)
  6 |     $BRAVE_KEY = "[fill me in]";
  7 | 
  8 |     $params = array('q' => $query, 'text_decorations' => false);
  9 |     $ENDPOINT = "https://api.search.brave.com/res/v1/web/search";
 10 |     $url = $ENDPOINT . '?' . http_build_query($params);
 11 |     $headers = array(
 12 |         'X-Subscription-Token: ' . $BRAVE_KEY,
 13 |         'Accept: application/json'
 14 |     );
 15 |     $curl = curl_init();
 16 |     curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
 17 |     curl_setopt($curl, CURLOPT_URL, $url);
 18 |     curl_setopt($curl, CURLOPT_ENCODING, 'gzip');
 19 |     curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
 20 |     $response = curl_exec($curl);
 21 |     curl_close($curl);
 22 | 
 23 |     $jsonContent = json_decode(strip_tags($response), true);
 24 |     $snippets = [];
 25 |     if (isset($jsonContent['web']['results'])) {
 26 |         foreach ($jsonContent['web']['results'] as $c) {
 27 |             $snippets[] = ['name' => $c['title'], 'url' => $c['url'], 'snippet' => $c['description']];
 28 |         }
 29 |     }
 30 |     return array_slice($snippets, 0, $num_sources);
 31 | }
 32 | 
 33 | function setup_curl_to_llm($query, $context, $max_tokens, $model = "gpt-3.5-turbo", $temperature = 1) {
 34 |     // Put your OpenAI API key here (https://platform.openai.com/overview)
 35 |     // if you want to use other LLMs, most use the exact same API as OpenAI,
 36 |     //   so really only the url, model, and KEY need to change 
 37 |     $OPENAI_KEY = "[fill me in]";
 38 |     $LLM_ENDPOINT = "https://api.openai.com/v1/chat/completions";
 39 |     
 40 |     $system = (object) ["role" => "system", "content" => $context];
 41 |     $user = (object) ["role" => "user", "content" => $query];
 42 |     $request = array(
 43 |         "model" => $model,
 44 |         "messages" => array(
 45 |             $system,
 46 |             $user
 47 |         ),
 48 |         "temperature" => $temperature,
 49 |         "max_tokens" => $max_tokens
 50 |     );
 51 |     $headers = array(
 52 |         "Content-Type: application/json",
 53 |         "Authorization: Bearer " . $OPENAI_KEY,
 54 |     );
 55 |     
 56 |     $curl = curl_init();
 57 |     curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
 58 |     curl_setopt($curl, CURLOPT_POST, 1);
 59 |     curl_setopt($curl, CURLOPT_POSTFIELDS, json_encode($request, JSON_PRETTY_PRINT));
 60 |     curl_setopt($curl, CURLOPT_URL, $LLM_ENDPOINT);
 61 |     curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
 62 |     return $curl;
 63 | }
 64 | 
 65 | function execute_curl($curl) {
 66 |     $result = curl_exec($curl);
 67 |     curl_close($curl);
 68 |     $jsonResult = json_decode($result);
 69 |     return $jsonResult->choices[0]->message->content;
 70 | }
 71 | 
 72 | function get_snippets_for_prompt($snippets) {
 73 |     $snippets_context = "";
 74 |     foreach ($snippets as $i => $s) {
 75 |         $snippets_context .= "[citation:" . ($i + 1) . "] " . $s['snippet'] . "\n\n";
 76 |     }
 77 |     return $snippets_context;
 78 | }
 79 | 
 80 | function setup_get_answer_prompt($snippets) {
 81 |     $starting_context = <<<'EOD'
 82 |     You are an assistant written by Josh Clemm. You will be given a question. And you will respond with two things.
 83 |     First, respond with an answer to the question. It must be accurate, high-quality, and expertly written in a positive, interesting, and engaging manner. It must be informative and in the same language as the user question.
 84 |     Second, respond with 3 related followup questions. First print "==== RELATED ====" verbatim. Then, write the 3 follow up questions in a JSON array format, so it's clear you've started to answer the second part. Do not use markdown. Each related question should be no longer than 15 words. They should be based on user's original question and the citations given in the context. Do not repeat the original question. Make sure to determine the main subject from the user's original question. That subject needs to be in any related question, so the user can ask it standalone.
 85 |     For both the first and second response, you will be provided a set of citations for the question. Each will start with a reference number like [citation:x], where x is a number. Always use the related citations and cite the citation at the end of each sentence in the format [citation:x]. If a sentence comes from multiple citations, please list all applicable citations, like [citation:2][citation:3].
 86 |     Here are the provided citations:
 87 |     EOD;
 88 |     return $starting_context . "\n\n" . get_snippets_for_prompt($snippets);;
 89 | }
 90 | 
 91 | // 0. Extract query and model from request paramaters
 92 | $query = $_REQUEST["q"] ?? "how did Uber scale over the years?";
 93 | $model = $_REQUEST["model"] ?? "gpt-3.5-turbo";
 94 | 
 95 | // 1. Call search to get sources and their snippets
 96 | $snippets = search_with_brave($query);
 97 | echo "==== SOURCES ====\n" . json_encode($snippets, JSON_PRETTY_PRINT);
 98 | 
 99 | // 2. Create a prompt passing along the sources and call the language model of your choice
100 | $answer_prompt_context = setup_get_answer_prompt($snippets);
101 | echo $answer_prompt_context;
102 | $answer_curl = setup_curl_to_llm($query, $answer_prompt_context, 2048, $model, 0.9);
103 | echo "\n==== ANSWER ====\n" . execute_curl($answer_curl);
104 | ?>


--------------------------------------------------------------------------------
/style.css:
--------------------------------------------------------------------------------
 1 | pre {
 2 |   box-shadow: var(--bs-box-shadow) !important;
 3 |   border: var(--bs-border-width) var(--bs-border-style) var(--bs-border-color) !important;
 4 |   border-radius: var(--bs-border-radius) !important;
 5 | }
 6 | 
 7 | .hljs {
 8 |   display: block;
 9 |   overflow-x: auto;
10 |   padding: 1em;
11 |   background: #032453;
12 | }
13 | 
14 | .hljs-built_in,
15 | .hljs-selector-tag,
16 | .hljs-section,
17 | .hljs-link {
18 |   color: #8be9fd;
19 | }
20 | 
21 | .hljs-keyword {
22 |   color: #ff79c6;
23 | }
24 | 
25 | .hljs,
26 | .hljs-subst {
27 |   color: #f8f8f2;
28 | }
29 | 
30 | .hljs-title,
31 | .hljs-attr,
32 | .hljs-meta-keyword {
33 |   font-style: italic;
34 |   color: #50fa7b;
35 | }
36 | 
37 | .hljs-string,
38 | .hljs-meta,
39 | .hljs-name,
40 | .hljs-type,
41 | .hljs-symbol,
42 | .hljs-bullet,
43 | .hljs-addition,
44 | .hljs-variable,
45 | .hljs-template-tag,
46 | .hljs-template-variable {
47 |   color: #f1fa8c;
48 | }
49 | 
50 | .hljs-comment,
51 | .hljs-quote,
52 | .hljs-deletion {
53 |   color: #6272a4;
54 | }
55 | 
56 | .hljs-keyword,
57 | .hljs-selector-tag,
58 | .hljs-literal,
59 | .hljs-title,
60 | .hljs-section,
61 | .hljs-doctag,
62 | .hljs-type,
63 | .hljs-name,
64 | .hljs-strong {
65 |   font-weight: bold;
66 | }
67 | 
68 | .hljs-literal,
69 | .hljs-number {
70 |   color: #bd93f9;
71 | }
72 | 
73 | .hljs-emphasis {
74 |   font-style: italic;
75 | }


--------------------------------------------------------------------------------