├── README.md └── rag-elixir.livemd /README.md: -------------------------------------------------------------------------------- 1 | # rag-elixir-doc 2 | 3 |

Building LLM enhanced search with the help of LLMs....

4 | 5 | We want to improve the search for the Elixir/Phoenix/Plug/LiveView documentation when using an LLM and experiment a `RAG` pipeline. 6 | 7 | All the tools used here are "free", meaning everything is running locally. 8 | 9 | [![Run in Livebook](https://livebook.dev/badge/v1/blue.svg)](https://livebook.dev/run?url=https%3A%2F%2Fgithub.com%2Fdwyl%2Frag-elixir-doc%2Fblob%2Fmain%2Frag-elixir.livemd) 10 | 11 | ## What is `RAG`? 12 | 13 | It is a "chat with your documents" process, meaning you ask an LLM model to respond based on additional ressources. 14 | 15 | Theses sources may or may not be already incorporated inside the training used for the LLM. 16 | 17 | Using RAG is _not about fine tuning_ the model, which is changing the coefficients or structure of the model based on additional sources. 18 | 19 | RAG is about giving an additional context - the "context window" - to enhance or constraint the response from the LLM. 20 | 21 | > Note that the LLM accepts a limited amounts of tokens, thus the window context is limited. 22 | 23 | 24 | ## Scope of this POC: 25 | 26 | We want to improve the LLm's response when we ask questions related to the Elixir/Phoenix/Plug/LiveView documentation. We will build a "context" to add information to help the LLM to build a response. 27 | 28 | Running such a helper locally means that we need to have the extra ressources available locally. Our database will be local and our LLM will be run locally, using only local ressources. 29 | 30 | We will extract some markdown files from the Phoenix_LiveView GitHub repo. 31 | 32 | We will use a database to store chunks extracted from these files. 33 | 34 | 35 | - One way is **SQL Full-Text-Search**. If we use `Postgres`, we have a [built-in functionality](https://www.postgresql.org/docs/current/textsearch-intro.html#TEXTSEARCH-DOCUMENT). This works by using ** key words**, so given that we may have various ways to express the same questions, we may want a more semantic search. [A good introduction to SQL FTS](https://leandronsp.com/a-powerful-full-text-search-in-postgresql-in-less-than-20-lines) or [here](https://peterullrich.com/complete-guide-to-full-text-search-with-postgres-and-ecto). 36 | 37 | - We can also use the open source version of "elastisearch" by Amazon named `opensearch` via a Docker image. We may also use the [local image of ElastiSearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/run-elasticsearch-locally.html). There exists [Text Chunking tools](https://opensearch.org/docs/latest/search-plugins/text-chunking/). The matching is based on the **BM25** algorithm. This might be an interesting extension to use this instead of the **bi-encoding** step, as a second version. Indeed, Elixir has an Opensearch client: [snap](https://github.com/breakroom/snap) 38 | 39 | - we can build a RAG pipeline with an LLM. We follow this route here. We save chunks with their embedding into a database, run a similarity search against our query, and then rerank our top-k reponses, and finally use the last reponse to build our prompt and submit it as a context to the LLM. This will help to constrain responses on questions about Elixir/Plug/Phoenix_Liveview. 40 | 41 | 42 | ## Using LLM to find which models to use 43 | 44 | We are using pre-trained NR models. We will also use an LLM to elaborate a response. 45 | 46 | The first question is: what does a RAG pipeline brings? 47 | 48 | We believe the next two paragraphs give some confidence that this technic works, or at least improves some responses. 49 | 50 | ### Testing Codellama 51 | 52 | ```text 53 | Question: "how to handle forms server-side?" 54 | ``` 55 | 56 |
The "raw" response 57 | 58 | Handling forms on the server-side involves using the `Phoenix.HTML` module's `form_for` function to create a form struct with the necessary information for submitting the form data to the server. 59 | Here's an example: 60 | 61 | ```elixir 62 | import Phoenix.HTML 63 | 64 | # Create a form struct with the necessary information 65 | form = form_for(conn, :action, :controller) 66 | 67 | # Add fields to the form using the `add_field` function 68 | form = form |> add_field(:name, :text) |> add_field(:age, :number) 69 | 70 | # Use the `handle_event` function to handle form submissions 71 | handle_event(form, :submit, conn) do 72 | # Do something with the submitted data here, e.g. update a database 73 | end 74 | ``` 75 | 76 | The `form_for` function takes three arguments: 77 | 78 | * `conn`: The current connection struct 79 | * `action`: The action to perform when submitting the form(e.g. `:create`, `:update`, etc.) 80 | * `controller`: The controller that handles the form submission 81 | 82 | Once you have created the form struct, you can add fields to it using the `add_field` function. 83 | This function takes two arguments: 84 | 85 | * `name`: The name of the field (e.g. `:username`,`:password`, etc.) 86 | * `type`: The type of field (e.g. `:text`, `:number`, etc.) 87 | 88 | You can then use the `handle_event` function to handle form submissions. This function takes three arguments: 89 | 90 | * `form`: The form struct that was created using the `form_for` function 91 | * `event`: The event that triggered the submission (e.g. `:submit`, etc.) 92 | * `conn`: The current connection struct 93 | 94 | Inside the `handle_event` function, you can perform any necessary actions when the form is submitted, such as updating a database or sending an email. 95 | 96 | To handle form validation and error handling, you can use the `Phoenix.LiveView` module's `validate` and `error` functions. 97 | 98 | For example: 99 | 100 | ```elixir 101 | import Phoenix.LiveView 102 | 103 | # Add validation rules to the form using the `validate` function 104 | form = form |> validate(:name, :required) |> validate(:age, :number) 105 | 106 | # Handle errors in the form submission using the `error` function 107 | handle_event(form, :submit, conn) do 108 | if error?(conn) do 109 | # Do something with the validation errors here, e.g. display an error message 110 | else 111 | # Do something with the submitted data here, e.g. update a database 112 | end 113 | end 114 | ``` 115 | 116 | In this example, we use the `validate` function to add validation rules to the form. The `:required` and `:number` arguments specify that the fields should be required and contain only numbers, respectively. 117 | If there are any errors in the submitted data, the `error?` function will return true and we can handle them using the `error` function. 118 |
119 |
120 | 121 |
The "RAGed" response 122 | 123 | To handle forms with LiveView, you can use the `Phoenix.Component` module and its functions to define a form component that will submit and validate the form data using LiveView's built-in validation mechanism. Here's an example of how to define a form component in your LiveView: 124 | 125 | ```elixir 126 | defmodule MyAppWeb.Components.Form do 127 | use Phoenix.Component 128 | def form(assigns) do 129 | ~H""" 130 | <.form for={@form} phx-change="validate" phx-submit="save">\ 131 | ... 132 | 133 | """ 134 | end 135 | end 136 | ... 137 | ``` 138 | 139 | In the above example, we define a `Form` component that renders a form tag with a `phx-change` and `phx-submit` event to handle validation and submission of the form data. The `for` attribute is used to bind the form to a changeset or user parameters, which can be created from a form struct using `Phoenix.Component.to_form/1`. 140 | 141 | 142 | Additionally, we strongly recommend including a unique HTML "id" attribute on the form. When DOM siblings change, elements without an ID will be replaced rather than moved, which can cause issues such as form fields losing focus. 143 | 144 | You may wish for an individual input to use its own change event or to target a different component. This can be accomplished by annotating the input itself with `phx-change`, for example: 145 | 146 | ```elixir 147 | <.form for={@form} phx-change="validate" phx-submit="save"> 148 | ... 149 | <.input field={@form[:email]} phx-change="email_changed" phx-target={@myself} /> 150 | 151 | ``` 152 | 153 | The `phx-change` event will trigger the `email_changed` function on the `@myself` component, which can be used to update the form data or perform other actions based on changes to the input field. 154 | 155 | You may also wish for an individual input to use its own change event or to target a different component. This can be accomplished by annotating the input itself with `phx-change`, for example: 156 | 157 | ```elixir 158 | <.form for={@form} phx-change="validate" phx-submit="save"> 159 | ... 160 | <.input field={@form[:email]} phx-change="email_changed" phx-target={@myself} /> 161 | 162 | ``` 163 | 164 | The `phx-change` event will trigger the `email_changed` function on the `@myself` component, which can be used to update the form data or perform other actions based on changes to the input field.\n\nIt's also important to note that you may want to use a unique HTML \"id\" attribute on the form. When DOM siblings change, elements without an ID will be replaced rather than moved, which can cause issues such as form fields losing focus.\n\nYou may also wish for an individual input to use its own change event or to target a different component. This can be accomplished by annotating the input itself with `phx-change`, for example: 165 | 166 | ```elixir 167 | <.form for={@form} phx-change="validate" phx-submit="save"> 168 | ... 169 | <.input field={@form[:email]} phx-change="email_changed" phx-target={@myself} /> 170 | 171 | ``` 172 | 173 | The `phx-change` event will trigger the `email_changed` function on the `@myself` component, which can be used to update the form data or perform other actions based on changes to the input field. 174 |
175 |
176 | 177 | The first conclusion is that the RAG pipeline improves a lot the response. 178 | This happens also because most probablythe current version of the LLM (July 2024) is not up-to-date with the current state-of-art at the time of writting. 179 | 180 | We continue by asking the LLM to help us with the technical task below. We encounter the main limitation: halucination. 181 | 182 |
Which embedding models can we use? Some examples of how LLMs respond to this question 183 | 184 | We know that most questions that are submitted to an LLM will produce so-called "halucinated" responses, or in other words, invented. 185 | 186 | Let's test this with LLama3.1, choosen because you can run it locally via `ollama run llama3.1`. 187 | 188 | ```txt 189 | question: "I want to use an Elixir portage of llama. Can you suggest a library? 190 | 191 | response: "**Elixir-T5**: This is an Elixir port of a popular LLM architecture (T5). It's designed specifically for use with Elixir and might be worth considering. You may also consider **LLaMA-erl**: This is an Elixir wrapper for the LLaMA model, which is similar to Ollama. 192 | ``` 193 | 194 | Nor `Elixir-T5` nor `LLama-erl` exist! 195 | 196 | I tried CodeLLama: `ollama run codellama`: 197 | 198 | ```txt 199 | question: which Elixir library would you suggest for text chunking to compute embeddings efficiently? 200 | ``` 201 | 202 |
codellama response 203 | 204 | ```console 205 | response: 206 | There are several Elixir libraries available for text chunking and computing embeddings, each with its own strengths and weaknesses. Here are a few popular options: 207 | 208 | 1. `elixir_bert`: This is an Elixir port of the BERT (Bidirectional Encoder Representations from Transformers) model, which is a powerful language model that can be 209 | used for text chunking. It provides a simple and efficient interface for computing embeddings, and it supports both CPU and GPU acceleration. 210 | 2. `elixir_bert_embedding`: This is an Elixir wrapper for the BERT embedding layer, which allows you to compute embeddings for input sequences using a pre-trained 211 | BERT model. It provides a simple interface for text chunking and supports both CPU and GPU acceleration. 212 | 3. `elixir_word2vec`: This is an Elixir port of the Word2Vec model, which is another popular language model that can be used for text chunking. It provides a simple 213 | and efficient interface for computing embeddings, and it supports both CPU and GPU acceleration. 214 | 4. `elixir_glove`: This is an Elixir port of the GLOVE (Global Vectors for Word Representation) model, which is a language model that can be used for text chunking. 215 | It provides a simple and efficient interface for computing embeddings, and it supports both CPU and GPU acceleration. 216 | 217 | All of these libraries should work well for your purposes, but you may want to consider which one best fits your specific needs. If you have any specific requirements 218 | or preferences (e.g., support for a particular file format or feature), you may want to choose the library that is most suitable for your use case. 219 | ``` 220 |
221 |
222 | 223 | We check that `elixir_bert` does not exist, nor `elixir_bert_embedding`, nor `elixir_glove` nor `elixir_workd2vec`. 224 | 225 | However, after researching a bit, the library `ExFinalFusion` exists, a NIF of `FinalFusion`. 226 | [![Screenshot 2024-08-24 at 19 33 08](https://github.com/user-attachments/assets/11d7daeb-8f3f-49ea-a2af-38c2c654c42d)](https://hexdocs.pm/ex_final_fusion/ExFinalFusion.html#summary) 227 | 228 | ### Let the LLMs choose the models 229 | 230 | The question: 231 | 232 | ```console 233 | I want to code a RAG pipeline in the Elixir langugage using its ML libraries (Nx, Axon, Bumble 234 | bee). I want to use a bi-encoder to compute embeddings from chunks of text, and then run a 235 | cross-encoding process. Can you recommend models firstly for the bi-encoding and then for the 236 | cross-encoding step, having in mind that I need a model for tokenizing the input. 237 | ``` 238 | 239 | Some reponses with differents LLMs. 240 | 241 |
codellama response 242 | 243 | ```console 244 | Bi-encoding (Tokenization): 245 | 246 | * BERT (Bidirectional Encoder Representations from Transformers) 247 | + This is a popular pre-trained language model that can be fine-tuned for various NLP tasks, 248 | including tokenization. 249 | + BERT uses a multi-layer bidirectional transformer encoder to generate contextualized 250 | representations of words in a sentence. 251 | * RoBERTa (Robustly Optimized BERT Pretraining Approach) 252 | + This is a variant of BERT that was specifically designed for text classification tasks and 253 | has been shown to achieve state-of-the-art results on several benchmark datasets. 254 | + RoBERTa uses a combination of tokenization and language modeling techniques to generate 255 | 256 | contextualized representations of words in a sentence. 257 | * DistilBERT (Distilled BERT) 258 | + This is a smaller and more efficient variant of BERT that has been trained to match the 259 | 260 | performance of the full BERT model while requiring fewer computational resources. 261 | + DistilBERT uses a combination of tokenization and attention techniques to generate 262 | contextualized representations of words in a sentence. 263 | 264 | Cross-encoding: 265 | 266 | * MLP (Multi-layer perceptron) 267 | + This is a simple feedforward neural network that can be used for classification tasks such 268 | as text classification, sentiment analysis, and topic modeling. 269 | + An MLP consists of multiple layers of artificial neurons that learn to represent the input 270 | data in a higher dimensional space. 271 | * CNN (Convolutional neural network) 272 | + This is a deep learning architecture that is particularly well-suited for text 273 | classification tasks where the input data has a spatial or temporal structure. 274 | + A CNN consists of multiple convolutional layers that learn to extract features from the 275 | 276 | input data, followed by pooling and fully connected layers to make predictions. 277 | 278 | It's important to note that the specific models you choose will depend on your specific use 279 | case and requirements. You may also want to consider using a combination of different models 280 | to achieve better results. 281 | ``` 282 |
283 |
284 | 285 |
Llama3 response 286 | 287 | ```console 288 | **Bi-Encoding Step: Tokenization and Embedding** 289 | 290 | For tokenization, you can use a simple word tokenizer like `Nx.Tokenizer` from the Elixir 291 | `axon` library. However, if you need more advanced tokenization capabilities (e.g., handling 292 | out-of-vocabulary words), consider using Bumblebee's `Bee.Tokenizer`. For embedding the 293 | tokens into vectors, I recommend using: 294 | 295 | 1. **Universal Sentence Encoder (USE)**: This is a pre-trained language model that can be 296 | used for token embedding. You can use the Elixir wrapper provided by Axon to load and use 297 | the USE model. 298 | 299 | **Cross-Encoding Step** 300 | 301 | For cross-encoding, you'll need to take the output of the bi-encoding step and generate a 302 | new representation based on it. I recommend using: 303 | 304 | 1. **Reformer**: This is a state-of-the-art transformer architecture that's well-suited for 305 | the RAG pipeline. You can use Bumblebee to load and use the Reformer model. 306 | 307 | Here are some additional recommendations: 308 | 309 | * For tokenizing input text, you might want to consider using more advanced techniques like 310 | subwording (e.g., WordPiece) or character-level encoding. 311 | * If you're dealing with long documents, you may need to split them into smaller chunks 312 | before applying bi-encoding and cross-encoding. Consider using a technique like document 313 | chunking or summarization. 314 | 315 | To give you a better understanding of the pipeline flow, here's a high-level outline: 316 | 317 | 1. **Tokenize input text**: Use `Nx.Tokenizer` or Bumblebee's `Bee.Tokenizer` to break down 318 | the text into individual tokens. 319 | 2. **Bi-encode tokens**: Apply USE to embed each token into a vector space (e.g., 320 | 128-dimensional). 321 | 3. **Cross-encode output**: Use Reformer to generate a new representation based on the 322 | bi-encoded output. 323 | 324 | Remember to preprocess your data (e.g., normalize text, remove stop words) before feeding it 325 | into these models. 326 | 327 | For more information and implementation details, I recommend checking out the following 328 | resources: 329 | 330 | * [Bumblebee documentation](https://hexdocs.pm/bumblebee/) 331 | * [Axon documentation](https://hexdocs.pm/axon/) 332 | * [Elixir ML 333 | libraries](https://elixir-lang.org/docs/master/libraries.html#machine-learning-libraries) 334 | * [RAG paper and implementation](https://arxiv.org/abs/2006.16262) 335 | ``` 336 |
337 |
338 | 339 |
Claude3.5 Sonnet response 340 | 341 | ```console 342 | Bi-encoding step: 343 | 344 | For the bi-encoding step, where you'll encode both the chunks from the documentation and the query, a good choice would be a sentence transformer model. A popular option that works well for semantic search is: 345 | 346 | Model: "sentence-transformers/all-MiniLM-L6-v2" 347 | 348 | This model provides a good balance between performance and speed. It's relatively small (80MB) but still gives good results for semantic similarity tasks. 349 | 350 | Cross-encoding step: 351 | 352 | For the cross-encoding step, where you'll want to get a more precise relevance score between the query and the retrieved documents, you could use: 353 | 354 | Model: "cross-encoder/ms-marco-MiniLM-L-6-v2" 355 | 356 | This model is specifically trained for relevance ranking and should work well for your cross-encoding needs. 357 | 358 | Tokenization: 359 | 360 | For tokenization, you'll want to use a tokenizer that matches the models you're using. Since both recommended models are based on MiniLM, you can use the MiniLM tokenizer: 361 | 362 | Tokenizer: "microsoft/MiniLM-L6-H384-uncased" 363 | ``` 364 |
365 |
366 | 367 |
ChatGPT response 368 | 369 | ```console 370 | To create a Retrieval-Augmented Generation (RAG) pipeline in Elixir using the Nx, Axon, and Bumblebee libraries, you'll need to carefully choose models that are suitable for both the bi-encoding and cross-encoding processes. Let's break down the components: 371 | 372 | 1. Bi-Encoder Step: 373 | Purpose: The bi-encoder step involves encoding chunks of text independently to generate embeddings that can be used for retrieval. 374 | Recommended Models: 375 | MiniLM (Microsoft): A small, efficient model designed for sentence embeddings. It's often used for bi-encoder tasks due to its balance between speed and accuracy. 376 | DistilBERT: A smaller and faster version of BERT, ideal for generating embeddings from text while maintaining a good balance of performance. 377 | Sentence-BERT (SBERT): This is a modification of BERT that is specifically designed for creating dense vector representations of sentences, making it a solid choice for bi-encoding. 378 | 2. Cross-Encoder Step: 379 | Purpose: The cross-encoder step involves comparing the query and retrieved passages together, typically used to re-rank the retrieved results. 380 | Recommended Models: 381 | BERT: The standard BERT model is highly effective for cross-encoding tasks because it considers the interaction between the query and the passage. 382 | RoBERTa: A variant of BERT with more robust training, which tends to perform better in NLP tasks. 383 | DeBERTa: Known for its improved handling of word order and dependency information, making it a strong candidate for cross-encoding. 384 | 3. Tokenization: 385 | Purpose: Tokenizing the input is crucial for both the bi-encoder and cross-encoder steps. The tokenizer needs to be compatible with the chosen models. 386 | Recommended Tokenizers: 387 | BERT Tokenizer: Compatible with BERT-based models (BERT, RoBERTa, DistilBERT, etc.). 388 | SentencePiece Tokenizer: Often used with models like MiniLM and other transformer-based models. 389 | Elixir Implementation: 390 | Libraries: 391 | Nx: For numerical computation, handling tensors, and integrating with the underlying ML operations. 392 | Axon: For building neural networks, including the bi-encoder and cross-encoder models. 393 | Bumblebee: For pre-trained models, including tokenizers and transformers, which will help with the embedding and cross-encoding tasks. 394 | Example Models in Bumblebee: 395 | For bi-encoding: Use models like "MiniLM-L6-v2" or "distilbert-base-uncased" from Bumblebee. 396 | For cross-encoding: Use models like "bert-base-uncased" or "roberta-base". 397 | Steps to Implement: 398 | Load the Model: Use Bumblebee to load the bi-encoder and cross-encoder models. 399 | Tokenize the Input: Tokenize the text using the appropriate tokenizer. 400 | Compute Embeddings: Generate embeddings using the bi-encoder model. 401 | Retrieve and Rank: Use the cross-encoder model to compare and rank the retrieved documents. 402 | ``` 403 |
404 | 405 | 406 |
407 | 408 | ### First conclusion for the LLM 409 | 410 | It seems that Claude3.5 Sonnet and ChatGTP give much "better" responses than Llama. However, we need an API to reach their REST API whilst we can run la local version of the LLama LLM. 411 | 412 | We will therefore choose LLama, running it at no cost, knowing that it can only get better by choosing a "paid" version with OpenAI or better Anthropic. 413 | 414 | ### Conclusion for the choice of our models 415 | 416 | We will use SBERT based models: 417 | - the bi-encoder"sentence-transformers/all-MiniLM-l6-v2" which also provides a tokenizer, 418 | - the cross-encoder "cross-encoder/ms-marco-MiniLM-L-6-v2" along with the tokenizer "bert-base-uncased" 419 | 420 | 421 | We check that the models are correctly implemented in Elixir by running the code in Python. This is done in [#8](https://github.com/dwyl/rag-elixir-doc/discussions/8) and [#9](https://github.com/dwyl/rag-elixir-doc/discussions/9). 422 | 423 | ## Source of knowledge 424 | 425 | We firstly seed the vector database with some Github markdown pages from the Elixir documentation. 426 | 427 | The sources will be extracted from the files that the GitHub API returns when querying some directories: 428 | - 429 | - 430 | - 431 | 432 | - we can also add some ".ex" modules when they provide documentation in a moduledoc. 433 | 434 | 435 | ## Overview of the RAG process: 436 | * installed tools: the database `Postgres` with the `pgvector` extension, the plateform `ollama` to run LLM locally. 437 | 438 | * Build the external sources. 439 | - Download "external sources" as a string 440 | - chunk the sources 441 | - produce an embedding based on a "sentence-transformer" model for each chunk 442 | - insert chunk + embedding into a Vector database using a HSNW index 443 | 444 | * Build a RAG pipeline 445 | - produce an embedding (a vector representation) from the question 446 | - perform a first vector similarity search (HNSW) against the database 447 | - rerank the top-k with "cross-encoding" 448 | - build a prompt by injecting the later result with the query as a context 449 | - submit the prompt to the LLM for completion 450 | 451 | ### Pseudo-code pipeline 452 | 453 | The pipeline will use three SBert based models: "sentence-transformers/all-MiniLM-L6-v2" for the embedding, "cross-encoder/ms-marco-MiniLM-L-6-v2" for the reranking, and "bert-base-uncased" for tokenizing. 454 | 455 | In pseudo-code, we have: 456 | 457 | ```elixir 458 | # Data collection and chunking 459 | defmodule DataCollector do 460 | def fetch_and_chunk_docs do 461 | ... 462 | end 463 | end 464 | 465 | # Embedding generation: "sentence-transformers/all-MiniLM-L6-v2" 466 | defmodule Embedder do 467 | def generate_embeddings(text) do 468 | ... 469 | end 470 | end 471 | 472 | # Semantic search 473 | defmodule SemanticSearch do 474 | def search(query, top_k) do 475 | ... 476 | end 477 | end 478 | 479 | # Cross-encoder reranking: "cross-encoder/ms-marco-MiniLM-L-6-v2" 480 | defmodule CrossEncoder do 481 | def rerank(query, documents) do 482 | ... 483 | end 484 | end 485 | 486 | # Prompt construction 487 | defmodule PromptBuilder do 488 | def build_prompt(query, context) do 489 | ... 490 | end 491 | end 492 | 493 | # LLM integration 494 | defmodule LLM do 495 | def generate_response(prompt) do 496 | ... 497 | end 498 | end 499 | 500 | # Main RAG pipeline 501 | defmodule RAG do 502 | def process_query(query) do 503 | query 504 | |> SemanticSearch.search(10) 505 | |> CrossEncoder.rerank(query) 506 | |> PromptBuilder.build_prompt(query) 507 | |> LLM.generate_response() 508 | end 509 | end 510 | ``` 511 | 512 | ## What is **bi-encoding** and **cross-encoding**? 513 | 514 | - [Bi-encoders]: Encode the query and document separately, then compare their vector representations. This is the "standard" similarity search. 515 | 516 | Bi-encoding does consider the relationship between the query and each document, but it does so independently for each document. The main problem is that bi-encoding might not capture nuanced differences between documents or complex query-document relationships. `HNSW` indexes or `BM25` can be used for this. 517 | 518 | - Cross-encoders: Take both the query and document as input simultaneously, allowing for more complex interactions between them. It processes the query and document together through a neural network (typically a transformer model like BERT) to produce a single relevance score. This allows the model to capture complex interactions between the query and document at all levels of representation. 519 | 520 | Cross-encoders typically perform better than bi-encoders in terms of accuracy, but are computationally more expensive and slower at inference time. 521 | They are not suitable for large-scale retrieval because they require comparing the query with every document from scratch, which doesn't scale well. 522 | Therefor, Cross-encoding is often used in a two-stage retrieval process. 523 | 524 | - How cross-encoders works in reranking?: 525 | - After initial retrieval (e.g., using vector similarity), you pass each query-document pair through the cross-encoder. 526 | - The cross-encoder outputs a relevance score for each pair. 527 | - Results are then sorted based on these scores, potentially significantly changing the order from the initial retrieval. 528 | 529 | 530 | ## How to **chunk**? 531 | 532 | We need to define how to ingest these documents to produce _embeddings_ saved into a _vector database_. 533 | 534 | Do we run a naive chunk? or [use this package](https://github.com/revelrylabs/text_chunker_ex), or [structured chunks](https://docs.llamaindex.ai/en/stable/examples/retrievers/auto_vs_recursive_retriever/), [Chunk + Document Hybrid Retrieval](https://docs.llamaindex.ai/en/stable/examples/retrievers/multi_doc_together_hybrid/), or use [BM25](https://docs.llamaindex.ai/en/stable/examples/retrievers/bm25_retriever/), with an Elixir implementation [BM25](https://github.com/elliotekj/bm25)? 535 | 536 | ## Which embedding? 537 | 538 | - [SBert](https://www.sbert.net/): check: . 539 | 540 | 541 | ## Vector database of Index? 542 | 543 | - An index [HNSW](https://github.com/elixir-nx/hnswlib), the Elixir portage of `hnswlib`, a KNN search, 544 | - or a vector database? 545 | - Postgres with [pgvector](https://github.com/pgvector/pgvector) with the Elixir portage: [pgvector-elixir](https://github.com/pgvector/pgvector-elixir), 546 | - SQLite with [sqlite-vec](https://github.com/asg017/sqlite-vec). The extension has to be installed manually from the repo and loaded (with `exqlite`), 547 | - or [Supabase](https://github.com/supabase/supabase), with an [Elixir client](https://github.com/zoedsoupe/supabase-ex) 548 | - or [ChromaDB](https://github.com/3zcurdia/chroma), with an [Elixir client](https://github.com/3zcurdia/chroma) 549 | 550 | We will use Postgres with the extension `pgvector` and the `HNSW` algorithm. See discussion on the Postgres + pg_vector setup. 551 | 552 | ## How to **prompt**? 553 | 554 | This is where we define the scope of the response we want from the LLM, given the retrieved context given by the database nearest neighbour search. 555 | 556 | The LLM should be able to generate an "accurate" response constrainted by this context. 557 | 558 | ## A word on **LLMs** 559 | 560 | A Dockyard post on this: . 561 | 562 | A comparison of different LLMs (source: Anthropic) 563 | Screenshot 2024-08-14 at 17 56 40 564 | 565 | ### Pricing 566 | 567 | [![Screenshot 2024-08-28 at 21 19 41](https://github.com/user-attachments/assets/a66e8689-a7c2-46e6-a597-fb141426b9cf)](https://openai.com/api/pricing/) 568 | 569 | [![Screenshot 2024-08-28 at 21 21 51](https://github.com/user-attachments/assets/081a4b1c-579d-4801-b441-d7f14be9c76a)](https://www.anthropic.com/pricing#anthropic-api) 570 | 571 | ## Going further? 572 | 573 | - accept new documents "on the fly" (download a given link), and maybe running the database ingesting in a background job. 574 | 575 | 576 | - use `Opensearch` instead of the bi-encoding: 577 | - install local: 578 | - 579 | - 580 | - ingest data: and 581 | 582 | - clusterise data? 583 | 584 | ## Source of inspiration. 585 | 586 | Which repos, blog post? 587 | - 588 | - using the cross-encoder: 589 | - Bumblebee, RAG: 590 | - Supabase: 591 | - Langchain: 592 | - 593 | - 594 | - 595 | - 596 | - A Fly.io post on using ``llama.cpp` with `Rustler`: 597 | - ExLLama: LlammaCpp.rs NIF wrapper for Elixir/Erlang: and 598 | - ollama-ex to run LLM locally: 599 | 600 | 601 | -------------------------------------------------------------------------------- /rag-elixir.livemd: -------------------------------------------------------------------------------- 1 | # RAG Elixir Phoenix Liveview documentation 2 | 3 | ```elixir 4 | Mix.install( 5 | [ 6 | {:req, "~> 0.5.6"}, 7 | {:bumblebee, "~> 0.5.3"}, 8 | {:ollama, "~> 0.7.1"}, 9 | {:text_chunker, "~> 0.3.1"}, 10 | {:postgrex, "~> 0.19.1"}, 11 | {:pgvector, "~> 0.3.0"}, 12 | {:ecto_sql, "~> 3.12"}, 13 | {:exla, "~> 0.7.3"}, 14 | {:kino_bumblebee, "~> 0.5.0"}, 15 | {:scholar, "~> 0.3.1"}, 16 | {:explorer, "~> 0.9.2"}, 17 | {:tucan, "~> 0.3.1"} 18 | ], 19 | config: [nx: [default_backend: EXLA.Backend]] 20 | ) 21 | 22 | Nx.Defn.global_default_options(compiler: EXLA, client: :host) 23 | ``` 24 | 25 | ## Vector extension to Postgres with Docker 26 | 27 | ### 1) pgvector 28 | 29 | To add the [pgvector](https://github.com/pgvector/pgvector) extension to your `PostgreSQL` container, you'll need to use a `PostgreSQL` image that includes this extension. 30 | 31 | The official PostgreSQL image doesn't include `pg_vector` by default, so we'll extend the Postgres image and build use a custom image that has the extension pgvector pre-installed. 32 | 33 | Create a Dockerfile with the following content: 34 | 35 | 36 | 37 |
38 | Dockerfile 39 | 40 | ```dockerfile 41 | FROM postgres:16 42 | 43 | RUN apt-get update && apt-get install -y \ 44 | git \ 45 | build-essential \ 46 | postgresql-server-dev-16 47 | 48 | RUN git clone https://github.com/pgvector/pgvector.git && \ 49 | cd pgvector && \ 50 | make && \ 51 | make install 52 | 53 | CMD ["postgres"] 54 | ``` 55 |
56 | 57 | 58 | 59 | Build the custom image named "postgres-with-vector": we have a **1.5Gb** image. 60 | 61 | 62 | 63 | ```bash 64 | > docker build -t postgres-with-vector . 65 | ``` 66 | 67 | 68 | 69 | Run a container in detached mode named "postgres-rag" from this custom "postgres-with-vector" image, create the database "rag_example", and open the port 5432 for the Elixir backend to be able to connect to: 70 | 71 | 72 | 73 | ```console 74 | > docker run \ 75 | -d --rm \ 76 | --name postgres-rag \ 77 | -e POSTGRES_PASSWORD=secret \ 78 | -e POSTGRES_DB=rag_example \ 79 | -p 5432:5432 \ 80 | postgres-with-vector 81 | ``` 82 | 83 | 84 | 85 | Check the logs: 86 | 87 | 88 | 89 | ```console 90 | > docker logs postgres-rag 91 | 92 | LOG: database system is ready to accept connections 93 | ``` 94 | 95 | 96 | 97 | In another terminal, connect to the running "postgres-rag" container and execute `psql` on the "rag_example" database: 98 | 99 | ```console 100 | > docker exec -it postgres-rag psql -U postgres -d rag_example 101 | ``` 102 | 103 | 104 | 105 | We execute the `psql` CLI in the container (with the default username "postgres" and password as above) to connect to the database "rag_example": 106 | 107 | 108 | 109 | ```bash 110 | rag_example=# 111 | ``` 112 | 113 | 114 | 115 | ### 2) Use an Ecto.Repo 116 | 117 | 118 | 119 | The extension will define a custom type: 120 | 121 | ```elixir 122 | Postgrex.Types.define( 123 | RAG.PostgrexTypes, 124 | Pgvector.extensions() ++ Ecto.Adapters.Postgres.extensions(), 125 | [] 126 | ) 127 | ``` 128 | 129 | Note that you can also use the Postgres adaptor [Postgrex](https://github.com/elixir-ecto/postgrex) directly with raw SQL commands. 130 | 131 | 132 | 133 |
134 | Postgrex code without Ecto 135 | 136 | ```elixir 137 | {:ok, pg} = Postgrex.start_link( 138 | username: "postgres", 139 | password: "secret", 140 | database: "rag_example", 141 | type: "RAG.PostgrexTypes" 142 | ) 143 | 144 | Postgrex.query!(pg, "create extension if not exists vector;", []) 145 | Postgrex.query!(pg, "drop table if exists documents;", []) 146 | Postgrex.query!(pg, "create table documents ....", []) 147 | ``` 148 |
149 | 150 | 151 | 152 | We use [Ecto.Repo](https://hexdocs.pm/ecto/Ecto.Repo.html) behaviour. We can use a more friendly DSL than raw SQL commands. 153 | 154 | ```elixir 155 | defmodule RAG.Repo do 156 | use Ecto.Repo, 157 | otp_app: :rag, 158 | adapter: Ecto.Adapters.Postgres 159 | end 160 | 161 | defmodule RAG.Document do 162 | use Ecto.Schema 163 | 164 | schema "documents" do 165 | field :content, :string 166 | field :embedding, Pgvector.Ecto.Vector 167 | end 168 | end 169 | 170 | {:ok, pg} = 171 | RAG.Repo.start_link( 172 | hostname: "localhost", 173 | username: "postgres", 174 | password: "secret", 175 | database: "rag_example", 176 | types: RAG.PostgrexTypes 177 | ) 178 | ``` 179 | 180 | We create the extension: 181 | 182 | ```elixir 183 | RAG.Repo.query!("create extension if not exists vector;") 184 | ``` 185 | 186 | We check in the terminal that the index `HNSW` method is available: 187 | 188 | ``` 189 | rag_example=# select * from pg_am where amname='hnsw'; 190 | 191 | 16450 | hnsw | hnswhandler | i 192 | ``` 193 | 194 | 195 | 196 | We create a table with two columns, "content" and "embedding" where the datatypes are respectively "text" and "vector(384)". The later is because we will be using an embedding model with 384 dimensions (see further). 197 | 198 | We create an `hnsw` index on the "embedding" column using the "cosine" distance. 199 | 200 | cf [documentation](https://github.com/pgvector/pgvector#hnsw): an HNSW index creates a multilayer graph. It has better query performance than IVFFlat (in terms of speed-recall tradeoff), but has slower build times and uses more memory. Also, an index can be created without any data in the table 201 | 202 | ```elixir 203 | # reset the table 204 | RAG.Repo.query!("drop table if exists documents;") 205 | 206 | RAG.Repo.query!(""" 207 | CREATE TABLE IF NOT EXISTS documents ( 208 | id SERIAL PRIMARY KEY, 209 | content TEXT, 210 | embedding vector(384) 211 | ) 212 | """) 213 | 214 | RAG.Repo.query!( 215 | "create index if not exists embedding_idx on documents using hnsw (embedding vector_l2_ops);" 216 | ) 217 | ``` 218 | 219 |
220 | Check in the terminal (that runs `psql` in the container) the details of the created table "documents" and the indexes we created 221 | 222 | ```bash 223 | rag_example=# \d documents 224 | 225 | id | integer | | not null | nextval('documents_id_seq'::regclass) 226 | content | text | | | 227 | embedding | vector(384) | | | 228 | ``` 229 | 230 | ```bash 231 | rag_example=# select * from pg_indexes where tablename='documents'; 232 | 233 | public | documents | documents_pkey | | CREATE UNIQUE INDEX documents_pkey ON public.documents USING btree (id) 234 | public | documents | documents_embedding_idx | | CREATE INDEX documents_embedding_idx ON public.documents USING hnsw (embedding vector_cosine_ops) 235 | ``` 236 |
237 | 238 | ## Fetching and chunking documents 239 | 240 | We implement the logic to fetch documents from the `Phoenix LiveView` GitHub repo and chunk them with `TextChunker`. 241 | 242 | ```elixir 243 | defmodule RAG.DataCollector do 244 | def process_directory(url, extractor) do 245 | Req.get!(url).body 246 | #|> Enum.flat_map(&extract_chunks/1) 247 | |> Enum.flat_map(fn file -> extractor.(file) end) 248 | end 249 | end 250 | ``` 251 | 252 | ## Generate & insert embeddings from the sources 253 | 254 | We use `Bumblebee` to load a sentence transformer model, and then compute the embeddings and insert into the database 255 | 256 | ```elixir 257 | defmodule RAG.Embedder do 258 | def load_model do 259 | repo = {:hf, "sentence-transformers/all-MiniLM-L6-v2"} 260 | {:ok, model_info} = Bumblebee.load_model(repo) 261 | {:ok, tokenizer} = Bumblebee.load_tokenizer(repo) 262 | 263 | embedding_serving = 264 | Bumblebee.Text.text_embedding( 265 | model_info, 266 | tokenizer, 267 | output_pool: :mean_pooling, 268 | output_attribute: :hidden_state, 269 | embedding_processor: :l2_norm, 270 | compile: [batch_size: 1, sequence_length: [2000]], 271 | defn_options: [compiler: EXLA] 272 | ) 273 | 274 | Kino.start_child({Nx.Serving, serving: embedding_serving, name: ChunkEmbedder}) 275 | end 276 | 277 | def generate_embedding(text) do 278 | %{embedding: vector} = Nx.Serving.batched_run(ChunkEmbedder, String.trim(text)) 279 | Nx.to_flat_list(vector) 280 | end 281 | end 282 | ``` 283 | 284 | ### Test the embedding against Python 285 | 286 |
287 | Python check 288 | 289 | Lets firstly test that our embedding works correctly. 290 | 291 | We use the `Python` results running this model as our source of truth. 292 | 293 | We use the `Python` library [llm](https://github.com/simonw/llm?ref=samrat.me) to compute an embedding of a given chunk. 294 | 295 | We install a plugin to bring in an embedding model "sentence-transformers": 296 | 297 | ```console 298 | > llm install llm-sentence-transformers 299 | ``` 300 | 301 | We check the installation: 302 | 303 | ```console 304 | > llm plugins 305 | 306 | [ 307 | { 308 | "name": "llm-sentence-transformers", 309 | "hooks": [ 310 | "register_commands", 311 | "register_embedding_models" 312 | ], 313 | "version": "0.2" 314 | } 315 | ] 316 | ``` 317 | 318 | We load the model and use the `llm` CLI to test the output of the chunk "phoenix liveview": 319 | 320 | ```console 321 | > llm embed -c 'phoenix liveview' -m sentence-transformers/all-MiniLM-L6-v2 322 | ``` 323 | 324 | We obtain a vector of length 384 (as expected when we craeted the row "embedding" in our "documents" table) 325 | 326 | ```console 327 | [-0.009706685319542885, -0.052094198763370514, -0.09055887907743454, -0.020933324471116066, -0.009688383899629116, 0.013350575231015682, 0.025953974574804306, -0.16938750445842743, -0.010423310101032257, -0.011145276017487049, 0.027349309995770454, -0.001918078283779323, -0.021567553281784058, -0.003199926810339093, -0.0008285145158879459, -0.015139210037887096, 0.06255557388067245, -0.06932919472455978, 0.013888751156628132, -0.004555793013423681, -0.07562420517206192, -0.009811706840991974, -0.012136539444327354, 0.04693487659096718,...] 328 | ``` 329 |
330 | 331 | 332 | 333 | We now test our `Bumblebee` settings. 334 | 335 | We load the model: 336 | 337 | ```elixir 338 | RAG.Embedder.load_model() 339 | ``` 340 | 341 | and we check that we obtain the same (first!) values as above when we run our `Bumblebee` based embedder against the same chunk: 342 | 343 | ```elixir 344 | RAG.Embedder.generate_embedding("phoenix liveview") 345 | ``` 346 | 347 | ### Build the RAG source 348 | 349 | We setup the foundations of our RAG by chunking and inserting our documents as strngs and embeddings, their numerical representation, into our vector database. 350 | 351 | We read each Github folder and download the markdown file, chunk it into a list of strings, and then compute an embedding for each chunk and save it into the vector database. 352 | 353 | ```elixir 354 | defmodule RAG.ExternalSources do 355 | 356 | def extract_chunks(file) do 357 | case file do 358 | %{"type" => "file", "name" => name, "download_url" => download_url} -> 359 | if String.ends_with?(name, ".md") do 360 | Req.get!(download_url).body 361 | |> TextChunker.split(format: :markdown, chunk_size: 800, chunk_overlap: 200) 362 | |> Enum.map(&Map.get(&1, :text)) 363 | else 364 | [] 365 | end 366 | _ -> [] 367 | end 368 | end 369 | 370 | def build(guides) do 371 | guides 372 | |> Task.async_stream(fn guide -> 373 | chunks = RAG.DataCollector.process_directory(guide, &extract_chunks/1) 374 | IO.puts("chunks length: #{length(chunks)}") 375 | Enum.each(chunks, fn chunk -> 376 | Task.start(fn -> 377 | embedding = RAG.Embedder.generate_embedding(chunk) 378 | RAG.Repo.insert!(%RAG.Document{content: chunk, embedding: embedding}) 379 | end) 380 | end) 381 | end, 382 | ordered: false, 383 | timeout: :infinity 384 | ) 385 | |> Stream.run() 386 | end 387 | end 388 | ``` 389 | 390 | ```elixir 391 | guides = [ 392 | "https://api.github.com/repos/phoenixframework/phoenix_live_view/contents/guides/server", 393 | "https://api.github.com/repos/phoenixframework/phoenix_live_view/contents/guides/client", 394 | "https://api.github.com/repos/phoenixframework/phoenix_live_view/contents/guides/introduction" 395 | ] 396 | 397 | RAG.ExternalSources.build(guides) 398 | ``` 399 | 400 | We check the number of insertions. We should have 422. 401 | 402 | ```elixir 403 | RAG.Repo.aggregate(RAG.Document, :count, :id) 404 | ``` 405 | 406 | ## Semantic search 407 | 408 | We implement the l2 similarity search on the embeddings 409 | 410 | ```elixir 411 | top_k = 20 412 | ``` 413 | 414 | ```elixir 415 | defmodule RAG.SemanticSearch do 416 | import Ecto.Query 417 | 418 | def search(query, top_k) do 419 | query_embedding = RAG.Embedder.generate_embedding(query) 420 | 421 | from(d in RAG.Document, 422 | order_by: fragment("embedding <-> ?", ^query_embedding), 423 | limit: ^top_k 424 | ) 425 | |> RAG.Repo.all() 426 | end 427 | end 428 | ``` 429 | 430 | ```elixir 431 | # Usage 432 | query = "how to handle forms server-side?" 433 | 434 | # a list of %RAG.Document{content: content, embedding: embedding} 435 | top_results = RAG.SemanticSearch.search(query, top_k) 436 | ``` 437 | 438 | We inspect the first reranking: 439 | 440 | ```elixir 441 | List.first(top_results).content 442 | ``` 443 | 444 | ## Re-ranking with cross-encoder 445 | 446 | For this step, we'll load another model from Huggingface compatible with Bumblebee to rerank the results. 447 | 448 | We use a pretrained model "cross-encoder/ms-marco-MiniLM-L-6-v2" as shown in the SBert [documentation on cross-encoders](https://www.sbert.net/docs/cross_encoder/pretrained_models.html). 449 | 450 | ```elixir 451 | defmodule RAG.CrossEncoder do 452 | @first 5 453 | 454 | def load_model do 455 | repo= {:hf, "cross-encoder/ms-marco-MiniLM-L-6-v2"} 456 | tokenizer = {:hf, "bert-base-uncased"} 457 | {:ok, model_info} = Bumblebee.load_model(repo) 458 | {:ok, tokenizer} = Bumblebee.load_tokenizer(tokenizer) 459 | 460 | {model_info, tokenizer} 461 | end 462 | 463 | def rerank(documents, query) do 464 | # Prepare input pairs for cross-encoder 465 | {model_info, tokenizer} = load_model() 466 | input_pairs = 467 | Bumblebee.apply_tokenizer(tokenizer, 468 | Enum.map(documents, fn doc -> 469 | {query, doc.content} 470 | end) 471 | ) 472 | 473 | # Run cross-encoder in batches 474 | outputs = Axon.predict(model_info.model, model_info.params, input_pairs) 475 | 476 | 477 | # Combine scores with original documents and sort 478 | Enum.zip(documents, outputs.logits |> Nx.to_flat_list()) 479 | |> Enum.sort_by(fn {_, score} -> score end, :desc) 480 | |> Enum.map(fn {doc, _} -> doc.content end) 481 | |> Enum.take(@first) 482 | end 483 | end 484 | ``` 485 | 486 | #### Check reranking against Python 487 | 488 | [TODO] 489 | 490 | 491 | 492 | This model uses the architecture `:for_sequence_classification`; there is no such function yet coded in Bumblebee at the time of writting. 493 | 494 | 495 | 496 | ### Build the context by re-ranking 497 | 498 | ```elixir 499 | # Load the model 500 | RAG.CrossEncoder.load_model() 501 | 502 | # Rerank the results 503 | #query = "how to handle forms server-side?" 504 | context = RAG.CrossEncoder.rerank(top_results, query) 505 | ``` 506 | 507 | ## Build the prompt 508 | 509 | We define the prompt with a context and a question 510 | 511 | ```elixir 512 | defmodule RAG.PromptBuilder do 513 | def build_prompt(context, query) do 514 | context_text = Enum.join(context, "\n\n") 515 | """ 516 | You are a proficient Elixir developer, with full knowledge of the framework Phoenix LiveView. 517 | You are given a context information below relevant to the query that is submitted to you. 518 | ----------------------- 519 | #{context_text} 520 | ----------------------- 521 | You answer to the query using in priority the context informations given above and you should cite it. 522 | The response should be in markdown format. 523 | 524 | Query: #{query} 525 | Answer: 526 | """ 527 | end 528 | end 529 | ``` 530 | 531 | ## LLM integration 532 | 533 | Most of the LLM are paid solutions accesible via an endpoint. Very few models can be run locally. LLMs tends to be large. 534 | 535 | We run the "codellama" model via the `ollama` plateform 536 | 537 | 538 | 539 | ### LLama CLI 540 | 541 |
542 | Install and start ollama server 543 | 544 | We install `ollama` (see [the repo](https://github.com/ollama/ollama/tree/main)) to install de "codellama" LLM. 545 | 546 | We pull a model from the registry: 547 | 548 | ```console 549 | > ollama pull codellama 550 | ``` 551 | 552 | We start an LLM server: 553 | 554 | ```console 555 | > ollama serve 556 | ``` 557 | 558 | This gives us an interactive CLI and a [REST API](https://github.com/ollama/ollama/tree/main#rest-api). 559 | 560 | 561 | We can test this and send a **POST** request to generate a completion where we pass a json `{"model": "codellama", "prompt": "...."}`. 562 | 563 | ```console 564 | > curl http://localhost:11434/api/generate -d \ 565 | '{"model": "codellama", "prompt": "how to handle forms with Phoenix Liveview?", "stream": false}' 566 | ``` 567 | 568 | We get a response back: 569 | 570 | ```json 571 | { 572 | "model":"codellama", 573 | "created_at":"2024-08-29T07:25:31.941263Z", 574 | "response":"\nTo handle forms in Phoenix LiveView, you can use the `Phoenix.LiveView.Form` module. This module provides a set of functions for creating and manipulating HTML form elements, as well as handling form data on the server.\n\nHere's an example of how to create a simple form using Phoenix LiveView:\n```\nimport Ecto.Changeset\n\n# Create a changeset for the form\nchangeset = Ecto.Changeset.change(%YourModel{}, %{})\n\n# Render the form in your template\n\u003cform phx-submit=\"save\"\u003e\n \u003cdiv\u003e\n \u003clabel for=\"name\"\u003eName:\u003c/label\u003e\n \u003cinput type=\"text\" id=\"name\" name=\"name\" value={changeset.data[\"name\"]} /\u003e\n \u003c/div\u003e\n\n \u003cdiv\u003e\n \u003clabel for=\"age\"\u003eAge:\u003c/label\u003e\n \u003cinput type=\"number\" id=\"age\" name=\"age\" value={changeset.data[\"age\"]} /\u003e\n \u003c/div\u003e\n\n \u003cbutton type=\"submit\"\u003eSave\u003c/button\u003e\n\u003c/form\u003e\n```\nIn this example, we're creating a changeset for the form, which is used to validate and update the form data on the server. We then render the form in our template using the `phx-submit` attribute, which tells Phoenix to send the form data to the server when the form is submitted.\n\nWhen the form is submitted, Phoenix will automatically handle the form data and update the changeset with any validation errors or updates. You can then use the updated changeset to persist the data in your database.\n\nTo handle the form submission on the server, you can define a `save` function in your LiveView module that will be called when the form is submitted. This function will receive the updated changeset as an argument, and you can use it to update the data in your database or perform any other necessary actions.\n```\ndef save(changeset) do\n # Validate the changeset and return an error if there are any validation errors\n case Ecto.Changeset.apply_action(changeset, :update) do\n {:ok, _model} -\u003e\n # Update the data in your database or perform any other necessary actions\n :ok\n\n {:error, _changeset} -\u003e\n # Render an error page if there were validation errors\n render(:index, changeset: changeset)\n end\nend\n```\nIn this example, we're using the `Ecto.Changeset` module to validate the form data and update the changeset with any validation errors or updates. If there are no validation errors, we can use the updated changeset to persist the data in our database or perform any other necessary actions. If there are validation errors, we render an error page with the updated changeset.\n\nOverall, using Phoenix LiveView forms provides a convenient and efficient way to handle form data on the server, while also providing a seamless user experience for your users.", 575 | "done":true, 576 | ... 577 | } 578 | 579 | ``` 580 |
581 | 582 | 583 | 584 | We check that `ollama` is running: 585 | 586 | ```console 587 | lsof -i -P | grep LISTEN | grep 11434 588 | ``` 589 | 590 | ## Generate a response via the LLama REST API and Elixir 591 | 592 | The Livebook runs a **POST** request with `Req` and pass a json to the `:json` key. 593 | 594 | As per the [documentation](https://hexdocs.pm/req/Req.Steps.html#encode_body/1-request-options), it does `Jason.encode_to_iodata(%{model: "codellama", "prompt": "..."})` and sets the adequate headers. 595 | 596 | Note that we need to increase the socket timeout above the default 5000. 597 | 598 | ```elixir 599 | defmodule LLM do 600 | def generate_response(prompt) do 601 | json = %{stream: false, model: "codellama", prompt: prompt} 602 | 603 | res = 604 | Req.post!( 605 | "http://localhost:11434/api/generate", 606 | json: json, 607 | receive_timeout: 120_000 608 | ) 609 | 610 | case res do 611 | %{status: 200, body: body} -> 612 | body["response"] 613 | _ -> 614 | IO.puts "error" 615 | end 616 | end 617 | end 618 | ``` 619 | 620 | ```elixir 621 | query 622 | ``` 623 | 624 | ```elixir 625 | RAG.PromptBuilder.build_prompt(context, query) 626 | |> LLM.generate_response() 627 | ``` 628 | 629 | ## Wrap up 630 | 631 | Seed the database with external sources 632 | 633 | 634 | 635 | ```elixir 636 | guides = [ 637 | "https://api.github.com/repos/phoenixframework/phoenix_live_view/contents/guides/server", 638 | "https://api.github.com/repos/phoenixframework/phoenix_live_view/contents/guides/client", 639 | "https://api.github.com/repos/phoenixframework/phoenix_live_view/contents/guides/introduction" 640 | ] 641 | 642 | RAG.ExternalSources.build(guides) 643 | ``` 644 | 645 | ```elixir 646 | defmodule RAG do 647 | def process_query(query) do 648 | top_k = 10 649 | 650 | query 651 | |> RAG.SemanticSearch.search(top_k) 652 | # top_results 653 | |> RAG.CrossEncoder.rerank(query) 654 | # context 655 | |> tap(&IO.puts/1) 656 | |> RAG.PromptBuilder.build_prompt(query) 657 | # prompt 658 | |> LLM.generate_response() 659 | end 660 | end 661 | 662 | query = "explain Javascript interoperability on the server-side" 663 | 664 | RAG.process_query(query) 665 | ``` 666 | 667 | ## Dimension reduction & visualization 668 | 669 | We will use the `scholar` librabry, 670 | 671 | ```elixir 672 | require Explorer.DataFrame, as: DF 673 | ``` 674 | --------------------------------------------------------------------------------