├── .gitattributes ├── .gitignore ├── LICENSE ├── README.md ├── config.yml ├── data └── invoice_1.pdf ├── docker-compose.yml ├── ingest.py ├── llm ├── __init__.py ├── llm.py ├── prompts.py └── wrapper.py ├── llmlayer.py ├── main.py ├── models └── model_download.txt ├── prompts-structured.txt ├── prompts.txt └── requirements.txt /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | share/python-wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | MANIFEST 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .nox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | *.py,cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | cover/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | .pybuilder/ 76 | target/ 77 | 78 | # Jupyter Notebook 79 | .ipynb_checkpoints 80 | 81 | # IPython 82 | profile_default/ 83 | ipython_config.py 84 | 85 | # pyenv 86 | # For a library or package, you might want to ignore these files since the code is 87 | # intended to run in multiple environments; otherwise, check them in: 88 | # .python-version 89 | 90 | # pipenv 91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 94 | # install all needed dependencies. 95 | #Pipfile.lock 96 | 97 | # poetry 98 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 99 | # This is especially recommended for binary packages to ensure reproducibility, and is more 100 | # commonly ignored for libraries. 101 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 102 | #poetry.lock 103 | 104 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 105 | __pypackages__/ 106 | 107 | # Celery stuff 108 | celerybeat-schedule 109 | celerybeat.pid 110 | 111 | # SageMath parsed files 112 | *.sage.py 113 | 114 | # Environments 115 | .env 116 | .venv 117 | env/ 118 | venv/ 119 | ENV/ 120 | env.bak/ 121 | venv.bak/ 122 | 123 | # Spyder project settings 124 | .spyderproject 125 | .spyproject 126 | 127 | # Rope project settings 128 | .ropeproject 129 | 130 | # mkdocs documentation 131 | /site 132 | 133 | # mypy 134 | .mypy_cache/ 135 | .dmypy.json 136 | dmypy.json 137 | 138 | # Pyre type checker 139 | .pyre/ 140 | 141 | # pytype static type analyzer 142 | .pytype/ 143 | 144 | # Cython debug symbols 145 | cython_debug/ 146 | 147 | # PyCharm 148 | # JetBrains specific template is maintainted in a separate JetBrains.gitignore that can 149 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 150 | # and can be added to the global gitignore or merged into this file. For a more nuclear 151 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 152 | .idea/ 153 | 154 | models/llama-2-13b-chat.Q5_K_M.gguf 155 | models/.DS_Store 156 | llm/.DS_Store 157 | .DS_Store 158 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Invoice data processing with Llama2 13B LLM RAG on Local CPU 2 | 3 | 4 | **Youtube**: Invoice Data Processing with Llama2 13B LLM RAG on Local CPU 5 | 6 | ___ 7 | 8 | ## Quickstart 9 | 10 | ### RAG runs on: LlamaCPP, Haystack, Weaviate 11 | 12 | 1. Download the Llama2 13B model, check models/model_download.txt for the download link. 13 | 2. Install Weaviate local DB with Docker 14 | 15 | `docker compose up -d` 16 | 17 | 3. Install the requirements: 18 | 19 | `pip install -r requirements.txt` 20 | 21 | 4. Copy text PDF files to the `data` folder. 22 | 5. Run the script, to convert text to vector embeddings and save in Weaviate vector storage: 23 | 24 | `python ingest.py` 25 | 26 | 6. Run the script, to process data with Llama2 13B LLM RAG and return the answer: 27 | 28 | `python main.py "What is the invoice number value?"` 29 | -------------------------------------------------------------------------------- /config.yml: -------------------------------------------------------------------------------- 1 | DATA_PATH: 'data/' 2 | EMBEDDINGS: 'sentence-transformers/all-MiniLM-L6-v2' 3 | WEAVIATE_HOST: 'http://localhost' 4 | WEAVIATE_PORT: 8080 5 | WEAVIATE_EMBEDDING_DIM: 384 6 | MODEL_BIN_PATH: 'models/llama-2-13b-chat.Q5_K_M.gguf' 7 | USE_GPU: False 8 | PRE_PROCESSOR_SPLIT_LENGTH: 1000 9 | PRE_PROCESSOR_SPLIT_OVERLAP: 0 10 | PROMPT_ANSWER_MAX_LENGTH_TOKENS: 1000 11 | MODEL_MAX_TOKEN_LIMIT: 1048 -------------------------------------------------------------------------------- /data/invoice_1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/katanaml/llm-rag-invoice-cpu/6b586baf34cb48694dcba97763cdd5634aa16433/data/invoice_1.pdf -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | --- 2 | version: '3.4' 3 | services: 4 | weaviate: 5 | command: 6 | - --host 7 | - 0.0.0.0 8 | - --port 9 | - '8080' 10 | - --scheme 11 | - http 12 | image: semitechnologies/weaviate:1.21.6 13 | ports: 14 | - 8080:8080 15 | volumes: 16 | - weaviate_data:/var/lib/weaviate 17 | restart: on-failure:0 18 | environment: 19 | QUERY_DEFAULTS_LIMIT: 25 20 | AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' 21 | PERSISTENCE_DATA_PATH: '/var/lib/weaviate' 22 | DEFAULT_VECTORIZER_MODULE: 'none' 23 | ENABLE_MODULES: 'text2vec-huggingface' 24 | CLUSTER_HOSTNAME: 'node1' 25 | volumes: 26 | weaviate_data: 27 | ... -------------------------------------------------------------------------------- /ingest.py: -------------------------------------------------------------------------------- 1 | from haystack.nodes import EmbeddingRetriever, PreProcessor 2 | from haystack.document_stores import WeaviateDocumentStore 3 | from haystack.preview.components.file_converters.pypdf import PyPDFToDocument 4 | import box 5 | import yaml 6 | import timeit 7 | import os 8 | 9 | 10 | # Import config vars 11 | with open('config.yml', 'r', encoding='utf8') as ymlfile: 12 | cfg = box.Box(yaml.safe_load(ymlfile)) 13 | 14 | 15 | def run_ingest(): 16 | file_list = [os.path.join(cfg.DATA_PATH, f) for f in os.listdir(cfg.DATA_PATH) if 17 | os.path.isfile(os.path.join(cfg.DATA_PATH, f)) and not f.startswith('.')] 18 | 19 | start = timeit.default_timer() 20 | 21 | vector_store = WeaviateDocumentStore(host=cfg.WEAVIATE_HOST, 22 | port=cfg.WEAVIATE_PORT, 23 | embedding_dim=cfg.WEAVIATE_EMBEDDING_DIM) 24 | 25 | converter = PyPDFToDocument() 26 | output = converter.run(paths=file_list) 27 | docs = output["documents"] 28 | 29 | final_doc = [] 30 | for doc in docs: 31 | new_doc = { 32 | 'content': doc.text, 33 | 'meta': doc.metadata 34 | } 35 | final_doc.append(new_doc) 36 | 37 | preprocessor = PreProcessor( 38 | clean_empty_lines=True, 39 | clean_whitespace=False, 40 | clean_header_footer=False, 41 | split_by="word", 42 | language="en", 43 | split_length=cfg.PRE_PROCESSOR_SPLIT_LENGTH, 44 | split_overlap=cfg.PRE_PROCESSOR_SPLIT_OVERLAP, 45 | split_respect_sentence_boundary=True, 46 | ) 47 | 48 | preprocessed_docs = preprocessor.process(final_doc) 49 | vector_store.write_documents(preprocessed_docs) 50 | 51 | retriever = EmbeddingRetriever( 52 | document_store=vector_store, 53 | embedding_model=cfg.EMBEDDINGS 54 | ) 55 | vector_store.update_embeddings(retriever) 56 | 57 | end = timeit.default_timer() 58 | print(f"Time to prepare embeddings: {end - start}") 59 | 60 | 61 | if __name__ == "__main__": 62 | run_ingest() -------------------------------------------------------------------------------- /llm/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/katanaml/llm-rag-invoice-cpu/6b586baf34cb48694dcba97763cdd5634aa16433/llm/__init__.py -------------------------------------------------------------------------------- /llm/llm.py: -------------------------------------------------------------------------------- 1 | from haystack.nodes import PromptModel 2 | from llmlayer import LlamaCPPInvocationLayer 3 | 4 | import box 5 | import yaml 6 | 7 | 8 | # Import config vars 9 | with open('config.yml', 'r', encoding='utf8') as ymlfile: 10 | cfg = box.Box(yaml.safe_load(ymlfile)) 11 | 12 | 13 | def setup_llm(): 14 | # max_length: The maximum number of tokens the output text generated by the model can have. 15 | return PromptModel( 16 | model_name_or_path=cfg.MODEL_BIN_PATH, 17 | invocation_layer_class=LlamaCPPInvocationLayer, 18 | use_gpu=cfg.USE_GPU, 19 | max_length=cfg.MODEL_MAX_TOKEN_LIMIT 20 | ) -------------------------------------------------------------------------------- /llm/prompts.py: -------------------------------------------------------------------------------- 1 | # Note: Precise formatting of spacing and indentation of the prompt template is important, 2 | # as it is highly sensitive to whitespace changes. For example, it could have problems generating 3 | # a summary from the pieces of context if the spacing is not done correctly 4 | 5 | prompt_template = """"Given the provided Documents, answer the Query. Make your answer short and concise. 6 | Query: {query} 7 | Documents: {join(documents)} 8 | Answer: 9 | """ -------------------------------------------------------------------------------- /llm/wrapper.py: -------------------------------------------------------------------------------- 1 | from haystack.document_stores import WeaviateDocumentStore 2 | from haystack.nodes import (AnswerParser, 3 | PromptTemplate, 4 | EmbeddingRetriever, 5 | PromptNode) 6 | from llm.prompts import prompt_template 7 | from llm.llm import setup_llm 8 | from haystack import Pipeline 9 | 10 | import box 11 | import yaml 12 | 13 | 14 | # Import config vars 15 | with open('config.yml', 'r', encoding='utf8') as ymlfile: 16 | cfg = box.Box(yaml.safe_load(ymlfile)) 17 | 18 | 19 | def setup_prompt(): 20 | return PromptTemplate(prompt=prompt_template, 21 | output_parser=AnswerParser()) 22 | 23 | 24 | def setup_retriever(model, prompt, document_store): 25 | # max_length: The maximum number of tokens the generated text output can have. 26 | prompt_node = PromptNode(model_name_or_path=model, 27 | max_length=cfg.PROMPT_ANSWER_MAX_LENGTH_TOKENS, 28 | use_gpu=cfg.USE_GPU, 29 | default_prompt_template=prompt) 30 | 31 | retriever = EmbeddingRetriever( 32 | document_store=document_store, 33 | embedding_model=cfg.EMBEDDINGS 34 | ) 35 | 36 | return prompt_node, retriever 37 | 38 | 39 | def setup_rag_pipeline(): 40 | document_store = WeaviateDocumentStore( 41 | host=cfg.WEAVIATE_HOST, 42 | port=cfg.WEAVIATE_PORT, 43 | embedding_dim=cfg.WEAVIATE_EMBEDDING_DIM 44 | ) 45 | 46 | prompt = setup_prompt() 47 | model = setup_llm() 48 | prompt_node, retriever = setup_retriever(model, prompt, document_store) 49 | 50 | rag_pipeline = Pipeline() 51 | rag_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"]) 52 | rag_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"]) 53 | 54 | return rag_pipeline 55 | -------------------------------------------------------------------------------- /llmlayer.py: -------------------------------------------------------------------------------- 1 | # Code from: https://medium.com/@fvanlitsenburg/building-a-private-gpt-with-haystack-part-3-using-llama-2-with-ggml-c2d994da40da 2 | 3 | from haystack.nodes import PromptModelInvocationLayer 4 | from llama_cpp import Llama 5 | import os 6 | from typing import Dict, List, Union, Optional 7 | 8 | import logging 9 | 10 | logger = logging.getLogger(__name__) 11 | 12 | 13 | class LlamaCPPInvocationLayer(PromptModelInvocationLayer): 14 | def __init__(self, model_name_or_path: Union[str, os.PathLike], 15 | max_length: Optional[int] = 128, 16 | max_context: Optional[int] = 2048, 17 | n_parts: Optional[int] = -1, 18 | seed: Optional[int] = 1337, 19 | f16_kv: Optional[bool] = True, 20 | logits_all: Optional[bool] = False, 21 | vocab_only: Optional[bool] = False, 22 | use_mmap: Optional[bool] = True, 23 | use_mlock: Optional[bool] = False, 24 | embedding: Optional[bool] = False, 25 | n_threads: Optional[int] = None, 26 | n_batch: Optional[int] = 512, 27 | last_n_tokens_size: Optional[int] = 64, 28 | lora_base: Optional[str] = None, 29 | lora_path: Optional[str] = None, 30 | verbose: Optional[bool] = True, 31 | **kwargs): 32 | 33 | """ 34 | Creates a new Llama CPP InvocationLayer instance. 35 | 36 | :param model_name_or_path: The name or path of the underlying model. 37 | :param kwargs: See `https://abetlen.github.io/llama-cpp-python/#llama_cpp.llama.Llama.__init__`. For max_length, we use the 128 'max_tokens' setting. 38 | """ 39 | if model_name_or_path is None or len(model_name_or_path) == 0: 40 | raise ValueError("model_name_or_path cannot be None or empty string") 41 | 42 | self.model_name_or_path = model_name_or_path 43 | self.max_context = max_context 44 | self.max_length = max_length 45 | self.n_parts = n_parts 46 | self.seed = seed 47 | self.f16_kv = f16_kv 48 | self.logits_all = logits_all 49 | self.vocab_only = vocab_only 50 | self.use_mmap = use_mmap 51 | self.use_mlock = use_mlock 52 | self.embedding = embedding 53 | self.n_threads = n_threads 54 | self.n_batch = n_batch 55 | self.last_n_tokens_size = last_n_tokens_size 56 | self.lora_base = lora_base 57 | self.lora_path = lora_path 58 | self.verbose = verbose 59 | self.model: Model = Llama(model_path=model_name_or_path, 60 | n_ctx=max_context, 61 | n_parts=n_parts, 62 | seed=seed, 63 | f16_kv=f16_kv, 64 | logits_all=logits_all, 65 | vocab_only=vocab_only, 66 | use_mmap=use_mmap, 67 | use_mlock=use_mlock, 68 | embedding=embedding, 69 | n_threads=n_threads, 70 | n_batch=n_batch, 71 | last_n_tokens_size=last_n_tokens_size, 72 | lora_base=lora_base, 73 | lora_path=lora_path, 74 | verbose=verbose) 75 | 76 | def _ensure_token_limit(self, prompt: Union[str, List[Dict[str, str]]]) -> Union[str, List[Dict[str, str]]]: 77 | """Ensure that length of the prompt and answer is within the maximum token length of the PromptModel. 78 | 79 | :param prompt: Prompt text to be sent to the generative model. 80 | """ 81 | if not isinstance(prompt, str): 82 | raise ValueError(f"Prompt must be of type str but got {type(prompt)}") 83 | 84 | context_length = self.model.n_ctx() 85 | tokenized_prompt = self.model.tokenize(bytes(prompt, 'utf-8')) 86 | if len(tokenized_prompt) + self.max_length > context_length: 87 | logger.warning( 88 | "The prompt has been truncated from %s tokens to %s tokens so that the prompt length and " 89 | "answer length (%s tokens) fit within the max token limit (%s tokens). " 90 | "Shorten the prompt to prevent it from being cut off", 91 | len(tokenized_prompt), 92 | max(0, context_length - self.max_length), 93 | self.max_length, 94 | context_length, 95 | ) 96 | return bytes.decode(self.model.detokenize(tokenized_prompt[:max(0, context_length - self.max_length)]), 97 | 'utf-8') 98 | 99 | return prompt 100 | 101 | def invoke(self, *args, **kwargs): 102 | """ 103 | It takes a prompt and returns a list of generated text using the underlying model. 104 | :return: A list of generated text. 105 | """ 106 | output: List[Dict[str, str]] = [] 107 | stream = kwargs.pop("stream", False) 108 | 109 | generated_texts = [] 110 | 111 | if kwargs and "prompt" in kwargs: 112 | prompt = kwargs.pop("prompt") 113 | 114 | # For more details refer to call documentation for Llama CPP https://abetlen.github.io/llama-cpp-python/#llama_cpp.llama.Llama.__call__ 115 | model_input_kwargs = { 116 | key: kwargs[key] 117 | for key in [ 118 | "suffix", 119 | "max_tokens", 120 | "temperature", 121 | "top_p", 122 | "logprobs", 123 | "echo", 124 | "repeat_penalty", 125 | "top_k", 126 | "stop" 127 | ] 128 | if key in kwargs 129 | } 130 | 131 | if stream: 132 | for token in self.model(prompt, stream=True, **model_input_kwargs): 133 | generated_texts.append(token['choices'][0]['text']) 134 | else: 135 | output = self.model(prompt, **model_input_kwargs) 136 | generated_texts = [o['text'] for o in output['choices']] 137 | return generated_texts 138 | 139 | def supports(cls, model_name_or_path: str, **kwargs) -> bool: 140 | """ 141 | Checks if the given model is supported by this invocation layer. 142 | 143 | :param model_name_or_path: The name or path of the model. 144 | :param kwargs: Additional keyword arguments passed to the underlying model which might be used to determine 145 | if the model is supported. 146 | :return: True if this invocation layer supports the model, False otherwise. 147 | """ 148 | # I guess there is not much to validate here ¯\_(ツ)_/¯ 149 | return model_name_or_path is not None and len(model_name_or_path) > 0 -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import timeit 2 | import argparse 3 | from llm.wrapper import setup_rag_pipeline 4 | 5 | 6 | if __name__ == "__main__": 7 | parser = argparse.ArgumentParser() 8 | parser.add_argument('input', 9 | type=str, 10 | default='What is the invoice number value?', 11 | help='Enter the query to pass into the LLM') 12 | args = parser.parse_args() 13 | 14 | start = timeit.default_timer() 15 | rag_pipeline = setup_rag_pipeline() 16 | 17 | json_response = rag_pipeline.run(query=args.input, params={"Retriever": {"top_k": 5}}) 18 | 19 | answers = json_response['answers'] 20 | answer = 'No answer found' 21 | for ans in answers: 22 | answer = ans.answer 23 | break 24 | 25 | end = timeit.default_timer() 26 | 27 | print(f'\nAnswer:\n {answer}') 28 | print('='*50) 29 | 30 | print(f"Time to retrieve answer: {end - start}") -------------------------------------------------------------------------------- /models/model_download.txt: -------------------------------------------------------------------------------- 1 | Download the quantized llama-2-13b-chat.Q5_K_M.gguf model from: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/tree/main -------------------------------------------------------------------------------- /prompts-structured.txt: -------------------------------------------------------------------------------- 1 | python main.py "retrieve invoice number in the format {\"invoice_number\": {}}" 2 | 3 | {"invoice_number": "61356291"} 4 | ================================================== 5 | Time to retrieve answer: 93.37903362209909 6 | 7 | 8 | python main.py "What is the invoice date value? use this format for the answer {\"invoice_date\": {}}" 9 | 10 | {"invoice_date": {"date": "09/06/2012"}} 11 | ================================================== 12 | Time to retrieve answer: 95.65435898804571 13 | 14 | 15 | python main.py "What is the invoice client name, address and tax ID? use this format for the answer {\"client_name\": {},\"address\": {},\"tax_id\": {}}" 16 | 17 | {"client_name": "Rodriguez-Stevens", "address": "2280 Angela Plain, Hortonshire, MS 93248", "tax_id": "939-98-8477"} 18 | ================================================== 19 | Time to retrieve answer: 125.74140086490661 20 | 21 | 22 | python main.py "What is the invoice seller name, address and tax ID? use this format for the answer {\"seller_name\": {},\"address\": {},\"tax_id\": {}}" 23 | 24 | {"seller_name": "Chapman, Kim and Green", "address": "64731 James Branch Smithmouth, NC 26872", "tax_id": "949-84-9105"} 25 | ================================================== 26 | Time to retrieve answer: 116.23752232501283 27 | 28 | 29 | python main.py "retrieve invoice IBAN in the format {\"invoice_iban\": {}}" 30 | 31 | {"invoice_iban": {"GB50ACIE59715038217063"}} 32 | ================================================== 33 | Time to retrieve answer: 98.67039507010486 34 | 35 | 36 | python main.py "retrieve two values: net price and gross worth for the second invoice item in this format: {\"net_price\": {},\"gross_worth\": {}}" 37 | 38 | No answer 39 | 40 | 41 | python main.py "retrieve gross worth value for each invoice item available in the table, in the format {\"gross_worth\": []}" 42 | 43 | No answer 44 | 45 | 46 | python main.py "What are the names of invoice items included into invoice? use this format for the answer {\"item_name\": []}" 47 | 48 | Answer: 49 | {"item_name": [ 50 | "Wine Glasses Goblets Pair Clear", 51 | "With Hooks Stemware Storage Multiple Uses Iron Wine Rack Hanging Glass", 52 | "Replacement Corkscrew Parts Spiral Worm Wine Opener Bottle Houdini", 53 | "HOME ESSENTIALS GRADIENT STEMLESS WINE GLASSES SET OF 4 20 FL OZ (591 ml) NEW"]} 54 | ================================================== 55 | Time to retrieve answer: 160.49235485703684 56 | 57 | 58 | python main.py "retrieve invoice total info. use this format for the answer {\"invoice_total\": {}}" 59 | 60 | Answer: 61 | {"invoice_total": {"net_worth": 192.81, "vat": 19.28, "gross_worth": 212.09}} 62 | ================================================== 63 | Time to retrieve answer: 108.48460242396686 64 | 65 | 66 | python main.py "retrieve three values: total gross worth, invoice number and invoice date. use this format for the response {\"total_gross_worth\": {}, \"invoice_number\": {}, \"invoice_date\": {}}" 67 | 68 | Answer: 69 | {"total_gross_worth": 212.09, "invoice_number": 61356291, "invoice_date": "09/06/2012"} 70 | ================================================== 71 | Time to retrieve answer: 111.77387699997053 -------------------------------------------------------------------------------- /prompts.txt: -------------------------------------------------------------------------------- 1 | python main.py "What is the invoice number value?" 2 | 3 | Answer: The invoice number value is 61356291. 4 | ================================================== 5 | Time to retrieve answer: 89.62815634801518 6 | 7 | 8 | python main.py "What is the invoice date value?" 9 | 10 | Answer: The invoice date value is 09/06/2012. 11 | ================================================== 12 | Time to retrieve answer: 96.43752192996908 13 | 14 | 15 | python main.py "What is the invoice client name, address and tax ID?" 16 | 17 | Answer: Invoice client name: Rodriguez-Stevens 18 | Address: 2280 Angela Plain, Hortonshire, MS 93248 19 | Tax ID: 939-98-8477 20 | ================================================== 21 | Time to retrieve answer: 124.1885514879832 22 | 23 | 24 | python main.py "What is the invoice seller name, address and tax ID?" 25 | 26 | Answer: Invoice seller name: Chapman, Kim and Green. 27 | Address: 64731 James Branch, Smithmouth, NC 26872. 28 | Tax ID: 949-84-9105. 29 | ================================================== 30 | Time to retrieve answer: 128.90529456199147 31 | 32 | 33 | python main.py "What is the invoice IBAN value?" 34 | 35 | Answer: The invoice IBAN value is GB50ACIE59715038217063. 36 | ================================================== 37 | Time to retrieve answer: 104.36387340398505 38 | 39 | 40 | python main.py "retrieve two values: net price and gross worth for the second invoice item" 41 | 42 | Answer: The net price of the second invoice item is $28.08 and the gross worth is $123.55. 43 | ================================================== 44 | Time to retrieve answer: 94.72781465109438 45 | 46 | 47 | python main.py "retrieve gross worth value for each invoice item available in the table" 48 | 49 | Answer: Gross worth value for each invoice item is as follows: 50 | Wine Glasses Goblets Pair Clear - $66.00 51 | With Hooks Stemware Storage - $123.55 52 | Replacement Corkscrew Parts - $8.25 53 | HOME ESSENTIALS GRADIENT STEMLESS WINE GLASSES SET OF 4 - $14.29 54 | ================================================== 55 | Time to retrieve answer: 155.1849697009893 56 | 57 | 58 | python main.py "What are the names of invoice items included into invoice?" 59 | 60 | Answer: The names of invoice items included into the invoice are: 61 | 62 | 1. Wine Glasses Goblets Pair Clear 63 | 2. With Hooks Stemware Storage Multiple Uses Iron Wine Rack Hanging Glass 64 | 3. Replacement Corkscrew Parts Spiral Worm Wine Opener Bottle Houdini 65 | 4. HOME ESSENTIALS GRADIENT STEMLESS WINE GLASSES SET OF 4 20 FL OZ (591 ml) NEW 66 | ================================================== 67 | Time to retrieve answer: 189.54536735790316 68 | 69 | 70 | python main.py "retrieve invoice gross worth total amount" 71 | 72 | Answer: The gross worth of the invoice is $212.09. 73 | ================================================== 74 | Time to retrieve answer: 99.25704580405727 75 | 76 | 77 | python main.py "retrieve three values: total gross worth, invoice number and invoice date" 78 | 79 | Answer: Total gross worth: $212,09 80 | Invoice number: 61356291 81 | Invoice date: 09/06/2012 82 | ================================================== 83 | Time to retrieve answer: 114.37888687697705 84 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | llama-cpp-python 2 | farm-haystack[weaviate] 3 | haystack-ai 4 | sentence_transformers 5 | pypdf 6 | python-box --------------------------------------------------------------------------------