├── delete_pinecone_index.py ├── query_only.py ├── README.zh.md ├── main.py └── README.md /delete_pinecone_index.py: -------------------------------------------------------------------------------- 1 | import pinecone 2 | import os 3 | import sys 4 | 5 | pinecone.init( 6 | api_key=os.environ["PINECONE_API_KEY"], 7 | environment="us-east1-gcp" 8 | ) 9 | 10 | index_name = sys.argv[1] 11 | 12 | if index_name in pinecone.list_indexes(): 13 | pinecone.delete_index(index_name) 14 | print(f"index: '{index_name}' successfully deleted") 15 | else: 16 | print(f"index: '{index_name}' not found in pinecone") -------------------------------------------------------------------------------- /query_only.py: -------------------------------------------------------------------------------- 1 | import pinecone 2 | from langchain.embeddings import OpenAIEmbeddings 3 | from langchain.prompts import PromptTemplate 4 | from langchain.llms import OpenAI 5 | from langchain.chains import LLMChain 6 | import os, sys, json 7 | 8 | # This file is a trimmed and slightly-altered version of main.py 9 | 10 | def pinecone_init(index_name: str = 'notion-database'): 11 | '''initialize connection to pinecone (get API key at app.pinecone.io)''' 12 | # index_name = 'notion-database' # assigned as the default value 13 | pinecone.init( 14 | api_key=os.environ["PINECONE_API_KEY"], 15 | environment="us-east1-gcp" 16 | ) 17 | 18 | # check if index already exists (it shouldn't if this is first time) 19 | if index_name in pinecone.list_indexes(): 20 | index = pinecone.Index(index_name) 21 | return index 22 | else: 23 | print(f"index {index_name} not found") 24 | quit() 25 | 26 | def get_docs(path: str = 'docs.json'): 27 | '''get indexed docs from docs.json, which is a memory file of main.py''' 28 | with open(path, 'r') as f: 29 | docs = json.load(f) 30 | return docs 31 | 32 | 33 | def pinecone_query(query: str = "who are you", docs=get_docs(), index=pinecone_init()): 34 | query_coord = OpenAIEmbeddings().embed_query(query) 35 | # retrieve from Pinecone 36 | query_res = index.query(query_coord, top_k=3, include_metadata=True) 37 | 38 | content_ids = [ 39 | int(x['id']) for x in query_res['matches'] 40 | ] 41 | contents = [docs[i] for i in content_ids] 42 | contents_str = "\n\n".join(contents) 43 | 44 | return contents_str 45 | 46 | 47 | def ask_gpt3(query:str ="who are you",contents_str=pinecone_query()): 48 | 49 | prompt = PromptTemplate( 50 | input_variables=["question","contents"], 51 | template=''' Answer this question: "{question}" using the contents below 52 | Contents: 53 | {contents} 54 | Answer: 55 | ''', 56 | ) 57 | 58 | chain = LLMChain( 59 | llm=OpenAI(temperature=0), 60 | prompt=prompt, 61 | # verbose=True, 62 | ) 63 | # print(prompt.format(question=query, contents=contents_str)) # for debugging purpose 64 | answer = chain.run( 65 | question=query, 66 | contents=contents_str, 67 | #verbose=True 68 | ) 69 | return answer 70 | 71 | def ans_cont_to_file(answer, contents_str): 72 | # This last set of code is to write the answer and contents to text files. 73 | with open ("answer.txt", "w") as f: 74 | f.write(answer) 75 | with open ("contents.txt", "w") as h: 76 | h.write(contents_str) 77 | 78 | def main(): 79 | try: 80 | query = sys.argv[1] 81 | except IndexError: 82 | query = input("ask a question: ") 83 | 84 | print("connecting to pinecone index...") 85 | index = pinecone_init("notion-database") 86 | print("getting docs") 87 | docs = get_docs() 88 | 89 | print("querying pinecone...") 90 | # query = input("ask a question") 91 | 92 | contents_str = pinecone_query(query, docs, index) 93 | print("querying gpt...") 94 | answer = ask_gpt3(query=query, contents_str=contents_str) 95 | 96 | # optimal, converts the answer and contents to text files 97 | ans_cont_to_file(answer, contents_str) 98 | 99 | print(f"done! the answer to '{query}' is: '{answer}'") 100 | 101 | 102 | if __name__ == "__main__": 103 | main() -------------------------------------------------------------------------------- /README.zh.md: -------------------------------------------------------------------------------- 1 | [![en](https://img.shields.io/badge/lang-en-red.svg)](https://github.com/madeyexz/markdown-file-query/blob/main/README.md) 2 | [![zh](https://img.shields.io/badge/lang-zh-blue.svg)](https://github.com/madeyexz/markdown-file-query/blob/main/README.zh.md) 3 | 4 | ## 概述 5 | 本项目 6 | - 使用[Pinecone](https://www.pinecone.io/)向量数据库以及OpenAI的embedding model将文字转变为向量。 7 | - 兼容任何`.md`类型文件,因此它完美兼容Notion和Obsidian(如果你用Notion的话得要手动输出成`.md`类型文件) 8 | - 是作者使用[费曼学习法](https://en.wikipedia.org/wiki/Learning_by_teaching)的一个案例 9 | - 可能是[llama_index](https://github.com/jerryjliu/llama_index#-dependencies)的一个弱化克隆。因此如果你想要一个撰写地更优美的文件问答程序,那么请参考llama_index。 10 | 11 | ### 实现原理 12 | 1. 对于每一个`.md`文件,它们会被`langchain.textsplitter`切分成许多小块。 13 | 2. 对于每一个小块,它们会被OpenAI的embedding model转换成向量(`langchain.embeddings.OpenAIEmbeddings`) 14 | 3. 接下来这些向量会被上传到`Pinecone`向量数据库。 15 | 4. 问题也会被转换成向量并上传到Pinecone。 16 | 5. 我们比较问题向量和数据库中的向量(使用余弦相似度)来检索结果。 17 | 6. 最相似的三个结果会被送入GPT-3,GPT-3会生成一个自然语言答案。 18 | 19 | ### 代办 20 | - [ ] 加一个 `--help` 选项 21 | - [ ] 部署到 Streamlit 上 22 | ## 开始 23 | 24 | ### 运行条件 25 | 1. 准备 Pinecone 和 OpenAI 的 API key 26 | - Pinecone API key 可以从[这里](https://app.pinecone.io/)获得。 27 | - OpenAI API key 可以从[这里](https://platform.openai.com/account/api-keys)获得。 28 | 2. 将 Pinecone 和 OpenAI 的 API key 导出到系统环境中 29 | ``` bash 30 | export PINECONE_API_KEY="your_pinecone_api_key" 31 | export OPENAI_API_KEY="your_openai_api_key" 32 | ``` 33 | 现在在 Python 中使用 34 | ``` python 35 | import os 36 | os.environ["PINECONE_API_KEY"] 37 | os.environ["OPENAI_API_KEY"] 38 | ``` 39 | 来检查你是否已经将它们导出到系统环境中,如果出现 `KeyError`,那么请重启终端(如果你在使用的话,还有你的IDE)。 40 | 41 | ### 安装 42 | 1. 将本项目克隆到你的本地 43 | ```bash 44 | git clone https://github.com/madeyexz/markdown-file-query.git 45 | ``` 46 | 2. 安装依赖项 47 | ``` bash 48 | pip install pinecone langchain tqdm 49 | ``` 50 | 51 | ### 使用 52 | 1. 准备好你的`.md`文件并将它们放在一个文件夹中(或者你可以自己取一个名字,但是你需要相应地修改代码)。注意这个文件夹应该和`main.py`在同一个目录下。 53 | 2. 如果这是你第一次查询某个文档,那么运行`main.py`程序 54 | ``` bash 55 | python3 main.py "文件夹的路径" "问题" 56 | ``` 57 | 3. 查询结果和GPT生成答案的参考文本会分别保存在`answer.txt`和`contents.txt`中。 58 | 4. 如果你想要再次查询同一批文档,那么运行`query_only.py`来避免重新嵌入文档。 59 | ``` bash 60 | python3 query_only.py "问题" 61 | ``` 62 | 63 | ### 使用实例 64 | 1. 我有一个文件夹叫做`markdown_database`,它包含了许多`.md`文件,我想要用问题"what's the strange situation"来查询这个数据库。 65 | ``` bash 66 | ❯ python3 main.py "markdown_database" "what's the strange situation" 67 | ``` 68 | ```text 69 | initiating pinecone index... 70 | digesting docs... 71 | uploading datas to pinecone... 72 | 92%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 60/65 [00:29<00:02, 1.87it/s] 73 | let's wait for 60 seconds to avoid RateLimitError... \(since im not a paid user\)) 74 | 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [01:00<00:00, 1.00s/it] 75 | 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 65/65 [01:32<00:00, 1.42s/it] 76 | querying pinecone... 77 | querying gpt... 78 | writing results to answer.txt and contents.txt 79 | done! the answer to 'what's the strange situation' is: ' 80 | The Strange Situation is a standardized procedure devised by Mary Ainsworth in the 1970s to observe attachment security in children within the context of caregiver relationships. It applies to infants between the age of nine and 18 months and involves a series of eight episodes lasting approximately 3 minutes each, whereby a mother, child and stranger are introduced, separated and reunited. The procedure is used to observe the quality of a young child’s attachment to his or her mother, and can also be applied to other attachment figures, such as God, through the use of Emotionally Focused Therapy (EFT) and religious beliefs, such as the saying “there are no atheists in foxholes”.' 81 | ``` 82 | 2. 如果我想要再次查询同一批文档,那么我可以使用`query_only.py`来避免重新嵌入文档。 83 | ``` bash 84 | ❯ python3 query_only.py "Who is Mary Ainsworth?" 85 | ``` 86 | ``` text 87 | connecting to pinecone index... 88 | getting docs 89 | querying pinecone... 90 | querying gpt... 91 | done! the answer to 'Who is Mary Ainsworth?' is: ' 92 | Mary Ainsworth was a developmental psychologist who devised the Strange Situation in the 1970s to observe attachment security in children within the context of caregiver relationships. The Strange Situation involves a series of eight episodes lasting approximately 3 minutes each, whereby a mother, child and stranger are introduced, separated and reunited. Ainsworth is also known for her observation that if you want to see the quality of a young child’s attachment to his or her mother, watch what the child does, not when Mother leaves, but when she returns. She is also known for her research on anxious babies and their inability to use their mothers as a secure base.' 93 | ``` 94 | ## 已知问题 95 | 1. 如果你使用了Pinecone,那么每当你想要查询一个新的文档(也就是创建一个新的数据库)时,你应该创建一个新的Pinecone索引(因为你不想要来自旧文档的答案),或者删除旧索引。这是因为Pinecone目前还不支持更新索引。 96 | 97 | 要删除旧索引: 98 | ``` bash 99 | python3 delete_pinecone_index.py NAME_OF_INDEX 100 | ``` 101 | ## 致谢 102 | 非常感谢开源社区提供的简单明了的例子和全面的教程! 103 | - [openai-cookbook: using vector database for embeddings search](https://github.com/openai/openai-cookbook/blob/main/examples/vector_databases/Using_vector_databases_for_embeddings_search.ipynb) 104 | - [Build a Personal Search Engine Web App using Open AI Text Embeddings - Avra](https://medium.com/@avra42/build-a-personal-search-engine-web-app-using-open-ai-text-embeddings-d6541f32892d) 105 | - this project is heavily inspired by [hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) 106 | - [Langchain](https://python.langchain.com/en/latest), a Python library for manipulating LLMs elegently. 107 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import os # to get system environment variables 2 | import sys # to parse system arguments 3 | import json # to dump and load data (lists) elegently 4 | import pinecone # vector database service, core of the VDB query 5 | from pathlib import Path # used to manipulate file paths elegently 6 | from tqdm.auto import tqdm # to show progress bar 7 | import time # to avoid RateLimitError 8 | from langchain.text_splitter import CharacterTextSplitter # to split texts 9 | from langchain.prompts import PromptTemplate # makes query easier 10 | from langchain.llms import OpenAI # to query LLMs 11 | from langchain.chains import LLMChain # makes query easier 12 | from langchain.embeddings import OpenAIEmbeddings # to turn texts into vectors 13 | 14 | def pinecone_init(index_name: str = 'notion-database'): 15 | '''initialize connection to pinecone (get API key at app.pinecone.io)''' 16 | # index_name = 'notion-database' # assigned as the default value 17 | pinecone.init( 18 | api_key=os.environ["PINECONE_API_KEY"], 19 | environment="us-east1-gcp" 20 | ) 21 | 22 | # check if index already exists (it shouldn't if this is first time) 23 | if index_name not in pinecone.list_indexes(): 24 | # if does not exist, create index 25 | pinecone.create_index( 26 | index_name, 27 | dimension=1536, 28 | metric='cosine', 29 | # metric='euclidean', 30 | metadata_config={'indexed': ['channel_id', 'published']} # useless code, guess why im not deleting this yet? 31 | ) 32 | # connect to index 33 | index = pinecone.Index(index_name) 34 | # view index status with 35 | # index.describe_index_stats() 36 | return index 37 | 38 | # an error was met and solved upon retring && upgrading jupyter notebook with `pip install notebook --upgrade` 39 | 40 | def md_digest(ps: list = list(Path("Notion_DB/").glob("**/*.md"))): 41 | '''This is the logic for ingesting Notion data into LangChain.''' 42 | 43 | # Here we load in the data in the format that Notion exports it in. 44 | data = [] 45 | sources = [] 46 | for p in ps: 47 | with open(p) as f: 48 | data.append(f.read()) 49 | sources.append(p) 50 | 51 | # We do this due to the context limits of the LLMs. 52 | # chunk size is 1000, which means each chunk of text will be 1000 characters long, and that the separator is a new line 53 | text_splitter = CharacterTextSplitter(chunk_size=1000, separator="\n") 54 | docs = [] 55 | metadatas = [] 56 | for i, d in enumerate(data): 57 | # where i, d is the index and content of each .md file respectively 58 | splits = text_splitter.split_text(d) 59 | docs.extend(splits) 60 | metadatas.extend([{"source": sources[i]}] * len(splits)) 61 | 62 | # after digestion, we save the docs to local json files for later queries to avoid re-encoding. 63 | with open('docs.json', 'w') as f: 64 | json.dump(docs, f) 65 | 66 | return docs 67 | # question, will the data be too big/unspecific for each chunk? 68 | # now len(docs) should be the number of vectors this is going to create 69 | 70 | def pinecone_upload(docs: list = md_digest(), index=pinecone_init()): 71 | '''This is the logic for uploading the data into Pinecone.''' 72 | # upload to pinecone 73 | 74 | id_batch = [str(x) for x in range(0, len(docs))] 75 | coord_list = [] 76 | 77 | for i in tqdm(range(0, len(docs))): 78 | # this line is added to avoid RateLimitError, where 60 second is a very random but conservative number. 79 | # a stupid approach by me :D 80 | rest = 60 81 | if i != 0 and i % 60 == 0: 82 | print(f"let's wait for {rest} seconds to avoid RateLimitError... \(since im not a paid user\))") 83 | for i in tqdm(range(0, rest)): 84 | time.sleep(1) 85 | 86 | # get texts to encode 87 | texts = docs[i] 88 | coord = OpenAIEmbeddings().embed_query(texts) 89 | coord_list.append(coord) 90 | 91 | # prepare and upload the vectors to Pinecone 92 | vectors = list(zip(id_batch, coord_list)) 93 | index.upsert(vectors) 94 | 95 | 96 | def pinecone_query(query: str = "who are you", docs=md_digest(), index=pinecone_init()): 97 | query_coord = OpenAIEmbeddings().embed_query(query) 98 | # retrieve from Pinecone 99 | # get relevant contexts (including the questions) 100 | query_res = index.query(query_coord, top_k=3, include_metadata=True) 101 | 102 | content_ids = [ 103 | int(x['id']) for x in query_res['matches'] 104 | ] 105 | contents = [docs[i] for i in content_ids] 106 | contents_str = "\n\n".join(contents) 107 | 108 | return contents_str 109 | 110 | 111 | def ask_gpt3(query:str ="who are you",contents_str=pinecone_query()): 112 | 113 | prompt = PromptTemplate( 114 | input_variables=["question","contents"], 115 | template=''' Answer this question: "{question}" using the contents below 116 | Contents: 117 | {contents} 118 | Answer: 119 | ''', 120 | ) 121 | 122 | chain = LLMChain( 123 | llm=OpenAI(temperature=0), 124 | prompt=prompt, 125 | # verbose=True, 126 | ) 127 | 128 | answer = chain.run(question=query,contents=contents_str) 129 | return answer 130 | 131 | def ans_cont_to_file(answer, contents_str): 132 | # This last set of code is to write the answer and contents to text files. 133 | with open ("answer.txt", "w") as f: 134 | f.write(answer) 135 | with open ("contents.txt", "w") as h: 136 | h.write(contents_str) 137 | 138 | def main(): 139 | print("initiating pinecone index...") 140 | index = pinecone_init("notion-database") 141 | directory, query = sys.argv[1], sys.argv[2] 142 | 143 | print("digesting docs...") 144 | docs = md_digest(list(Path(directory).glob("**/*.md"))) 145 | # docs = md_digest() 146 | 147 | print("uploading datas to pinecone...") 148 | pinecone_upload(docs, index) 149 | 150 | print("querying pinecone...") 151 | # query = input("ask a question") 152 | contents_str = pinecone_query(query, docs) 153 | 154 | print("querying gpt...") 155 | answer = ask_gpt3(query=query, contents_str=contents_str) 156 | 157 | # optimal, converts the answer and contents to text files 158 | print("writing results to answer.txt and contents.txt") 159 | ans_cont_to_file(answer, contents_str) 160 | 161 | print(f"done! the answer to '{query}' is: '{answer}'") 162 | 163 | if __name__ == "__main__": 164 | main() -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![en](https://img.shields.io/badge/lang-en-red.svg)](https://github.com/madeyexz/markdown-file-query/blob/main/README.md) 2 | [![zh](https://img.shields.io/badge/lang-zh-blue.svg)](https://github.com/madeyexz/markdown-file-query/blob/main/README.zh.md) 3 | 4 | > *This project currently works best with English documents.* 5 | 6 | ## About This Project 7 | this project 8 | - utilizes [Pinecone](https://www.pinecone.io/) vector database (VDB) and OpenAI (vector) embedding model to turn texts into vectors. 9 | - works with any `.md` file, so it works perfectly with Notion & Obsidian (though for Notion you have to export it to `.md` manually first) 10 | - is the author's practice of [Feynman technique](https://en.wikipedia.org/wiki/Learning_by_teaching). 11 | - is probably a weaker duplicate of [privateGPT](https://github.com/imartinez/privateGPT) and [llama_index](https://github.com/jerryjliu/llama_index#-dependencies), if you want a beautifully-crafted document query program, you should use llama_index instead of this toy. 12 | 13 | ### Walkthrough of this Program 14 | 1. Each markdown file in the target directory is cut into lots of small chunks using `langchain.textsplitter` 15 | 2. Each chunck is turned into a vector via OpenAI's embedding model (`langchain.embeddings.OpenAIEmbeddings`) 16 | 3. The vectors are then uploaded to `Pinecone` vector database. 17 | 4. Queries are also converted to vectors using the vector embedding model and uploaded to Pinecone. 18 | 5. To retrieve search results, we compare the query vector with vector database using Pinecone (by cosine similarity). 19 | 6. Closest 3 results are retrieved and fed into GPT-3 along with the question, and GPT-3 will generate an answer in natural language. 20 | 21 | ### TODO 22 | - [ ] add a `--help` option 23 | - [ ] deploy to Streamlit 24 | ## Getting Started 25 | 26 | ### Prerequisites 27 | 1. Prepare Pinecone and OpenAI API key: 28 | - Pinecone API key can be obtained [here](https://app.pinecone.io/). 29 | - OpenAI API key can be obtained [here](https://platform.openai.com/account/api-keys). 30 | 2. To export the Pinecone and OpenAI API key to system environment 31 | ``` bash 32 | export PINECONE_API_KEY="your_pinecone_api_key" 33 | export OPENAI_API_KEY="your_openai_api_key" 34 | ``` 35 | now in Python use 36 | ``` python 37 | import os 38 | os.environ["PINECONE_API_KEY"] 39 | os.environ["OPENAI_API_KEY"] 40 | ``` 41 | to check if you have them exported to system environment, if `KeyError`, then restart the terminal upon completion (and your IDE if you are using one). 42 | ### Installation 43 | 1. clone this repo to your local machine 44 | ```bash 45 | git clone https://github.com/madeyexz/markdown-file-query.git 46 | ``` 47 | 2. Install the dependencies 48 | ``` bash 49 | pip install pinecone langchain tqdm 50 | ``` 51 | 52 | ### Usage 53 | 1. Prepare the markdown file(s) and put them in a `FOLDER` (or any name you like, but you have to change the code accordingly). Notice this should be in the same directory as `main.py`. 54 | 2. If this is your first time querying a certain document, run the `main.py` program 55 | ``` bash 56 | python3 main.py "PATH_OF_FOLDER" "QUESTION" 57 | ``` 58 | 3. The query results and the reference GPT used to generate the answer will be saved in `answer.txt` and `contents.txt` respectively. 59 | 4. If you want to query the same batch of documents again, then run the `query_only.py` to avoid re-embedding the documents. 60 | ``` bash 61 | python3 query_only.py "QUESTION" 62 | ``` 63 | 64 | ### Example 65 | 1. I have a folder called `markdown_database` which contains a bunch of `.md` files, I want to query this database with the question "Whats the strange situation" 66 | ``` bash 67 | ❯ python3 main.py "markdown_database" "what's the strange situation" 68 | ``` 69 | ```text 70 | initiating pinecone index... 71 | digesting docs... 72 | uploading datas to pinecone... 73 | 92%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 60/65 [00:29<00:02, 1.87it/s] 74 | let's wait for 60 seconds to avoid RateLimitError... \(since im not a paid user\)) 75 | 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [01:00<00:00, 1.00s/it] 76 | 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 65/65 [01:32<00:00, 1.42s/it] 77 | querying pinecone... 78 | querying gpt... 79 | writing results to answer.txt and contents.txt 80 | done! the answer to 'what's the strange situation' is: ' 81 | The Strange Situation is a standardized procedure devised by Mary Ainsworth in the 1970s to observe attachment security in children within the context of caregiver relationships. It applies to infants between the age of nine and 18 months and involves a series of eight episodes lasting approximately 3 minutes each, whereby a mother, child and stranger are introduced, separated and reunited. The procedure is used to observe the quality of a young child’s attachment to his or her mother, and can also be applied to other attachment figures, such as God, through the use of Emotionally Focused Therapy (EFT) and religious beliefs, such as the saying “there are no atheists in foxholes”.' 82 | ``` 83 | 2. If I want to query the same database again, I can use `query_only.py` to avoid re-embedding the documents. 84 | ``` bash 85 | ❯ python3 query_only.py "Who is Mary Ainsworth?" 86 | ``` 87 | ``` text 88 | connecting to pinecone index... 89 | getting docs 90 | querying pinecone... 91 | querying gpt... 92 | done! the answer to 'Who is Mary Ainsworth?' is: ' 93 | Mary Ainsworth was a developmental psychologist who devised the Strange Situation in the 1970s to observe attachment security in children within the context of caregiver relationships. The Strange Situation involves a series of eight episodes lasting approximately 3 minutes each, whereby a mother, child and stranger are introduced, separated and reunited. Ainsworth is also known for her observation that if you want to see the quality of a young child’s attachment to his or her mother, watch what the child does, not when Mother leaves, but when she returns. She is also known for her research on anxious babies and their inability to use their mothers as a secure base.' 94 | ``` 95 | ## Known Limitation 96 | 1. If you use Pinecone, then whenever you want to query a new document (i.e. creating a new database), you should probably create a new Pinecone index (for you don't want answers from the old document), or delete the old index. This is because Pinecone does not support updating the index (yet). 97 | 98 | To delete the old index: 99 | ``` bash 100 | python3 delete_pinecone_index.py NAME_OF_INDEX 101 | ``` 102 | ## Acknowledgements 103 | Huge shout out to the open-source community for providing straight-forward examples and comprehensive tutorials! 104 | - [openai-cookbook: using vector database for embeddings search](https://github.com/openai/openai-cookbook/blob/main/examples/vector_databases/Using_vector_databases_for_embeddings_search.ipynb) 105 | - [Build a Personal Search Engine Web App using Open AI Text Embeddings - Avra](https://medium.com/@avra42/build-a-personal-search-engine-web-app-using-open-ai-text-embeddings-d6541f32892d) 106 | - this project is heavily inspired by [hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) 107 | - [Langchain](https://python.langchain.com/en/latest), a Python library for manipulating LLMs elegently. 108 | --------------------------------------------------------------------------------