├── Export-d3adfe0f-3131-4bf3-8987-a52017fc1bae.zip ├── LICENSE ├── README.md ├── app.py ├── export_format.png ├── export_notion.png ├── ingest_data.py ├── query_data.py ├── requirements.txt └── vectorstore.pkl /Export-d3adfe0f-3131-4bf3-8987-a52017fc1bae.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hwchase17/chat-langchain-notion/33f9e63dd2c683beee47056b64cc2d98af8daf79/Export-d3adfe0f-3131-4bf3-8987-a52017fc1bae.zip -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Harrison Chase 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Chat-LangChain-Notion 2 | 3 | Create a ChatGPT like experience over your Notion database using [LangChain](https://github.com/hwchase17/langchain). 4 | 5 | 6 | ## 📊 Example Data 7 | This repo uses the [Blendle Employee Handbook](https://www.notion.so/Blendle-s-Employee-Handbook-7692ffe24f07450785f093b94bbe1a09) as an example. 8 | It was downloaded October 18th so may have changed slightly since then! 9 | 10 | ## 🧑 Instructions for ingesting your own dataset 11 | 12 | Export your dataset from Notion. You can do this by clicking on the three dots in the upper right hand corner and then clicking `Export`. 13 | 14 | export 15 | 16 | When exporting, make sure to select the `Markdown & CSV` format option. 17 | 18 | export-format 19 | 20 | This will produce a `.zip` file in your Downloads folder. Move the `.zip` file into this repository. 21 | 22 | Run the following command to unzip the zip file (replace the `Export...` with your own file name as needed). 23 | 24 | ```shell 25 | unzip Export-d3adfe0f-3131-4bf3-8987-a52017fc1bae.zip -d Notion_DB 26 | ``` 27 | 28 | ## Ingest data 29 | 30 | Therefor, the only thing that is needed is to be done to ingest data is run `python ingest_data.py` 31 | 32 | ## Query data 33 | Custom prompts are used to ground the answers in the Blendle Employee Handbook files. 34 | 35 | ## Running the Application 36 | 37 | By running `python app.py` from the command line you can easily interact with your ChatGPT over your own data. 38 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | from query_data import get_chain 3 | 4 | 5 | if __name__ == "__main__": 6 | with open("vectorstore.pkl", "rb") as f: 7 | vectorstore = pickle.load(f) 8 | qa_chain = get_chain(vectorstore) 9 | chat_history = [] 10 | print("Chat with your docs!") 11 | while True: 12 | print("Human:") 13 | question = input() 14 | result = qa_chain({"question": question, "chat_history": chat_history}) 15 | chat_history.append((question, result["answer"])) 16 | print("AI:") 17 | print(result["answer"]) 18 | -------------------------------------------------------------------------------- /export_format.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hwchase17/chat-langchain-notion/33f9e63dd2c683beee47056b64cc2d98af8daf79/export_format.png -------------------------------------------------------------------------------- /export_notion.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hwchase17/chat-langchain-notion/33f9e63dd2c683beee47056b64cc2d98af8daf79/export_notion.png -------------------------------------------------------------------------------- /ingest_data.py: -------------------------------------------------------------------------------- 1 | from langchain.text_splitter import RecursiveCharacterTextSplitter 2 | from langchain.document_loaders import NotionDirectoryLoader 3 | from langchain.vectorstores.faiss import FAISS 4 | from langchain.embeddings import OpenAIEmbeddings 5 | import pickle 6 | 7 | # Load Data 8 | loader = NotionDirectoryLoader("Notion_DB") 9 | raw_documents = loader.load() 10 | 11 | # Split text 12 | text_splitter = RecursiveCharacterTextSplitter() 13 | documents = text_splitter.split_documents(raw_documents) 14 | 15 | 16 | # Load Data to vectorstore 17 | embeddings = OpenAIEmbeddings() 18 | vectorstore = FAISS.from_documents(documents, embeddings) 19 | 20 | 21 | # Save vectorstore 22 | with open("vectorstore.pkl", "wb") as f: 23 | pickle.dump(vectorstore, f) 24 | -------------------------------------------------------------------------------- /query_data.py: -------------------------------------------------------------------------------- 1 | from langchain.prompts.prompt import PromptTemplate 2 | from langchain.llms import OpenAI 3 | from langchain.chains import ChatVectorDBChain 4 | 5 | _template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question. 6 | You can assume the question about the Blendle Employee Handbook. 7 | 8 | Chat History: 9 | {chat_history} 10 | Follow Up Input: {question} 11 | Standalone question:""" 12 | CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template) 13 | 14 | template = """You are an AI assistant for answering questions about the Blendle Employee Handbook. 15 | You are given the following extracted parts of a long document and a question. Provide a conversational answer. 16 | If you don't know the answer, just say "Hmm, I'm not sure." Don't try to make up an answer. 17 | If the question is not about the Blendle Employee Handbook, politely inform them that you are tuned to only answer questions about the Blendle Employee Handbook. 18 | 19 | Question: {question} 20 | ========= 21 | {context} 22 | ========= 23 | Answer in Markdown:""" 24 | QA_PROMPT = PromptTemplate(template=template, input_variables=["question", "context"]) 25 | 26 | 27 | def get_chain(vectorstore): 28 | llm = OpenAI(temperature=0) 29 | qa_chain = ChatVectorDBChain.from_llm( 30 | llm, 31 | vectorstore, 32 | qa_prompt=QA_PROMPT, 33 | condense_question_prompt=CONDENSE_QUESTION_PROMPT, 34 | ) 35 | return qa_chain 36 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | langchain 2 | openai 3 | unstructured 4 | faiss-cpu 5 | -------------------------------------------------------------------------------- /vectorstore.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hwchase17/chat-langchain-notion/33f9e63dd2c683beee47056b64cc2d98af8daf79/vectorstore.pkl --------------------------------------------------------------------------------