├── Export-d3adfe0f-3131-4bf3-8987-a52017fc1bae.zip
├── LICENSE
├── README.md
├── app.py
├── export_format.png
├── export_notion.png
├── ingest_data.py
├── query_data.py
├── requirements.txt
└── vectorstore.pkl
/Export-d3adfe0f-3131-4bf3-8987-a52017fc1bae.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hwchase17/chat-langchain-notion/33f9e63dd2c683beee47056b64cc2d98af8daf79/Export-d3adfe0f-3131-4bf3-8987-a52017fc1bae.zip
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 Harrison Chase
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Chat-LangChain-Notion
2 |
3 | Create a ChatGPT like experience over your Notion database using [LangChain](https://github.com/hwchase17/langchain).
4 |
5 |
6 | ## 📊 Example Data
7 | This repo uses the [Blendle Employee Handbook](https://www.notion.so/Blendle-s-Employee-Handbook-7692ffe24f07450785f093b94bbe1a09) as an example.
8 | It was downloaded October 18th so may have changed slightly since then!
9 |
10 | ## 🧑 Instructions for ingesting your own dataset
11 |
12 | Export your dataset from Notion. You can do this by clicking on the three dots in the upper right hand corner and then clicking `Export`.
13 |
14 |
15 |
16 | When exporting, make sure to select the `Markdown & CSV` format option.
17 |
18 |
19 |
20 | This will produce a `.zip` file in your Downloads folder. Move the `.zip` file into this repository.
21 |
22 | Run the following command to unzip the zip file (replace the `Export...` with your own file name as needed).
23 |
24 | ```shell
25 | unzip Export-d3adfe0f-3131-4bf3-8987-a52017fc1bae.zip -d Notion_DB
26 | ```
27 |
28 | ## Ingest data
29 |
30 | Therefor, the only thing that is needed is to be done to ingest data is run `python ingest_data.py`
31 |
32 | ## Query data
33 | Custom prompts are used to ground the answers in the Blendle Employee Handbook files.
34 |
35 | ## Running the Application
36 |
37 | By running `python app.py` from the command line you can easily interact with your ChatGPT over your own data.
38 |
--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
1 | import pickle
2 | from query_data import get_chain
3 |
4 |
5 | if __name__ == "__main__":
6 | with open("vectorstore.pkl", "rb") as f:
7 | vectorstore = pickle.load(f)
8 | qa_chain = get_chain(vectorstore)
9 | chat_history = []
10 | print("Chat with your docs!")
11 | while True:
12 | print("Human:")
13 | question = input()
14 | result = qa_chain({"question": question, "chat_history": chat_history})
15 | chat_history.append((question, result["answer"]))
16 | print("AI:")
17 | print(result["answer"])
18 |
--------------------------------------------------------------------------------
/export_format.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hwchase17/chat-langchain-notion/33f9e63dd2c683beee47056b64cc2d98af8daf79/export_format.png
--------------------------------------------------------------------------------
/export_notion.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hwchase17/chat-langchain-notion/33f9e63dd2c683beee47056b64cc2d98af8daf79/export_notion.png
--------------------------------------------------------------------------------
/ingest_data.py:
--------------------------------------------------------------------------------
1 | from langchain.text_splitter import RecursiveCharacterTextSplitter
2 | from langchain.document_loaders import NotionDirectoryLoader
3 | from langchain.vectorstores.faiss import FAISS
4 | from langchain.embeddings import OpenAIEmbeddings
5 | import pickle
6 |
7 | # Load Data
8 | loader = NotionDirectoryLoader("Notion_DB")
9 | raw_documents = loader.load()
10 |
11 | # Split text
12 | text_splitter = RecursiveCharacterTextSplitter()
13 | documents = text_splitter.split_documents(raw_documents)
14 |
15 |
16 | # Load Data to vectorstore
17 | embeddings = OpenAIEmbeddings()
18 | vectorstore = FAISS.from_documents(documents, embeddings)
19 |
20 |
21 | # Save vectorstore
22 | with open("vectorstore.pkl", "wb") as f:
23 | pickle.dump(vectorstore, f)
24 |
--------------------------------------------------------------------------------
/query_data.py:
--------------------------------------------------------------------------------
1 | from langchain.prompts.prompt import PromptTemplate
2 | from langchain.llms import OpenAI
3 | from langchain.chains import ChatVectorDBChain
4 |
5 | _template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
6 | You can assume the question about the Blendle Employee Handbook.
7 |
8 | Chat History:
9 | {chat_history}
10 | Follow Up Input: {question}
11 | Standalone question:"""
12 | CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)
13 |
14 | template = """You are an AI assistant for answering questions about the Blendle Employee Handbook.
15 | You are given the following extracted parts of a long document and a question. Provide a conversational answer.
16 | If you don't know the answer, just say "Hmm, I'm not sure." Don't try to make up an answer.
17 | If the question is not about the Blendle Employee Handbook, politely inform them that you are tuned to only answer questions about the Blendle Employee Handbook.
18 |
19 | Question: {question}
20 | =========
21 | {context}
22 | =========
23 | Answer in Markdown:"""
24 | QA_PROMPT = PromptTemplate(template=template, input_variables=["question", "context"])
25 |
26 |
27 | def get_chain(vectorstore):
28 | llm = OpenAI(temperature=0)
29 | qa_chain = ChatVectorDBChain.from_llm(
30 | llm,
31 | vectorstore,
32 | qa_prompt=QA_PROMPT,
33 | condense_question_prompt=CONDENSE_QUESTION_PROMPT,
34 | )
35 | return qa_chain
36 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | langchain
2 | openai
3 | unstructured
4 | faiss-cpu
5 |
--------------------------------------------------------------------------------
/vectorstore.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hwchase17/chat-langchain-notion/33f9e63dd2c683beee47056b64cc2d98af8daf79/vectorstore.pkl
--------------------------------------------------------------------------------