├── .env.sample ├── requirements.txt ├── README.md ├── templates └── index.html ├── .gitignore └── main.py /.env.sample: -------------------------------------------------------------------------------- 1 | OPENAI_API_KEY='' 2 | PINECONE_API_KEY='' 3 | PINECONE_ENV='' 4 | EMBEDDING_ID='' 5 | SECRET_KEY='' -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | flask 2 | python-dotenv 3 | openai 4 | pinecone-client 5 | langchain 6 | git+https://github.com/embedstore/embedstore.git 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # langchain-pinecone-chat-bot 2 | 3 | This repo is a fully functional Flask app that can be used to create a chatbot app like BibleGPT, KrishnaGPT or Chat app over any other data source. 4 | 5 | It uses the following 6 | 7 | * Ready made embeddings from [embedstore.ai](https://embedstore.ai) ([python package](https://github.com/embedstore/embedstore)). These text is chunked using LangChain's RecursiveCharacterTextSplitter with chunk_size as 1000, chunk_overlap as 100 and length_function as len. OpenAI embeddings (dimension 1536) are then used to calculate embeddings for each chunk. 8 | * It loads the embeddings and then indexes them into a Pinecone index. 9 | * Now whenever a user query is received, it first creates embedding for it using OpenAI embeddings. Then it search for the nearest 3 neighbour using cosine similarity in Pinecone index. 10 | * Now these documents are passed as context to ChatGPT API with the below prompt and temperature as 0 and max_tokens as 800 11 | 12 | ``` 13 | You are given a paragraph and a query. You need to answer the query on the basis of paragraph. If the answer is not contained within the text below, say "Sorry, I don't know. Please try again." 14 | 15 | P: {documents} 16 | Q: {query} 17 | A: 18 | ``` 19 | * Answer retrieved from ChatGPT API is shown to the user. 20 | 21 | # Setup Instructions 22 | 23 | * First make sure that you python3, python-virtualenv and setup tool installed on the system. 24 | * Next clone the repo 25 | 26 | ``` 27 | git clone https://github.com/embedstore/langchain-pinecone-chat-bot.git 28 | ``` 29 | 30 | * Now create a virtual environment and install all the dependencies 31 | 32 | ``` 33 | cd langchain-pinecone-chat-bot 34 | virtualenv -p $(which python3) pyenv 35 | source pyenv/bin/activate 36 | 37 | pip install -r requirements.txt 38 | ``` 39 | 40 | * Now copy the env file and fill in your variables 41 | 42 | ``` 43 | cp .env.sample .env 44 | ``` 45 | 46 | * The `EMBEDDING_ID` variable is the id of the embedding which you can get from [embedstore.ai](https://embedstore.ai) 47 | 48 | * Now run the app 49 | 50 | ``` 51 | python main.py 52 | ``` 53 | 54 | * You can also create a repl and import the repo and run it easily over there. 55 | 56 | # License 57 | 58 | * MIT License -------------------------------------------------------------------------------- /templates/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 |
4 | 5 | 6 |{{ result.response }}
25 || # | 30 |Document | 31 |
|---|---|
| {{ loop.index }} | 37 |{{ document }} | 38 |