├── src
    ├── .env
    ├── __pycache__
    │   ├── llm_connection.cpython-310.pyc
    │   ├── prompt_templet.cpython-310.pyc
    │   └── vector_database_loader.cpython-310.pyc
    ├── llm_connection.py
    ├── prompt_templet.py
    └── vector_database_loader.py
├── Dockerfile
├── data
    └── question-answer.csv
├── screenshots
    ├── screenshot1.png
    └── screenshot2.png
├── backend
    ├── vector_database_file
    │   ├── index.pkl
    │   └── index.faiss
    ├── __pycache__
    │   └── new_chain.cpython-310.pyc
    └── new_chain.py
├── requirements.txt
├── app.py
└── README.md


/src/.env:
--------------------------------------------------------------------------------
1 | GOOGLE_API_KEY = "AIzaSyCltDof1MUFxBlsluhjHywT8KABr-0Oc9s"


--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM python:3.10
2 | COPY . /app
3 | WORKDIR /app
4 | RUN pip install -r requirements.txt
5 | EXPOSE $PORT
6 | CMD streamlit run app.py


--------------------------------------------------------------------------------
/data/question-answer.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/abhi227070/Custom-Question-Answering-Chatbot-using-Langchain-and-Gemini-AI/HEAD/data/question-answer.csv


--------------------------------------------------------------------------------
/screenshots/screenshot1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/abhi227070/Custom-Question-Answering-Chatbot-using-Langchain-and-Gemini-AI/HEAD/screenshots/screenshot1.png


--------------------------------------------------------------------------------
/screenshots/screenshot2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/abhi227070/Custom-Question-Answering-Chatbot-using-Langchain-and-Gemini-AI/HEAD/screenshots/screenshot2.png


--------------------------------------------------------------------------------
/backend/vector_database_file/index.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/abhi227070/Custom-Question-Answering-Chatbot-using-Langchain-and-Gemini-AI/HEAD/backend/vector_database_file/index.pkl


--------------------------------------------------------------------------------
/backend/vector_database_file/index.faiss:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/abhi227070/Custom-Question-Answering-Chatbot-using-Langchain-and-Gemini-AI/HEAD/backend/vector_database_file/index.faiss


--------------------------------------------------------------------------------
/backend/__pycache__/new_chain.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/abhi227070/Custom-Question-Answering-Chatbot-using-Langchain-and-Gemini-AI/HEAD/backend/__pycache__/new_chain.cpython-310.pyc


--------------------------------------------------------------------------------
/src/__pycache__/llm_connection.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/abhi227070/Custom-Question-Answering-Chatbot-using-Langchain-and-Gemini-AI/HEAD/src/__pycache__/llm_connection.cpython-310.pyc


--------------------------------------------------------------------------------
/src/__pycache__/prompt_templet.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/abhi227070/Custom-Question-Answering-Chatbot-using-Langchain-and-Gemini-AI/HEAD/src/__pycache__/prompt_templet.cpython-310.pyc


--------------------------------------------------------------------------------
/src/__pycache__/vector_database_loader.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/abhi227070/Custom-Question-Answering-Chatbot-using-Langchain-and-Gemini-AI/HEAD/src/__pycache__/vector_database_loader.cpython-310.pyc


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | transformers
 2 | langchain
 3 | langchain-core 
 4 | langchain_google_genai
 5 | sentence-transformers==2.2.2
 6 | InstructorEmbedding
 7 | faiss-cpu
 8 | langchain_community
 9 | streamlit
10 | python-dotenv
11 | 


--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
 1 | import streamlit as st
 2 | from backend.new_chain import get_chain
 3 | 
 4 | chain = None
 5 | 
 6 | if chain == None:
 7 |     chain = get_chain()
 8 | 
 9 | st.title("Custom chatbot")
10 | 
11 | question = st.text_area("Enter your question here: ")
12 | 
13 | if st.button("Submit"):
14 |     response = chain.invoke(question)
15 |     
16 |     st.write(response['result'])
17 | 
18 | 


--------------------------------------------------------------------------------
/src/llm_connection.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from dotenv import load_dotenv
 3 | from langchain_google_genai import ChatGoogleGenerativeAI
 4 | 
 5 | def get_llm():
 6 |     
 7 |     """
 8 |     Call the LLM using API key. Here we are using Google Gemini Pro model
 9 |     
10 |     Args: None
11 |     
12 |     Return: LLM model
13 |     
14 |     """
15 |     
16 |     load_dotenv()
17 |     llm = ChatGoogleGenerativeAI(model="gemini-pro",google_api_key = os.environ["GOOGLE_API_KEY"],temperature = 0.1)
18 |     return llm


--------------------------------------------------------------------------------
/src/prompt_templet.py:
--------------------------------------------------------------------------------
 1 | from langchain.prompts import PromptTemplate
 2 | 
 3 | 
 4 | 
 5 | def get_prompt():
 6 |     
 7 |     prompt_templet = """
 8 |     Given the following context and a question, generate an answer based on the context only.
 9 |     In the answer try to provide as much text as possible from "response" section in the source document.
10 |     If the answer is not found in the context, kindly state that "I don't know.". Don't try to make up an answer.
11 | 
12 |     CONTEXT: {context}
13 | 
14 |     QUESTION: {question}
15 | 
16 |     """
17 | 
18 |     prompt = PromptTemplate(
19 |     template = prompt_templet,
20 |     input_variables = ["context","question"]
21 |     )
22 |     
23 |     return prompt


--------------------------------------------------------------------------------
/backend/new_chain.py:
--------------------------------------------------------------------------------
 1 | from langchain_community.vectorstores import FAISS 
 2 | from langchain.chains import RetrievalQA
 3 | from src.llm_connection import get_llm
 4 | from src.prompt_templet import get_prompt
 5 | from src.vector_database_loader import vector_database,get_embedding
 6 | 
 7 | file = "vector_database_file"
 8 | 
 9 | def get_chain():
10 |     
11 |     llm = get_llm()
12 |     embeddings = get_embedding()
13 |     vectordb = FAISS.load_local(file,embeddings,allow_dangerous_deserialization=True)
14 |     retriever = vectordb.as_retriever()
15 |     prompt = get_prompt()
16 |     
17 |     chain = RetrievalQA.from_chain_type(
18 |         
19 |         llm = llm,
20 |         chain_type = "stuff",
21 |         retriever = retriever,
22 |         input_key = "query",
23 |         return_source_documents = True,
24 |         chain_type_kwargs = {"prompt":prompt}
25 |     
26 |     )
27 |     
28 |     return chain
29 | 


--------------------------------------------------------------------------------
/src/vector_database_loader.py:
--------------------------------------------------------------------------------
 1 | from langchain_community.embeddings import HuggingFaceInstructEmbeddings
 2 | from langchain_community.vectorstores import FAISS
 3 | from langchain.document_loaders.csv_loader import CSVLoader
 4 | 
 5 | 
 6 | file_location = "vector_database_file"
 7 | data_location = "Custom Question-Answering Chatbot using Google Gemini and Langchain\data\question-answer.csv"
 8 | 
 9 | def vector_database():
10 |     
11 |     loader = CSVLoader(file_path=data_location,source_column="prompt",encoding='cp1252')
12 |     data = loader.load()
13 |     embeddings = HuggingFaceInstructEmbeddings()
14 |     vectordb = FAISS.from_documents(documents = data, embedding = embeddings)
15 |     vectordb.save_local(file_location)
16 |     
17 |     return embeddings
18 | 
19 | def get_embedding():
20 |     embeddings = HuggingFaceInstructEmbeddings()
21 |     return embeddings
22 |     
23 | if __name__ == "__main__":
24 |     vector_database()


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Custom Question Answering Chatbot using Langchain and Gemini LLM
 2 | 
 3 | - [Introduction](#introduction)
 4 | - [Project Overview](#project-overview)
 5 | - [Tools Used](#tools-used)
 6 | - [Getting Started](#getting-started)
 7 | - [Screenshot](#screenshot)
 8 | - [Use Case](#use-case)
 9 | - [Future Improvements](#future-improvements)
10 | - [Contributors](#contributors)
11 | - [License](#license)
12 | 
13 | ## Introduction
14 | 
15 | This project implements a custom question answering chatbot using Langchain and Google Gemini Language Model (LLM). The chatbot is trained on industrial data from an online learning platform, consisting of questions and corresponding answers.
16 | 
17 | ## Project Overview
18 | 
19 | The project workflow involves the following steps:
20 | 
21 | 1. **Data Fine-Tuning**: The Google Gemini LLM is fine-tuned with the industrial data, ensuring that the model can accurately answer questions based on the provided context.
22 | 
23 | 2. **Embedding and Vector Database**: HuggingFace sentence embedding is utilized to convert questions and answers into vectors, which are stored in a vector database.
24 | 
25 | 3. **Retriever Implementation**: A retriever component is developed to retrieve similar-looking vectors from the vector database based on the user's query.
26 | 
27 | 4. **Integration with Langchain RetrivalQA Chain**: The components are integrated into a chain using Langchain RetrivalQA chain, which processes incoming queries and retrieves relevant answers.
28 | 
29 | 5. **User Interface**: Streamlit is used to create a simple user interface, allowing users to input their questions and receive answers from the chatbot.
30 | 
31 | ## Tools Used
32 | 
33 | - [Google Gemini LLM](https://link-to-gemini-llm): Language model fine-tuned with industrial data.
34 | - [HuggingFace](https://link-to-huggingface): Library used for sentence embedding.
35 | - [Langchain](https://link-to-langchain): Framework for building conversational AI systems.
36 | - [Streamlit](https://link-to-streamlit): Library for building web-based user interfaces.
37 | 
38 | ## Getting Started
39 | 
40 | To run the project locally, follow these steps:
41 | 
42 | 1. Clone the repository to your local machine.
43 | 2. Install the necessary dependencies listed in the `requirements.txt` file.
44 | 3. Run the Streamlit application by executing `streamlit run app.py` in your terminal.
45 | 
46 | ## Screenshot
47 | 
48 | ![Screenshot1](screenshots/screenshot1.png)
49 | ![Screenshot1](screenshots/screenshot2.png)
50 | 
51 | ## Use Case
52 | 
53 | The custom question answering chatbot serves various purposes, including:
54 | 
55 | - Providing quick and accurate responses to user queries related to the topic covered by the industrial data.
56 | - Enhancing user experience on online learning platforms by offering immediate assistance.
57 | - Streamlining customer support processes by automating responses to frequently asked questions.
58 | 
59 | ## Future Improvements
60 | 
61 | - Incorporate additional pre-processing techniques to handle a wider range of user queries.
62 | - Implement advanced language models for more accurate responses.
63 | - Enhance the user interface with additional features for a better user experience.
64 | 
65 | ## Contributors
66 | 
67 | - [Abhijit Maharana](https://www.linkedin.com/in/abhijitmaharana/)
68 | 
69 | ## License
70 | 
71 | This project is licensed under the [MIT License](link-to-license-file).
72 | 


--------------------------------------------------------------------------------