├── requirements.txt
├── .streamlit
    ├── secrets.toml
    └── config.toml
├── README.md
└── app.py


/requirements.txt:
--------------------------------------------------------------------------------
1 | python==3.9
2 | streamlit
3 | replicate
4 | PyPDF2


--------------------------------------------------------------------------------
/.streamlit/secrets.toml:
--------------------------------------------------------------------------------
1 | REPLICATE_API_TOKEN = "INSERT_YOUR_REPLICATE_API_TOKEN_HERE"


--------------------------------------------------------------------------------
/.streamlit/config.toml:
--------------------------------------------------------------------------------
1 | [theme]
2 | primaryColor="#BB86FC"
3 | backgroundColor="#121212"
4 | secondaryBackgroundColor="#1F1B24"
5 | textColor="#FFFFFF"
6 | font="sans serif"
7 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # PDF Summarizer and Chatbot using LLaMa2 in Streamlit
 2 | ![Screenshot (481)](https://github.com/user-attachments/assets/f5c8a6d0-e9db-4113-a6d2-a44e478b0845)
 3 | 
 4 | ## Project Overview
 5 | The PDF Summarizer Chatbot is a user-friendly application that allows you to upload PDF documents and receive concise summaries generated using advanced Large Language Models (LLMs). This project leverages the power of Natural Language Processing (NLP) to extract meaningful insights from textual data, making document analysis faster and more efficient.
 6 | 
 7 | ## Project Purpose
 8 | This project can be a starting point for beginners who want to learn about LLMs. I use **Replicate**, which provides **free** cloud API services with open-source models like **LLaMa2**. The open-source Python framework **Streamlit** is used to deploy the model into an interactive web app. Overall, this project was made as simple as possible to help your understanding of the implementation of the LLMs project.
 9 | 
10 | ## Features
11 | - **PDF Parsing**: Extract text from PDF files using PyPDF2.
12 | - **AI-Powered Summarization**: Summaries are generated using the Llama 2 model, renowned for its state-of-the-art performance in NLP tasks.
13 | - **Interactive User Interface**: Built with Streamlit, providing an intuitive platform for users to upload files and receive outputs.
14 | - **Themes**: Support light and dark themes for user convenience.
15 | - **API Integration**: Utilizes the Replicate API for seamless communication with the LLM backend.
16 | 
17 | ## Getting Started
18 | 1. Clone the repository
19 |    ```
20 |    git clone https://github.com/0xichikawa/PDF-summarizer-chatbot-using-LLaMa2
21 |    cd PDF-summarizer-chatbot-using-LLaMa2 
22 | 2.  Install dependencies
23 |    ```
24 |    pip install -r requirements.txt
25 |    ```
26 | 3. Set Up Replicate API Key
27 |    Obtain your Replicate API key from [replicate.com](replicate.com) and add it to `secrets.toml` file in the `.streamlit` folder:
28 |    ```
29 |    REPLICATE_API_TOKEN = "INSERT_YOUR_REPLICATE_API_TOKEN_HERE"
30 | 4. Run the application
31 |    ```
32 |    streamlit run app.py  
33 | 
34 | ## Future Enhancements
35 | - Multi-language support for summarization.
36 | - Enhanced text extraction with OCR for scanned PDFs.
37 | - Options for customized summary lengths and formats.
38 | 
39 | ## Acknowledgments
40 | - Meta AI for Llama 2.
41 | - Replicate for their API services.
42 | - Streamlit and PyPDF2 for simplifying the development process.
43 | - Data Professor (https://github.com/dataprofessor) for tutorials and project inspiration
44 | 


--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
  1 | import streamlit as st
  2 | import replicate
  3 | import PyPDF2
  4 | import os
  5 | 
  6 | st.set_page_config(page_title="🖊️PDF Summarizer Chatbot")
  7 | 
  8 | # Toggle light and dark mode themes
  9 | ms = st.session_state
 10 | if "themes" not in ms: 
 11 |     ms.themes = {"current_theme": "light",
 12 |                  "refreshed": True,
 13 |                     
 14 |                 "light": {"theme.base": "dark",
 15 |                           "theme.backgroundColor": "#FFFFFF",
 16 |                           "theme.primaryColor": "#6200EE",
 17 |                           "theme.secondaryBackgroundColor": "#F5F5F5",
 18 |                           "theme.textColor": "000000",
 19 |                           "button_face": "🌜"},
 20 | 
 21 |                 "dark":  {"theme.base": "light",
 22 |                           "theme.backgroundColor": "#121212",
 23 |                           "theme.primaryColor": "#BB86FC",
 24 |                           "theme.secondaryBackgroundColor": "#1F1B24",
 25 |                           "theme.textColor": "#E0E0E0",
 26 |                           "button_face": "🌞"},
 27 |                           }
 28 | 
 29 | 
 30 | def ChangeTheme():
 31 |     previous_theme = ms.themes["current_theme"]
 32 |     tdict = ms.themes["light"] if ms.themes["current_theme"] == "light" else ms.themes["dark"]
 33 |     for vkey, vval in tdict.items(): 
 34 |         if vkey.startswith("theme"): st._config.set_option(vkey, vval)
 35 | 
 36 |     ms.themes["refreshed"] = False
 37 |     if previous_theme == "dark": ms.themes["current_theme"] = "light"
 38 |     elif previous_theme == "light": ms.themes["current_theme"] = "dark"
 39 | 
 40 | btn_face = ms.themes["light"]["button_face"] if ms.themes["current_theme"] == "light" else ms.themes["dark"]["button_face"]
 41 | st.button(btn_face, on_click=ChangeTheme)
 42 | 
 43 | if ms.themes["refreshed"] == False:
 44 |     ms.themes["refreshed"] = True
 45 |     st.rerun()
 46 | 
 47 | 
 48 | # Function to extract pdf to text format
 49 | def extract_text_from_pdf(pdf_file):
 50 |     """Extract text from an uploaded PDF file."""
 51 |     reader = PyPDF2.PdfReader(pdf_file)
 52 |     text = ""
 53 |     for page in reader.pages:
 54 |         text += page.extract_text()
 55 |     return text
 56 | 
 57 | 
 58 | # Replicate Credentials
 59 | with st.sidebar:
 60 |     st.title('🖊️PDF Summarizer Chatbot')
 61 |     if 'REPLICATE_API_TOKEN' in st.secrets:
 62 |         st.success('API key already provided!', icon='✅')
 63 |         replicate_api = st.secrets['REPLICATE_API_TOKEN']
 64 |     else:
 65 |         replicate_api = st.text_input('Enter Replicate API token:', type='password')
 66 |         if not (replicate_api.startswith('r8_') and len(replicate_api)==40):
 67 |             st.warning('Please enter your credentials!', icon='⚠️')
 68 |         else:
 69 |             st.success('Proceed to entering your prompt message!', icon='👉')
 70 | 
 71 |     uploaded_file = st.file_uploader("Upload a PDF file", type="pdf")
 72 |     pdf_text = ""
 73 |     
 74 |     if uploaded_file:
 75 |         with st.spinner("Extracting text from PDF..."):
 76 |             pdf_text = extract_text_from_pdf(uploaded_file)
 77 |         st.success("PDF uploaded and text extracted!")
 78 | 
 79 |     st.markdown('''
 80 |         Developed by Ichikawa Hiroshi - 2024  
 81 |         Visit my GitHub profile <a href="https://github.com/0xichikawa" style="color:white; background-color:#3187A2; padding:3px 5px; text-decoration:none; border-radius:5px;">here</a>
 82 |         ''', unsafe_allow_html=True)
 83 | 
 84 | 
 85 | os.environ['REPLICATE_API_TOKEN'] = replicate_api
 86 | 
 87 | # Store LLM generated responses
 88 | if "messages" not in st.session_state.keys():
 89 |     st.session_state.messages = [{"role": "assistant", "content": "Upload a PDF file from the sidebar to get started."}]
 90 | 
 91 | # Display or clear chat messages
 92 | for message in st.session_state.messages:
 93 |     with st.chat_message(message["role"]):
 94 |         st.write(message["content"])
 95 | 
 96 | def clear_chat_history():
 97 |     st.session_state.messages = [{"role": "assistant", "content": "Upload a PDF file from the sidebar to get started."}]
 98 |     st.sidebar.button('Clear Chat History', on_click=clear_chat_history)
 99 | 
100 | 
101 | def generate_llama2_response(text, question):
102 |     string_dialogue = "You are a helpful assistant. You do not respond as 'User' or pretend to be 'User'. You only respond once as 'Assistant'."
103 |     prompt = f"Here is the context:\n\n{text[:5000]}\n\nNow answer this question:\n{question}"
104 |     for dict_message in st.session_state.messages:
105 |         if dict_message["role"] == "user":
106 |             string_dialogue += "User: " + dict_message["content"] + "\n\n"
107 |         else:
108 |             string_dialogue += "Assistant: " + dict_message["content"] + "\n\n"
109 |     output = replicate.run('a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5', 
110 |                            input={"prompt" : prompt,
111 |                                   "temperature":0.1, "top_p":0.9, "max_length":2000, "repetition_penalty":1})
112 |     return output
113 | 
114 | 
115 | # Generate a new response if last message is not from assistant
116 | if pdf_text:
117 |     question = st.text_input("Enter your question:")
118 |     if question:
119 |         with st.chat_message("assistant"):
120 |             with st.spinner("Thinking..."):
121 |                 response = generate_llama2_response(pdf_text, question)
122 |                 placeholder = st.empty()
123 |                 full_response = ''
124 |                 for item in response:
125 |                     full_response += item
126 |                     placeholder.markdown(full_response)
127 |                 placeholder.markdown(full_response)
128 |         message = {"role": "assistant", "content": full_response}
129 |         st.session_state.messages.append(message)
130 | 
131 | 


--------------------------------------------------------------------------------