├── requirements.txt ├── .streamlit ├── secrets.toml └── config.toml ├── README.md └── app.py /requirements.txt: -------------------------------------------------------------------------------- 1 | python==3.9 2 | streamlit 3 | replicate 4 | PyPDF2 -------------------------------------------------------------------------------- /.streamlit/secrets.toml: -------------------------------------------------------------------------------- 1 | REPLICATE_API_TOKEN = "INSERT_YOUR_REPLICATE_API_TOKEN_HERE" -------------------------------------------------------------------------------- /.streamlit/config.toml: -------------------------------------------------------------------------------- 1 | [theme] 2 | primaryColor="#BB86FC" 3 | backgroundColor="#121212" 4 | secondaryBackgroundColor="#1F1B24" 5 | textColor="#FFFFFF" 6 | font="sans serif" 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PDF Summarizer and Chatbot using LLaMa2 in Streamlit 2 | ![Screenshot (481)](https://github.com/user-attachments/assets/f5c8a6d0-e9db-4113-a6d2-a44e478b0845) 3 | 4 | ## Project Overview 5 | The PDF Summarizer Chatbot is a user-friendly application that allows you to upload PDF documents and receive concise summaries generated using advanced Large Language Models (LLMs). This project leverages the power of Natural Language Processing (NLP) to extract meaningful insights from textual data, making document analysis faster and more efficient. 6 | 7 | ## Project Purpose 8 | This project can be a starting point for beginners who want to learn about LLMs. I use **Replicate**, which provides **free** cloud API services with open-source models like **LLaMa2**. The open-source Python framework **Streamlit** is used to deploy the model into an interactive web app. Overall, this project was made as simple as possible to help your understanding of the implementation of the LLMs project. 9 | 10 | ## Features 11 | - **PDF Parsing**: Extract text from PDF files using PyPDF2. 12 | - **AI-Powered Summarization**: Summaries are generated using the Llama 2 model, renowned for its state-of-the-art performance in NLP tasks. 13 | - **Interactive User Interface**: Built with Streamlit, providing an intuitive platform for users to upload files and receive outputs. 14 | - **Themes**: Support light and dark themes for user convenience. 15 | - **API Integration**: Utilizes the Replicate API for seamless communication with the LLM backend. 16 | 17 | ## Getting Started 18 | 1. Clone the repository 19 | ``` 20 | git clone https://github.com/0xichikawa/PDF-summarizer-chatbot-using-LLaMa2 21 | cd PDF-summarizer-chatbot-using-LLaMa2 22 | 2. Install dependencies 23 | ``` 24 | pip install -r requirements.txt 25 | ``` 26 | 3. Set Up Replicate API Key 27 | Obtain your Replicate API key from [replicate.com](replicate.com) and add it to `secrets.toml` file in the `.streamlit` folder: 28 | ``` 29 | REPLICATE_API_TOKEN = "INSERT_YOUR_REPLICATE_API_TOKEN_HERE" 30 | 4. Run the application 31 | ``` 32 | streamlit run app.py 33 | 34 | ## Future Enhancements 35 | - Multi-language support for summarization. 36 | - Enhanced text extraction with OCR for scanned PDFs. 37 | - Options for customized summary lengths and formats. 38 | 39 | ## Acknowledgments 40 | - Meta AI for Llama 2. 41 | - Replicate for their API services. 42 | - Streamlit and PyPDF2 for simplifying the development process. 43 | - Data Professor (https://github.com/dataprofessor) for tutorials and project inspiration 44 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import replicate 3 | import PyPDF2 4 | import os 5 | 6 | st.set_page_config(page_title="🖊️PDF Summarizer Chatbot") 7 | 8 | # Toggle light and dark mode themes 9 | ms = st.session_state 10 | if "themes" not in ms: 11 | ms.themes = {"current_theme": "light", 12 | "refreshed": True, 13 | 14 | "light": {"theme.base": "dark", 15 | "theme.backgroundColor": "#FFFFFF", 16 | "theme.primaryColor": "#6200EE", 17 | "theme.secondaryBackgroundColor": "#F5F5F5", 18 | "theme.textColor": "000000", 19 | "button_face": "🌜"}, 20 | 21 | "dark": {"theme.base": "light", 22 | "theme.backgroundColor": "#121212", 23 | "theme.primaryColor": "#BB86FC", 24 | "theme.secondaryBackgroundColor": "#1F1B24", 25 | "theme.textColor": "#E0E0E0", 26 | "button_face": "🌞"}, 27 | } 28 | 29 | 30 | def ChangeTheme(): 31 | previous_theme = ms.themes["current_theme"] 32 | tdict = ms.themes["light"] if ms.themes["current_theme"] == "light" else ms.themes["dark"] 33 | for vkey, vval in tdict.items(): 34 | if vkey.startswith("theme"): st._config.set_option(vkey, vval) 35 | 36 | ms.themes["refreshed"] = False 37 | if previous_theme == "dark": ms.themes["current_theme"] = "light" 38 | elif previous_theme == "light": ms.themes["current_theme"] = "dark" 39 | 40 | btn_face = ms.themes["light"]["button_face"] if ms.themes["current_theme"] == "light" else ms.themes["dark"]["button_face"] 41 | st.button(btn_face, on_click=ChangeTheme) 42 | 43 | if ms.themes["refreshed"] == False: 44 | ms.themes["refreshed"] = True 45 | st.rerun() 46 | 47 | 48 | # Function to extract pdf to text format 49 | def extract_text_from_pdf(pdf_file): 50 | """Extract text from an uploaded PDF file.""" 51 | reader = PyPDF2.PdfReader(pdf_file) 52 | text = "" 53 | for page in reader.pages: 54 | text += page.extract_text() 55 | return text 56 | 57 | 58 | # Replicate Credentials 59 | with st.sidebar: 60 | st.title('🖊️PDF Summarizer Chatbot') 61 | if 'REPLICATE_API_TOKEN' in st.secrets: 62 | st.success('API key already provided!', icon='✅') 63 | replicate_api = st.secrets['REPLICATE_API_TOKEN'] 64 | else: 65 | replicate_api = st.text_input('Enter Replicate API token:', type='password') 66 | if not (replicate_api.startswith('r8_') and len(replicate_api)==40): 67 | st.warning('Please enter your credentials!', icon='⚠️') 68 | else: 69 | st.success('Proceed to entering your prompt message!', icon='👉') 70 | 71 | uploaded_file = st.file_uploader("Upload a PDF file", type="pdf") 72 | pdf_text = "" 73 | 74 | if uploaded_file: 75 | with st.spinner("Extracting text from PDF..."): 76 | pdf_text = extract_text_from_pdf(uploaded_file) 77 | st.success("PDF uploaded and text extracted!") 78 | 79 | st.markdown(''' 80 | Developed by Ichikawa Hiroshi - 2024 81 | Visit my GitHub profile here 82 | ''', unsafe_allow_html=True) 83 | 84 | 85 | os.environ['REPLICATE_API_TOKEN'] = replicate_api 86 | 87 | # Store LLM generated responses 88 | if "messages" not in st.session_state.keys(): 89 | st.session_state.messages = [{"role": "assistant", "content": "Upload a PDF file from the sidebar to get started."}] 90 | 91 | # Display or clear chat messages 92 | for message in st.session_state.messages: 93 | with st.chat_message(message["role"]): 94 | st.write(message["content"]) 95 | 96 | def clear_chat_history(): 97 | st.session_state.messages = [{"role": "assistant", "content": "Upload a PDF file from the sidebar to get started."}] 98 | st.sidebar.button('Clear Chat History', on_click=clear_chat_history) 99 | 100 | 101 | def generate_llama2_response(text, question): 102 | string_dialogue = "You are a helpful assistant. You do not respond as 'User' or pretend to be 'User'. You only respond once as 'Assistant'." 103 | prompt = f"Here is the context:\n\n{text[:5000]}\n\nNow answer this question:\n{question}" 104 | for dict_message in st.session_state.messages: 105 | if dict_message["role"] == "user": 106 | string_dialogue += "User: " + dict_message["content"] + "\n\n" 107 | else: 108 | string_dialogue += "Assistant: " + dict_message["content"] + "\n\n" 109 | output = replicate.run('a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5', 110 | input={"prompt" : prompt, 111 | "temperature":0.1, "top_p":0.9, "max_length":2000, "repetition_penalty":1}) 112 | return output 113 | 114 | 115 | # Generate a new response if last message is not from assistant 116 | if pdf_text: 117 | question = st.text_input("Enter your question:") 118 | if question: 119 | with st.chat_message("assistant"): 120 | with st.spinner("Thinking..."): 121 | response = generate_llama2_response(pdf_text, question) 122 | placeholder = st.empty() 123 | full_response = '' 124 | for item in response: 125 | full_response += item 126 | placeholder.markdown(full_response) 127 | placeholder.markdown(full_response) 128 | message = {"role": "assistant", "content": full_response} 129 | st.session_state.messages.append(message) 130 | 131 | --------------------------------------------------------------------------------