├── .gitignore ├── requirements.txt ├── __pycache__ └── app.cpython-311.pyc ├── README.md ├── CONTRIBUTION.md └── app.py /.gitignore: -------------------------------------------------------------------------------- 1 | .env 2 | testapi.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | fastapi[all] 2 | openai 3 | python-dotenv 4 | pydantic==1.* 5 | langchain 6 | bs4 7 | tiktoken -------------------------------------------------------------------------------- /__pycache__/app.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/daley-mottley/ai-research-agent/HEAD/__pycache__/app.cpython-311.pyc -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AI Research Agent 2 | 3 | ## Project Overview 4 | 5 | AI Research Agent is a versatile application that leverages multiple tools to conduct thorough research on any topic. The application integrates web search, website scraping, and summarization capabilities, all powered by OpenAI's language model. The agent is designed to gather, analyze, and summarize factual information, ensuring that the research output is accurate and well-supported by data. 6 | 7 | ## Features 8 | 9 | - **Web Search:** Conduct searches using Serper API to gather information on any topic. 10 | - **Website Scraping:** Scrape websites for relevant content and summarize large texts to extract key information. 11 | - **Summarization:** Use AI-driven summarization techniques to condense large amounts of data into concise reports. 12 | - **Streamlit Web App:** A user-friendly interface to input research queries and receive detailed responses. 13 | - **FastAPI Endpoint:** Expose the research capabilities as an API for programmatic access. 14 | 15 | ## Setup Instructions 16 | 17 | ### Prerequisites 18 | 19 | - Python 3.x 20 | - Poetry package manager 21 | - API Keys for: 22 | - Browserless (for web scraping) 23 | - Serper (for web search) 24 | - OpenAI (for language model) 25 | - Optional: A `.env` file to store environment variables 26 | 27 | ### Installation 28 | 29 | 1. **Clone the repository:** 30 | ```bash 31 | git clone https://github.com/dmotts/ai-research-agent.git 32 | cd ai-research-agent 33 | ``` 34 | 35 | 2. **Install dependencies:** 36 | ```bash 37 | poetry install 38 | ``` 39 | 40 | 3. **Set up environment variables:** 41 | You can either export the required environment variables in your shell or create a `.env` file in the root directory with the following content: 42 | ```dotenv 43 | BROWSERLESS_API_KEY=your_browserless_api_key 44 | SERP_API_KEY=your_serper_api_key 45 | OPENAI_API_KEY=your_openai_api_key 46 | ``` 47 | 48 | 4. **Run the application locally:** 49 | ```bash 50 | poetry run streamlit run app.py 51 | ``` 52 | 53 | 5. **Access the Streamlit app:** 54 | Open your web browser and go to `http://localhost:8501` to interact with the AI Research Agent. 55 | 56 | ### Running the FastAPI Server 57 | 58 | If you want to expose the research agent as an API: 59 | 60 | 1. **Start the FastAPI server:** 61 | ```bash 62 | poetry run uvicorn app:app --reload 63 | ``` 64 | 65 | 2. **Access the FastAPI endpoint:** 66 | You can send POST requests to `http://localhost:8000/` with a JSON payload containing the research query. 67 | 68 | ## Usage 69 | 70 | - **Streamlit Interface:** Enter your research goal in the text input field, and the agent will perform the research, presenting you with a detailed summary of the findings. 71 | - **API Endpoint:** Use the FastAPI endpoint to integrate the research agent into other applications or services. 72 | 73 | ## Contributions 74 | 75 | Contributions are welcome! It only takes five (5) steps! 76 | 77 | To contribute: 78 | 79 | 1) Fork the repository. 80 | 81 | 2) Create a new branch: `git checkout -b my-feature-branch`. 82 | 83 | 3) Make your changes and commit them: `git commit -m 'Add some feature'`. 84 | 85 | 4) Push to the branch: `git push origin my-feature-branch`. 86 | 87 | 5) Open a pull request. 88 | 89 | Please ensure your code follows the project's coding standards and includes tests where appropriate. 90 | 91 | ## Let's Connect 🤝 92 | 93 | If you find this project useful, please consider connecting with me on GitHub: 94 | 95 | [Daley Mottley (dmotts)](https://github.com/dmotts) 96 | -------------------------------------------------------------------------------- /CONTRIBUTION.md: -------------------------------------------------------------------------------- 1 | # AI Research Agent: Contributing Guidelines 📄 2 | 3 | ## Table of Contents 4 | 1. [Introduction](#introduction-) 5 | 2. [Tech Stack](#tech-stack-) 6 | 3. [Installation](#installation-) 7 | 4. [Contributing](#contributing-) 8 | - [Development Workflow](#development-workflow) 9 | - [Issue Report Process](#issue-report-process) 10 | - [Pull Request Process](#pull-request-process-) 11 | - [Contributing Using GitHub Desktop](#contributing-using-github-desktop) 12 | 5. [Resources for Beginners](#resources-for-beginners-) 13 | 6. [Documentation](#documentation-) 14 | 7. [Code Reviews](#code-reviews-) 15 | 8. [Feature Requests](#feature-requests-) 16 | 9. [Spreading the Word](#spreading-the-word-) 17 | 10. [Code of Conduct](#code-of-conduct-) 18 | 11. [Thank You](#thank-you-) 19 | 20 | ## Introduction 🖥️ 21 | 22 | Welcome to **Customer Support Bot**, a WordPress plugin that allows businesses to automate customer support using AI technology. We are excited to have you contribute to our project! No contribution is too small, and we appreciate your help in improving this plugin. 23 | 24 | ## Setup Instructions 25 | 26 | ### Prerequisites 27 | 28 | - Python 3.x 29 | - Poetry package manager 30 | - API Keys for: 31 | - Browserless (for web scraping) 32 | - Serper (for web search) 33 | - OpenAI (for language model) 34 | - Optional: A `.env` file to store environment variables 35 | 36 | ## Installation ⚙️ 37 | 38 | 1. **Clone the repository:** 39 | ```bash 40 | git clone https://github.com/dmotts/ai-research-agent.git 41 | cd ai-research-agent 42 | ``` 43 | 44 | 2. **Install dependencies:** 45 | ```bash 46 | poetry install 47 | ``` 48 | 49 | 3. **Set up environment variables:** 50 | You can either export the required environment variables in your shell or create a `.env` file in the root directory with the following content: 51 | ```dotenv 52 | BROWSERLESS_API_KEY=your_browserless_api_key 53 | SERP_API_KEY=your_serper_api_key 54 | OPENAI_API_KEY=your_openai_api_key 55 | ``` 56 | 57 | 4. **Run the application locally:** 58 | ```bash 59 | poetry run streamlit run app.py 60 | ``` 61 | 62 | 5. **Access the Streamlit app:** 63 | Open your web browser and go to `http://localhost:8501` to interact with the AI Research Agent. 64 | 65 | ### Running the FastAPI Server 66 | 67 | If you want to expose the research agent as an API: 68 | 69 | 1. **Start the FastAPI server:** 70 | ```bash 71 | poetry run uvicorn app:app --reload 72 | ``` 73 | 74 | 2. **Access the FastAPI endpoint:** 75 | You can send POST requests to `http://localhost:8000/` with a JSON payload containing the research query. 76 | 77 | ## Usage 78 | 79 | - **Streamlit Interface:** Enter your research goal in the text input field, and the agent will perform the research, presenting you with a detailed summary of the findings. 80 | - **API Endpoint:** Use the FastAPI endpoint to integrate the research agent into other applications or services. 81 | 82 | ## Contributing 📝 83 | 84 | We welcome contributions to **Customer Support Bot**! Please follow these guidelines to ensure a smooth contribution process. 85 | 86 | ### Development Workflow 87 | 88 | - **Work on a New Branch:** Always create a new branch for each issue or feature you are working on. 89 | - **Keep Your Branch Up to Date:** Regularly pull changes from the master branch to keep your branch up to date. 90 | - **Write Clear Commit Messages:** Use descriptive commit messages to explain what your changes do. 91 | - **Test Thoroughly:** Ensure your changes work correctly and do not break existing functionality. 92 | - **Self-Review:** Review your code before submitting to catch any errors or areas for improvement. 93 | 94 | ### Issue Report Process 📌 95 | 96 | 1. **Check Existing Issues:** Before creating a new issue, check if it has already been reported. 97 | 2. **Create a New Issue:** Go to the project's [issues section](https://github.com/dmotts/customer-support-bot/issues) and select the appropriate template. 98 | 3. **Provide Details:** Give a clear and detailed description of the issue. 99 | 4. **Wait for Assignment:** Wait for the issue to be assigned to you before starting work. 100 | 101 | ### Pull Request Process 🚀 102 | 103 | 1. **Ensure Self-Review:** Make sure you have thoroughly reviewed your code. 104 | 2. **Provide Descriptions:** Add a clear description of the functionality and changes in your pull request. 105 | 3. **Comment Your Code:** Comment on complex or hard-to-understand areas of your code. 106 | 4. **Add Screenshots:** Include screenshots if they help explain your changes. 107 | 5. **Submit PR:** Submit your pull request using the provided template, and wait for the maintainers to review it. 108 | 109 | ### Contributing Using GitHub Desktop 110 | 111 | If you prefer using GitHub Desktop, follow these steps: 112 | 113 | 1. **Open GitHub Desktop:** Launch GitHub Desktop and log in to your GitHub account. 114 | 2. **Clone the Repository:** Click on "File" > "Clone Repository" and select the repository to clone. 115 | 3. **Create a Branch:** Click on "Current Branch" and select "New Branch" to create a new branch for your work. 116 | 4. **Make Changes:** Edit the code using your preferred code editor. 117 | 5. **Commit Changes:** 118 | - In GitHub Desktop, select the files you changed. 119 | - Enter a summary and description for your commit. 120 | - Click "Commit to [branch-name]". 121 | 6. **Push Changes:** Click "Push origin" to push your changes to GitHub. 122 | 7. **Create a Pull Request:** 123 | - On GitHub, navigate to your forked repository. 124 | - Click on "Compare & pull request". 125 | - Review your changes and submit the pull request. 126 | 8. **Wait for Review:** Wait for the maintainers to review your pull request. 127 | 128 | ## Resources for Beginners 📚 129 | 130 | If you're new to Git and GitHub, here are some resources to help you get started: 131 | 132 | - [Forking a Repo](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) 133 | - [Cloning a Repo](https://help.github.com/en/desktop/contributing-to-projects/creating-an-issue-or-pull-request) 134 | - [Creating a Pull Request](https://opensource.com/article/19/7/create-pull-request-github) 135 | - [Getting Started with Git and GitHub](https://towardsdatascience.com/getting-started-with-git-and-github-6fcd0f2d4ac6) 136 | - [Learn GitHub from Scratch](https://docs.github.com/en/get-started/start-your-journey/git-and-github-learning-resources) 137 | 138 | ## Documentation 📍 139 | 140 | - **Update Documentation:** Document any significant changes or additions to the codebase. 141 | - **Provide Clear Explanations:** Explain the functionality, usage, and any relevant considerations. 142 | - **Use Comments:** Comment your code, especially in complex areas. 143 | 144 | ## Code Reviews 🔎 145 | 146 | - **Be Open to Feedback:** Welcome feedback and constructive criticism from other contributors. 147 | - **Participate in Reviews:** Help review others' code when possible. 148 | - **Follow Guidelines:** Ensure your code meets the project's coding standards and guidelines. 149 | 150 | ## Feature Requests 🔥 151 | 152 | - **Suggest Improvements:** Propose new features or enhancements that could benefit the project. 153 | - **Provide Details:** Explain the rationale and potential impact of your suggestion. 154 | 155 | ## Spreading the Word 👐 156 | 157 | - **Share Your Experience:** Share the project with others who might be interested. 158 | - **Engage on Social Media:** Talk about the project on social media, developer forums, or relevant platforms. 159 | 160 | ## Code of Conduct 📜 161 | 162 | Please note that we have a [Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project, you agree to abide by its terms. 163 | 164 | ## Thank You 💗 165 | 166 | Thank you for contributing to **Customer Support Bot**! Together, we can make a significant impact. Happy coding! 🚀 167 | 168 | Don't forget to ⭐ the repository! 169 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | import os 2 | from dotenv import load_dotenv 3 | 4 | from langchain import PromptTemplate 5 | from langchain.agents import initialize_agent, Tool 6 | from langchain.agents import AgentType 7 | from langchain.chat_models import ChatOpenAI 8 | from langchain.prompts import MessagesPlaceholder 9 | from langchain.memory import ConversationSummaryBufferMemory 10 | from langchain.text_splitter import RecursiveCharacterTextSplitter 11 | from langchain.chains.summarize import load_summarize_chain 12 | from langchain.tools import BaseTool 13 | from pydantic import BaseModel, Field 14 | from typing import Type 15 | from bs4 import BeautifulSoup 16 | import requests 17 | import json 18 | from langchain.schema import SystemMessage 19 | from fastapi import FastAPI 20 | import streamlit as st 21 | 22 | load_dotenv() 23 | brwoserless_api_key = os.getenv("BROWSERLESS_API_KEY") 24 | serper_api_key = os.getenv("SERP_API_KEY") 25 | 26 | # 1. Tool for search 27 | 28 | 29 | def search(query): 30 | url = "https://google.serper.dev/search" 31 | 32 | payload = json.dumps({ 33 | "q": query 34 | }) 35 | 36 | headers = { 37 | 'X-API-KEY': serper_api_key, 38 | 'Content-Type': 'application/json' 39 | } 40 | 41 | response = requests.request("POST", url, headers=headers, data=payload) 42 | 43 | print(response.text) 44 | 45 | return response.text 46 | 47 | 48 | # 2. Tool for scraping 49 | def scrape_website(objective: str, url: str): 50 | # scrape website, and also will summarize the content based on objective if the content is too large 51 | # objective is the original objective & task that user give to the agent, url is the url of the website to be scraped 52 | 53 | print("Scraping website...") 54 | # Define the headers for the request 55 | headers = { 56 | 'Cache-Control': 'no-cache', 57 | 'Content-Type': 'application/json', 58 | } 59 | 60 | # Define the data to be sent in the request 61 | data = { 62 | "url": url 63 | } 64 | 65 | # Convert Python object to JSON string 66 | data_json = json.dumps(data) 67 | 68 | # Send the POST request 69 | post_url = f"https://chrome.browserless.io/content?token={brwoserless_api_key}" 70 | response = requests.post(post_url, headers=headers, data=data_json) 71 | 72 | # Check the response status code 73 | if response.status_code == 200: 74 | soup = BeautifulSoup(response.content, "html.parser") 75 | for script in soup(["script", "style"]): 76 | script.decompose() 77 | text = soup.get_text() 78 | print("CONTENT:", text) 79 | 80 | if len(text) > 10000: 81 | output = summary(objective, text) 82 | 83 | return output 84 | else: 85 | return text 86 | else: 87 | print(f"HTTP request failed with status code {response.status_code}") 88 | 89 | 90 | 91 | def summary(objective, content): 92 | llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k-0613") 93 | 94 | text_splitter = RecursiveCharacterTextSplitter( 95 | separators=["\n\n", "\n"], chunk_size=10000, chunk_overlap=500) 96 | docs = text_splitter.create_documents([content]) 97 | map_prompt = """ 98 | Write a summary of the following text for {objective}. The text is Scraped data from a website so 99 | will have a lot of usless information that doesnt relate to this topic, links, other news stories etc.. 100 | Only summarise the relevant Info and try to keep as much factual information Intact: 101 | "{text}" 102 | SUMMARY: 103 | """ 104 | map_prompt_template = PromptTemplate( 105 | template=map_prompt, input_variables=["text", "objective"]) 106 | 107 | summary_chain = load_summarize_chain( 108 | llm=llm, 109 | chain_type='map_reduce', 110 | map_prompt=map_prompt_template, 111 | combine_prompt=map_prompt_template, 112 | verbose=True 113 | ) 114 | 115 | output = summary_chain.run(input_documents=docs, objective=objective) 116 | 117 | return output 118 | 119 | class ScrapeWebsiteInput(BaseModel): 120 | """Inputs for scrape_website""" 121 | objective: str = Field( 122 | description="The objective & task that users give to the agent") 123 | url: str = Field(description="The url of the website to be scraped") 124 | 125 | 126 | class ScrapeWebsiteTool(BaseTool): 127 | name = "scrape_website" 128 | description = "useful when you need to get data from a website url, passing both url and objective to the function; DO NOT make up any url, the url should only be from the search results" 129 | args_schema: Type[BaseModel] = ScrapeWebsiteInput 130 | 131 | def _run(self, objective: str, url: str): 132 | return scrape_website(objective, url) 133 | 134 | def _arun(self, url: str): 135 | raise NotImplementedError("error here") 136 | 137 | 138 | # 3. Create langchain agent with the tools above 139 | tools = [ 140 | Tool( 141 | name="Search", 142 | func=search, 143 | description="useful for when you need to answer questions about current events, data. You should ask targeted questions" 144 | ), 145 | ScrapeWebsiteTool(), 146 | ] 147 | 148 | system_message = SystemMessage( 149 | content="""You are a world class researcher, who can do detailed research on any topic and produce facts based results; 150 | you do not make things up, you will try as hard as possible to gather facts & data to back up the research 151 | 152 | Please make sure you complete the objective above with the following rules: 153 | 1/ You should do enough research to gather as much information as possible about the objective 154 | 2/ If there are url of relevant links & articles, you will scrape it to gather more information 155 | 3/ After scraping & search, you should think "is there any new things i should search & scraping based on the data I collected to increase research quality?" If answer is yes, continue; But don't do this more than 3 iteratins 156 | 4/ You should not make things up, you should only write facts & data that you have gathered 157 | 5/ In the final output, You should include all reference data & links to back up your research; You should include all reference data & links to back up your research 158 | 6/ In the final output, You should include all reference data & links to back up your research; You should include all reference data & links to back up your research""" 159 | ) 160 | 161 | agent_kwargs = { 162 | "extra_prompt_messages": [MessagesPlaceholder(variable_name="memory")], 163 | "system_message": system_message, 164 | } 165 | 166 | llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k-0613") 167 | memory = ConversationSummaryBufferMemory( 168 | memory_key="memory", return_messages=True, llm=llm, max_token_limit=1000) 169 | 170 | agent = initialize_agent( 171 | tools, 172 | llm, 173 | agent=AgentType.OPENAI_FUNCTIONS, 174 | verbose=True, 175 | agent_kwargs=agent_kwargs, 176 | memory=memory, 177 | ) 178 | 179 | # 4. Use streamlit to create a web app 180 | def main(): 181 | st.set_page_config(page_title="AI research agent", page_icon=":bird:") 182 | 183 | st.header("AI research agent :bird:") 184 | query = st.text_input("Research goal") 185 | 186 | if query: 187 | st.write("Doing research for ", query) 188 | 189 | result = agent({"input": query}) 190 | 191 | st.info(result['output']) 192 | 193 | 194 | if __name__ == '__main__': 195 | main() 196 | 197 | 198 | # 5. Set this as an API endpoint via FastAPI 199 | app = FastAPI() 200 | 201 | 202 | class Query(BaseModel): 203 | query: str 204 | 205 | 206 | @app.post("/") 207 | def researchAgent(query: Query): 208 | query = query.query 209 | content = agent({"input": query}) 210 | actual_content = content['output'] 211 | return actual_content --------------------------------------------------------------------------------