├── .gitignore
├── requirements.txt
├── __pycache__
    └── app.cpython-311.pyc
├── README.md
├── CONTRIBUTION.md
└── app.py


/.gitignore:
--------------------------------------------------------------------------------
1 | .env
2 | testapi.py


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | fastapi[all]
2 | openai
3 | python-dotenv
4 | pydantic==1.*
5 | langchain
6 | bs4
7 | tiktoken


--------------------------------------------------------------------------------
/__pycache__/app.cpython-311.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/daley-mottley/ai-research-agent/HEAD/__pycache__/app.cpython-311.pyc


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # AI Research Agent
 2 | 
 3 | ## Project Overview
 4 | 
 5 | AI Research Agent is a versatile application that leverages multiple tools to conduct thorough research on any topic. The application integrates web search, website scraping, and summarization capabilities, all powered by OpenAI's language model. The agent is designed to gather, analyze, and summarize factual information, ensuring that the research output is accurate and well-supported by data.
 6 | 
 7 | ## Features
 8 | 
 9 | - **Web Search:** Conduct searches using Serper API to gather information on any topic.
10 | - **Website Scraping:** Scrape websites for relevant content and summarize large texts to extract key information.
11 | - **Summarization:** Use AI-driven summarization techniques to condense large amounts of data into concise reports.
12 | - **Streamlit Web App:** A user-friendly interface to input research queries and receive detailed responses.
13 | - **FastAPI Endpoint:** Expose the research capabilities as an API for programmatic access.
14 | 
15 | ## Setup Instructions
16 | 
17 | ### Prerequisites
18 | 
19 | - Python 3.x
20 | - Poetry package manager
21 | - API Keys for:
22 |   - Browserless (for web scraping)
23 |   - Serper (for web search)
24 |   - OpenAI (for language model)
25 | - Optional: A `.env` file to store environment variables
26 | 
27 | ### Installation
28 | 
29 | 1. **Clone the repository:**
30 |    ```bash
31 |    git clone https://github.com/dmotts/ai-research-agent.git
32 |    cd ai-research-agent
33 |    ```
34 | 
35 | 2. **Install dependencies:**
36 |    ```bash
37 |    poetry install
38 |    ```
39 | 
40 | 3. **Set up environment variables:**
41 |    You can either export the required environment variables in your shell or create a `.env` file in the root directory with the following content:
42 |    ```dotenv
43 |    BROWSERLESS_API_KEY=your_browserless_api_key
44 |    SERP_API_KEY=your_serper_api_key
45 |    OPENAI_API_KEY=your_openai_api_key
46 |    ```
47 | 
48 | 4. **Run the application locally:**
49 |    ```bash
50 |    poetry run streamlit run app.py
51 |    ```
52 | 
53 | 5. **Access the Streamlit app:**
54 |    Open your web browser and go to `http://localhost:8501` to interact with the AI Research Agent.
55 | 
56 | ### Running the FastAPI Server
57 | 
58 | If you want to expose the research agent as an API:
59 | 
60 | 1. **Start the FastAPI server:**
61 |    ```bash
62 |    poetry run uvicorn app:app --reload
63 |    ```
64 | 
65 | 2. **Access the FastAPI endpoint:**
66 |    You can send POST requests to `http://localhost:8000/` with a JSON payload containing the research query.
67 | 
68 | ## Usage
69 | 
70 | - **Streamlit Interface:** Enter your research goal in the text input field, and the agent will perform the research, presenting you with a detailed summary of the findings.
71 | - **API Endpoint:** Use the FastAPI endpoint to integrate the research agent into other applications or services.
72 | 
73 | ## Contributions
74 | 
75 | Contributions are welcome! It only takes five (5) steps!
76 | 
77 | To contribute:
78 | 
79 | 1) Fork the repository.
80 | 
81 | 2) Create a new branch: `git checkout -b my-feature-branch`.
82 | 
83 | 3) Make your changes and commit them: `git commit -m 'Add some feature'`.
84 | 
85 | 4) Push to the branch: `git push origin my-feature-branch`.
86 | 
87 | 5) Open a pull request.
88 | 
89 | Please ensure your code follows the project's coding standards and includes tests where appropriate.
90 | 
91 | ## Let's Connect 🤝
92 | 
93 | If you find this project useful, please consider connecting with me on GitHub:
94 | 
95 | [Daley Mottley (dmotts)](https://github.com/dmotts)
96 | 


--------------------------------------------------------------------------------
/CONTRIBUTION.md:
--------------------------------------------------------------------------------
  1 | # AI Research Agent: Contributing Guidelines 📄
  2 | 
  3 | ## Table of Contents
  4 | 1. [Introduction](#introduction-)
  5 | 2. [Tech Stack](#tech-stack-)
  6 | 3. [Installation](#installation-)
  7 | 4. [Contributing](#contributing-)
  8 |    - [Development Workflow](#development-workflow)
  9 |    - [Issue Report Process](#issue-report-process)
 10 |    - [Pull Request Process](#pull-request-process-)
 11 |    - [Contributing Using GitHub Desktop](#contributing-using-github-desktop)
 12 | 5. [Resources for Beginners](#resources-for-beginners-)
 13 | 6. [Documentation](#documentation-)
 14 | 7. [Code Reviews](#code-reviews-)
 15 | 8. [Feature Requests](#feature-requests-)
 16 | 9. [Spreading the Word](#spreading-the-word-)
 17 | 10. [Code of Conduct](#code-of-conduct-)
 18 | 11. [Thank You](#thank-you-)
 19 | 
 20 | ## Introduction 🖥️
 21 | 
 22 | Welcome to **Customer Support Bot**, a WordPress plugin that allows businesses to automate customer support using AI technology. We are excited to have you contribute to our project! No contribution is too small, and we appreciate your help in improving this plugin.
 23 | 
 24 | ## Setup Instructions
 25 | 
 26 | ### Prerequisites
 27 | 
 28 | - Python 3.x
 29 | - Poetry package manager
 30 | - API Keys for:
 31 |   - Browserless (for web scraping)
 32 |   - Serper (for web search)
 33 |   - OpenAI (for language model)
 34 | - Optional: A `.env` file to store environment variables
 35 | 
 36 | ## Installation ⚙️
 37 | 
 38 | 1. **Clone the repository:**
 39 |    ```bash
 40 |    git clone https://github.com/dmotts/ai-research-agent.git
 41 |    cd ai-research-agent
 42 |    ```
 43 | 
 44 | 2. **Install dependencies:**
 45 |    ```bash
 46 |    poetry install
 47 |    ```
 48 | 
 49 | 3. **Set up environment variables:**
 50 |    You can either export the required environment variables in your shell or create a `.env` file in the root directory with the following content:
 51 |    ```dotenv
 52 |    BROWSERLESS_API_KEY=your_browserless_api_key
 53 |    SERP_API_KEY=your_serper_api_key
 54 |    OPENAI_API_KEY=your_openai_api_key
 55 |    ```
 56 | 
 57 | 4. **Run the application locally:**
 58 |    ```bash
 59 |    poetry run streamlit run app.py
 60 |    ```
 61 | 
 62 | 5. **Access the Streamlit app:**
 63 |    Open your web browser and go to `http://localhost:8501` to interact with the AI Research Agent.
 64 | 
 65 | ### Running the FastAPI Server
 66 | 
 67 | If you want to expose the research agent as an API:
 68 | 
 69 | 1. **Start the FastAPI server:**
 70 |    ```bash
 71 |    poetry run uvicorn app:app --reload
 72 |    ```
 73 | 
 74 | 2. **Access the FastAPI endpoint:**
 75 |    You can send POST requests to `http://localhost:8000/` with a JSON payload containing the research query.
 76 | 
 77 | ## Usage
 78 | 
 79 | - **Streamlit Interface:** Enter your research goal in the text input field, and the agent will perform the research, presenting you with a detailed summary of the findings.
 80 | - **API Endpoint:** Use the FastAPI endpoint to integrate the research agent into other applications or services.
 81 | 
 82 | ## Contributing 📝
 83 | 
 84 | We welcome contributions to **Customer Support Bot**! Please follow these guidelines to ensure a smooth contribution process.
 85 | 
 86 | ### Development Workflow
 87 | 
 88 | - **Work on a New Branch:** Always create a new branch for each issue or feature you are working on.
 89 | - **Keep Your Branch Up to Date:** Regularly pull changes from the master branch to keep your branch up to date.
 90 | - **Write Clear Commit Messages:** Use descriptive commit messages to explain what your changes do.
 91 | - **Test Thoroughly:** Ensure your changes work correctly and do not break existing functionality.
 92 | - **Self-Review:** Review your code before submitting to catch any errors or areas for improvement.
 93 | 
 94 | ### Issue Report Process 📌
 95 | 
 96 | 1. **Check Existing Issues:** Before creating a new issue, check if it has already been reported.
 97 | 2. **Create a New Issue:** Go to the project's [issues section](https://github.com/dmotts/customer-support-bot/issues) and select the appropriate template.
 98 | 3. **Provide Details:** Give a clear and detailed description of the issue.
 99 | 4. **Wait for Assignment:** Wait for the issue to be assigned to you before starting work.
100 | 
101 | ### Pull Request Process 🚀
102 | 
103 | 1. **Ensure Self-Review:** Make sure you have thoroughly reviewed your code.
104 | 2. **Provide Descriptions:** Add a clear description of the functionality and changes in your pull request.
105 | 3. **Comment Your Code:** Comment on complex or hard-to-understand areas of your code.
106 | 4. **Add Screenshots:** Include screenshots if they help explain your changes.
107 | 5. **Submit PR:** Submit your pull request using the provided template, and wait for the maintainers to review it.
108 | 
109 | ### Contributing Using GitHub Desktop
110 | 
111 | If you prefer using GitHub Desktop, follow these steps:
112 | 
113 | 1. **Open GitHub Desktop:** Launch GitHub Desktop and log in to your GitHub account.
114 | 2. **Clone the Repository:** Click on "File" > "Clone Repository" and select the repository to clone.
115 | 3. **Create a Branch:** Click on "Current Branch" and select "New Branch" to create a new branch for your work.
116 | 4. **Make Changes:** Edit the code using your preferred code editor.
117 | 5. **Commit Changes:**
118 |    - In GitHub Desktop, select the files you changed.
119 |    - Enter a summary and description for your commit.
120 |    - Click "Commit to [branch-name]".
121 | 6. **Push Changes:** Click "Push origin" to push your changes to GitHub.
122 | 7. **Create a Pull Request:**
123 |    - On GitHub, navigate to your forked repository.
124 |    - Click on "Compare & pull request".
125 |    - Review your changes and submit the pull request.
126 | 8. **Wait for Review:** Wait for the maintainers to review your pull request.
127 | 
128 | ## Resources for Beginners 📚
129 | 
130 | If you're new to Git and GitHub, here are some resources to help you get started:
131 | 
132 | - [Forking a Repo](https://help.github.com/en/github/getting-started-with-github/fork-a-repo)
133 | - [Cloning a Repo](https://help.github.com/en/desktop/contributing-to-projects/creating-an-issue-or-pull-request)
134 | - [Creating a Pull Request](https://opensource.com/article/19/7/create-pull-request-github)
135 | - [Getting Started with Git and GitHub](https://towardsdatascience.com/getting-started-with-git-and-github-6fcd0f2d4ac6)
136 | - [Learn GitHub from Scratch](https://docs.github.com/en/get-started/start-your-journey/git-and-github-learning-resources)
137 | 
138 | ## Documentation 📍
139 | 
140 | - **Update Documentation:** Document any significant changes or additions to the codebase.
141 | - **Provide Clear Explanations:** Explain the functionality, usage, and any relevant considerations.
142 | - **Use Comments:** Comment your code, especially in complex areas.
143 | 
144 | ## Code Reviews 🔎
145 | 
146 | - **Be Open to Feedback:** Welcome feedback and constructive criticism from other contributors.
147 | - **Participate in Reviews:** Help review others' code when possible.
148 | - **Follow Guidelines:** Ensure your code meets the project's coding standards and guidelines.
149 | 
150 | ## Feature Requests 🔥
151 | 
152 | - **Suggest Improvements:** Propose new features or enhancements that could benefit the project.
153 | - **Provide Details:** Explain the rationale and potential impact of your suggestion.
154 | 
155 | ## Spreading the Word 👐
156 | 
157 | - **Share Your Experience:** Share the project with others who might be interested.
158 | - **Engage on Social Media:** Talk about the project on social media, developer forums, or relevant platforms.
159 | 
160 | ## Code of Conduct 📜
161 | 
162 | Please note that we have a [Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project, you agree to abide by its terms.
163 | 
164 | ## Thank You 💗
165 | 
166 | Thank you for contributing to **Customer Support Bot**! Together, we can make a significant impact. Happy coding! 🚀
167 | 
168 | Don't forget to ⭐ the repository!
169 | 


--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from dotenv import load_dotenv
  3 | 
  4 | from langchain import PromptTemplate
  5 | from langchain.agents import initialize_agent, Tool
  6 | from langchain.agents import AgentType
  7 | from langchain.chat_models import ChatOpenAI
  8 | from langchain.prompts import MessagesPlaceholder
  9 | from langchain.memory import ConversationSummaryBufferMemory
 10 | from langchain.text_splitter import RecursiveCharacterTextSplitter
 11 | from langchain.chains.summarize import load_summarize_chain
 12 | from langchain.tools import BaseTool
 13 | from pydantic import BaseModel, Field
 14 | from typing import Type
 15 | from bs4 import BeautifulSoup
 16 | import requests
 17 | import json
 18 | from langchain.schema import SystemMessage
 19 | from fastapi import FastAPI
 20 | import streamlit as st
 21 | 
 22 | load_dotenv()
 23 | brwoserless_api_key = os.getenv("BROWSERLESS_API_KEY")
 24 | serper_api_key = os.getenv("SERP_API_KEY")
 25 | 
 26 | # 1. Tool for search
 27 | 
 28 | 
 29 | def search(query):
 30 |     url = "https://google.serper.dev/search"
 31 | 
 32 |     payload = json.dumps({
 33 |         "q": query
 34 |     })
 35 | 
 36 |     headers = {
 37 |         'X-API-KEY': serper_api_key,
 38 |         'Content-Type': 'application/json'
 39 |     }
 40 | 
 41 |     response = requests.request("POST", url, headers=headers, data=payload)
 42 | 
 43 |     print(response.text)
 44 | 
 45 |     return response.text
 46 | 
 47 | 
 48 | # 2. Tool for scraping
 49 | def scrape_website(objective: str, url: str):
 50 |     # scrape website, and also will summarize the content based on objective if the content is too large
 51 |     # objective is the original objective & task that user give to the agent, url is the url of the website to be scraped
 52 | 
 53 |     print("Scraping website...")
 54 |     # Define the headers for the request
 55 |     headers = {
 56 |         'Cache-Control': 'no-cache',
 57 |         'Content-Type': 'application/json',
 58 |     }
 59 | 
 60 |     # Define the data to be sent in the request
 61 |     data = {
 62 |         "url": url
 63 |     }
 64 | 
 65 |     # Convert Python object to JSON string
 66 |     data_json = json.dumps(data)
 67 | 
 68 |     # Send the POST request
 69 |     post_url = f"https://chrome.browserless.io/content?token={brwoserless_api_key}"
 70 |     response = requests.post(post_url, headers=headers, data=data_json)
 71 |     
 72 |     # Check the response status code
 73 |     if response.status_code == 200:
 74 |         soup = BeautifulSoup(response.content, "html.parser")
 75 |         for script in soup(["script", "style"]):
 76 |             script.decompose()
 77 |         text = soup.get_text()
 78 |         print("CONTENT:", text)
 79 | 
 80 |         if len(text) > 10000:
 81 |             output = summary(objective, text)
 82 |             
 83 |             return output
 84 |         else:
 85 |             return text
 86 |     else:
 87 |         print(f"HTTP request failed with status code {response.status_code}")
 88 | 
 89 | 
 90 | 
 91 | def summary(objective, content):
 92 |     llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k-0613")
 93 | 
 94 |     text_splitter = RecursiveCharacterTextSplitter(
 95 |         separators=["\n\n", "\n"], chunk_size=10000, chunk_overlap=500)
 96 |     docs = text_splitter.create_documents([content])
 97 |     map_prompt = """
 98 |     Write a summary of the following text for {objective}. The text is Scraped data from a website so 
 99 |     will have a lot of usless information that doesnt relate to this topic, links, other news stories etc.. 
100 |     Only summarise the relevant Info and try to keep as much factual information Intact:
101 |     "{text}"
102 |     SUMMARY:
103 |     """
104 |     map_prompt_template = PromptTemplate(
105 |         template=map_prompt, input_variables=["text", "objective"])
106 | 
107 |     summary_chain = load_summarize_chain(
108 |         llm=llm,
109 |         chain_type='map_reduce',
110 |         map_prompt=map_prompt_template,
111 |         combine_prompt=map_prompt_template,
112 |         verbose=True
113 |     )
114 | 
115 |     output = summary_chain.run(input_documents=docs, objective=objective)
116 | 
117 |     return output
118 | 
119 | class ScrapeWebsiteInput(BaseModel):
120 |     """Inputs for scrape_website"""
121 |     objective: str = Field(
122 |         description="The objective & task that users give to the agent")
123 |     url: str = Field(description="The url of the website to be scraped")
124 | 
125 | 
126 | class ScrapeWebsiteTool(BaseTool):
127 |     name = "scrape_website"
128 |     description = "useful when you need to get data from a website url, passing both url and objective to the function; DO NOT make up any url, the url should only be from the search results"
129 |     args_schema: Type[BaseModel] = ScrapeWebsiteInput
130 | 
131 |     def _run(self, objective: str, url: str):
132 |         return scrape_website(objective, url)
133 | 
134 |     def _arun(self, url: str):
135 |         raise NotImplementedError("error here")
136 | 
137 | 
138 | # 3. Create langchain agent with the tools above
139 | tools = [
140 |     Tool(
141 |         name="Search",
142 |         func=search,
143 |         description="useful for when you need to answer questions about current events, data. You should ask targeted questions"
144 |     ),
145 |     ScrapeWebsiteTool(),
146 | ]
147 | 
148 | system_message = SystemMessage(
149 |     content="""You are a world class researcher, who can do detailed research on any topic and produce facts based results; 
150 |             you do not make things up, you will try as hard as possible to gather facts & data to back up the research
151 |             
152 |             Please make sure you complete the objective above with the following rules:
153 |             1/ You should do enough research to gather as much information as possible about the objective
154 |             2/ If there are url of relevant links & articles, you will scrape it to gather more information
155 |             3/ After scraping & search, you should think "is there any new things i should search & scraping based on the data I collected to increase research quality?" If answer is yes, continue; But don't do this more than 3 iteratins
156 |             4/ You should not make things up, you should only write facts & data that you have gathered
157 |             5/ In the final output, You should include all reference data & links to back up your research; You should include all reference data & links to back up your research
158 |             6/ In the final output, You should include all reference data & links to back up your research; You should include all reference data & links to back up your research"""
159 | )
160 | 
161 | agent_kwargs = {
162 |     "extra_prompt_messages": [MessagesPlaceholder(variable_name="memory")],
163 |     "system_message": system_message,
164 | }
165 | 
166 | llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k-0613")
167 | memory = ConversationSummaryBufferMemory(
168 |     memory_key="memory", return_messages=True, llm=llm, max_token_limit=1000)
169 | 
170 | agent = initialize_agent(
171 |     tools,
172 |     llm,
173 |     agent=AgentType.OPENAI_FUNCTIONS,
174 |     verbose=True,
175 |     agent_kwargs=agent_kwargs,
176 |     memory=memory,
177 | )
178 | 
179 | # 4. Use streamlit to create a web app
180 | def main():
181 |     st.set_page_config(page_title="AI research agent", page_icon=":bird:")
182 | 
183 |     st.header("AI research agent :bird:")
184 |     query = st.text_input("Research goal")
185 | 
186 |     if query:
187 |         st.write("Doing research for ", query)
188 | 
189 |         result = agent({"input": query})
190 | 
191 |         st.info(result['output'])
192 | 
193 | 
194 | if __name__ == '__main__':
195 |     main()
196 | 
197 | 
198 | # 5. Set this as an API endpoint via FastAPI
199 | app = FastAPI()
200 | 
201 | 
202 | class Query(BaseModel):
203 |     query: str
204 | 
205 | 
206 | @app.post("/")
207 | def researchAgent(query: Query):
208 |     query = query.query
209 |     content = agent({"input": query})
210 |     actual_content = content['output']
211 |     return actual_content


--------------------------------------------------------------------------------