├── .gitignore
├── .vscode
    └── settings.json
├── Home.py
├── README.md
├── main.py
├── movies.sqlite
├── notebook.ipynb
├── pages
    ├── 01_DocumentGPT.py
    ├── 02_PrivateGPT.py
    ├── 03_QuizGPT.py
    ├── 04_SiteGPT.py
    ├── 05_MeetingGPT.py
    └── 06_InvestorGPT.py
├── recipes.csv
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | env/
2 | .env
3 | files/
4 | .cache/
5 | .streamlit/
6 | .DS_Store
7 | falcon.bin
8 | __pycache__


--------------------------------------------------------------------------------
/.vscode/settings.json:
--------------------------------------------------------------------------------
1 | {
2 |   "python.autoComplete.extraPaths": ["./env/bin/python3"],
3 |   "python.analysis.extraPaths": ["./env/bin/python3"],
4 |   "python.analysis.autoImportCompletions": true
5 | }
6 | 


--------------------------------------------------------------------------------
/Home.py:
--------------------------------------------------------------------------------
 1 | import streamlit as st
 2 | 
 3 | st.set_page_config(
 4 |     page_title="FullstackGPT Home",
 5 |     page_icon="🤖",
 6 | )
 7 | 
 8 | st.markdown(
 9 |     """
10 | # Hello!
11 |             
12 | Welcome to my FullstackGPT Portfolio!
13 |             
14 | Here are the apps I made:
15 |             
16 | - [x] [📃 DocumentGPT](/DocumentGPT)
17 | - [x] [🔒 PrivateGPT](/PrivateGPT)
18 | - [x] [❓ QuizGPT](/QuizGPT)
19 | - [x] [🖥️ SiteGPT](/SiteGPT)
20 | - [x] [💼 MeetingGPT](/MeetingGPT)
21 | - [x] [📈 InvestorGPT](/InvestorGPT)
22 | """
23 | )
24 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Fullstack GPT
 2 | 
 3 | 랭체인으로 AI 웹 서비스 7개 만들기
 4 | 
 5 | ## 무엇을 배우나요?
 6 | 
 7 | GPT-4, Langchain 을 활용하여 AI 웹 서비스를 구축하는 방법을 A 부터 Z 까지 배웁니다.
 8 | 
 9 | -   Langchain, Language Models 에 대한 기본 이해
10 | -   자체 데이터에 GPT-4를 사용하는 방법
11 | -   커스텀 자율 에이전트(Autonomous Agent)를 만드는 방법…등 다수!
12 | 
13 | 이제 AI를 활용하고 제대로 다루는 것은 개발자의 덕목 중 하나라고 해도 과언이 아닙니다. Fullstack GPT 강의를 통해 생산성은 물론 개발자로서의 스펙트럼을 넓혀 보세요.
14 | 
15 | ## 어떻게 배우나요?
16 | 
17 | 지금 당장 활용 할 수 있는 실전형 AI 웹서비스 7개를 직접 구현하며 배웁니다.
18 | 
19 | - AI 웹 서비스 (6종) : DocumentGPT, PrivateGPT, QuizGPT, SiteGPT, MeetingGPT, InvestorGPT
20 | - ChatGPT 플러그인 (1종) : ChefGPT
21 | - 활용하는 패키지 : Langchain, GPT-4, Whisper, FastAPI, Streamlit, Pinecone, Hugging Face… and more!
22 | 
23 | 직접 구현하면서 배우는 것 만큼 빠르고 효과적인 학습방법은 없습니다. 실전 경험 그리고 포트폴리오까지 얻어가세요!
24 | 
25 | ### DocumentGPT
26 | 
27 | 법률. 의학 등 어려운 용어로 가득한 각종 문서. AI로 빠르게 파악하고 싶다면?
28 | 
29 | AI로 신속하고 정확하게 문서 내용을 파악하고 정리한 뒤, 필요한 부분만 쏙쏙 골라내어 사용하세요. DocumentGPT 챗봇을 사용하면, AI가 문서(.txt, .pdf, .docx 등)를 꼼꼼하게 읽고, 해당 문서에 관한 질문에 척척 답변해 줍니다.
30 | 
31 | ### PrivateGPT
32 | 
33 | 회사 기밀이 유출될까 걱정된다면? 이제 나만이 볼 수 있는 비공개 GPT를 만들어 활용하세요!
34 | 
35 | DocumentGPT와 비슷하지만 로컬 언어 모델을 사용해 비공개 데이터를 다루기에 적합한 챗봇입니다. 데이터는 컴퓨터에 보관되므로 오프라인에서도 사용할 수 있습니다. 유출 걱정 없이 필요한 데이터를 PrivateGPT에 맡기고 업무 생산성을 높일 수 있어요.
36 | 
37 | ### QuizGPT
38 | 
39 | 암기해야 할 내용을 효율적으로 학습하고 싶다면?
40 | 
41 | 문서나 위키피디아 등 학습이 필요한 컨텐츠를 AI에게 학습시키면, 이를 기반으로 퀴즈를 생성해 주는 앱입니다. 번거로운 과정을 최소화하고 학습 효율을 극대화할 수 있어, 특히 시험이나 단기간 고효율 학습이 필요할 때 매우 유용하게 사용할 수 있어요.
42 | 
43 | ### SiteGPT
44 | 
45 | 자주 묻는 질문 때문에 CS 직원을 채용...? SiteGPT로 비용을 2배 절감해 봅시다.
46 | 
47 | 웹사이트를 스크랩하여 콘텐츠를 수집하고, 해당 출처를 인용하여 관련 질문에 답변하는 챗봇입니다. 고객 응대의 대부분을 차지하는 단순 정보 안내에 들이는 시간을 획기적으로 줄일 수 있고, 고객 또한 CS직원의 근무 시간에 구애받지 않고 정확한 정보를 빠르게 전달받을 수 있습니다.
48 | 
49 | ### MeetingGPT
50 | 
51 | 이제 회의록 정리는 MeetingGPT에게 맡기세요!
52 | 
53 | 회의 영상 내용을 토대로 오디오 추출, 콘텐츠를 수집하여 회의록을 요약 및 작성해 주는 앱입니다. 회의 내용을 기록하느라 회의에 제대로 참석하지 못하는 일을 방지할 수 있고, 관련 질의응답도 가능해 단순한 기록보다 훨씬 더 효율적으로 회의록을 관리하고 활용할 수 있습니다.
54 | 
55 | ### InvestorGPT
56 | 
57 | AI가 자료 조사도 알아서 척척 해 줍니다.
58 | 
59 | 인터넷을 검색하고 타사 API를 사용할 수 있는 자율 에이전트입니다. 회사, 주가 및 재무제표를 조사하여 재무에 대한 인사이트를 제공할 수 있습니다. 또한 알아서 데이터베이스를 수집하기 때문에 직접 SQL 쿼리를 작성할 필요가 없고, 해당 내용에 대한 질의응답도 얼마든지 가능합니다.
60 | 
61 | ### ChefGPT
62 | 
63 | 요즘 핫한 ChatGPT 플러그인? 직접 구현해 봐요!
64 | 
65 | 유저가 ChatGPT 플러그인 스토어에서 설치할 수 있는 ChatGPT 플러그인입니다. 이 플러그인을 통해 유저는 ChatGPT 인터페이스에서 바로 레시피를 검색하고 조리법을 얻을 수 있습니다. 또한 ChatGPT 플러그인에서 OAuth 인증을 구현하는 방법에 대해서도 배웁니다.


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
 1 | from typing import Any, Dict
 2 | from fastapi import Body, FastAPI, Form, Request
 3 | from fastapi.responses import HTMLResponse
 4 | from pydantic import BaseModel, Field
 5 | from dotenv import load_dotenv
 6 | import pinecone
 7 | import os
 8 | from langchain.embeddings import OpenAIEmbeddings
 9 | from langchain.vectorstores import Pinecone
10 | 
11 | load_dotenv()
12 | 
13 | pinecone.init(
14 |     api_key=os.getenv("PINECONE_API_KEY"),
15 |     environment="gcp-starter",
16 | )
17 | 
18 | embeddings = OpenAIEmbeddings()
19 | 
20 | vector_store = Pinecone.from_existing_index(
21 |     "recipes",
22 |     embeddings,
23 | )
24 | 
25 | 
26 | app = FastAPI(
27 |     title="CheftGPT. The best provider of Indian Recipes in the world.",
28 |     description="Give ChefGPT the name of an ingredient and it will give you multiple recipes to use that ingredient on in return.",
29 |     servers=[
30 |         {
31 |             "url": "https://occupations-partition-governments-analyzed.trycloudflare.com",
32 |         },
33 |     ],
34 | )
35 | 
36 | 
37 | class Document(BaseModel):
38 |     page_content: str
39 | 
40 | 
41 | @app.get(
42 |     "/recipes",
43 |     summary="Returns a list of recipes.",
44 |     description="Upon receiving an ingredient, this endpoint will return a list of recipes that contain that ingredient.",
45 |     response_description="A Document object that contains the recipe and preparation instructions",
46 |     response_model=list[Document],
47 |     openapi_extra={
48 |         "x-openai-isConsequential": False,
49 |     },
50 | )
51 | def get_recipe(ingredient: str):
52 |     docs = vector_store.similarity_search(ingredient)
53 |     return docs
54 | 
55 | 
56 | user_token_db = {"ABCDEF": "nico"}
57 | 
58 | 
59 | @app.get(
60 |     "/authorize",
61 |     response_class=HTMLResponse,
62 |     include_in_schema=False,
63 | )
64 | def handle_authorize(client_id: str, redirect_uri: str, state: str):
65 |     return f"""
66 |     <html>
67 |         <head>
68 |             <title>Nicolacus Maximus Log In</title>
69 |         </head>
70 |         <body>
71 |             <h1>Log Into Nicolacus Maximus</h1>
72 |             <a href="{redirect_uri}?code=ABCDEF&state={state}">Authorize Nicolacus Maximus GPT</a>
73 |         </body>
74 |     </html>
75 |     """
76 | 
77 | 
78 | @app.post(
79 |     "/token",
80 |     include_in_schema=False,
81 | )
82 | def handle_token(code=Form(...)):
83 |     return {
84 |         "access_token": user_token_db[code],
85 |     }
86 | 


--------------------------------------------------------------------------------
/movies.sqlite:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nomadcoders/fullstack-gpt/a9ceb1c43dde8ed28fd1cf157773a3665a8aee9e/movies.sqlite


--------------------------------------------------------------------------------
/notebook.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "code",
 5 |    "execution_count": 3,
 6 |    "metadata": {},
 7 |    "outputs": [
 8 |     {
 9 |      "ename": "ValidationError",
10 |      "evalue": "1 validation error for JsonOutputKeyToolsParser\nkey_name\n  field required (type=value_error.missing)",
11 |      "output_type": "error",
12 |      "traceback": [
13 |       "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
14 |       "\u001b[0;31mValidationError\u001b[0m                           Traceback (most recent call last)",
15 |       "\u001b[1;32m/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb Cell 1\u001b[0m line \u001b[0;36m2\n\u001b[1;32m      <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=4'>5</a>\u001b[0m prompt \u001b[39m=\u001b[39m ChatPromptTemplate\u001b[39m.\u001b[39mfrom_messages(\n\u001b[1;32m      <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=5'>6</a>\u001b[0m     [\n\u001b[1;32m      <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=6'>7</a>\u001b[0m         (\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=14'>15</a>\u001b[0m     ]\n\u001b[1;32m     <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=15'>16</a>\u001b[0m )\n\u001b[1;32m     <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=17'>18</a>\u001b[0m chat \u001b[39m=\u001b[39m ChatOpenAI(\n\u001b[1;32m     <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=18'>19</a>\u001b[0m     temperature\u001b[39m=\u001b[39m\u001b[39m0.1\u001b[39m,\n\u001b[1;32m     <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=19'>20</a>\u001b[0m )\n\u001b[0;32m---> <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=21'>22</a>\u001b[0m chain \u001b[39m=\u001b[39m prompt \u001b[39m|\u001b[39m chat \u001b[39m|\u001b[39m JsonOutputKeyToolsParser()\n\u001b[1;32m     <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=23'>24</a>\u001b[0m chain\u001b[39m.\u001b[39minvoke(\n\u001b[1;32m     <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=24'>25</a>\u001b[0m     {\n\u001b[1;32m     <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=25'>26</a>\u001b[0m         \u001b[39m\"\u001b[39m\u001b[39msource_language\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39mEnglish\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=28'>29</a>\u001b[0m     }\n\u001b[1;32m     <a href='vscode-notebook-cell:/Users/nomadcoders/Documents/fullstack-gpt/notebook.ipynb#W0sZmlsZQ%3D%3D?line=29'>30</a>\u001b[0m )\n",
16 |       "File \u001b[0;32m~/Documents/fullstack-gpt/env/lib/python3.11/site-packages/langchain/load/serializable.py:97\u001b[0m, in \u001b[0;36mSerializable.__init__\u001b[0;34m(self, **kwargs)\u001b[0m\n\u001b[1;32m     96\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m__init__\u001b[39m(\u001b[39mself\u001b[39m, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs: Any) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m \u001b[39mNone\u001b[39;00m:\n\u001b[0;32m---> 97\u001b[0m     \u001b[39msuper\u001b[39;49m()\u001b[39m.\u001b[39;49m\u001b[39m__init__\u001b[39;49m(\u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m     98\u001b[0m     \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_lc_kwargs \u001b[39m=\u001b[39m kwargs\n",
17 |       "File \u001b[0;32m~/Documents/fullstack-gpt/env/lib/python3.11/site-packages/pydantic/main.py:341\u001b[0m, in \u001b[0;36mpydantic.main.BaseModel.__init__\u001b[0;34m()\u001b[0m\n",
18 |       "\u001b[0;31mValidationError\u001b[0m: 1 validation error for JsonOutputKeyToolsParser\nkey_name\n  field required (type=value_error.missing)"
19 |      ]
20 |     }
21 |    ],
22 |    "source": [
23 |     "from langchain.prompts import ChatPromptTemplate\n",
24 |     "from langchain.chat_models import ChatAnthropic\n",
25 |     "from langchain.output_parsers import JsonOutputKeyToolsParser\n",
26 |     "\n",
27 |     "prompt = ChatPromptTemplate.from_messages(\n",
28 |     "    [\n",
29 |     "        (\n",
30 |     "            \"system\",\n",
31 |     "            \"You are a translator bot. Translate sentences from {source_language} to {target_language}.\",\n",
32 |     "        ),\n",
33 |     "        (\n",
34 |     "            \"human\",\n",
35 |     "            \"Translate this: {sentence}\",\n",
36 |     "        ),\n",
37 |     "    ]\n",
38 |     ")\n",
39 |     "\n",
40 |     "chat = ChatAnthropic(temperature=0.1, model_name=\"\")\n",
41 |     "\n",
42 |     "chain = prompt | chat | JsonOutputKeyToolsParser()\n",
43 |     "\n",
44 |     "chain.invoke(\n",
45 |     "    {\n",
46 |     "        \"source_language\": \"English\",\n",
47 |     "        \"target_language\": \"Italian\",\n",
48 |     "        \"sentence\": \"I love you, Langchain!\",\n",
49 |     "    }\n",
50 |     ")"
51 |    ]
52 |   }
53 |  ],
54 |  "metadata": {
55 |   "kernelspec": {
56 |    "display_name": "env",
57 |    "language": "python",
58 |    "name": "python3"
59 |   },
60 |   "language_info": {
61 |    "codemirror_mode": {
62 |     "name": "ipython",
63 |     "version": 3
64 |    },
65 |    "file_extension": ".py",
66 |    "mimetype": "text/x-python",
67 |    "name": "python",
68 |    "nbconvert_exporter": "python",
69 |    "pygments_lexer": "ipython3",
70 |    "version": "3.11.6"
71 |   }
72 |  },
73 |  "nbformat": 4,
74 |  "nbformat_minor": 2
75 | }
76 | 


--------------------------------------------------------------------------------
/pages/01_DocumentGPT.py:
--------------------------------------------------------------------------------
  1 | from langchain.prompts import ChatPromptTemplate
  2 | from langchain.document_loaders import UnstructuredFileLoader
  3 | from langchain.embeddings import CacheBackedEmbeddings, OpenAIEmbeddings
  4 | from langchain.schema.runnable import RunnableLambda, RunnablePassthrough
  5 | from langchain.storage import LocalFileStore
  6 | from langchain.text_splitter import CharacterTextSplitter
  7 | from langchain.vectorstores.faiss import FAISS
  8 | from langchain.chat_models import ChatOpenAI
  9 | from langchain.callbacks.base import BaseCallbackHandler
 10 | import streamlit as st
 11 | 
 12 | st.set_page_config(
 13 |     page_title="DocumentGPT",
 14 |     page_icon="📃",
 15 | )
 16 | 
 17 | 
 18 | class ChatCallbackHandler(BaseCallbackHandler):
 19 |     message = ""
 20 | 
 21 |     def on_llm_start(self, *args, **kwargs):
 22 |         self.message_box = st.empty()
 23 | 
 24 |     def on_llm_end(self, *args, **kwargs):
 25 |         save_message(self.message, "ai")
 26 | 
 27 |     def on_llm_new_token(self, token, *args, **kwargs):
 28 |         self.message += token
 29 |         self.message_box.markdown(self.message)
 30 | 
 31 | 
 32 | llm = ChatOpenAI(
 33 |     temperature=0.1,
 34 |     streaming=True,
 35 |     callbacks=[
 36 |         ChatCallbackHandler(),
 37 |     ],
 38 | )
 39 | 
 40 | 
 41 | @st.cache_data(show_spinner="Embedding file...")
 42 | def embed_file(file):
 43 |     file_content = file.read()
 44 |     file_path = f"./.cache/files/{file.name}"
 45 |     with open(file_path, "wb") as f:
 46 |         f.write(file_content)
 47 |     cache_dir = LocalFileStore(f"./.cache/embeddings/{file.name}")
 48 |     splitter = CharacterTextSplitter.from_tiktoken_encoder(
 49 |         separator="\n",
 50 |         chunk_size=600,
 51 |         chunk_overlap=100,
 52 |     )
 53 |     loader = UnstructuredFileLoader(file_path)
 54 |     docs = loader.load_and_split(text_splitter=splitter)
 55 |     embeddings = OpenAIEmbeddings()
 56 |     cached_embeddings = CacheBackedEmbeddings.from_bytes_store(embeddings, cache_dir)
 57 |     vectorstore = FAISS.from_documents(docs, cached_embeddings)
 58 |     retriever = vectorstore.as_retriever()
 59 |     return retriever
 60 | 
 61 | 
 62 | def save_message(message, role):
 63 |     st.session_state["messages"].append({"message": message, "role": role})
 64 | 
 65 | 
 66 | def send_message(message, role, save=True):
 67 |     with st.chat_message(role):
 68 |         st.markdown(message)
 69 |     if save:
 70 |         save_message(message, role)
 71 | 
 72 | 
 73 | def paint_history():
 74 |     for message in st.session_state["messages"]:
 75 |         send_message(
 76 |             message["message"],
 77 |             message["role"],
 78 |             save=False,
 79 |         )
 80 | 
 81 | 
 82 | def format_docs(docs):
 83 |     return "\n\n".join(document.page_content for document in docs)
 84 | 
 85 | 
 86 | prompt = ChatPromptTemplate.from_messages(
 87 |     [
 88 |         (
 89 |             "system",
 90 |             """
 91 |             Answer the question using ONLY the following context. If you don't know the answer just say you don't know. DON'T make anything up.
 92 |             
 93 |             Context: {context}
 94 |             """,
 95 |         ),
 96 |         ("human", "{question}"),
 97 |     ]
 98 | )
 99 | 
100 | 
101 | st.title("DocumentGPT")
102 | 
103 | st.markdown(
104 |     """
105 | Welcome!
106 |             
107 | Use this chatbot to ask questions to an AI about your files!
108 | 
109 | Upload your files on the sidebar.
110 | """
111 | )
112 | 
113 | with st.sidebar:
114 |     file = st.file_uploader(
115 |         "Upload a .txt .pdf or .docx file",
116 |         type=["pdf", "txt", "docx"],
117 |     )
118 | 
119 | if file:
120 |     retriever = embed_file(file)
121 |     send_message("I'm ready! Ask away!", "ai", save=False)
122 |     paint_history()
123 |     message = st.chat_input("Ask anything about your file...")
124 |     if message:
125 |         send_message(message, "human")
126 |         chain = (
127 |             {
128 |                 "context": retriever | RunnableLambda(format_docs),
129 |                 "question": RunnablePassthrough(),
130 |             }
131 |             | prompt
132 |             | llm
133 |         )
134 |         with st.chat_message("ai"):
135 |             chain.invoke(message)
136 | 
137 | 
138 | else:
139 |     st.session_state["messages"] = []
140 | 


--------------------------------------------------------------------------------
/pages/02_PrivateGPT.py:
--------------------------------------------------------------------------------
  1 | from langchain.prompts import ChatPromptTemplate
  2 | from langchain.document_loaders import UnstructuredFileLoader
  3 | from langchain.embeddings import CacheBackedEmbeddings, OllamaEmbeddings
  4 | from langchain.schema.runnable import RunnableLambda, RunnablePassthrough
  5 | from langchain.storage import LocalFileStore
  6 | from langchain.text_splitter import CharacterTextSplitter
  7 | from langchain.vectorstores.faiss import FAISS
  8 | from langchain.chat_models import ChatOllama
  9 | from langchain.callbacks.base import BaseCallbackHandler
 10 | import streamlit as st
 11 | 
 12 | st.set_page_config(
 13 |     page_title="PrivateGPT",
 14 |     page_icon="🔒",
 15 | )
 16 | 
 17 | 
 18 | class ChatCallbackHandler(BaseCallbackHandler):
 19 |     message = ""
 20 | 
 21 |     def on_llm_start(self, *args, **kwargs):
 22 |         self.message_box = st.empty()
 23 | 
 24 |     def on_llm_end(self, *args, **kwargs):
 25 |         save_message(self.message, "ai")
 26 | 
 27 |     def on_llm_new_token(self, token, *args, **kwargs):
 28 |         self.message += token
 29 |         self.message_box.markdown(self.message)
 30 | 
 31 | 
 32 | llm = ChatOllama(
 33 |     model="mistral:latest",
 34 |     temperature=0.1,
 35 |     streaming=True,
 36 |     callbacks=[
 37 |         ChatCallbackHandler(),
 38 |     ],
 39 | )
 40 | 
 41 | 
 42 | @st.cache_data(show_spinner="Embedding file...")
 43 | def embed_file(file):
 44 |     file_content = file.read()
 45 |     file_path = f"./.cache/private_files/{file.name}"
 46 |     with open(file_path, "wb") as f:
 47 |         f.write(file_content)
 48 |     cache_dir = LocalFileStore(f"./.cache/private_embeddings/{file.name}")
 49 |     splitter = CharacterTextSplitter.from_tiktoken_encoder(
 50 |         separator="\n",
 51 |         chunk_size=600,
 52 |         chunk_overlap=100,
 53 |     )
 54 |     loader = UnstructuredFileLoader(file_path)
 55 |     docs = loader.load_and_split(text_splitter=splitter)
 56 |     embeddings = OllamaEmbeddings(model="mistral:latest")
 57 |     cached_embeddings = CacheBackedEmbeddings.from_bytes_store(embeddings, cache_dir)
 58 |     vectorstore = FAISS.from_documents(docs, cached_embeddings)
 59 |     retriever = vectorstore.as_retriever()
 60 |     return retriever
 61 | 
 62 | 
 63 | def save_message(message, role):
 64 |     st.session_state["messages"].append({"message": message, "role": role})
 65 | 
 66 | 
 67 | def send_message(message, role, save=True):
 68 |     with st.chat_message(role):
 69 |         st.markdown(message)
 70 |     if save:
 71 |         save_message(message, role)
 72 | 
 73 | 
 74 | def paint_history():
 75 |     for message in st.session_state["messages"]:
 76 |         send_message(
 77 |             message["message"],
 78 |             message["role"],
 79 |             save=False,
 80 |         )
 81 | 
 82 | 
 83 | def format_docs(docs):
 84 |     return "\n\n".join(document.page_content for document in docs)
 85 | 
 86 | 
 87 | prompt = ChatPromptTemplate.from_template(
 88 |     """Answer the question using ONLY the following context and not your training data. If you don't know the answer just say you don't know. DON'T make anything up.
 89 |     
 90 |     Context: {context}
 91 |     Question:{question}
 92 |     """
 93 | )
 94 | 
 95 | 
 96 | st.title("PrivateGPT")
 97 | 
 98 | st.markdown(
 99 |     """
100 | Welcome!
101 |             
102 | Use this chatbot to ask questions to an AI about your files!
103 | 
104 | Upload your files on the sidebar.
105 | """
106 | )
107 | 
108 | with st.sidebar:
109 |     file = st.file_uploader(
110 |         "Upload a .txt .pdf or .docx file",
111 |         type=["pdf", "txt", "docx"],
112 |     )
113 | 
114 | if file:
115 |     retriever = embed_file(file)
116 |     send_message("I'm ready! Ask away!", "ai", save=False)
117 |     paint_history()
118 |     message = st.chat_input("Ask anything about your file...")
119 |     if message:
120 |         send_message(message, "human")
121 |         chain = (
122 |             {
123 |                 "context": retriever | RunnableLambda(format_docs),
124 |                 "question": RunnablePassthrough(),
125 |             }
126 |             | prompt
127 |             | llm
128 |         )
129 |         with st.chat_message("ai"):
130 |             chain.invoke(message)
131 | 
132 | 
133 | else:
134 |     st.session_state["messages"] = []
135 | 


--------------------------------------------------------------------------------
/pages/03_QuizGPT.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | from langchain.document_loaders import UnstructuredFileLoader
  3 | from langchain.text_splitter import CharacterTextSplitter
  4 | from langchain.chat_models import ChatOpenAI
  5 | from langchain.prompts import ChatPromptTemplate
  6 | from langchain.callbacks import StreamingStdOutCallbackHandler
  7 | import streamlit as st
  8 | from langchain.retrievers import WikipediaRetriever
  9 | from langchain.schema import BaseOutputParser, output_parser
 10 | 
 11 | 
 12 | class JsonOutputParser(BaseOutputParser):
 13 |     def parse(self, text):
 14 |         text = text.replace("```", "").replace("json", "")
 15 |         return json.loads(text)
 16 | 
 17 | 
 18 | output_parser = JsonOutputParser()
 19 | 
 20 | st.set_page_config(
 21 |     page_title="QuizGPT",
 22 |     page_icon="❓",
 23 | )
 24 | 
 25 | st.title("QuizGPT")
 26 | 
 27 | llm = ChatOpenAI(
 28 |     temperature=0.1,
 29 |     model="gpt-3.5-turbo-1106",
 30 |     streaming=True,
 31 |     callbacks=[StreamingStdOutCallbackHandler()],
 32 | )
 33 | 
 34 | 
 35 | def format_docs(docs):
 36 |     return "\n\n".join(document.page_content for document in docs)
 37 | 
 38 | 
 39 | questions_prompt = ChatPromptTemplate.from_messages(
 40 |     [
 41 |         (
 42 |             "system",
 43 |             """
 44 |     You are a helpful assistant that is role playing as a teacher.
 45 |          
 46 |     Based ONLY on the following context make 10 (TEN) questions to test the user's knowledge about the text.
 47 |     
 48 |     Each question should have 4 answers, three of them must be incorrect and one should be correct.
 49 |          
 50 |     Use (o) to signal the correct answer.
 51 |          
 52 |     Question examples:
 53 |          
 54 |     Question: What is the color of the ocean?
 55 |     Answers: Red|Yellow|Green|Blue(o)
 56 |          
 57 |     Question: What is the capital or Georgia?
 58 |     Answers: Baku|Tbilisi(o)|Manila|Beirut
 59 |          
 60 |     Question: When was Avatar released?
 61 |     Answers: 2007|2001|2009(o)|1998
 62 |          
 63 |     Question: Who was Julius Caesar?
 64 |     Answers: A Roman Emperor(o)|Painter|Actor|Model
 65 |          
 66 |     Your turn!
 67 |          
 68 |     Context: {context}
 69 | """,
 70 |         )
 71 |     ]
 72 | )
 73 | 
 74 | questions_chain = {"context": format_docs} | questions_prompt | llm
 75 | 
 76 | formatting_prompt = ChatPromptTemplate.from_messages(
 77 |     [
 78 |         (
 79 |             "system",
 80 |             """
 81 |     You are a powerful formatting algorithm.
 82 |      
 83 |     You format exam questions into JSON format.
 84 |     Answers with (o) are the correct ones.
 85 |      
 86 |     Example Input:
 87 | 
 88 |     Question: What is the color of the ocean?
 89 |     Answers: Red|Yellow|Green|Blue(o)
 90 |          
 91 |     Question: What is the capital or Georgia?
 92 |     Answers: Baku|Tbilisi(o)|Manila|Beirut
 93 |          
 94 |     Question: When was Avatar released?
 95 |     Answers: 2007|2001|2009(o)|1998
 96 |          
 97 |     Question: Who was Julius Caesar?
 98 |     Answers: A Roman Emperor(o)|Painter|Actor|Model
 99 |     
100 |      
101 |     Example Output:
102 |      
103 |     ```json
104 |     {{ "questions": [
105 |             {{
106 |                 "question": "What is the color of the ocean?",
107 |                 "answers": [
108 |                         {{
109 |                             "answer": "Red",
110 |                             "correct": false
111 |                         }},
112 |                         {{
113 |                             "answer": "Yellow",
114 |                             "correct": false
115 |                         }},
116 |                         {{
117 |                             "answer": "Green",
118 |                             "correct": false
119 |                         }},
120 |                         {{
121 |                             "answer": "Blue",
122 |                             "correct": true
123 |                         }}
124 |                 ]
125 |             }},
126 |                         {{
127 |                 "question": "What is the capital or Georgia?",
128 |                 "answers": [
129 |                         {{
130 |                             "answer": "Baku",
131 |                             "correct": false
132 |                         }},
133 |                         {{
134 |                             "answer": "Tbilisi",
135 |                             "correct": true
136 |                         }},
137 |                         {{
138 |                             "answer": "Manila",
139 |                             "correct": false
140 |                         }},
141 |                         {{
142 |                             "answer": "Beirut",
143 |                             "correct": false
144 |                         }}
145 |                 ]
146 |             }},
147 |                         {{
148 |                 "question": "When was Avatar released?",
149 |                 "answers": [
150 |                         {{
151 |                             "answer": "2007",
152 |                             "correct": false
153 |                         }},
154 |                         {{
155 |                             "answer": "2001",
156 |                             "correct": false
157 |                         }},
158 |                         {{
159 |                             "answer": "2009",
160 |                             "correct": true
161 |                         }},
162 |                         {{
163 |                             "answer": "1998",
164 |                             "correct": false
165 |                         }}
166 |                 ]
167 |             }},
168 |             {{
169 |                 "question": "Who was Julius Caesar?",
170 |                 "answers": [
171 |                         {{
172 |                             "answer": "A Roman Emperor",
173 |                             "correct": true
174 |                         }},
175 |                         {{
176 |                             "answer": "Painter",
177 |                             "correct": false
178 |                         }},
179 |                         {{
180 |                             "answer": "Actor",
181 |                             "correct": false
182 |                         }},
183 |                         {{
184 |                             "answer": "Model",
185 |                             "correct": false
186 |                         }}
187 |                 ]
188 |             }}
189 |         ]
190 |      }}
191 |     ```
192 |     Your turn!
193 | 
194 |     Questions: {context}
195 | 
196 | """,
197 |         )
198 |     ]
199 | )
200 | 
201 | formatting_chain = formatting_prompt | llm
202 | 
203 | 
204 | @st.cache_data(show_spinner="Loading file...")
205 | def split_file(file):
206 |     file_content = file.read()
207 |     file_path = f"./.cache/quiz_files/{file.name}"
208 |     with open(file_path, "wb") as f:
209 |         f.write(file_content)
210 |     splitter = CharacterTextSplitter.from_tiktoken_encoder(
211 |         separator="\n",
212 |         chunk_size=600,
213 |         chunk_overlap=100,
214 |     )
215 |     loader = UnstructuredFileLoader(file_path)
216 |     docs = loader.load_and_split(text_splitter=splitter)
217 |     return docs
218 | 
219 | 
220 | @st.cache_data(show_spinner="Making quiz...")
221 | def run_quiz_chain(_docs, topic):
222 |     chain = {"context": questions_chain} | formatting_chain | output_parser
223 |     return chain.invoke(_docs)
224 | 
225 | 
226 | @st.cache_data(show_spinner="Searching Wikipedia...")
227 | def wiki_search(term):
228 |     retriever = WikipediaRetriever(top_k_results=5)
229 |     docs = retriever.get_relevant_documents(term)
230 |     return docs
231 | 
232 | 
233 | with st.sidebar:
234 |     docs = None
235 |     topic = None
236 |     choice = st.selectbox(
237 |         "Choose what you want to use.",
238 |         (
239 |             "File",
240 |             "Wikipedia Article",
241 |         ),
242 |     )
243 |     if choice == "File":
244 |         file = st.file_uploader(
245 |             "Upload a .docx , .txt or .pdf file",
246 |             type=["pdf", "txt", "docx"],
247 |         )
248 |         if file:
249 |             docs = split_file(file)
250 |     else:
251 |         topic = st.text_input("Search Wikipedia...")
252 |         if topic:
253 |             docs = wiki_search(topic)
254 | 
255 | 
256 | if not docs:
257 |     st.markdown(
258 |         """
259 |     Welcome to QuizGPT.
260 |                 
261 |     I will make a quiz from Wikipedia articles or files you upload to test your knowledge and help you study.
262 |                 
263 |     Get started by uploading a file or searching on Wikipedia in the sidebar.
264 |     """
265 |     )
266 | else:
267 |     response = run_quiz_chain(docs, topic if topic else file.name)
268 |     with st.form("questions_form"):
269 |         st.write(response)
270 |         for question in response["questions"]:
271 |             st.write(question["question"])
272 |             value = st.radio(
273 |                 "Select an option.",
274 |                 [answer["answer"] for answer in question["answers"]],
275 |                 index=None,
276 |             )
277 |             if {"answer": value, "correct": True} in question["answers"]:
278 |                 st.success("Correct!")
279 |             elif value is not None:
280 |                 st.error("Wrong!")
281 |         button = st.form_submit_button()
282 | 


--------------------------------------------------------------------------------
/pages/04_SiteGPT.py:
--------------------------------------------------------------------------------
  1 | from langchain.document_loaders import SitemapLoader
  2 | from langchain.schema.runnable import RunnableLambda, RunnablePassthrough
  3 | from langchain.text_splitter import RecursiveCharacterTextSplitter
  4 | from langchain.vectorstores.faiss import FAISS
  5 | from langchain.embeddings import OpenAIEmbeddings
  6 | from langchain.chat_models import ChatOpenAI
  7 | from langchain.prompts import ChatPromptTemplate
  8 | import streamlit as st
  9 | 
 10 | llm = ChatOpenAI(
 11 |     temperature=0.1,
 12 | )
 13 | 
 14 | answers_prompt = ChatPromptTemplate.from_template(
 15 |     """
 16 |     Using ONLY the following context answer the user's question. If you can't just say you don't know, don't make anything up.
 17 |                                                   
 18 |     Then, give a score to the answer between 0 and 5.
 19 | 
 20 |     If the answer answers the user question the score should be high, else it should be low.
 21 | 
 22 |     Make sure to always include the answer's score even if it's 0.
 23 | 
 24 |     Context: {context}
 25 |                                                   
 26 |     Examples:
 27 |                                                   
 28 |     Question: How far away is the moon?
 29 |     Answer: The moon is 384,400 km away.
 30 |     Score: 5
 31 |                                                   
 32 |     Question: How far away is the sun?
 33 |     Answer: I don't know
 34 |     Score: 0
 35 |                                                   
 36 |     Your turn!
 37 | 
 38 |     Question: {question}
 39 | """
 40 | )
 41 | 
 42 | 
 43 | def get_answers(inputs):
 44 |     docs = inputs["docs"]
 45 |     question = inputs["question"]
 46 |     answers_chain = answers_prompt | llm
 47 |     # answers = []
 48 |     # for doc in docs:
 49 |     #     result = answers_chain.invoke(
 50 |     #         {"question": question, "context": doc.page_content}
 51 |     #     )
 52 |     #     answers.append(result.content)
 53 |     return {
 54 |         "question": question,
 55 |         "answers": [
 56 |             {
 57 |                 "answer": answers_chain.invoke(
 58 |                     {"question": question, "context": doc.page_content}
 59 |                 ).content,
 60 |                 "source": doc.metadata["source"],
 61 |                 "date": doc.metadata["lastmod"],
 62 |             }
 63 |             for doc in docs
 64 |         ],
 65 |     }
 66 | 
 67 | 
 68 | choose_prompt = ChatPromptTemplate.from_messages(
 69 |     [
 70 |         (
 71 |             "system",
 72 |             """
 73 |             Use ONLY the following pre-existing answers to answer the user's question.
 74 | 
 75 |             Use the answers that have the highest score (more helpful) and favor the most recent ones.
 76 | 
 77 |             Cite sources and return the sources of the answers as they are, do not change them.
 78 | 
 79 |             Answers: {answers}
 80 |             """,
 81 |         ),
 82 |         ("human", "{question}"),
 83 |     ]
 84 | )
 85 | 
 86 | 
 87 | def choose_answer(inputs):
 88 |     answers = inputs["answers"]
 89 |     question = inputs["question"]
 90 |     choose_chain = choose_prompt | llm
 91 |     condensed = "\n\n".join(
 92 |         f"{answer['answer']}\nSource:{answer['source']}\nDate:{answer['date']}\n"
 93 |         for answer in answers
 94 |     )
 95 |     return choose_chain.invoke(
 96 |         {
 97 |             "question": question,
 98 |             "answers": condensed,
 99 |         }
100 |     )
101 | 
102 | 
103 | def parse_page(soup):
104 |     header = soup.find("header")
105 |     footer = soup.find("footer")
106 |     if header:
107 |         header.decompose()
108 |     if footer:
109 |         footer.decompose()
110 |     return (
111 |         str(soup.get_text())
112 |         .replace("\n", " ")
113 |         .replace("\xa0", " ")
114 |         .replace("CloseSearch Submit Blog", "")
115 |     )
116 | 
117 | 
118 | @st.cache_data(show_spinner="Loading website...")
119 | def load_website(url):
120 |     splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
121 |         chunk_size=1000,
122 |         chunk_overlap=200,
123 |     )
124 |     loader = SitemapLoader(
125 |         url,
126 |         parsing_function=parse_page,
127 |     )
128 |     loader.requests_per_second = 2
129 |     docs = loader.load_and_split(text_splitter=splitter)
130 |     vector_store = FAISS.from_documents(docs, OpenAIEmbeddings())
131 |     return vector_store.as_retriever()
132 | 
133 | 
134 | st.set_page_config(
135 |     page_title="SiteGPT",
136 |     page_icon="🖥️",
137 | )
138 | 
139 | 
140 | st.markdown(
141 |     """
142 |     # SiteGPT
143 |             
144 |     Ask questions about the content of a website.
145 |             
146 |     Start by writing the URL of the website on the sidebar.
147 | """
148 | )
149 | 
150 | 
151 | with st.sidebar:
152 |     url = st.text_input(
153 |         "Write down a URL",
154 |         placeholder="https://example.com",
155 |     )
156 | 
157 | 
158 | if url:
159 |     if ".xml" not in url:
160 |         with st.sidebar:
161 |             st.error("Please write down a Sitemap URL.")
162 |     else:
163 |         retriever = load_website(url)
164 |         query = st.text_input("Ask a question to the website.")
165 |         if query:
166 |             chain = (
167 |                 {
168 |                     "docs": retriever,
169 |                     "question": RunnablePassthrough(),
170 |                 }
171 |                 | RunnableLambda(get_answers)
172 |                 | RunnableLambda(choose_answer)
173 |             )
174 |             result = chain.invoke(query)
175 |             st.markdown(result.content.replace("$", "\$"))
176 | 


--------------------------------------------------------------------------------
/pages/05_MeetingGPT.py:
--------------------------------------------------------------------------------
  1 | from langchain.storage import LocalFileStore
  2 | import streamlit as st
  3 | import subprocess
  4 | import math
  5 | from pydub import AudioSegment
  6 | import glob
  7 | import openai
  8 | import os
  9 | from langchain.chat_models import ChatOpenAI
 10 | from langchain.prompts import ChatPromptTemplate
 11 | from langchain.document_loaders import TextLoader
 12 | from langchain.text_splitter import RecursiveCharacterTextSplitter
 13 | from langchain.schema import StrOutputParser
 14 | from langchain.vectorstores.faiss import FAISS
 15 | from langchain.embeddings import CacheBackedEmbeddings, OpenAIEmbeddings
 16 | 
 17 | llm = ChatOpenAI(
 18 |     temperature=0.1,
 19 | )
 20 | 
 21 | has_transcript = os.path.exists("./.cache/podcast.txt")
 22 | 
 23 | splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
 24 |     chunk_size=800,
 25 |     chunk_overlap=100,
 26 | )
 27 | 
 28 | 
 29 | @st.cache_data()
 30 | def embed_file(file_path):
 31 |     cache_dir = LocalFileStore(f"./.cache/embeddings/{file.name}")
 32 |     splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
 33 |         chunk_size=800,
 34 |         chunk_overlap=100,
 35 |     )
 36 |     loader = TextLoader(file_path)
 37 |     docs = loader.load_and_split(text_splitter=splitter)
 38 |     embeddings = OpenAIEmbeddings()
 39 |     cached_embeddings = CacheBackedEmbeddings.from_bytes_store(embeddings, cache_dir)
 40 |     vectorstore = FAISS.from_documents(docs, cached_embeddings)
 41 |     retriever = vectorstore.as_retriever()
 42 |     return retriever
 43 | 
 44 | 
 45 | @st.cache_data()
 46 | def transcribe_chunks(chunk_folder, destination):
 47 |     if has_transcript:
 48 |         return
 49 |     files = glob.glob(f"{chunk_folder}/*.mp3")
 50 |     files.sort()
 51 |     for file in files:
 52 |         with open(file, "rb") as audio_file, open(destination, "a") as text_file:
 53 |             transcript = openai.Audio.transcribe(
 54 |                 "whisper-1",
 55 |                 audio_file,
 56 |             )
 57 |             text_file.write(transcript["text"])
 58 | 
 59 | 
 60 | @st.cache_data()
 61 | def extract_audio_from_video(video_path):
 62 |     if has_transcript:
 63 |         return
 64 |     audio_path = video_path.replace("mp4", "mp3")
 65 |     command = [
 66 |         "ffmpeg",
 67 |         "-y",
 68 |         "-i",
 69 |         video_path,
 70 |         "-vn",
 71 |         audio_path,
 72 |     ]
 73 |     subprocess.run(command)
 74 | 
 75 | 
 76 | @st.cache_data()
 77 | def cut_audio_in_chunks(audio_path, chunk_size, chunks_folder):
 78 |     if has_transcript:
 79 |         return
 80 |     track = AudioSegment.from_mp3(audio_path)
 81 |     chunk_len = chunk_size * 60 * 1000
 82 |     chunks = math.ceil(len(track) / chunk_len)
 83 |     for i in range(chunks):
 84 |         start_time = i * chunk_len
 85 |         end_time = (i + 1) * chunk_len
 86 |         chunk = track[start_time:end_time]
 87 |         chunk.export(
 88 |             f"./{chunks_folder}/chunk_{i}.mp3",
 89 |             format="mp3",
 90 |         )
 91 | 
 92 | 
 93 | st.set_page_config(
 94 |     page_title="MeetingGPT",
 95 |     page_icon="💼",
 96 | )
 97 | 
 98 | st.markdown(
 99 |     """
100 | # MeetingGPT
101 |             
102 | Welcome to MeetingGPT, upload a video and I will give you a transcript, a summary and a chat bot to ask any questions about it.
103 | 
104 | Get started by uploading a video file in the sidebar.
105 | """
106 | )
107 | 
108 | with st.sidebar:
109 |     video = st.file_uploader(
110 |         "Video",
111 |         type=["mp4", "avi", "mkv", "mov"],
112 |     )
113 | 
114 | if video:
115 |     chunks_folder = "./.cache/chunks"
116 |     with st.status("Loading video...") as status:
117 |         video_content = video.read()
118 |         video_path = f"./.cache/{video.name}"
119 |         audio_path = video_path.replace("mp4", "mp3")
120 |         transcript_path = video_path.replace("mp4", "txt")
121 |         with open(video_path, "wb") as f:
122 |             f.write(video_content)
123 |         status.update(label="Extracting audio...")
124 |         extract_audio_from_video(video_path)
125 |         status.update(label="Cutting audio segments...")
126 |         cut_audio_in_chunks(audio_path, 10, chunks_folder)
127 |         status.update(label="Transcribing audio...")
128 |         transcribe_chunks(chunks_folder, transcript_path)
129 | 
130 |     transcript_tab, summary_tab, qa_tab = st.tabs(
131 |         [
132 |             "Transcript",
133 |             "Summary",
134 |             "Q&A",
135 |         ]
136 |     )
137 | 
138 |     with transcript_tab:
139 |         with open(transcript_path, "r") as file:
140 |             st.write(file.read())
141 | 
142 |     with summary_tab:
143 |         start = st.button("Generate summary")
144 |         if start:
145 |             loader = TextLoader(transcript_path)
146 | 
147 |             docs = loader.load_and_split(text_splitter=splitter)
148 | 
149 |             first_summary_prompt = ChatPromptTemplate.from_template(
150 |                 """
151 |                 Write a concise summary of the following:
152 |                 "{text}"
153 |                 CONCISE SUMMARY:                
154 |             """
155 |             )
156 | 
157 |             first_summary_chain = first_summary_prompt | llm | StrOutputParser()
158 | 
159 |             summary = first_summary_chain.invoke(
160 |                 {"text": docs[0].page_content},
161 |             )
162 | 
163 |             refine_prompt = ChatPromptTemplate.from_template(
164 |                 """
165 |                 Your job is to produce a final summary.
166 |                 We have provided an existing summary up to a certain point: {existing_summary}
167 |                 We have the opportunity to refine the existing summary (only if needed) with some more context below.
168 |                 ------------
169 |                 {context}
170 |                 ------------
171 |                 Given the new context, refine the original summary.
172 |                 If the context isn't useful, RETURN the original summary.
173 |                 """
174 |             )
175 | 
176 |             refine_chain = refine_prompt | llm | StrOutputParser()
177 | 
178 |             with st.status("Summarizing...") as status:
179 |                 for i, doc in enumerate(docs[1:]):
180 |                     status.update(label=f"Processing document {i+1}/{len(docs)-1} ")
181 |                     summary = refine_chain.invoke(
182 |                         {
183 |                             "existing_summary": summary,
184 |                             "context": doc.page_content,
185 |                         }
186 |                     )
187 |                     st.write(summary)
188 |             st.write(summary)
189 | 
190 |     with qa_tab:
191 |         retriever = embed_file(transcript_path)
192 | 
193 |         docs = retriever.invoke("do they talk about marcus aurelius?")
194 | 
195 |         st.write(docs)
196 | 


--------------------------------------------------------------------------------
/pages/06_InvestorGPT.py:
--------------------------------------------------------------------------------
  1 | from langchain.schema import SystemMessage
  2 | import streamlit as st
  3 | import os
  4 | import requests
  5 | from typing import Type
  6 | from langchain.chat_models import ChatOpenAI
  7 | from langchain.tools import BaseTool
  8 | from pydantic import BaseModel, Field
  9 | from langchain.agents import initialize_agent, AgentType
 10 | from langchain.utilities import DuckDuckGoSearchAPIWrapper
 11 | 
 12 | llm = ChatOpenAI(temperature=0.1, model_name="gpt-3.5-turbo-1106")
 13 | 
 14 | alpha_vantage_api_key = os.environ.get("ALPHA_VANTAGE_API_KEY")
 15 | 
 16 | 
 17 | class StockMarketSymbolSearchToolArgsSchema(BaseModel):
 18 |     query: str = Field(
 19 |         description="The query you will search for.Example query: Stock Market Symbol for Apple Company"
 20 |     )
 21 | 
 22 | 
 23 | class StockMarketSymbolSearchTool(BaseTool):
 24 |     name = "StockMarketSymbolSearchTool"
 25 |     description = """
 26 |     Use this tool to find the stock market symbol for a company.
 27 |     It takes a query as an argument.
 28 |     
 29 |     """
 30 |     args_schema: Type[
 31 |         StockMarketSymbolSearchToolArgsSchema
 32 |     ] = StockMarketSymbolSearchToolArgsSchema
 33 | 
 34 |     def _run(self, query):
 35 |         ddg = DuckDuckGoSearchAPIWrapper()
 36 |         return ddg.run(query)
 37 | 
 38 | 
 39 | class CompanyOverviewArgsSchema(BaseModel):
 40 |     symbol: str = Field(
 41 |         description="Stock symbol of the company.Example: AAPL,TSLA",
 42 |     )
 43 | 
 44 | 
 45 | class CompanyOverviewTool(BaseTool):
 46 |     name = "CompanyOverview"
 47 |     description = """
 48 |     Use this to get an overview of the financials of the company.
 49 |     You should enter a stock symbol.
 50 |     """
 51 |     args_schema: Type[CompanyOverviewArgsSchema] = CompanyOverviewArgsSchema
 52 | 
 53 |     def _run(self, symbol):
 54 |         r = requests.get(
 55 |             f"https://www.alphavantage.co/query?function=OVERVIEW&symbol={symbol}&apikey={alpha_vantage_api_key}"
 56 |         )
 57 |         return r.json()
 58 | 
 59 | 
 60 | class CompanyIncomeStatementTool(BaseTool):
 61 |     name = "CompanyIncomeStatement"
 62 |     description = """
 63 |     Use this to get the income statement of a company.
 64 |     You should enter a stock symbol.
 65 |     """
 66 |     args_schema: Type[CompanyOverviewArgsSchema] = CompanyOverviewArgsSchema
 67 | 
 68 |     def _run(self, symbol):
 69 |         r = requests.get(
 70 |             f"https://www.alphavantage.co/query?function=INCOME_STATEMENT&symbol={symbol}&apikey={alpha_vantage_api_key}"
 71 |         )
 72 |         return r.json()["annualReports"]
 73 | 
 74 | 
 75 | class CompanyStockPerformanceTool(BaseTool):
 76 |     name = "CompanyStockPerformance"
 77 |     description = """
 78 |     Use this to get the weekly performance of a company stock.
 79 |     You should enter a stock symbol.
 80 |     """
 81 |     args_schema: Type[CompanyOverviewArgsSchema] = CompanyOverviewArgsSchema
 82 | 
 83 |     def _run(self, symbol):
 84 |         r = requests.get(
 85 |             f"https://www.alphavantage.co/query?function=TIME_SERIES_WEEKLY&symbol={symbol}&apikey={alpha_vantage_api_key}"
 86 |         )
 87 |         response = r.json()
 88 |         return list(response["Weekly Time Series"].items())[:200]
 89 | 
 90 | 
 91 | agent = initialize_agent(
 92 |     llm=llm,
 93 |     verbose=True,
 94 |     agent=AgentType.OPENAI_FUNCTIONS,
 95 |     handle_parsing_errors=True,
 96 |     tools=[
 97 |         CompanyIncomeStatementTool(),
 98 |         CompanyStockPerformanceTool(),
 99 |         StockMarketSymbolSearchTool(),
100 |         CompanyOverviewTool(),
101 |     ],
102 |     agent_kwargs={
103 |         "system_message": SystemMessage(
104 |             content="""
105 |             You are a hedge fund manager.
106 |             
107 |             You evaluate a company and provide your opinion and reasons why the stock is a buy or not.
108 |             
109 |             Consider the performance of a stock, the company overview and the income statement.
110 |             
111 |             Be assertive in your judgement and recommend the stock or advise the user against it.
112 |         """
113 |         )
114 |     },
115 | )
116 | 
117 | st.set_page_config(
118 |     page_title="InvestorGPT",
119 |     page_icon="💼",
120 | )
121 | 
122 | st.markdown(
123 |     """
124 |     # InvestorGPT
125 |             
126 |     Welcome to InvestorGPT.
127 |             
128 |     Write down the name of a company and our Agent will do the research for you.
129 | """
130 | )
131 | 
132 | company = st.text_input("Write the name of the company you are interested on.")
133 | 
134 | if company:
135 |     result = agent.invoke(company)
136 |     st.write(result["output"].replace("$", "\$"))
137 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
  1 | aiofiles==23.2.1
  2 | aiohttp==3.8.5
  3 | aiosignal==1.3.1
  4 | altair==5.1.1
  5 | annotated-types==0.5.0
  6 | antlr4-python3-runtime==4.9.3
  7 | anyio==3.7.1
  8 | appnope==0.1.3
  9 | asttokens==2.4.0
 10 | async-timeout==4.0.3
 11 | attrs==23.1.0
 12 | backcall==0.2.0
 13 | backoff==2.2.1
 14 | bcrypt==4.0.1
 15 | beautifulsoup4==4.12.2
 16 | black==23.9.1
 17 | blinker==1.6.2
 18 | blis==0.7.10
 19 | Brotli==1.1.0
 20 | cachetools==5.3.1
 21 | catalogue==2.0.9
 22 | certifi==2023.7.22
 23 | cffi==1.15.1
 24 | chardet==5.2.0
 25 | charset-normalizer==3.2.0
 26 | chroma-hnswlib==0.7.3
 27 | chromadb==0.4.11
 28 | click==8.1.7
 29 | coloredlogs==15.0.1
 30 | comm==0.1.4
 31 | confection==0.1.3
 32 | contourpy==1.1.1
 33 | cryptography==41.0.4
 34 | cycler==0.11.0
 35 | cymem==2.0.8
 36 | dataclasses-json==0.5.14
 37 | debugpy==1.8.0
 38 | decorator==5.1.1
 39 | dill==0.3.7
 40 | dnspython==2.4.2
 41 | duckduckgo-search==3.8.5
 42 | EbookLib==0.18
 43 | effdet==0.4.1
 44 | elastic-transport==8.4.0
 45 | elasticsearch==8.9.0
 46 | email-validator==2.0.0.post2
 47 | emoji==2.8.0
 48 | et-xmlfile==1.1.0
 49 | executing==1.2.0
 50 | faiss-cpu==1.7.4
 51 | fastapi==0.99.1
 52 | ffmpeg==1.4
 53 | ffmpeg-python==0.2.0
 54 | filelock==3.12.4
 55 | filetype==1.2.0
 56 | flatbuffers==23.5.26
 57 | fonttools==4.42.1
 58 | frozenlist==1.4.0
 59 | fsspec==2023.9.1
 60 | future==0.18.3
 61 | gitdb==4.0.10
 62 | GitPython==3.1.35
 63 | gpt4all==2.0.2
 64 | greenlet==3.0.0
 65 | h11==0.14.0
 66 | h2==4.1.0
 67 | hpack==4.0.0
 68 | html2text==2020.1.16
 69 | httpcore==0.18.0
 70 | httptools==0.6.0
 71 | httpx==0.25.0
 72 | huggingface-hub==0.16.4
 73 | humanfriendly==10.0
 74 | hyperframe==6.0.1
 75 | idna==3.4
 76 | importlib-metadata==6.8.0
 77 | importlib-resources==6.0.1
 78 | iopath==0.1.10
 79 | ipykernel==6.25.2
 80 | ipython==8.15.0
 81 | itsdangerous==2.1.2
 82 | jedi==0.19.0
 83 | Jinja2==3.1.2
 84 | joblib==1.3.2
 85 | jsonpatch==1.33
 86 | jsonpointer==2.4
 87 | jsonschema==4.19.0
 88 | jsonschema-specifications==2023.7.1
 89 | jupyter_client==8.3.1
 90 | jupyter_core==5.3.1
 91 | kiwisolver==1.4.5
 92 | langchain==0.0.332
 93 | langcodes==3.3.0
 94 | langsmith==0.0.52
 95 | layoutparser==0.3.4
 96 | loguru==0.7.2
 97 | lxml==4.9.3
 98 | manifest-ml==0.0.1
 99 | Markdown==3.4.4
100 | markdown-it-py==3.0.0
101 | MarkupSafe==2.1.3
102 | marshmallow==3.20.1
103 | matplotlib==3.8.0
104 | matplotlib-inline==0.1.6
105 | mdurl==0.1.2
106 | monotonic==1.6
107 | mpmath==1.3.0
108 | msg-parser==1.2.0
109 | multidict==6.0.4
110 | murmurhash==1.0.10
111 | mypy-extensions==1.0.0
112 | nest-asyncio==1.5.8
113 | networkx==3.1
114 | nltk==3.8.1
115 | numexpr==2.8.5
116 | numpy==1.25.2
117 | olefile==0.46
118 | omegaconf==2.3.0
119 | onnx==1.14.1
120 | onnxruntime==1.16.0
121 | openai==0.28.0
122 | opencv-python==4.8.0.76
123 | openpyxl==3.1.2
124 | orjson==3.9.9
125 | overrides==7.4.0
126 | packaging==23.1
127 | pandas==2.1.0
128 | parso==0.8.3
129 | pathspec==0.11.2
130 | pathy==0.10.2
131 | pdf2image==1.16.3
132 | pdfminer.six==20221105
133 | pdfplumber==0.10.2
134 | pexpect==4.8.0
135 | pickleshare==0.7.5
136 | Pillow==9.5.0
137 | pinecone-client==2.2.4
138 | platformdirs==3.10.0
139 | playwright==1.39.0
140 | portalocker==2.8.2
141 | posthog==3.0.2
142 | preshed==3.0.9
143 | prompt-toolkit==3.0.39
144 | protobuf==4.24.3
145 | psutil==5.9.5
146 | ptyprocess==0.7.0
147 | pulsar-client==3.3.0
148 | pure-eval==0.2.2
149 | pyarrow==13.0.0
150 | pycocotools==2.0.7
151 | pycparser==2.21
152 | pydantic==1.10.12
153 | pydantic_core==2.6.3
154 | pydeck==0.8.0
155 | pydub==0.25.1
156 | pyee==11.0.1
157 | Pygments==2.16.1
158 | Pympler==1.0.1
159 | pypandoc==1.11
160 | pyparsing==3.1.1
161 | pypdf==3.16.2
162 | pypdfium2==4.20.0
163 | PyPika==0.48.9
164 | pytesseract==0.3.10
165 | python-dateutil==2.8.2
166 | python-docx==0.8.11
167 | python-dotenv==1.0.0
168 | python-iso639==2023.6.15
169 | python-magic==0.4.27
170 | python-multipart==0.0.6
171 | python-pptx==0.6.21
172 | pytube==11.0.2
173 | pytz==2023.3.post1
174 | pytz-deprecation-shim==0.1.0.post0
175 | PyYAML==6.0.1
176 | pyzmq==25.1.1
177 | rapidfuzz==3.3.1
178 | redis==5.0.0
179 | referencing==0.30.2
180 | regex==2023.8.8
181 | requests==2.31.0
182 | rich==13.5.2
183 | rpds-py==0.10.2
184 | safetensors==0.3.3
185 | scikit-learn==1.3.1
186 | scipy==1.11.3
187 | sentence-transformers==2.2.2
188 | sentencepiece==0.1.99
189 | six==1.16.0
190 | smart-open==6.4.0
191 | smmap==5.0.0
192 | sniffio==1.3.0
193 | socksio==1.0.0
194 | soupsieve==2.5
195 | spacy==3.6.1
196 | spacy-legacy==3.0.12
197 | spacy-loggers==1.0.5
198 | SQLAlchemy==2.0.22
199 | sqlitedict==2.1.0
200 | srsly==2.4.7
201 | stack-data==0.6.2
202 | starlette==0.27.0
203 | streamlit==1.27.2
204 | sympy==1.12
205 | tabulate==0.9.0
206 | tenacity==8.2.3
207 | thinc==8.1.12
208 | threadpoolctl==3.2.0
209 | tiktoken==0.5.1
210 | timm==0.9.7
211 | tokenizers==0.14.0
212 | toml==0.10.2
213 | toolz==0.12.0
214 | torch==2.0.1
215 | torchvision==0.15.2
216 | tornado==6.3.3
217 | tqdm==4.66.1
218 | traitlets==5.10.0
219 | transformers==4.34.0
220 | typer==0.9.0
221 | typing-inspect==0.9.0
222 | typing_extensions==4.7.1
223 | tzdata==2023.3
224 | tzlocal==4.3.1
225 | ujson==5.8.0
226 | unstructured==0.10.16
227 | unstructured-inference==0.6.6
228 | unstructured.pytesseract==0.3.12
229 | urllib3==1.26.16
230 | uvicorn==0.23.2
231 | validators==0.22.0
232 | wasabi==1.1.2
233 | watchdog==3.0.0
234 | watchfiles==0.20.0
235 | wcwidth==0.2.6
236 | websockets==11.0.3
237 | wikipedia==1.4.0
238 | xlrd==2.0.1
239 | XlsxWriter==3.1.5
240 | yarl==1.9.2
241 | zipp==3.16.2
242 | 


--------------------------------------------------------------------------------