├── .DS_Store ├── .gitignore ├── .vscode └── settings.json ├── README.md ├── data ├── .DS_Store ├── meta │ └── meta.pdf ├── microsoft │ └── MSFT_FY23Q4_10K.pdf └── ticker_symbols │ ├── ticker_symbols.csv │ └── ticker_symbols.txt ├── docs ├── finsight.gif ├── main.md └── news.md ├── experiments ├── new_sections.txt ├── sem_qa.py └── semantic_qa_over_tables.ipynb ├── pdf └── final_report.pdf ├── prompts ├── insights.prompt ├── iv2.prompt ├── main.prompt └── report.prompt ├── requirements.txt ├── src ├── .DS_Store ├── balance_sheet.py ├── cash_flow.py ├── company_overview.py ├── fields.py ├── fields2.py ├── income_statement.py ├── news_sentiment.py ├── pages │ ├── 1_📊_Finance_Metrics_Review.py │ └── 2_🗂️_Annual_Report_Analyzer.py ├── pdf_gen.py ├── pydantic_models.py ├── ticker_symbol.py ├── utils.py └── 🏡_Home.py └── test_files ├── .DS_Store ├── Models.py ├── RAG ├── data1.pdf ├── tech_1.py └── test.py ├── __pycache__ ├── Models.cpython-311.pyc └── main.cpython-311.pyc ├── attribs.py ├── av-api-test.py ├── finchat.py ├── fmp-api.py ├── main.py ├── node_parsing.py ├── nodes.py ├── open_ai_api.py ├── parser.py ├── pdf1.py ├── pdf_gen.py ├── plotly_chart.py ├── plotly_pdf.py ├── pydant.py ├── remove_tags.py ├── sec_api_test.py ├── sec_download.py ├── summarize.py ├── table_pdf.py ├── tbl.py └── tools.py /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/.DS_Store -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | venv 2 | .env 3 | src/__pycache__ 4 | AAPL.pdf 5 | example.pdf 6 | .streamlit -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "python.analysis.typeCheckingMode": "off", 3 | "python.analysis.extraPaths": [ 4 | "./venv/lib/python3.11/site-packages" 5 | ] 6 | } -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | # 💸 FinSight: 4 | **Financial Insights at Your Fingertips** 5 | 6 | Finsight is a cutting-edge finance AI assistant tailored to meet the needs of portfolio managers, investors, and finance enthusiasts. By leveraging `GPT-4` and financial data, Finsight provides deep insights and actionable summaries about a company, aiding in more informed investment decisions. 7 | 8 | ![demo](docs/demo.gif) 9 | 10 | If you'd like to learn more about the technical details of FinSight, check out the LlamaIndex blogpost below where I do a deep dive into the project: 11 | 12 | [How I built the Streamlit LLM Hackathon winning app — FinSight using LlamaIndex.](https://blog.llamaindex.ai/how-i-built-the-streamlit-llm-hackathon-winning-app-finsight-using-llamaindex-9dcf6c46d7a0) 13 | 14 | ## Features 15 | 📊 **Finance Metrics Overview**: 16 | - Dive deep into core financial metrics extracted from the income statement, balance sheet, and cash flow. 17 | - Stay updated with the top news sentiment surrounding the company for the current year, ensuring you're always in the loop. 18 | - These are the different sections: 19 | - **Company Overview**: Get a quick overview of the company. 20 | - **Income Statement**: Understand the company's revenue and expenses. 21 | - **Balance Sheet**: Get a grasp on the company's assets, liabilities, and shareholders' equity. 22 | - **Cash Flow**: Understand the company's cash flow from operating, investing, and financing activities. 23 | - **News Sentiment**: Stay updated with the top news sentiment surrounding the company for the current year. 24 | 25 | 📄 **Annual Report Analyzer**: 26 | - Simply upload a company's annual report. 27 | - Finsight will then provide comprehensive insights into: 28 | - **Fiscal Year Highlights**: Major achievements, milestones, and financial highlights. 29 | - **Strategy Outlook and Future Direction**: Understand the company's strategic plans and anticipated future trajectory. 30 | - **Risk Management**: Insight into the company's risk assessment, potential challenges, and mitigation strategies. 31 | - **Innovation and R&D**: Get a grasp on the company's commitment to innovation and its R&D endeavors. 32 | 33 | ## Tech Stack 34 | **Streamlit**: Powers the frontend, providing a seamless user interface. 35 | **LangChain**: Acts as the foundation for integrating the LLM into the web app. 36 | **Llama Index**: The data framework behind the Research Agent Generator (RAG) and agent-based features, such as the Annual Report Analyzer. 37 | **Alpha Vantage**: The go-to API service for fetching the most recent financial data about companies. 38 | 39 | ## How to Use 40 | ### Website Access: 41 | Head over to [Finsight](https://finsight-report.streamlit.app/) 42 | 43 | ### **Local Setup**: 44 | 45 | 46 | 1. **Clone the Repository**: 47 | ```bash 48 | git clone https://github.com/vishwasg217/finsight.git 49 | cd finsight 50 | ``` 51 | 52 | 2. **Set Up a Virtual Environment** (Optional but Recommended): 53 | ```bash 54 | # For macOS and Linux: 55 | python3 -m venv venv 56 | 57 | # For Windows: 58 | python -m venv venv 59 | ``` 60 | 61 | 3. **Activate the Virtual Environment**: 62 | ```bash 63 | # For macOS and Linux: 64 | source venv/bin/activate 65 | 66 | # For Windows: 67 | .\venv\Scripts\activate 68 | ``` 69 | 70 | 4. **Install Required Dependencies**: 71 | ```bash 72 | pip install -r requirements.txt 73 | ``` 74 | 75 | 5. **Set up the Environment Variables**: 76 | ```bash 77 | # create directory 78 | mkdir .streamlit 79 | 80 | # create toml file 81 | touch .streamlit/secrets.toml 82 | ``` 83 | 84 | You can get your API keys here: [AlphaVantage](https://www.alphavantage.co/support/#api-key), [OpenAI](https://openai.com/blog/openai-api) 85 | 86 | ```bash 87 | # add the following API keys 88 | av_api_key = "ALPHA_VANTAGE API KEY" 89 | 90 | openai_api_key = "OPEN AI API KEY" 91 | 92 | 93 | ``` 94 | 95 | 6. **Run Finsight**: 96 | ```bash 97 | streamlit run src/🏡_Home.py 98 | ``` 99 | 100 | After running the command, Streamlit will provide a local URL (usually `http://localhost:8501/`) which you can open in your web browser to access Finsight. 101 | -------------------------------------------------------------------------------- /data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/data/.DS_Store -------------------------------------------------------------------------------- /data/meta/meta.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/data/meta/meta.pdf -------------------------------------------------------------------------------- /data/microsoft/MSFT_FY23Q4_10K.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/data/microsoft/MSFT_FY23Q4_10K.pdf -------------------------------------------------------------------------------- /docs/finsight.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/docs/finsight.gif -------------------------------------------------------------------------------- /docs/main.md: -------------------------------------------------------------------------------- 1 | ## About the App and Its Features: 2 | Finsight is a cutting-edge AI assistant tailored for portfolio managers, investors, and finance enthusiasts. It streamlines the process of gaining crucial insights and summaries about a company in a user-friendly manner. 3 | 4 | ### [Finance Metrics Review](https://finsight-report.streamlit.app/Finance_Metrics_Review): 5 | Simply enter the ticker symbol of your desired company. With a click, Finsight delves deep into the financial data and current news sentiment, presenting you with a comprehensive analysis. From metrics derived from income statements to the latest news sentiments, get a 360° view of the company's financial health. 6 | 7 | ### [Annual Report Analyzer](https://finsight-report.streamlit.app/Annual_Report_Analyzer): 8 | Want a deep dive into a company's annual report? Upload the report in PDF format, and Finsight will process and analyze it, offering insights into Fiscal Year Highlights, Strategy Outlook and Future Direction, Risk Management, and Innovation & R&D. 9 | 10 | #### GitHub Repository: 11 | For those keen on diving into the code or contributing, the entire project is open-source and hosted on GitHub. Find it [here](). 12 | 13 | #### About the Creator: 14 | Hi! I'm Vishwas Gowda, an ML Engineer and an LLMs enthusiast. 15 | 16 | Let's Connect and Collaborate: 17 | - [GitHub](https://github.com/vishwasg217) 18 | - [Twitter](https://twitter.com/VishwasAiTech) 19 | - [LinkedIn](https://www.linkedin.com/in/vishwasgowda217/) 20 | -------------------------------------------------------------------------------- /docs/news.md: -------------------------------------------------------------------------------- 1 | #### FinSight Wins Streamlit LLM Hackathon!! 2 | 3 | I'm excited to share some great news with you all. Over the past month, I've been working tirelessly on Finsight, and it's been an incredible journey. Today, I'm thrilled to announce that Finsight has emerged victorious in the LLM Hackathon organized by streamlit, specifically in the LlamaIndex category! 4 | 5 | 6 | [Read more](https://www.linkedin.com/posts/vishwasgowda217_llm-hackathon-streamlit-activity-7115398433573666816-1y72?utm_source=share&utm_medium=member_desktop) 7 | 8 | 9 | Stay tuned for more exciting updates from FinSight! 10 | -------------------------------------------------------------------------------- /experiments/new_sections.txt: -------------------------------------------------------------------------------- 1 | Section 1: Business 2 | 3 | Company Background: Provide the company overiew in a neatly formatted way 4 | 5 | Products and Services: List out the products and services offered by the company in a neatly formatted way 6 | 7 | Competition and Strategy: Provide the details about the company's competition and their strategy to compete with them. 8 | 9 | Intellectual Property and Human Resources: Provide details about the company's intellectual property and steps taken to protect and enchance human capital 10 | 11 | Regulatory and Legal Matters: What are the regulatory and legal issues? List them out in detail 12 | 13 | 14 | Section 2: Risk Factors 15 | 16 | Product Related Risks: What are the risks related to the product? List them out in detail 17 | 18 | Regulatory and Enforcement Risks: What are the risks faced by the company in enforcing regulations? 19 | 20 | Operational Risks: what are the risks faced in day to day operations by the company? 21 | 22 | Market Risk: what are the market risks faced by the company? List them in detail 23 | 24 | 25 | Section 3: Management's Discussion and Analysis 26 | 27 | Fiscal Year Highlights: 28 | 29 | 30 | Section 4: Financial Statements 31 | 32 | Income statements: provide the net income, basic earnings per share, cost and expenses. provide insights from these metrics. 33 | 34 | Balance Sheet: provide the total assets, total liabilites, retained earnings. provide insights from these metrics. 35 | 36 | 37 | 38 | 39 | -------------------------------------------------------------------------------- /experiments/sem_qa.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | 8 | import pandas as pd 9 | from llama_index import SimpleDirectoryReader 10 | from platform import node 11 | from llama_index.node_parser import UnstructuredElementNodeParser, SimpleNodeParser 12 | from llama_index.retrievers import RecursiveRetriever 13 | from llama_index.query_engine import RetrieverQueryEngine 14 | from llama_index import VectorStoreIndex 15 | from llama_index.tools import QueryEngineTool, ToolMetadata 16 | from llama_index.query_engine import SubQuestionQueryEngine 17 | from llama_index import ServiceContext 18 | from llama_index.llms import OpenAI 19 | 20 | import nest_asyncio 21 | 22 | nest_asyncio.apply() 23 | 24 | import streamlit as st 25 | import os 26 | 27 | OPENAI_API_KEY = st.secrets["openai_api_key"] 28 | os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY 29 | 30 | pd.set_option("display.max_rows", None) 31 | pd.set_option("display.max_columns", None) 32 | pd.set_option("display.width", None) 33 | pd.set_option("display.max_colwidth", None) 34 | 35 | 36 | company = input("Enter company name: ") 37 | 38 | # load pdfs 39 | if company == "apple": 40 | reader = SimpleDirectoryReader( 41 | input_files=["data/apple/AAPL.pdf"] 42 | ) 43 | docs = reader.load_data() 44 | 45 | 46 | elif company =="meta": 47 | reader = SimpleDirectoryReader( 48 | input_files=["data/meta/meta.pdf"] 49 | ) 50 | 51 | docs = reader.load_data() 52 | 53 | 54 | # node_parser = UnstructuredElementNodeParser() 55 | node_parser = SimpleNodeParser() 56 | 57 | nodes = node_parser.get_nodes_from_documents(docs, show_progress=True) 58 | # base_nodes, node_mappings = node_parser.get_base_nodes_and_mappings(nodes) 59 | 60 | 61 | vector_index = VectorStoreIndex(nodes) 62 | vector_retriever = vector_index.as_retriever(similarity_top_k=3) 63 | query_engine = vector_index.as_query_engine(similarity_top_k=3) 64 | 65 | # recursive_retriever = RecursiveRetriever( 66 | # "vector", 67 | # retriever_dict={"vector": vector_retriever}, 68 | # node_dict=aapl_node_mappings, 69 | # verbose=True, 70 | # ) 71 | # query_engine = RetrieverQueryEngine.from_args(recursive_retriever) 72 | 73 | 74 | llm = OpenAI(model="gpt-3.5-turbo", api_key=OPENAI_API_KEY) 75 | service_context = ServiceContext.from_defaults(llm=llm) 76 | 77 | query_engine_tool = [ 78 | QueryEngineTool( 79 | query_engine=query_engine, 80 | metadata=ToolMetadata( 81 | name = company, 82 | description=f"provides information about {company} financials for the year 2022", 83 | ), 84 | ) 85 | ] 86 | 87 | sub_query_engine = SubQuestionQueryEngine.from_defaults( 88 | query_engine_tools=query_engine_tool, 89 | service_context=service_context, 90 | use_async=True 91 | ) 92 | 93 | query = """ 94 | You are tasked with generating performance_highlights insight about the company for the Fiscal Year Highlights section from the annual report of the company: 95 | 96 | Given below is the output format, which has the subsections. 97 | Must use bullet points. 98 | Always use $ symbol for money values, and round it off to millions or billions accordingly 99 | 100 | Incase you don't have enough info you can just write: No information available 101 | --- 102 | {'performance_highlights': 'Key performance and financial stats over the fiscal year. Provide the Revenue Growth, Net Profit Margin, Market Share expansion, Cost Savings and Efficiency, Dividend Distribution'} 103 | 104 | """ 105 | 106 | query = "" 107 | 108 | while query != "exit": 109 | query = input("ENTER QUERY: ") 110 | response = sub_query_engine.query(query) 111 | print("-"*50) 112 | print(str(response)) 113 | print("-"*50) 114 | 115 | { 116 | "Business": "Item 1: ", 117 | "Risk Factors": "Item 1A, Item 7A", 118 | "MD&A": "Item 7", 119 | "Financial Statements": "Item 8", 120 | "Management's Report on Internal Control Over Financial Reporting": "Item 9A", 121 | "Report of Independent Registered Public Accounting Firm": "Item 8", 122 | "Corporate Governance": "Item 10" 123 | } 124 | 125 | -------------------------------------------------------------------------------- /pdf/final_report.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/pdf/final_report.pdf -------------------------------------------------------------------------------- /prompts/insights.prompt: -------------------------------------------------------------------------------- 1 | You are tasked with generating insights about the company from the {type_of_data} below: 2 | 3 | ---- 4 | {inputs} 5 | ---- 6 | 7 | Rules: 8 | Each insight must must not state the obvious. 9 | Always use $ symbol for money values, and round it off to millions or billions accordingly 10 | 11 | Generate insights about the company according to the following format. 12 | 13 | ---- 14 | {output_format} 15 | ---- -------------------------------------------------------------------------------- /prompts/iv2.prompt: -------------------------------------------------------------------------------- 1 | You are tasked with generating {insight_name} insight about the company for the {type_of_data} data given below: 2 | 3 | ---- 4 | {inputs} 5 | ---- 6 | 7 | Rules: 8 | The insight must not state the obvious. 9 | Always use $ symbol for money values, and round it off to millions or billions accordingly 10 | 11 | Generate the insight as per the given description: 12 | 13 | ---- 14 | {output_format} 15 | ---- -------------------------------------------------------------------------------- /prompts/main.prompt: -------------------------------------------------------------------------------- 1 | You are a financial analyzer for a mutual fund portfolio manager. 2 | Your job is the is to provide a detailed analysis of the following: 3 | 4 | -------------------------------------------------------------------------------- /prompts/report.prompt: -------------------------------------------------------------------------------- 1 | You are tasked with generating {insight_name} insight about the company for the {section_name} section from the annual report of the company: 2 | 3 | Given below is the output format, which has the subsections. 4 | Must use bullet points. 5 | Always use $ symbol for money values, and round it off to millions or billions accordingly 6 | 7 | Incase you don't have enough info you can just write: No information available 8 | --- 9 | {output_format} 10 | --- -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiofiles==23.2.1 2 | aiohttp==3.8.5 3 | aiosignal==1.3.1 4 | altair==5.1.1 5 | annotated-types==0.5.0 6 | antlr4-python3-runtime==4.9.3 7 | anyio==3.7.1 8 | appnope==0.1.3 9 | astor==0.8.1 10 | asttokens==2.4.0 11 | async-timeout==4.0.3 12 | atlassian-python-api==3.41.3 13 | attrs==23.1.0 14 | Authlib==1.2.1 15 | backcall==0.2.0 16 | backoff==2.2.1 17 | base58==2.1.1 18 | bcrypt==4.0.1 19 | beautifulsoup4==4.12.2 20 | blinker==1.6.2 21 | cachetools==5.3.1 22 | camelot-py==0.11.0 23 | certifi==2023.7.22 24 | cffi==1.15.1 25 | chardet==5.2.0 26 | charset-normalizer==3.2.0 27 | Chroma==0.2.0 28 | chroma-hnswlib==0.7.3 29 | ci-info==0.3.0 30 | clarifai==9.8.1 31 | clarifai-grpc==9.8.2 32 | click==7.1.2 33 | coloredlogs==15.0.1 34 | comm==0.1.4 35 | configobj==5.0.8 36 | configparser==6.0.0 37 | contourpy==1.1.1 38 | cryptography==41.0.3 39 | cycler==0.12.1 40 | dataclasses-json==0.5.14 41 | debugpy==1.8.0 42 | decorator==5.1.1 43 | Deprecated==1.2.14 44 | diskcache==5.6.3 45 | distro==1.8.0 46 | dotenv-python==0.0.1 47 | EbookLib==0.18 48 | effdet==0.4.1 49 | emoji==2.8.0 50 | et-xmlfile==1.1.0 51 | etelemetry==0.3.1 52 | executing==2.0.0 53 | faiss-cpu==1.7.4 54 | fastapi==0.99.1 55 | filelock==3.12.3 56 | filetype==1.2.0 57 | fitz==0.0.1.dev2 58 | flatbuffers==23.5.26 59 | fonttools==4.43.1 60 | frontend==0.0.3 61 | frozenlist==1.4.0 62 | fsspec==2023.9.0 63 | future==0.18.3 64 | gitdb==4.0.10 65 | GitPython==3.1.35 66 | googleapis-common-protos==1.60.0 67 | greenlet==3.0.0 68 | grpcio==1.58.0 69 | h11==0.14.0 70 | html2text==2020.1.16 71 | httplib2==0.22.0 72 | httptools==0.6.0 73 | huggingface-hub==0.16.4 74 | humanfriendly==10.0 75 | idna==3.4 76 | importlib-metadata==6.8.0 77 | importlib-resources==6.0.1 78 | iopath==0.1.10 79 | ipykernel==6.25.2 80 | ipython==8.16.1 81 | isodate==0.6.1 82 | itsdangerous==2.1.2 83 | jedi==0.19.1 84 | Jinja2==3.1.2 85 | joblib==1.3.2 86 | JPype1==1.4.1 87 | jsonpatch==1.33 88 | jsonpointer==2.4 89 | jsonschema==4.19.0 90 | jsonschema-specifications==2023.7.1 91 | jupyter_client==8.3.1 92 | jupyter_core==5.3.2 93 | kaleido==0.2.1 94 | kiwisolver==1.4.5 95 | langchain==0.0.310 96 | langdetect==1.0.9 97 | langsmith==0.0.43 98 | layoutparser==0.3.4 99 | llama-hub==0.0.38 100 | llama-index==0.8.48 101 | llama_cpp_python==0.2.2 102 | looseversion==1.3.0 103 | lxml==4.9.3 104 | Markdown==3.5 105 | markdown-it-py==3.0.0 106 | MarkupSafe==2.1.3 107 | marshmallow==3.20.1 108 | matplotlib==3.8.0 109 | matplotlib-inline==0.1.6 110 | mdurl==0.1.2 111 | monotonic==1.6 112 | mpmath==1.3.0 113 | msg-parser==1.2.0 114 | multidict==6.0.4 115 | mypy-extensions==1.0.0 116 | nest-asyncio==1.5.8 117 | networkx==3.2 118 | nibabel==5.1.0 119 | nipype==1.8.6 120 | nltk==3.8.1 121 | numexpr==2.8.5 122 | numpy==1.25.2 123 | oauthlib==3.2.2 124 | olefile==0.46 125 | omegaconf==2.3.0 126 | onnx==1.14.1 127 | onnxruntime==1.15.1 128 | openai==0.28.0 129 | opencv-python==4.8.1.78 130 | openpyxl==3.1.2 131 | overrides==7.4.0 132 | packaging==23.1 133 | pandas==2.1.0 134 | parso==0.8.3 135 | pathlib==1.0.1 136 | pdf2image==1.16.3 137 | pdfminer.six==20221105 138 | pdfplumber==0.10.2 139 | pexpect==4.8.0 140 | pickleshare==0.7.5 141 | Pillow==9.5.0 142 | pipdeptree==2.13.0 143 | platformdirs==3.11.0 144 | plotly==5.17.0 145 | portalocker==2.8.2 146 | posthog==3.0.2 147 | prompt-toolkit==3.0.39 148 | protobuf==4.24.3 149 | prov==2.0.0 150 | psutil==5.9.5 151 | ptyprocess==0.7.0 152 | pulsar-client==3.3.0 153 | pure-eval==0.2.2 154 | pyarrow==13.0.0 155 | pycocotools==2.0.7 156 | pycparser==2.21 157 | pydantic==1.10.12 158 | pydantic_core==2.6.3 159 | pydeck==0.8.0 160 | pydot==1.4.2 161 | Pygments==2.16.1 162 | Pympler==1.0.1 163 | pypandoc==1.12 164 | pyparsing==3.1.1 165 | pypdf==3.15.5 166 | PyPDF2==2.12.1 167 | pypdfium2==4.22.0 168 | PyPika==0.48.9 169 | pyrate-limiter==3.1.0 170 | pytesseract==0.3.10 171 | python-dateutil==2.8.2 172 | python-docx==1.0.1 173 | python-dotenv==1.0.0 174 | python-iso639==2023.6.15 175 | python-magic==0.4.27 176 | python-multipart==0.0.6 177 | python-pptx==0.6.21 178 | python-rapidjson==1.11 179 | pytz==2023.3.post1 180 | pytz-deprecation-shim==0.1.0.post0 181 | pyxnat==1.6 182 | PyYAML==6.0.1 183 | pyzmq==25.1.1 184 | rapidfuzz==3.4.0 185 | rdflib==7.0.0 186 | referencing==0.30.2 187 | regex==2023.8.8 188 | reportlab==4.0.4 189 | requests==2.31.0 190 | requests-oauthlib==1.3.1 191 | retrying==1.3.4 192 | rich==13.4.2 193 | rpds-py==0.10.2 194 | safetensors==0.4.0 195 | scipy==1.11.3 196 | sec-edgar-downloader==5.0.0 197 | simplejson==3.19.2 198 | six==1.16.0 199 | slack-sdk==3.23.0 200 | smmap==5.0.0 201 | sniffio==1.3.0 202 | soupsieve==2.5 203 | SQLAlchemy==2.0.20 204 | stack-data==0.6.3 205 | starlette==0.27.0 206 | streamlit==1.27.2 207 | sympy==1.12 208 | tabula-py==2.8.2 209 | tabulate==0.9.0 210 | tenacity==8.2.3 211 | tiktoken==0.5.1 212 | timm==0.9.8 213 | tokenizers==0.14.0 214 | toml==0.10.2 215 | toolz==0.12.0 216 | torch==2.1.0 217 | torchvision==0.16.0 218 | tornado==6.3.3 219 | tqdm==4.64.1 220 | traitlets==5.11.2 221 | traits==6.3.2 222 | transformers==4.34.1 223 | tritonclient==2.34.0 224 | typing-inspect==0.9.0 225 | typing_extensions==4.7.1 226 | tzdata==2023.3 227 | tzlocal==4.3.1 228 | unstructured==0.10.25 229 | unstructured-inference==0.7.9 230 | unstructured.pytesseract==0.3.12 231 | urllib3==1.26.16 232 | uvicorn==0.23.2 233 | uvloop==0.17.0 234 | validators==0.21.0 235 | watchdog==3.0.0 236 | watchfiles==0.20.0 237 | wcwidth==0.2.8 238 | weaviate-client==3.23.2 239 | websockets==11.0.3 240 | wrapt==1.15.0 241 | xlrd==2.0.1 242 | XlsxWriter==3.1.9 243 | yarl==1.9.2 244 | zipp==3.16.2 245 | -------------------------------------------------------------------------------- /src/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/src/.DS_Store -------------------------------------------------------------------------------- /src/balance_sheet.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | import requests 8 | import streamlit as st 9 | import os 10 | # from dotenv import dotenv_values 11 | 12 | from src.pydantic_models import BalanceSheetInsights 13 | from src.utils import insights, get_total_revenue, safe_float, generate_pydantic_model 14 | # from src.fields import balance_sheet_fields, balance_sheet_attributes 15 | from src.fields2 import bal_sheet, balance_sheet_attributes 16 | 17 | # config = dotenv_values(".env") 18 | # OPENAI_API_KEY = config["OPENAI_API_KEY"] 19 | # AV_API_KEY = config["ALPHA_VANTAGE_API_KEY"] 20 | 21 | # AV_API_KEY = st.secrets["av_api_key"] 22 | # OPENAI_API_KEY = st.secrets["openai_api_key"] 23 | 24 | AV_API_KEY = os.environ.get("AV_API_KEY") 25 | OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") 26 | 27 | def charts(data): 28 | report = data['annualReports'][0] 29 | asset_composition = {"total_current_assets": report['totalCurrentAssets'], 30 | "total_non_current_assets": report['totalNonCurrentAssets'] 31 | } 32 | 33 | liabilities_composition = { 34 | "total_current_liabilities": report['totalCurrentLiabilities'], 35 | "total_non_current_liabilities": report['totalNonCurrentLiabilities'] 36 | } 37 | 38 | debt_structure = { 39 | "short_term_debt": report['shortTermDebt'], 40 | "long_term_debt": report['longTermDebt'] 41 | } 42 | 43 | return { 44 | "asset_composition": asset_composition, 45 | "liabilities_composition": liabilities_composition, 46 | "debt_structure": debt_structure 47 | } 48 | 49 | 50 | 51 | def metrics(data, total_revenue): 52 | 53 | # Extracting values from the data 54 | totalCurrentAssets = safe_float(data.get("totalCurrentAssets")) 55 | totalCurrentLiabilities = safe_float(data.get("totalCurrentLiabilities")) 56 | totalLiabilities = safe_float(data.get("totalLiabilities")) 57 | totalShareholderEquity = safe_float(data.get("totalShareholderEquity")) 58 | totalAssets = safe_float(data.get("totalAssets")) 59 | inventory = safe_float(data.get("inventory")) 60 | 61 | # Calculate metrics, but check for N/A values in operands 62 | current_ratio = ( 63 | "N/A" 64 | if "N/A" in (totalCurrentAssets, totalCurrentLiabilities) 65 | else totalCurrentAssets / totalCurrentLiabilities 66 | ) 67 | debt_to_equity_ratio = ( 68 | "N/A" 69 | if "N/A" in (totalLiabilities, totalShareholderEquity) 70 | else totalLiabilities / totalShareholderEquity 71 | ) 72 | quick_ratio = ( 73 | "N/A" 74 | if "N/A" in (totalCurrentAssets, totalCurrentLiabilities, inventory) 75 | else (totalCurrentAssets - inventory) / totalCurrentLiabilities 76 | ) 77 | asset_turnover = ( 78 | "N/A" if "N/A" in (total_revenue, totalAssets) else total_revenue / totalAssets 79 | ) 80 | equity_multiplier = ( 81 | "N/A" 82 | if "N/A" in (totalAssets, totalShareholderEquity) 83 | else totalAssets / totalShareholderEquity 84 | ) 85 | 86 | # Returning the results 87 | return { 88 | "current_ratio": current_ratio, 89 | "debt_to_equity_ratio": debt_to_equity_ratio, 90 | "quick_ratio": quick_ratio, 91 | "asset_turnover": asset_turnover, 92 | "equity_multiplier": equity_multiplier, 93 | } 94 | 95 | 96 | def balance_sheet(symbol, fields_to_include, api_key): 97 | url = "https://www.alphavantage.co/query" 98 | params = { 99 | "function": "BALANCE_SHEET", 100 | "symbol": symbol, 101 | "apikey": AV_API_KEY 102 | } 103 | response = requests.get(url, params=params) 104 | data = response.json() 105 | if not data: 106 | print(f"No data found for {symbol}") 107 | return None 108 | 109 | if "Error Message" in data: 110 | return {"Error": data["Error Message"]} 111 | 112 | chart_data = charts(data) 113 | 114 | report = data["annualReports"][0] 115 | total_revenue = get_total_revenue(symbol) 116 | met = metrics(report, total_revenue) 117 | 118 | data_for_insights = { 119 | "annual_report_data": report, 120 | "historical_data": chart_data, 121 | } 122 | 123 | ins = {} 124 | for i, field in enumerate(balance_sheet_attributes): 125 | if fields_to_include[i]: 126 | response = insights(field, "balance sheet", data_for_insights, str({field: bal_sheet[field]}), api_key) 127 | ins[field] = response 128 | 129 | return { 130 | "metrics": met, 131 | "chart_data": chart_data, 132 | "insights": ins 133 | } 134 | 135 | if __name__ == "__main__": 136 | fields = [True, True, False, False, False] 137 | data = balance_sheet("MSFT", fields, OPENAI_API_KEY) 138 | print("Metrics: ", data['metrics']) 139 | print("Chart Data: ", data['chart_data']) 140 | print("Insights", data['insights']) 141 | 142 | 143 | -------------------------------------------------------------------------------- /src/cash_flow.py: -------------------------------------------------------------------------------- 1 | 2 | import sys 3 | from pathlib import Path 4 | script_dir = Path(__file__).resolve().parent 5 | project_root = script_dir.parent 6 | sys.path.append(str(project_root)) 7 | 8 | import requests 9 | import streamlit as st 10 | import os 11 | # from dotenv import dotenv_values 12 | 13 | from src.pydantic_models import CashFlowInsights 14 | from src.utils import insights, get_total_revenue, get_total_debt, safe_float, generate_pydantic_model 15 | from src.fields2 import cashflow, cashflow_attributes 16 | # config = dotenv_values(".env") 17 | # OPENAI_API_KEY = config["OPENAI_API_KEY"] 18 | # AV_API_KEY = config["ALPHA_VANTAGE_API_KEY"] 19 | 20 | # AV_API_KEY = st.secrets["av_api_key"] 21 | # OPENAI_API_KEY = st.secrets["openai_api_key"] 22 | 23 | AV_API_KEY = os.environ.get("AV_API_KEY") 24 | OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") 25 | 26 | 27 | def charts(data): 28 | dates = [] 29 | operating_cash_flow = [] 30 | cash_flow_from_investment = [] 31 | cash_flow_from_financing = [] 32 | 33 | for report in reversed(data["annualReports"]): 34 | dates.append(report["fiscalDateEnding"]) 35 | operating_cash_flow.append(report["operatingCashflow"]) 36 | cash_flow_from_investment.append(report["cashflowFromInvestment"]) 37 | cash_flow_from_financing.append(report["cashflowFromFinancing"]) 38 | 39 | return { 40 | "dates": dates, 41 | "operating_cash_flow": operating_cash_flow, 42 | "cash_flow_from_investment": cash_flow_from_investment, 43 | "cash_flow_from_financing": cash_flow_from_financing 44 | } 45 | 46 | 47 | def metrics(data, total_revenue, total_debt): 48 | 49 | # Helper function to safely convert to float or set to N/A 50 | 51 | 52 | operatingCashFlow = safe_float(data.get("operatingCashflow")) 53 | capitalExpenditures = safe_float(data.get("capitalExpenditures")) 54 | dividendPayout = safe_float(data.get("dividendPayout")) 55 | netIncome = safe_float(data.get("netIncome")) 56 | 57 | operating_cash_flow_margin = "N/A" if "N/A" in (operatingCashFlow, total_revenue) else operatingCashFlow / total_revenue 58 | capital_expenditure_coverage_ratio = "N/A" if "N/A" in (operatingCashFlow, capitalExpenditures) else operatingCashFlow / capitalExpenditures 59 | free_cash_flow = "N/A" if "N/A" in (operatingCashFlow, capitalExpenditures) else operatingCashFlow - capitalExpenditures 60 | dividend_coverage_ratio = "N/A" if "N/A" in (dividendPayout, netIncome) else netIncome / dividendPayout 61 | cash_flow_to_debt_ratio = "N/A" if "N/A" in (operatingCashFlow, total_debt) else operatingCashFlow / total_debt 62 | 63 | return { 64 | "operating_cash_flow_margin": operating_cash_flow_margin, 65 | "capital_expenditure_coverage_ratio": capital_expenditure_coverage_ratio, 66 | "free_cash_flow": free_cash_flow, 67 | "dividend_coverage_ratio": dividend_coverage_ratio, 68 | "cash_flow_to_debt_ratio": cash_flow_to_debt_ratio 69 | } 70 | 71 | 72 | def cash_flow(symbol, fields_to_include, api_key): 73 | url = "https://www.alphavantage.co/query" 74 | params = { 75 | "function": "CASH_FLOW", 76 | "symbol": symbol, 77 | "apikey": AV_API_KEY 78 | } 79 | response = requests.get(url, params=params) 80 | data = response.json() 81 | if not data: 82 | print(f"No data found for {symbol}") 83 | return None 84 | 85 | if "Error Message" in data: 86 | return {"Error": data["Error Message"]} 87 | 88 | chart_data = charts(data) 89 | 90 | report = data["annualReports"][0] 91 | total_revenue = get_total_revenue(symbol) 92 | total_debt = get_total_debt(symbol) 93 | met = metrics(report, total_revenue, total_debt) 94 | 95 | data_for_insights = { 96 | "annual_report_data": report, 97 | "historical_data": chart_data, 98 | } 99 | ins = {} 100 | for i, field in enumerate(cashflow_attributes): 101 | if fields_to_include[i]: 102 | response = insights(field, "cash flow", data_for_insights, str({field: cashflow[field]}), api_key) 103 | ins[field] = response 104 | 105 | 106 | return { 107 | "metrics": met, 108 | "chart_data": chart_data, 109 | "insights": ins 110 | } 111 | 112 | if __name__ == "__main__": 113 | fields = [True, True, False, False, False] 114 | data = cash_flow("AAPL", fields, OPENAI_API_KEY) 115 | print("Metrics: ", data['metrics']) 116 | print("Chart Data: ", data['chart_data']) 117 | print("Insights", data['insights']) 118 | 119 | # if __name__ == "__main__": 120 | # symbol = "AAPL" 121 | # url = "https://www.alphavantage.co/query" 122 | # params = { 123 | # "function": "CASH_FLOW", 124 | # "symbol": symbol, 125 | # "apikey": AV_API_KEY 126 | # } 127 | # response = requests.get(url, params=params) 128 | # data = response.json() 129 | # if not data: 130 | # print(f"No data found for {symbol}") 131 | 132 | # ans = charts(data) 133 | # print(ans) -------------------------------------------------------------------------------- /src/company_overview.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | import requests 8 | import streamlit as st 9 | import os 10 | 11 | from src.utils import safe_float 12 | 13 | 14 | # AV_API_KEY = st.secrets["av_api_key"] 15 | 16 | AV_API_KEY = os.environ.get("AV_API_KEY") 17 | OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") 18 | 19 | 20 | def company_overview(symbol): 21 | url = "https://www.alphavantage.co/query" 22 | params = { 23 | "function": "OVERVIEW", 24 | "symbol": symbol, 25 | "apikey": AV_API_KEY 26 | } 27 | 28 | # Send a GET request to the API 29 | response = requests.get(url, params=params) 30 | if response.status_code == 200: 31 | data = response.json() 32 | if not data: 33 | print(f"No data found for {symbol}") 34 | return None 35 | 36 | if "Error Message" in data: 37 | return {"Error": data["Error Message"]} 38 | 39 | extracted_data = { 40 | "Symbol": data.get("Symbol"), 41 | "AssetType": data.get("AssetType"), 42 | "Name": data.get("Name"), 43 | "Description": data.get("Description"), 44 | "CIK": data.get("CIK"), 45 | "Exchange": data.get("Exchange"), 46 | "Currency": data.get("Currency"), 47 | "Country": data.get("Country"), 48 | "Sector": data.get("Sector"), 49 | "Industry": data.get("Industry"), 50 | "Address": data.get("Address"), 51 | "FiscalYearEnd": data.get("FiscalYearEnd"), 52 | "LatestQuarter": data.get("LatestQuarter"), 53 | "MarketCapitalization": safe_float(data.get("MarketCapitalization")), 54 | } 55 | 56 | else: 57 | print(f"Error: {response.status_code} - {response.text}") 58 | 59 | return extracted_data 60 | 61 | 62 | if __name__ == "__main__": 63 | ans = company_overview("TSLA") 64 | print(ans) 65 | -------------------------------------------------------------------------------- /src/fields.py: -------------------------------------------------------------------------------- 1 | from pydantic import Field 2 | 3 | min_length = 40 4 | 5 | inc_stat_fields = { 6 | "revenue_health": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's total revenue, providing a perspective on the health of the primary business activity.")), 7 | "operational_efficiency": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of the company's operating expenses in relation to its revenue, offering a view into the firm's operational efficiency.")), 8 | "r_and_d_focus": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's commitment to research and development, signifying its emphasis on innovation and future growth.")), 9 | "debt_management": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of the company's interest expenses, highlighting the scale of its debt obligations and its approach to leveraging.")), 10 | "profit_retention": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's net income, showcasing the amount retained post all expenses, which can be reinvested or distributed.")) 11 | } 12 | 13 | inc_stat_attributes = ["revenue_health", "operational_efficiency", "r_and_d_focus", "debt_management", "profit_retention"] 14 | 15 | balance_sheet_fields = { 16 | "liquidity_position": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's ability to meet its short-term obligations using its short-term assets.")), 17 | "operational_efficiency": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of how efficiently the company is using its assets to generate sales.")), 18 | "capital_structure": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's financial leverage and its reliance on external liabilities versus internal equity.")), 19 | "inventory_management": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of the company's efficiency in managing, selling, and replacing its inventory.")), 20 | "overall_solvency": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's overall ability to meet its long-term debts and obligations.")) 21 | } 22 | 23 | balance_sheet_attributes = ["liquidity_position", "operational_efficiency", "capital_structure", "inventory_management", "overall_solvency"] 24 | 25 | cashflow_fields = { 26 | "operational_cash_efficiency": (str, Field(..., description=f"Must be more than {min_length} words. Insight into how efficiently the company is generating cash from its core operations.")), 27 | "investment_capability": (str, Field(..., description=f"Must be more than {min_length} words. Indicates the company's ability to invest in its business using its operational cash flows.")), 28 | "financial_flexibility": (str, Field(..., description=f"Must be more than {min_length} words. Demonstrates the cash left after all operational expenses and investments, which can be used for dividends, share buybacks, or further investments.")), 29 | "dividend_sustainability": (str, Field(..., description=f"Must be more than {min_length} words. Indicates the company's ability to cover its dividend payouts with its net earnings.")), 30 | "debt_service_capability": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of the company's ability to service its debt using the operational cash flows.")) 31 | } 32 | 33 | cashflow_attributes = ["operational_cash_efficiency", "investment_capability", "financial_flexibility", "dividend_sustainability", "debt_service_capability"] 34 | 35 | fiscal_year_fields = { 36 | "performance_highlights": (str, Field(..., description="Key performance and financial stats over the fiscal year.")), 37 | "major_events": (str, Field(..., description="Highlight of significant events, acquisitions, or strategic shifts that occurred during the year.")), 38 | "challenges_encountered": (str, Field(..., description="Challenges the company faced during the year and, if and how they managed or overcame them.")), 39 | } 40 | fiscal_year_attributes = ["performance_highlights", "major_events", "challenges_encountered"] 41 | 42 | strat_outlook_fields = { 43 | "strategic_initiatives": (str, Field(..., description="The company's primary objectives and growth strategies for the upcoming years.")), 44 | "market_outlook": (str, Field(..., description="Insights into the broader market, competitive landscape, and industry trends the company anticipates.")), 45 | "product_roadmap": (str, Field(..., description="Upcoming launches, expansions, or innovations the company plans to roll out.")) 46 | } 47 | 48 | strat_outlook_attributes = ["strategic_initiatives", "market_outlook", "product_roadmap"] 49 | 50 | risk_management_fields = { 51 | "risk_factors": (str, Field(..., description="Primary risks the company acknowledges.")), 52 | "risk_mitigation": (str, Field(..., description="Strategies for managing these risks.")) 53 | } 54 | 55 | risk_management_attributes = ["risk_factors", "risk_mitigation"] 56 | 57 | 58 | innovation_fields = { 59 | "r_and_d_activities": (str, Field(..., description="Overview of the company's focus on research and development, major achievements, or breakthroughs.")), 60 | "innovation_focus": (str, Field(..., description="Mention of new technologies, patents, or areas of research the company is diving into.")) 61 | } 62 | 63 | innovation_attributes = ["r_and_d_activities", "innovation_focus"] 64 | 65 | -------------------------------------------------------------------------------- /src/fields2.py: -------------------------------------------------------------------------------- 1 | min_length = 40 2 | 3 | inc_stat = { 4 | "revenue_health": f"Must be more than {min_length} words. Insight into the company's total revenue, providing a perspective on the health of the primary business activity.", 5 | "operational_efficiency": f"Must be more than {min_length} words. Analysis of the company's operating expenses in relation to its revenue, offering a view into the firm's operational efficiency.", 6 | "r_and_d_focus": f"Must be more than {min_length} words. Insight into the company's commitment to research and development, signifying its emphasis on innovation and future growth.", 7 | "debt_management": f"Must be more than {min_length} words. Analysis of the company's interest expenses, highlighting the scale of its debt obligations and its approach to leveraging.", 8 | "profit_retention": f"Must be more than {min_length} words. Insight into the company's net income, showcasing the amount retained post all expenses, which can be reinvested or distributed." 9 | } 10 | 11 | inc_stat_attributes = ["revenue_health", "operational_efficiency", "r_and_d_focus", "debt_management", "profit_retention"] 12 | 13 | bal_sheet = { 14 | "liquidity_position": f"Must be more than {min_length} words. Insight into the company's ability to meet its short-term obligations using its short-term assets.", 15 | "assets_efficiency": f"Must be more than {min_length} words. Analysis of how efficiently the company is using its assets to generate sales.", 16 | "capital_structure": f"Must be more than {min_length} words. Insight into the company's financial leverage and its reliance on external liabilities versus internal equity.", 17 | "inventory_management": f"Must be more than {min_length} words. Analysis of the company's efficiency in managing, selling, and replacing its inventory.", 18 | "overall_solvency": f"Must be more than {min_length} words. Insight into the company's overall ability to meet its long-term debts and obligations." 19 | } 20 | balance_sheet_attributes = ["liquidity_position", "assets_efficiency", "capital_structure", "inventory_management", "overall_solvency"] 21 | 22 | 23 | cashflow = { 24 | "operational_cash_efficiency": f"Must be more than {min_length} words. Insight into how efficiently the company is generating cash from its core operations.", 25 | "investment_capability": f"Must be more than {min_length} words. Indicates the company's ability to invest in its business using its operational cash flows.", 26 | "financial_flexibility": f"Must be more than {min_length} words. Demonstrates the cash left after all operational expenses and investments, which can be used for dividends, share buybacks, or further investments.", 27 | "dividend_sustainability": f"Must be more than {min_length} words. Indicates the company's ability to cover its dividend payouts with its net earnings.", 28 | "debt_service_capability": f"Must be more than {min_length} words. Analysis of the company's ability to service its debt using the operational cash flows." 29 | } 30 | 31 | cashflow_attributes = ["operational_cash_efficiency", "investment_capability", "financial_flexibility", "dividend_sustainability", "debt_service_capability"] 32 | 33 | fiscal_year = { 34 | "performance_highlights": "Key performance and financial stats over the fiscal year.", 35 | "major_events": "Highlight of significant events, acquisitions, or strategic shifts that occurred during the year.", 36 | "challenges_encountered": "Challenges the company faced during the year and, if and how they managed or overcame them." 37 | } 38 | 39 | fiscal_year_attributes = ["performance_highlights", "major_events", "challenges_encountered"] 40 | 41 | strat_outlook = { 42 | "strategic_initiatives": "The company's primary objectives and growth strategies for the upcoming years.", 43 | "market_outlook": "Insights into the broader market, competitive landscape, and industry trends the company anticipates.", 44 | "product_roadmap": "Upcoming launches, expansions, or innovations the company plans to roll out." 45 | } 46 | 47 | strat_outlook_attributes = ["strategic_initiatives", "market_outlook", "product_roadmap"] 48 | 49 | risk_management = { 50 | "risk_factors": "Primary risks the company acknowledges.", 51 | "risk_mitigation": "Strategies for managing these risks." 52 | } 53 | 54 | risk_management_attributes = ["risk_factors", "risk_mitigation"] 55 | 56 | innovation = { 57 | "r_and_d_activities": "Overview of the company's focus on research and development, major achievements, or breakthroughs.", 58 | "innovation_focus": "Mention of new technologies, patents, or areas of research the company is diving into." 59 | } 60 | 61 | innovation_attributes = ["r_and_d_activities", "innovation_focus"] -------------------------------------------------------------------------------- /src/income_statement.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | import os 8 | import pandas as pd 9 | import requests 10 | import streamlit as st 11 | import plotly.graph_objects as go 12 | # from dotenv import dotenv_values 13 | 14 | from src.pydantic_models import IncomeStatementInsights 15 | from src.utils import insights, safe_float, generate_pydantic_model 16 | from src.fields import inc_stat_attributes, inc_stat_fields 17 | from src.fields2 import inc_stat, inc_stat_attributes 18 | 19 | # config = dotenv_values(".env") 20 | # OPENAI_API_KEY = config["OPENAI_API_KEY"] 21 | # AV_API_KEY = config["ALPHA_VANTAGE_API_KEY"] 22 | 23 | # AV_API_KEY = st.secrets["av_api_key"] 24 | # OPENAI_API_KEY = st.secrets["openai_api_key"] 25 | 26 | OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") 27 | AV_API_KEY = os.environ.get("AV_API_KEY") 28 | 29 | ## 30 | 31 | def charts(data): 32 | dates = [] 33 | total_revenue = [] 34 | net_income = [] 35 | interest_expense = [] 36 | 37 | print(data) 38 | 39 | for report in reversed(data["annualReports"]): 40 | dates.append(report["fiscalDateEnding"]) 41 | total_revenue.append(report["totalRevenue"]) 42 | net_income.append(report["netIncome"]) 43 | interest_expense.append(report["interestAndDebtExpense"]) 44 | 45 | return { 46 | "dates": dates, 47 | "total_revenue": total_revenue, 48 | "net_income": net_income, 49 | "interest_expense": interest_expense 50 | } 51 | 52 | 53 | def metrics(data): 54 | 55 | # Extracting values from the data 56 | grossProfit = safe_float(data.get("grossProfit")) 57 | totalRevenue = safe_float(data.get("totalRevenue")) 58 | operatingIncome = safe_float(data.get("operatingIncome")) 59 | costOfRevenue = safe_float(data.get("costOfRevenue")) 60 | costofGoodsAndServicesSold = safe_float(data.get("costofGoodsAndServicesSold")) 61 | sellingGeneralAndAdministrative = safe_float(data.get("sellingGeneralAndAdministrative")) 62 | ebit = safe_float(data.get("ebit")) 63 | interestAndDebtExpense = safe_float(data.get("interestAndDebtExpense")) 64 | netIncome = safe_float(data["netIncome"]) 65 | 66 | # Calculate metrics, but check for N/A values in operands 67 | gross_profit_margin = ( 68 | "N/A" if "N/A" in (grossProfit, totalRevenue) else grossProfit / totalRevenue 69 | ) 70 | operating_profit_margin = ( 71 | "N/A" if "N/A" in (operatingIncome, totalRevenue) else operatingIncome / totalRevenue 72 | ) 73 | net_profit_margin = ( 74 | "N/A" if "N/A" in (netIncome, totalRevenue) else netIncome / totalRevenue 75 | ) 76 | cost_efficiency = ( 77 | "N/A" 78 | if "N/A" in (totalRevenue, costOfRevenue, costofGoodsAndServicesSold) 79 | else totalRevenue / (costOfRevenue + costofGoodsAndServicesSold) 80 | ) 81 | sg_and_a_efficiency = ( 82 | "N/A" 83 | if "N/A" in (totalRevenue, sellingGeneralAndAdministrative) 84 | else totalRevenue / sellingGeneralAndAdministrative 85 | ) 86 | interest_coverage_ratio = ( 87 | "N/A" if "N/A" in (ebit, interestAndDebtExpense) else ebit / interestAndDebtExpense 88 | ) 89 | 90 | # Returning the results 91 | return { 92 | "gross_profit_margin": gross_profit_margin, 93 | "operating_profit_margin": operating_profit_margin, 94 | "net_profit_margin": net_profit_margin, 95 | "cost_efficiency": cost_efficiency, 96 | "sg_and_a_efficiency": sg_and_a_efficiency, 97 | "interest_coverage_ratio": interest_coverage_ratio, 98 | } 99 | 100 | 101 | 102 | def income_statement(symbol, fields_to_include, api_key): 103 | url = "https://www.alphavantage.co/query" 104 | params = { 105 | "function": "INCOME_STATEMENT", 106 | "symbol": symbol, 107 | "apikey": AV_API_KEY 108 | } 109 | 110 | # Send a GET request to the API 111 | response = requests.get(url, params=params) 112 | if response.status_code == 200: 113 | data = response.json() 114 | if not data: 115 | print(f"No data found for {symbol}") 116 | return None 117 | 118 | else: 119 | print(f"Error: {response.status_code} - {response.text}") 120 | 121 | if 'Error Message' in data: 122 | return {"Error": data['Error Message']} 123 | 124 | chart_data = charts(data) 125 | 126 | report = data["annualReports"][0] 127 | met = metrics(report) 128 | 129 | data_for_insights = { 130 | "annual_report_data": report, 131 | "historical_data": chart_data, 132 | } 133 | 134 | ins = {} 135 | for i, field in enumerate(inc_stat_attributes): 136 | if fields_to_include[i]: 137 | response = insights(field, "income statement", data_for_insights, str({field: inc_stat[field]}), api_key) 138 | ins[field] = response 139 | 140 | return { 141 | "metrics": met, 142 | "chart_data": chart_data, 143 | "insights": ins 144 | } 145 | 146 | 147 | if __name__ == "__main__": 148 | fields_to_include = [True, False, False, False, True] 149 | 150 | data = income_statement("TSLA", fields_to_include, OPENAI_API_KEY) 151 | print("Metrics: ", data['metrics']) 152 | print("Chart Data: ", data['chart_data']) 153 | print("Insights", data['insights']) 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | -------------------------------------------------------------------------------- /src/news_sentiment.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | import streamlit as st 8 | import requests 9 | from datetime import datetime, timedelta 10 | import pandas as pd 11 | import os 12 | 13 | # AV_API_KEY = st.secrets["av_api_key"] 14 | 15 | AV_API_KEY = os.environ.get("AV_API_KEY") 16 | OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") 17 | 18 | def classify_sentiment(mean_score): 19 | if mean_score <= -0.35: 20 | return "Bearish" 21 | elif -0.35 < mean_score <= -0.15: 22 | return "Somewhat-Bearish" 23 | elif -0.15 < mean_score < 0.15: 24 | return "Neutral" 25 | elif 0.15 <= mean_score < 0.35: 26 | return "Somewhat_Bullish" 27 | elif mean_score >= 0.35: 28 | return "Bullish" 29 | else: 30 | return "Undefined" 31 | 32 | def top_news(symbol, max_feed): 33 | 34 | current_datetime = datetime.now() 35 | one_year_ago = current_datetime - timedelta(days=365) 36 | formatted_time_from = one_year_ago.strftime("%Y%m%dT%H%M") 37 | print("time_from=", formatted_time_from) 38 | 39 | url = "https://www.alphavantage.co/query" 40 | params = { 41 | "function": "NEWS_SENTIMENT", 42 | "tickers": symbol, 43 | "apikey": AV_API_KEY, 44 | "sort": "RELEVANCE", 45 | } 46 | 47 | # Send a GET request to the API 48 | response = requests.get(url, params=params) 49 | if response.status_code == 200: 50 | data = response.json() 51 | if not data: 52 | print(f"No data found for {symbol}") 53 | print(data) 54 | return None 55 | news = [] 56 | 57 | if "Error Message" in data: 58 | return {"Error": data["Error Message"]} 59 | 60 | try: 61 | for i in data["feed"][:max_feed]: 62 | temp = {} 63 | temp["title"] = i["title"] 64 | temp["url"] = i["url"] 65 | temp["authors"] = i["authors"] 66 | 67 | topics = [] 68 | for j in i["topics"]: 69 | topics.append(j["topic"]) 70 | temp["topics"] = topics 71 | 72 | sentiment_score = "" 73 | sentiment_label = "" 74 | for j in i["ticker_sentiment"]: 75 | if j["ticker"] == symbol: 76 | sentiment_score = j["ticker_sentiment_score"] 77 | sentiment_label = j["ticker_sentiment_label"] 78 | break 79 | temp["sentiment_score"] = sentiment_score 80 | temp["sentiment_label"] = sentiment_label 81 | 82 | news.append(temp) 83 | 84 | except Exception as e: 85 | print(e) 86 | return None 87 | 88 | else: 89 | print(f"Error: {response.status_code} - {response.text}") 90 | 91 | news = pd.DataFrame(news) 92 | news["sentiment_score"] = pd.to_numeric(news["sentiment_score"]) 93 | mean_sentiment_score = news["sentiment_score"].mean() 94 | mean_sentiment_class = classify_sentiment(mean_sentiment_score) 95 | 96 | return { 97 | "news": news, 98 | "mean_sentiment_score": mean_sentiment_score, 99 | "mean_sentiment_class": mean_sentiment_class 100 | } 101 | 102 | if __name__ == "__main__": 103 | news = top_news("AAPL", 10) 104 | print(news) 105 | 106 | -------------------------------------------------------------------------------- /src/pages/1_📊_Finance_Metrics_Review.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | 8 | import streamlit as st 9 | import os 10 | 11 | st.set_page_config(page_title="Finance Metrics Reviews", page_icon=":bar_chart:", layout="wide", initial_sidebar_state="collapsed") 12 | 13 | st.title(":chart_with_upwards_trend: Finance Metrics Review") 14 | st.info(""" 15 | Simply input the ticker symbol of your desired company and hit the 'Generate Insights' button. Allow a few moments for the system to compile the data and insights tailored to the selected company. Once done, you have the option to browse through these insights directly on the platform or download a comprehensive report by selecting 'Generate PDF', followed by 'Download PDF'. 16 | """) 17 | 18 | 19 | from src.income_statement import income_statement 20 | from src.balance_sheet import balance_sheet 21 | from src.cash_flow import cash_flow 22 | from src.news_sentiment import top_news 23 | from src.company_overview import company_overview 24 | from src.utils import round_numeric, format_currency, create_donut_chart, create_bar_chart 25 | from src.pdf_gen import gen_pdf 26 | from src.fields2 import inc_stat, inc_stat_attributes, bal_sheet, balance_sheet_attributes, cashflow, cashflow_attributes 27 | 28 | st.sidebar.info(""" 29 | You can get your API keys here: [OpenAI](https://openai.com/blog/openai-api), [AlphaVantage](https://www.alphavantage.co/support/#api-key), 30 | """) 31 | 32 | OPENAI_API_KEY = st.sidebar.text_input("Enter OpenAI API key", type="password") 33 | os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY 34 | 35 | AV_API_KEY = st.sidebar.text_input("Enter Alpha Vantage API key", type="password") 36 | os.environ["AV_API_KEY"] = AV_API_KEY 37 | 38 | 39 | if not OPENAI_API_KEY: 40 | st.error("Please enter your OpenAI API Key") 41 | elif not AV_API_KEY: 42 | st.error("Please enter your Alpha Vantage API Key") 43 | else: 44 | 45 | 46 | col1, col2 = st.columns([0.25, 0.75], gap="medium") 47 | 48 | with col1: 49 | st.write(""" 50 | ### Select Insights 51 | """) 52 | with st.expander("**Income Statement Insights**", expanded=True): 53 | revenue_health = st.toggle("Revenue Health") 54 | operational_efficiency = st.toggle("Operational Efficiency") 55 | r_and_d_focus = st.toggle("R&D Focus") 56 | debt_management = st.toggle("Debt Management") 57 | profit_retention = st.toggle("Profit Retention") 58 | 59 | 60 | income_statement_feature_list = [revenue_health, operational_efficiency, r_and_d_focus, debt_management, profit_retention] 61 | 62 | with st.expander("**Balance Sheet Insights**", expanded=True): 63 | liquidity_position = st.toggle("Liquidity Position") 64 | assets_efficiency = st.toggle("Operational efficiency") 65 | capital_structure = st.toggle("Capital Structure") 66 | inventory_management = st.toggle("Inventory Management") 67 | overall_solvency = st.toggle("Overall Solvency") 68 | 69 | balance_sheet_feature_list = [liquidity_position, assets_efficiency, capital_structure, inventory_management, overall_solvency] 70 | 71 | with st.expander("**Cash Flow Insights**", expanded=True): 72 | operational_cash_efficiency = st.toggle("Operational Cash Efficiency") 73 | investment_capability = st.toggle("Investment Capability") 74 | financial_flexibility = st.toggle("Financial Flexibility") 75 | dividend_sustainability = st.toggle("Dividend Sustainability") 76 | debt_service_capability = st.toggle("Debt Service Capability") 77 | 78 | cash_flow_feature_list = [operational_cash_efficiency, investment_capability, financial_flexibility, dividend_sustainability, debt_service_capability] 79 | 80 | 81 | with col2: 82 | ticker = st.text_input("**Enter ticker symbol**") 83 | st.warning("Example Tickers: Apple Inc. - AAPL, Microsoft Corporation - MSFT, Tesla Inc. - TSLA") 84 | 85 | 86 | for insight in inc_stat_attributes: 87 | if insight not in st.session_state: 88 | st.session_state[insight] = None 89 | 90 | for insight in balance_sheet_attributes: 91 | if insight not in st.session_state: 92 | st.session_state[insight] = None 93 | 94 | for insight in cashflow_attributes: 95 | if insight not in st.session_state: 96 | st.session_state[insight] = None 97 | 98 | 99 | if "company_overview" not in st.session_state: 100 | st.session_state.company_overview = None 101 | 102 | if "income_statement" not in st.session_state: 103 | st.session_state.income_statement = None 104 | 105 | if "balance_sheet" not in st.session_state: 106 | st.session_state.balance_sheet = None 107 | 108 | if "cash_flow" not in st.session_state: 109 | st.session_state.cash_flow = None 110 | 111 | if "news" not in st.session_state: 112 | st.session_state.news = None 113 | 114 | if "all_outputs" not in st.session_state: 115 | st.session_state.all_outputs = None 116 | 117 | if ticker: 118 | if st.button("Generate Insights"): 119 | 120 | with st.status("**Generating Insights...**"): 121 | 122 | 123 | if not st.session_state.company_overview: 124 | st.write("Getting company overview...") 125 | st.session_state.company_overview = company_overview(ticker) 126 | 127 | 128 | if any(income_statement_feature_list): 129 | st.write("Generating income statement insights...") 130 | for i, insight in enumerate(inc_stat_attributes): 131 | if st.session_state[insight]: 132 | income_statement_feature_list[i] = False 133 | 134 | response = income_statement(ticker, income_statement_feature_list, OPENAI_API_KEY) 135 | 136 | st.session_state.income_statement = response 137 | 138 | for key, value in response["insights"].items(): 139 | st.session_state[key] = value 140 | 141 | 142 | if any(balance_sheet_feature_list): 143 | st.write("Generating balance sheet insights...") 144 | for i, insight in enumerate(balance_sheet_attributes): 145 | if st.session_state[insight]: 146 | balance_sheet_feature_list[i] = False 147 | 148 | response = balance_sheet(ticker, balance_sheet_feature_list, OPENAI_API_KEY) 149 | 150 | st.session_state.balance_sheet = response 151 | 152 | for key, value in response["insights"].items(): 153 | st.session_state[key] = value 154 | 155 | 156 | if any(cash_flow_feature_list): 157 | st.write("Generating cash flow insights...") 158 | for i, insight in enumerate(cashflow_attributes): 159 | if st.session_state[insight]: 160 | cash_flow_feature_list[i] = False 161 | 162 | 163 | 164 | response = cash_flow(ticker, cash_flow_feature_list, OPENAI_API_KEY) 165 | 166 | st.session_state.cash_flow = response 167 | 168 | for key, value in response["insights"].items(): 169 | st.session_state[key] = value 170 | 171 | if not st.session_state.news: 172 | st.write('Getting latest news...') 173 | st.session_state.news = top_news(ticker, 10) 174 | 175 | if st.session_state.company_overview and st.session_state.income_statement and st.session_state.balance_sheet and st.session_state.cash_flow and st.session_state.news: 176 | st.session_state.all_outputs = True 177 | 178 | if st.session_state.company_overview == None: 179 | st.error(f"No Data available") 180 | 181 | if st.session_state.all_outputs: 182 | st.toast("Insights successfully Generated!") 183 | if st.button("Generate PDF"): 184 | gen_pdf(st.session_state.company_overview["Name"], 185 | st.session_state.company_overview, 186 | st.session_state.income_statement, 187 | st.session_state.balance_sheet, 188 | st.session_state.cash_flow, 189 | None) 190 | st.toast("PDF successfully generated!") 191 | with open("pdf/final_report.pdf", "rb") as file: 192 | st.download_button( 193 | label="Download PDF", 194 | data=file, 195 | file_name="final_report.pdf", 196 | mime="application/pdf" 197 | ) 198 | 199 | 200 | 201 | tab1, tab2, tab3, tab4, tab5 = st.tabs(["Company Overview", "Income Statement", "Balance Sheet", "Cash Flow", "News Sentiment"]) 202 | 203 | 204 | if st.session_state.company_overview: 205 | 206 | if "Error" in st.session_state.company_overview: 207 | st.error(st.session_state.company_overview["Error Message"]) 208 | 209 | else: 210 | with tab1: 211 | with st.container(): 212 | 213 | st.write("# Company Overview") 214 | # st.markdown("### Company Name:") 215 | st.markdown(f"""### {st.session_state.company_overview["Name"]}""") 216 | col1, col2, col3 = st.columns(3) 217 | col1.markdown("### Symbol:") 218 | col1.write(st.session_state.company_overview["Symbol"]) 219 | col2.markdown("### Exchange:") 220 | col2.write(st.session_state.company_overview["Exchange"]) 221 | col3.markdown("### Currency:") 222 | col3.write(st.session_state.company_overview["Currency"]) 223 | 224 | col1, col2, col3 = st.columns(3) 225 | col1.markdown("### Sector:") 226 | col1.write(st.session_state.company_overview["Sector"]) 227 | col2.markdown("### Industry:") 228 | col2.write(st.session_state.company_overview["Industry"]) 229 | col3.write() 230 | st.markdown("### Description:") 231 | st.write(st.session_state.company_overview["Description"]) 232 | 233 | col1, col2, col3 = st.columns(3) 234 | col1.markdown("### Country:") 235 | col1.write(st.session_state.company_overview["Country"]) 236 | col2.markdown("### Address:") 237 | col2.write(st.session_state.company_overview["Address"]) 238 | col3.write() 239 | 240 | col1, col2, col3 = st.columns(3) 241 | col1.markdown("### Fiscal Year End:") 242 | col1.write(st.session_state.company_overview["FiscalYearEnd"]) 243 | col2.markdown("### Latest Quarter:") 244 | col2.write(st.session_state.company_overview["LatestQuarter"]) 245 | col3.markdown("### Market Capitalization:") 246 | col3.write(format_currency(st.session_state.company_overview["MarketCapitalization"])) 247 | 248 | 249 | if st.session_state.income_statement: 250 | 251 | if "Error" in st.session_state.income_statement: 252 | st.error(st.session_state.income_statement["Error Message"]) 253 | 254 | else: 255 | 256 | with tab2: 257 | 258 | st.write("# Income Statement") 259 | st.write("## Metrics") 260 | 261 | with st.container(): 262 | 263 | col1, col2, col3 = st.columns(3) 264 | 265 | col1.metric("Gross Profit Margin", round_numeric(st.session_state.income_statement["metrics"]["gross_profit_margin"], 2)) 266 | col2.metric("Operating Profit Margin", round_numeric(st.session_state.income_statement["metrics"]["operating_profit_margin"], 2)) 267 | col3.metric("Net Profit Margin", round_numeric(st.session_state.income_statement["metrics"]["net_profit_margin"], 2)) 268 | col1.metric("Cost Efficiency", round_numeric(st.session_state.income_statement["metrics"]["cost_efficiency"], 2)) 269 | col2.metric("SG&A Efficiency", round_numeric(st.session_state.income_statement["metrics"]["sg_and_a_efficiency"], 2)) 270 | col3.metric("Interest Coverage Ratio", round_numeric(st.session_state.income_statement["metrics"]["interest_coverage_ratio"], 2)) 271 | 272 | 273 | st.write("## Insights") 274 | 275 | 276 | if revenue_health: 277 | if st.session_state["revenue_health"]: 278 | st.write("### Revenue Health") 279 | st.markdown(st.session_state["revenue_health"]) 280 | total_revenue_chart = create_bar_chart(st.session_state.income_statement["chart_data"], 281 | "total_revenue", 282 | "Revenue Growth") 283 | st.write(total_revenue_chart) 284 | else: 285 | st.error("Revenue Health insight has not been generated") 286 | 287 | 288 | if operational_efficiency: 289 | if st.session_state["operational_efficiency"]: 290 | st.write("### Operational Efficiency") 291 | st.write(st.session_state["operational_efficiency"]) 292 | else: 293 | st.error("Operational Efficiency insight has not been generated") 294 | 295 | 296 | if r_and_d_focus: 297 | if st.session_state["r_and_d_focus"]: 298 | st.write("### R&D Focus") 299 | st.write(st.session_state["r_and_d_focus"]) 300 | else: 301 | st.error("R&D Focus insight has not been generated") 302 | 303 | 304 | 305 | if debt_management: 306 | if st.session_state["debt_management"]: 307 | st.write("### Debt Management") 308 | st.write(st.session_state["debt_management"]) 309 | interest_expense_chart = create_bar_chart(st.session_state.income_statement["chart_data"], 310 | "interest_expense", 311 | "Debt Service Obligation") 312 | st.write(interest_expense_chart) 313 | else: 314 | st.error("Debt Management insight has not been generated") 315 | 316 | 317 | 318 | 319 | if profit_retention: 320 | if st.session_state["profit_retention"]: 321 | st.write("### Profit Retention") 322 | st.write(st.session_state["profit_retention"]) 323 | net_income_chart = create_bar_chart(st.session_state.income_statement["chart_data"], 324 | "net_income", 325 | "Profitability Trend") 326 | st.write(net_income_chart) 327 | else: 328 | st.error("Profit Retention insight has not been generated") 329 | 330 | 331 | 332 | if st.session_state.balance_sheet: 333 | with tab3: 334 | 335 | st.write("# Balance Sheet") 336 | st.write("## Metrics") 337 | 338 | with st.container(): 339 | 340 | col1, col2, col3 = st.columns(3) 341 | 342 | col1.metric("Current Ratio", round_numeric(st.session_state.balance_sheet['metrics']['current_ratio'], 2)) 343 | col2.metric("Debt to Equity Ratio", round_numeric(st.session_state.balance_sheet['metrics']['debt_to_equity_ratio'], 2)) 344 | col3.metric("Quick Ratio", round_numeric(st.session_state.balance_sheet['metrics']['quick_ratio'], 2)) 345 | col1.metric("Asset Turnover", round_numeric(st.session_state.balance_sheet['metrics']['asset_turnover'], 2)) 346 | col2.metric("Equity Multiplier", round_numeric(st.session_state.balance_sheet['metrics']['equity_multiplier'], 2)) 347 | 348 | 349 | 350 | st.write("## Insights") 351 | 352 | 353 | if liquidity_position: 354 | if st.session_state['liquidity_position']: 355 | st.write("### Liquidity Position") 356 | st.write(st.session_state["liquidity_position"]) 357 | asset_comp_chart = create_donut_chart(st.session_state.balance_sheet["chart_data"],"asset_composition") 358 | st.write(asset_comp_chart) 359 | else: 360 | st.error("Liquidity Position insight has not been generated") 361 | 362 | 363 | if assets_efficiency: 364 | if st.session_state['assets_efficiency']: 365 | st.write("### Assets Efficiency") 366 | st.write(st.session_state["assets_efficiency"]) 367 | else: 368 | st.error("Assets Efficiency insight has not been generated") 369 | 370 | 371 | if capital_structure: 372 | if st.session_state['capital_structure']: 373 | st.write("### Capital Structure") 374 | st.write(st.session_state["capital_structure"]) 375 | liabilities_comp_chart = create_donut_chart(st.session_state.balance_sheet["chart_data"],"liabilities_composition") 376 | st.write(liabilities_comp_chart) 377 | else: 378 | st.error("Capital Structure insight has not been generated") 379 | 380 | 381 | if inventory_management: 382 | if st.session_state['inventory_management']: 383 | st.write("### Inventory Management") 384 | st.write(st.session_state["inventory_management"]) 385 | else: 386 | st.error("Inventory Management insight has not been generated") 387 | 388 | if overall_solvency: 389 | if st.session_state['overall_solvency']: 390 | st.write("### Overall Solvency") 391 | st.write(st.session_state["overall_solvency"]) 392 | liabilities_comp_chart = create_donut_chart(st.session_state.balance_sheet["chart_data"],"debt_structure") 393 | st.write(liabilities_comp_chart) 394 | else: 395 | st.error("Overall Solvency insight has not been generated") 396 | 397 | 398 | if st.session_state.cash_flow: 399 | with tab4: 400 | 401 | st.write("# Cash Flow") 402 | st.write("## Metrics") 403 | 404 | with st.container(): 405 | 406 | col1, col2, col3 = st.columns(3) 407 | 408 | col1.metric("Operating Cash Flow Margin", round_numeric(st.session_state.cash_flow['metrics']['operating_cash_flow_margin'], 2)) 409 | col2.metric("Capital Expenditure Coverage Ratio", round_numeric(st.session_state.cash_flow['metrics']['capital_expenditure_coverage_ratio'], 2)) 410 | col3.metric("Dividend Coverage Ratio", round_numeric(st.session_state.cash_flow['metrics']['dividend_coverage_ratio'], 2)) 411 | col1.metric("Cash Flow to Debt Ratio", round_numeric(st.session_state.cash_flow['metrics']['cash_flow_to_debt_ratio'], 2)) 412 | 413 | col2.metric("Free Cash Flow", format_currency(st.session_state.cash_flow['metrics']['free_cash_flow'])) 414 | 415 | 416 | if operational_cash_efficiency: 417 | if st.session_state["operational_cash_efficiency"]: 418 | st.write("## Insights") 419 | st.write("### Operational Cash Efficiency") 420 | st.write(st.session_state["operational_cash_efficiency"]) 421 | operating_cash_flow_chart = create_bar_chart(st.session_state.cash_flow["chart_data"], 422 | "operating_cash_flow", 423 | "Operating Cash Flow Trend") 424 | st.write(operating_cash_flow_chart) 425 | else: 426 | st.error("Operational Cash Efficiency insight has not been generated") 427 | 428 | if investment_capability: 429 | if st.session_state["investment_capability"]: 430 | st.write("### Investment Capability") 431 | st.write(st.session_state["investment_capability"]) 432 | cash_flow_from_investment_chart = create_bar_chart(st.session_state.cash_flow["chart_data"], 433 | "cash_flow_from_investment", 434 | "Investment Capability Trend") 435 | st.write(cash_flow_from_investment_chart) 436 | else: 437 | st.error("Investment Capability insight has not been generated") 438 | 439 | 440 | 441 | if financial_flexibility: 442 | if st.session_state["financial_flexibility"]: 443 | st.write("### Financial Flexibility") 444 | st.write(st.session_state["financial_flexibility"]) 445 | free_cash_flow_chart = create_bar_chart(st.session_state.cash_flow["chart_data"], 446 | "cash_flow_from_financing", 447 | "Free Cash Flow Trend") 448 | st.write(free_cash_flow_chart) 449 | else: 450 | st.error("Financial Flexibility insight has not been generated") 451 | 452 | 453 | if dividend_sustainability: 454 | if st.session_state["dividend_sustainability"]: 455 | st.write("### Dividend Sustainability") 456 | st.write(st.session_state["dividend_sustainability"]) 457 | else: 458 | st.error("Dividend Sustainability insight has not been generated") 459 | 460 | if debt_service_capability: 461 | if st.session_state["debt_service_capability"]: 462 | st.write("### Debt Service Capability") 463 | st.write(st.session_state["debt_service_capability"]) 464 | else: 465 | st.error("Debt Service Capability insight has not been generated") 466 | 467 | 468 | 469 | if st.session_state.news: 470 | 471 | with tab5: 472 | st.markdown("## Top News") 473 | column_config = { 474 | "title": st.column_config.Column( 475 | "Title", 476 | width="large", 477 | ), 478 | "url": st.column_config.LinkColumn( 479 | "Link", 480 | width="medium", 481 | ), 482 | "authors": st.column_config.ListColumn( 483 | "Authors", 484 | width = "medium" 485 | ), 486 | "topics": st.column_config.ListColumn( 487 | "Topics", 488 | width="large" 489 | ), 490 | "sentiment_score" : st.column_config.ProgressColumn( 491 | "Sentiment Score", 492 | min_value=-0.5, 493 | max_value=0.5 494 | ), 495 | "sentiment_label": st.column_config.Column( 496 | "Sentiment Label" 497 | ) 498 | 499 | } 500 | 501 | st.metric("Mean Sentiment Score", 502 | value=round_numeric(st.session_state.news["mean_sentiment_score"]), 503 | delta=st.session_state.news["mean_sentiment_class"]) 504 | 505 | st.dataframe(st.session_state.news["news"], column_config=column_config) 506 | 507 | -------------------------------------------------------------------------------- /src/pages/2_🗂️_Annual_Report_Analyzer.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | 8 | from langchain.prompts import PromptTemplate 9 | from langchain.output_parsers import PydanticOutputParser 10 | 11 | from llama_index import VectorStoreIndex, ServiceContext, StorageContext 12 | from llama_index.vector_stores import FaissVectorStore 13 | from llama_index.tools import QueryEngineTool, ToolMetadata 14 | from llama_index.query_engine import SubQuestionQueryEngine 15 | from llama_index.embeddings import OpenAIEmbedding 16 | from llama_index.schema import Document 17 | from llama_index.node_parser import UnstructuredElementNodeParser 18 | 19 | from src.utils import get_model, process_pdf2, generate_pydantic_model 20 | from src.pydantic_models import FiscalYearHighlights, StrategyOutlookFutureDirection, RiskManagement, CorporateGovernanceSocialResponsibility, InnovationRnD 21 | # from src.fields import ( 22 | # fiscal_year_fields, fiscal_year_attributes, 23 | # strat_outlook_fields, strat_outlook_attributes, 24 | # risk_management_fields, risk_management_attributes, 25 | # innovation_fields, innovation_attributes 26 | # ) 27 | 28 | from src.fields2 import ( 29 | fiscal_year, fiscal_year_attributes, 30 | strat_outlook, strat_outlook_attributes, 31 | risk_management, risk_management_attributes, 32 | innovation, innovation_attributes 33 | ) 34 | 35 | import streamlit as st 36 | import weaviate 37 | import os 38 | import openai 39 | import faiss 40 | import time 41 | from pypdf import PdfReader 42 | 43 | 44 | st.set_page_config(page_title="Annual Report Analyzer", page_icon=":card_index_dividers:", initial_sidebar_state="expanded", layout="wide") 45 | 46 | st.title(":card_index_dividers: Annual Report Analyzer") 47 | st.info(""" 48 | Begin by uploading the annual report of your chosen company in PDF format. Afterward, click on 'Process PDF'. Once the document has been processed, tap on 'Analyze Report' and the system will start its magic. After a brief wait, you'll be presented with a detailed analysis and insights derived from the report for your reading. 49 | """) 50 | 51 | 52 | # OPENAI_API_KEY = st.secrets["openai_api_key"] 53 | 54 | 55 | # openai.api_key = os.environ["OPENAI_API_KEY"] 56 | 57 | def process_pdf(pdf): 58 | file = PdfReader(pdf) 59 | 60 | document_list = [] 61 | for page in file.pages: 62 | document_list.append(Document(text=str(page.extract_text()))) 63 | 64 | node_paser = UnstructuredElementNodeParser() 65 | nodes = node_paser.get_nodes_from_documents(document_list, show_progress=True) 66 | 67 | return nodes 68 | 69 | 70 | def get_vector_index(nodes, vector_store): 71 | print(nodes) 72 | llm = get_model("openai", OPENAI_API_KEY) 73 | if vector_store == "faiss": 74 | d = 1536 75 | faiss_index = faiss.IndexFlatL2(d) 76 | vector_store = FaissVectorStore(faiss_index=faiss_index) 77 | storage_context = StorageContext.from_defaults(vector_store=vector_store) 78 | # embed_model = OpenAIEmbedding() 79 | # service_context = ServiceContext.from_defaults(embed_model=embed_model) 80 | service_context = ServiceContext.from_defaults(llm=llm) 81 | index = VectorStoreIndex(nodes, 82 | service_context=service_context, 83 | storage_context=storage_context 84 | ) 85 | elif vector_store == "simple": 86 | index = VectorStoreIndex.from_documents(nodes) 87 | 88 | 89 | return index 90 | 91 | 92 | 93 | def generate_insight(engine, insight_name, section_name, output_format): 94 | 95 | with open("prompts/report.prompt", "r") as f: 96 | template = f.read() 97 | 98 | prompt_template = PromptTemplate( 99 | template=template, 100 | input_variables=['insight_name', 'section_name', 'output_format'] 101 | ) 102 | 103 | formatted_input = prompt_template.format(insight_name=insight_name, section_name=section_name, output_format=output_format) 104 | print(formatted_input) 105 | response = engine.query(formatted_input) 106 | return response.response 107 | 108 | 109 | 110 | def report_insights(engine, section_name, fields_to_include, section_num): 111 | 112 | fields = None 113 | attribs = None 114 | 115 | if section_num == 1: 116 | fields = fiscal_year 117 | attribs = fiscal_year_attributes 118 | elif section_num == 2: 119 | fields = strat_outlook 120 | attribs = strat_outlook_attributes 121 | elif section_num == 3: 122 | fields = risk_management 123 | attribs = risk_management_attributes 124 | elif section_num == 4: 125 | fields = innovation 126 | attribs = innovation_attributes 127 | 128 | ins = {} 129 | for i, field in enumerate(attribs): 130 | if fields_to_include[i]: 131 | response = generate_insight(engine, field, section_name, str({field: fields[field]})) 132 | ins[field] = response 133 | 134 | return { 135 | "insights": ins 136 | } 137 | 138 | def get_query_engine(engine): 139 | llm = get_model("openai", OPENAI_API_KEY) 140 | service_context = ServiceContext.from_defaults(llm=llm) 141 | 142 | query_engine_tools = [ 143 | QueryEngineTool( 144 | query_engine=engine, 145 | metadata=ToolMetadata( 146 | name="Annual Report", 147 | description=f"Provides information about the company from its annual report.", 148 | ), 149 | ), 150 | ] 151 | 152 | 153 | s_engine = SubQuestionQueryEngine.from_defaults( 154 | query_engine_tools=query_engine_tools, 155 | service_context=service_context 156 | ) 157 | return s_engine 158 | 159 | 160 | for insight in fiscal_year_attributes: 161 | if insight not in st.session_state: 162 | st.session_state[insight] = None 163 | 164 | for insight in strat_outlook_attributes: 165 | if insight not in st.session_state: 166 | st.session_state[insight] = None 167 | 168 | for insight in risk_management_attributes: 169 | if insight not in st.session_state: 170 | st.session_state[insight] = None 171 | 172 | for insight in innovation_attributes: 173 | if insight not in st.session_state: 174 | st.session_state[insight] = None 175 | 176 | if "end_time" not in st.session_state: 177 | st.session_state.end_time = None 178 | 179 | 180 | if "process_doc" not in st.session_state: 181 | st.session_state.process_doc = False 182 | 183 | 184 | st.sidebar.info(""" 185 | You can get your OpenAI API key [here](https://openai.com/blog/openai-api) 186 | """) 187 | OPENAI_API_KEY = st.sidebar.text_input("OpenAI API Key", type="password") 188 | os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY 189 | 190 | if not OPENAI_API_KEY: 191 | st.error("Please enter your OpenAI API Key") 192 | 193 | if OPENAI_API_KEY: 194 | pdfs = st.sidebar.file_uploader("Upload the annual report in PDF format", type="pdf") 195 | st.sidebar.info(""" 196 | Example reports you can upload here: 197 | - [Apple Inc.](https://s2.q4cdn.com/470004039/files/doc_financials/2022/q4/_10-K-2022-(As-Filed).pdf) 198 | - [Microsoft Corporation](https://microsoft.gcs-web.com/static-files/07cf3c30-cfc3-4567-b20f-f4b0f0bd5087) 199 | - [Tesla Inc.](https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q4-2022-Update) 200 | """) 201 | 202 | if st.sidebar.button("Process Document"): 203 | with st.spinner("Processing Document..."): 204 | nodes = process_pdf(pdfs) 205 | st.session_state.index = get_vector_index(nodes, vector_store="faiss") 206 | st.session_state.process_doc = True 207 | 208 | 209 | st.toast("Document Processsed!") 210 | 211 | 212 | if st.session_state.process_doc: 213 | 214 | col1, col2 = st.columns([0.25, 0.75]) 215 | 216 | with col1: 217 | st.write(""" 218 | ### Select Insights 219 | """) 220 | 221 | with st.expander("**Fiscal Year Highlights**", expanded=True): 222 | performance_highlights = st.toggle("Performance Highlights") 223 | major_events = st.toggle("Major Events") 224 | challenges_encountered = st.toggle("Challenges Encountered") 225 | 226 | fiscal_year_highlights_list = [performance_highlights, major_events, challenges_encountered] 227 | 228 | with st.expander("**Strategy Outlook and Future Direction**", expanded=True): 229 | strategic_initiatives = st.toggle("Strategic Initiatives") 230 | market_outlook = st.toggle("Market Outlook") 231 | product_roadmap = st.toggle("Product Roadmap") 232 | 233 | strategy_outlook_future_direction_list = [strategic_initiatives, market_outlook, product_roadmap] 234 | 235 | with st.expander("**Risk Management**", expanded=True): 236 | risk_factors = st.toggle("Risk Factors") 237 | risk_mitigation = st.toggle("Risk Mitigation") 238 | 239 | risk_management_list = [risk_factors, risk_mitigation] 240 | 241 | with st.expander("**Innovation and R&D**", expanded=True): 242 | r_and_d_activities = st.toggle("R&D Activities") 243 | innovation_focus = st.toggle("Innovation Focus") 244 | 245 | innovation_and_rd_list = [r_and_d_activities, innovation_focus] 246 | 247 | 248 | with col2: 249 | if st.button("Analyze Report"): 250 | engine = get_query_engine(st.session_state.index.as_query_engine(similarity_top_k=3)) 251 | start_time = time.time() 252 | 253 | with st.status("**Analyzing Report...**"): 254 | 255 | 256 | if any(fiscal_year_highlights_list): 257 | st.write("Fiscal Year Highlights...") 258 | 259 | for i, insight in enumerate(fiscal_year_attributes): 260 | if st.session_state[insight]: 261 | fiscal_year_highlights_list[i] = False 262 | 263 | response = report_insights(engine, "Fiscal Year Highlights", fiscal_year_highlights_list, 1) 264 | 265 | for key, value in response["insights"].items(): 266 | st.session_state[key] = value 267 | 268 | if any(strategy_outlook_future_direction_list): 269 | st.write("Strategy Outlook and Future Direction...") 270 | 271 | for i, insight in enumerate(strat_outlook_attributes): 272 | if st.session_state[insight]: 273 | strategy_outlook_future_direction_list[i] = False 274 | response = report_insights(engine, "Strategy Outlook and Future Direction", strategy_outlook_future_direction_list, 2) 275 | 276 | for key, value in response["insights"].items(): 277 | st.session_state[key] = value 278 | 279 | 280 | if any(risk_management_list): 281 | st.write("Risk Management...") 282 | 283 | for i, insight in enumerate(risk_management_attributes): 284 | if st.session_state[insight]: 285 | risk_management_list[i] = False 286 | 287 | response = report_insights(engine, "Risk Management", risk_management_list, 3) 288 | 289 | for key, value in response["insights"].items(): 290 | st.session_state[key] = value 291 | 292 | if any(innovation_and_rd_list): 293 | st.write("Innovation and R&D...") 294 | 295 | for i, insight in enumerate(innovation_attributes): 296 | if st.session_state[insight]: 297 | innovation_and_rd_list[i] = False 298 | 299 | response = report_insights(engine, "Innovation and R&D", innovation_and_rd_list, 4) 300 | st.session_state.innovation_and_rd = response 301 | 302 | for key, value in response["insights"].items(): 303 | st.session_state[key] = value 304 | 305 | st.session_state["end_time"] = "{:.2f}".format((time.time() - start_time)) 306 | 307 | 308 | 309 | st.toast("Report Analysis Complete!") 310 | 311 | if st.session_state.end_time: 312 | st.write("Report Analysis Time: ", st.session_state.end_time, "s") 313 | 314 | 315 | # if st.session_state.all_report_outputs: 316 | # st.toast("Report Analysis Complete!") 317 | 318 | tab1, tab2, tab3, tab4 = st.tabs(["Fiscal Year Highlights", "Strategy Outlook and Future Direction", "Risk Management", "Innovation and R&D"]) 319 | 320 | 321 | 322 | 323 | with tab1: 324 | st.write("## Fiscal Year Highlights") 325 | try: 326 | if performance_highlights: 327 | if st.session_state['performance_highlights']: 328 | st.write("### Performance Highlights") 329 | st.write(st.session_state['performance_highlights']) 330 | else: 331 | st.error("fiscal Year Highlights insight has not been generated") 332 | except: 333 | st.error("This insight has not been generated") 334 | 335 | try: 336 | if major_events: 337 | if st.session_state["major_events"]: 338 | st.write("### Major Events") 339 | st.write(st.session_state["major_events"]) 340 | else: 341 | st.error("Major Events insight has not been generated") 342 | except: 343 | st.error("This insight has not been generated") 344 | try: 345 | if challenges_encountered: 346 | if st.session_state["challenges_encountered"]: 347 | st.write("### Challenges Encountered") 348 | st.write(st.session_state["challenges_encountered"]) 349 | else: 350 | st.error("Challenges Encountered insight has not been generated") 351 | except: 352 | st.error("This insight has not been generated") 353 | # st.write("### Milestone Achievements") 354 | # st.write(str(st.session_state.fiscal_year_highlights.milestone_achievements)) 355 | 356 | 357 | 358 | with tab2: 359 | st.write("## Strategy Outlook and Future Direction") 360 | try: 361 | if strategic_initiatives: 362 | if st.session_state["strategic_initiatives"]: 363 | st.write("### Strategic Initiatives") 364 | st.write(st.session_state["strategic_initiatives"]) 365 | else: 366 | st.error("Strategic Initiatives insight has not been generated") 367 | except: 368 | st.error("This insight has not been generated") 369 | 370 | try: 371 | if market_outlook: 372 | if st.session_state["market_outlook"]: 373 | st.write("### Market Outlook") 374 | st.write(st.session_state["market_outlook"]) 375 | else: 376 | st.error("Market Outlook insight has not been generated") 377 | 378 | except: 379 | st.error("This insight has not been generated") 380 | 381 | try: 382 | if product_roadmap: 383 | if st.session_state["product_roadmap"]: 384 | st.write("### Product Roadmap") 385 | st.write(st.session_state["product_roadmap"]) 386 | else: 387 | st.error("Product Roadmap insight has not been generated") 388 | except: 389 | st.error("This insight has not been generated") 390 | 391 | with tab3: 392 | st.write("## Risk Management") 393 | 394 | try: 395 | if risk_factors: 396 | if st.session_state["risk_factors"]: 397 | st.write("### Risk Factors") 398 | st.write(st.session_state["risk_factors"]) 399 | else: 400 | st.error("Risk Factors insight has not been generated") 401 | except: 402 | st.error("This insight has not been generated") 403 | 404 | try: 405 | if risk_mitigation: 406 | if st.session_state["risk_mitigation"]: 407 | st.write("### Risk Mitigation") 408 | st.write(st.session_state["risk_mitigation"]) 409 | else: 410 | st.error("Risk Mitigation insight has not been generated") 411 | except: 412 | st.error("This insight has not been generated") 413 | 414 | 415 | with tab4: 416 | st.write("## Innovation and R&D") 417 | 418 | try: 419 | if r_and_d_activities: 420 | if st.session_state["r_and_d_activities"]: 421 | st.write("### R&D Activities") 422 | st.write(st.session_state["r_and_d_activities"]) 423 | else: 424 | st.error("R&D Activities insight has not been generated") 425 | except: 426 | st.error("This insight has not been generated") 427 | 428 | try: 429 | if innovation_focus: 430 | if st.session_state["innovation_focus"]: 431 | st.write("### Innovation Focus") 432 | st.write(st.session_state["innovation_focus"]) 433 | else: 434 | st.error("Innovation Focus insight has not been generated") 435 | except: 436 | st.error("This insight has not been generated") 437 | -------------------------------------------------------------------------------- /src/pdf_gen.py: -------------------------------------------------------------------------------- 1 | from io import BytesIO 2 | import json 3 | import sys 4 | from pathlib import Path 5 | script_dir = Path(__file__).resolve().parent 6 | project_root = script_dir.parent 7 | sys.path.append(str(project_root)) 8 | 9 | from reportlab.lib.pagesizes import letter 10 | from reportlab.lib.units import inch 11 | from reportlab.platypus import PageBreak, Paragraph, SimpleDocTemplate, Image 12 | from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle 13 | from reportlab.lib.enums import TA_CENTER 14 | from reportlab.lib.pagesizes import landscape 15 | from reportlab.platypus import Paragraph, Table, TableStyle, Spacer 16 | from reportlab.lib import colors 17 | import plotly.io as pio 18 | from io import BytesIO 19 | import tempfile 20 | 21 | 22 | from src.company_overview import company_overview 23 | from src.income_statement import income_statement 24 | from src.balance_sheet import balance_sheet 25 | from src.cash_flow import cash_flow 26 | from src.news_sentiment import top_news 27 | from src.utils import round_numeric, create_donut_chart, create_bar_chart 28 | 29 | # Get the default styles 30 | styles = getSampleStyleSheet() 31 | 32 | # Define custom styles 33 | centered_style = ParagraphStyle( 34 | 'CenteredStyle', 35 | parent=styles['Heading1'], 36 | alignment=TA_CENTER, 37 | fontSize=48, 38 | spaceAfter=50, 39 | ) 40 | 41 | sub_centered_style = ParagraphStyle( 42 | 'SubCenteredStyle', 43 | parent=styles['Heading2'], 44 | alignment=TA_CENTER, 45 | fontSize=24, 46 | spaceAfter=15, 47 | ) 48 | 49 | def cover_page(company_name): 50 | flowables = [] 51 | 52 | # Title 53 | title = "FinSight" 54 | para_title = Paragraph(title, centered_style) 55 | flowables.append(para_title) 56 | 57 | # Subtitle 58 | subtitle = "Financial Insights for
" 59 | para_subtitle = Paragraph(subtitle, sub_centered_style) 60 | flowables.append(para_subtitle) 61 | 62 | subtitle2 = "{} {}".format(company_name, "2022") 63 | para_subtitle2 = Paragraph(subtitle2, sub_centered_style) 64 | flowables.append(para_subtitle2) 65 | 66 | # Add a page break after the cover page 67 | flowables.append(PageBreak()) 68 | 69 | return flowables 70 | 71 | from reportlab.lib.styles import ParagraphStyle 72 | from reportlab.lib.enums import TA_LEFT 73 | 74 | # Define custom styles 75 | header_style = ParagraphStyle( 76 | 'HeaderStyle', 77 | parent=styles['Heading2'], 78 | fontSize=24, 79 | spaceAfter=20, 80 | leading=30 81 | ) 82 | 83 | sub_section_header_style = ParagraphStyle( 84 | 'SubSectionHeaderStyle', 85 | parent=styles['Heading3'], 86 | fontSize=16, 87 | spaceAfter=8, 88 | leading=20 89 | ) 90 | 91 | data_style = ParagraphStyle( 92 | 'DataStyle', 93 | parent=styles['Normal'], 94 | fontSize=14, 95 | spaceAfter=15, 96 | leading=20 97 | ) 98 | 99 | sub_header_style = ParagraphStyle( 100 | 'DataStyle', 101 | parent=styles['Normal'], 102 | fontSize=20, 103 | spaceAfter=15, 104 | leading=20 105 | ) 106 | 107 | def pdf_plotly_chart(fig): 108 | img_bytes = fig.to_image(format="png") 109 | temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".png") 110 | temp_file.write(img_bytes) 111 | temp_file.close() 112 | img = Image(temp_file.name, width=5*inch, height=3*inch) 113 | return img 114 | 115 | def pdf_company_overview(data): 116 | flowables = [] 117 | 118 | # Section Title 119 | title = "Company Overview" 120 | para_title = Paragraph(title, header_style) 121 | flowables.append(para_title) 122 | 123 | # Company Name 124 | # data = json.loads(data) 125 | company_name = data.get("Name") 126 | print(company_name) 127 | para_name = Paragraph(" {} ".format(company_name), sub_header_style) 128 | flowables.append(para_name) 129 | 130 | # Other details 131 | details = [ 132 | ("Symbol:", data.get("Symbol")), 133 | ("Exchange:", data.get("Exchange")), 134 | ("Currency:", data.get("Currency")), 135 | ("Sector:", data.get("Sector")), 136 | ("Industry:", data.get("Industry")), 137 | ("Description:", data.get("Description")), 138 | ("Country:", data.get("Country")), 139 | ("Address:", data.get("Address")), 140 | ("Fiscal Year End:", data.get("FiscalYearEnd")), 141 | ("Latest Quarter:", data.get("LatestQuarter")), 142 | ("Market Capitalization:", "$ "+str("{:,}".format(round_numeric(data.get("MarketCapitalization"))))) 143 | ] 144 | 145 | for label, value in details: 146 | para_label = Paragraph("{} {}".format(label, value), data_style) 147 | flowables.append(para_label) 148 | 149 | return flowables 150 | 151 | 152 | def pdf_income_statement(metrics, insights, chart_data): 153 | flowables = [] 154 | 155 | # Section Title 156 | title = "INCOME STATEMENT" 157 | para_title = Paragraph(title, header_style) 158 | flowables.append(para_title) 159 | 160 | 161 | # Metrics 162 | flowables.append(Paragraph("METRICS", sub_header_style)) 163 | for label, value in metrics.items(): 164 | metric_text = "{}: {}".format(label.replace("_", " ").title(), round_numeric(value)) 165 | flowables.append(Paragraph(metric_text, data_style)) 166 | 167 | # Insights 168 | flowables.append(Paragraph("INSIGHTS", sub_header_style)) 169 | 170 | 171 | try: 172 | flowables.append(Paragraph("Revenue Health", sub_section_header_style)) 173 | flowables.append(Paragraph(insights.revenue_health, data_style)) 174 | flowables.append(pdf_plotly_chart(create_bar_chart(chart_data, "total_revenue"))) 175 | except: 176 | pass 177 | try: 178 | flowables.append(Paragraph("Operational Efficiency", sub_section_header_style)) 179 | flowables.append(Paragraph(insights.operational_efficiency, data_style)) 180 | except: 181 | pass 182 | try: 183 | flowables.append(Paragraph("R&D Focus", sub_section_header_style)) 184 | flowables.append(Paragraph(insights.r_and_d_focus, data_style)) 185 | except: 186 | pass 187 | try: 188 | flowables.append(Paragraph("Debt Management", sub_section_header_style)) 189 | flowables.append(Paragraph(insights.debt_management, data_style)) 190 | flowables.append(pdf_plotly_chart(create_bar_chart(chart_data, "interest_expense"))) 191 | except: 192 | pass 193 | 194 | try: 195 | flowables.append(Paragraph("Profit Retention", sub_section_header_style)) 196 | flowables.append(Paragraph(insights.profit_retention, data_style)) 197 | flowables.append(pdf_plotly_chart(create_bar_chart(chart_data, "net_income"))) 198 | except: 199 | pass 200 | 201 | return flowables 202 | 203 | def pdf_balance_sheet(metrics, insights, chart_data): 204 | flowables = [] 205 | 206 | # Section Title 207 | title = "BALANCE SHEET" 208 | para_title = Paragraph(title, header_style) 209 | flowables.append(para_title) 210 | 211 | # Metrics 212 | flowables.append(Paragraph("METRICS", sub_header_style)) 213 | for label, value in metrics.items(): 214 | metric_text = "{}: {}".format(label.replace("_", " ").title(), round_numeric(value)) 215 | flowables.append(Paragraph(metric_text, data_style)) 216 | 217 | # Insights 218 | flowables.append(Paragraph("INSIGHTS", sub_header_style)) 219 | # insight_sections = [ 220 | # ("Liquidity Position", insights.liquidity_position), 221 | # ("Operational Efficiency", insights.operational_efficiency), 222 | # ("Capital Structure", insights.capital_structure), 223 | # ("Inventory Management", insights.inventory_management), 224 | # ("Overall Solvency", insights.overall_solvency) 225 | # ] 226 | 227 | # for section_title, insight_text in insight_sections: 228 | # flowables.append(Paragraph(section_title, sub_section_header_style)) 229 | # flowables.append(Paragraph(insight_text, data_style)) 230 | 231 | try: 232 | flowables.append(Paragraph("Liquidity Position", sub_section_header_style)) 233 | flowables.append(Paragraph(insights.liquidity_position, data_style)) 234 | flowables.append(pdf_plotly_chart(create_donut_chart(chart_data,"asset_composition"))) 235 | except: 236 | pass 237 | try: 238 | flowables.append(Paragraph("Operational Efficiency", sub_section_header_style)) 239 | flowables.append(Paragraph(insights.operational_efficiency, data_style)) 240 | except: 241 | pass 242 | try: 243 | flowables.append(Paragraph("Capital Structure", sub_section_header_style)) 244 | flowables.append(Paragraph(insights.capital_structure, data_style)) 245 | flowables.append(pdf_plotly_chart(create_donut_chart(chart_data, "liabilities_composition"))) 246 | 247 | except: 248 | pass 249 | 250 | try: 251 | flowables.append(Paragraph("Inventory Management", sub_section_header_style)) 252 | flowables.append(Paragraph(insights.inventory_management, data_style)) 253 | except: 254 | pass 255 | 256 | try: 257 | flowables.append(Paragraph("Overall Solvency", sub_section_header_style)) 258 | flowables.append(Paragraph(insights.overall_solvency, data_style)) 259 | flowables.append(pdf_plotly_chart(create_donut_chart(chart_data, "debt_structure"))) 260 | 261 | except: 262 | pass 263 | 264 | return flowables 265 | 266 | def pdf_cash_flow(metrics, insights, chart_data): 267 | flowables = [] 268 | 269 | # Section Title 270 | title = "CASH FLOW" 271 | para_title = Paragraph(title, header_style) 272 | flowables.append(para_title) 273 | 274 | flowables.append(Paragraph("METRICS", sub_header_style)) 275 | for label, value in metrics.items(): 276 | metric_text = "{}: {}".format(label.replace("_", " ").title(), round_numeric(value)) 277 | flowables.append(Paragraph(metric_text, data_style)) 278 | 279 | # Insights 280 | flowables.append(Paragraph("INSIGHTS", sub_header_style)) 281 | # insight_sections = [ 282 | # ("Operational Cash Efficiency", insights.operational_cash_efficiency), 283 | # ("Investment Capability", insights.investment_capability), 284 | # ("Financial Flexibility", insights.financial_flexibility), 285 | # ("Dividend Sustainability", insights.dividend_sustainability), 286 | # ("Debt Service Capability", insights.debt_service_capability) 287 | # ] 288 | 289 | # for section_title, insight_text in insight_sections: 290 | # flowables.append(Paragraph(section_title, sub_section_header_style)) 291 | # flowables.append(Paragraph(insight_text, data_style)) 292 | 293 | try: 294 | flowables.append(Paragraph("Operational Cash Efficiency", sub_section_header_style)) 295 | flowables.append(Paragraph(insights.operational_cash_efficiency, data_style)) 296 | flowables.append(pdf_plotly_chart(create_bar_chart(chart_data, "operating_cash_flow"))) 297 | except: 298 | pass 299 | 300 | try: 301 | flowables.append(Paragraph("Investment Capability", sub_section_header_style)) 302 | flowables.append(Paragraph(insights.investment_capability, data_style)) 303 | flowables.append(pdf_plotly_chart(create_bar_chart(chart_data, "cash_flow_from_investment"))) 304 | except: 305 | pass 306 | 307 | try: 308 | flowables.append(Paragraph("Financial Flexibility", sub_section_header_style)) 309 | flowables.append(Paragraph(insights.financial_flexibility, data_style)) 310 | flowables.append(pdf_plotly_chart(create_bar_chart(chart_data, "cash_flow_from_financing"))) 311 | except: 312 | pass 313 | 314 | try: 315 | flowables.append(Paragraph("Dividend Sustainability", sub_section_header_style)) 316 | flowables.append(Paragraph(insights.dividend_sustainability, data_style)) 317 | except: 318 | pass 319 | 320 | try: 321 | flowables.append(Paragraph("Debt Service Capability", sub_section_header_style)) 322 | flowables.append(Paragraph(insights.debt_service_capability, data_style)) 323 | except: 324 | pass 325 | 326 | 327 | 328 | return flowables 329 | 330 | def pdf_news_sentiment(data): 331 | flowables = [] 332 | 333 | # Section Title 334 | title = "NEWS SENTIMENT" 335 | para_title = Paragraph(title, header_style) 336 | flowables.append(para_title) 337 | flowables.append(Spacer(1, 12)) 338 | 339 | # News DataFrame to Table 340 | df = data['news'] 341 | table_data = [df.columns.to_list()] + df.values.tolist() 342 | table = Table(table_data, repeatRows=1) # repeatRows ensures the header is repeated if the table spans multiple pages 343 | 344 | # Table Style 345 | style = TableStyle([ 346 | ('BACKGROUND', (0, 0), (-1, 0), colors.grey), 347 | ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke), 348 | ('ALIGN', (0, 0), (-1, -1), 'LEFT'), 349 | ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'), 350 | ('FONTSIZE', (0, 0), (-1, 0), 12), 351 | ('BOTTOMPADDING', (0, 0), (-1, 0), 12), 352 | ('BACKGROUND', (0, 1), (-1, -1), colors.beige), 353 | ('GRID', (0, 0), (-1, -1), 1, colors.black) 354 | ]) 355 | table.setStyle(style) 356 | flowables.append(table) 357 | flowables.append(Spacer(1, 12)) 358 | 359 | # Mean Sentiment Score 360 | mean_sentiment_score_text = f"Mean Sentiment Score: {data['mean_sentiment_score']:.2f}" 361 | flowables.append(Paragraph(mean_sentiment_score_text, data_style)) 362 | flowables.append(Spacer(1, 12)) 363 | 364 | # Mean Sentiment Class 365 | mean_sentiment_class_text = f"Mean Sentiment Class: {data['mean_sentiment_class']}" 366 | flowables.append(Paragraph(mean_sentiment_class_text, data_style)) 367 | 368 | return flowables 369 | 370 | 371 | 372 | 373 | def gen_pdf(company_name, overview_data, income_statement_data, balance_sheet_data, cash_flow_data, news_data): 374 | doc = SimpleDocTemplate("pdf/final_report.pdf", pagesize=letter) 375 | all_flowables = [] 376 | 377 | all_flowables.extend(cover_page(company_name)) 378 | all_flowables.extend(pdf_company_overview(overview_data)) 379 | all_flowables.extend(pdf_income_statement(income_statement_data['metrics'], income_statement_data['insights'], income_statement_data['chart_data'])) 380 | # all_flowables.extend(pdf_balance_sheet(balance_sheet_data['metrics'], balance_sheet_data['insights'], balance_sheet_data['chart_data'])) 381 | # all_flowables.extend(pdf_cash_flow(cash_flow_data['metrics'], cash_flow_data['insights'], cash_flow_data['chart_data'])) 382 | # all_flowables.extend(pdf_news_sentiment(news_data)) 383 | doc.build(all_flowables) 384 | 385 | if __name__ == "__main__": 386 | overview_data = company_overview("AAPL") 387 | inc = income_statement("AAPL", [True, True, False, False, False]) 388 | # bal = balance_sheet("AAPL", [True, False, True, False, True]) 389 | # cash = cash_flow("AAPL", [True, True, True, False, False]) 390 | # news = top_news("AAPL", 10) 391 | gen_pdf("Apple Inc.", overview_data, inc, None, None, None) 392 | -------------------------------------------------------------------------------- /src/pydantic_models.py: -------------------------------------------------------------------------------- 1 | from pydantic import BaseModel, Field 2 | 3 | min_length = 40 4 | 5 | class IncomeStatementInsights(BaseModel): 6 | revenue_health: str = Field(..., description=f"Must be more than {min_length} words. Insight into the company's total revenue, providing a perspective on the health of the primary business activity.") 7 | operational_efficiency: str = Field(..., description=f"Must be more than {min_length} words. Analysis of the company's operating expenses in relation to its revenue, offering a view into the firm's operational efficiency.") 8 | r_and_d_focus: str = Field(..., description=f"Must be more than {min_length} words. Insight into the company's commitment to research and development, signifying its emphasis on innovation and future growth.") 9 | debt_management: str = Field(..., description=f"Must be more than {min_length} words. Analysis of the company's interest expenses, highlighting the scale of its debt obligations and its approach to leveraging.") 10 | profit_retention: str = Field(..., description=f"Must be more than {min_length} words. Insight into the company's net income, showcasing the amount retained post all expenses, which can be reinvested or distributed.") 11 | 12 | class BalanceSheetInsights(BaseModel): 13 | liquidity_position: str = Field(..., description=f"Must be more than {min_length} words. Insight into the company's ability to meet its short-term obligations using its short-term assets.") 14 | operational_efficiency: str = Field(..., description=f"Must be more than {min_length} words. Analysis of how efficiently the company is using its assets to generate sales.") 15 | capital_structure: str = Field(..., description=f"Must be more than {min_length} words. Insight into the company's financial leverage and its reliance on external liabilities versus internal equity.") 16 | inventory_management: str = Field(..., description=f"Must be more than {min_length} words. Analysis of the company's efficiency in managing, selling, and replacing its inventory.") 17 | overall_solvency: str = Field(..., description=f"Must be more than {min_length} words. Insight into the company's overall ability to meet its long-term debts and obligations.") 18 | 19 | class CashFlowInsights(BaseModel): 20 | operational_cash_efficiency: str = Field(..., description=f"Must be more than {min_length} words. Insight into how efficiently the company is generating cash from its core operations.") 21 | investment_capability: str = Field(..., description=f"Must be more than {min_length} words. Indicates the company's ability to invest in its business using its operational cash flows.") 22 | financial_flexibility: str = Field(..., description=f"Must be more than {min_length} words. Demonstrates the cash left after all operational expenses and investments, which can be used for dividends, share buybacks, or further investments.") 23 | dividend_sustainability: str = Field(..., description=f"Must be more than {min_length} words. Indicates the company's ability to cover its dividend payouts with its net earnings.") 24 | debt_service_capability: str = Field(..., description=f"Must be more than {min_length} words. Analysis of the company's ability to service its debt using the operational cash flows.") 25 | 26 | 27 | class FiscalYearHighlights(BaseModel): 28 | performance_highlights: str = Field(..., description="Key performance and financial stats over the fiscal year.") 29 | major_events: str = Field(..., description="Highlight of significant events, acquisitions, or strategic shifts that occurred during the year.") 30 | challenges_encountered: str = Field(..., description="Challenges the company faced during the year and, if and how they managed or overcame them.") 31 | # milestone_achievements: str = Field(..., description="Milestones achieved in terms of projects, expansions, or any other notable accomplishments.") 32 | 33 | 34 | class StrategyOutlookFutureDirection(BaseModel): 35 | strategic_initiatives: str = Field(..., description="The company's primary objectives and growth strategies for the upcoming years.") 36 | market_outlook: str = Field(..., description="Insights into the broader market, competitive landscape, and industry trends the company anticipates.") 37 | product_roadmap: str = Field(..., description="Upcoming launches, expansions, or innovations the company plans to roll out.") 38 | 39 | class RiskManagement(BaseModel): 40 | risk_factors: str = Field(..., description="Primary risks the company acknowledges.") 41 | risk_mitigation: str = Field(..., description="Strategies for managing these risks.") 42 | 43 | class CorporateGovernanceSocialResponsibility(BaseModel): 44 | board_governance: str = Field(..., description="Details about the company's board composition, governance policies, and any changes in leadership or structure.") 45 | csr_sustainability: str = Field(..., description="The company's initiatives related to environmental stewardship, community involvement, and ethical practices.") 46 | 47 | class InnovationRnD(BaseModel): 48 | r_and_d_activities: str = Field(..., description="Overview of the company's focus on research and development, major achievements, or breakthroughs.") 49 | innovation_focus: str = Field(..., description="Mention of new technologies, patents, or areas of research the company is diving into.") 50 | -------------------------------------------------------------------------------- /src/ticker_symbol.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | import csv 8 | import requests 9 | import streamlit as st 10 | 11 | API_TOKEN = st.secrets["eod_api_key"] 12 | 13 | def get_ticker_symbol(company_name): 14 | with open("data/ticker_symbols/ticker_symbols.csv", 'r') as csvfile: 15 | reader = csv.DictReader(csvfile) 16 | for row in reader: 17 | if row['Name'] == company_name: 18 | return row['Code'] 19 | return None 20 | 21 | # Example usage: 22 | # if __name__ == "__main__": # Replace with the path to your CSV file 23 | # company_name = "Apple Inc" 24 | # ticker = get_ticker_symbol(company_name) 25 | # if ticker: 26 | # print(f"The ticker symbol for {company_name} is {ticker}.") 27 | # else: 28 | # print(f"No ticker symbol found for {company_name}.") 29 | 30 | def get_all_company_names(): 31 | company_names = [] 32 | with open("data/ticker_symbols/ticker_symbols.csv", 'r') as csvfile: 33 | reader = csv.DictReader(csvfile) 34 | for row in reader: 35 | if row['Type'] == "Common Stock": 36 | company_names.append(row['Name']) 37 | return tuple(company_names) 38 | 39 | # Example usage: 40 | if __name__ == "__main__": 41 | companies = get_all_company_names() 42 | print(companies) 43 | 44 | 45 | 46 | def get_symbols_for_exchange(exchange_code, api_token): 47 | base_url = "https://eodhd.com/api/exchange-symbol-list/" 48 | url = f"{base_url}{exchange_code}/" 49 | params = { 50 | "api_token": api_token 51 | } 52 | 53 | response = requests.get(url, params=params) 54 | 55 | if response.status_code == 200: 56 | try: 57 | return response.json() 58 | except ValueError: 59 | print("Received unexpected response:") 60 | print(response.text) 61 | print(type(response)) 62 | print(type(response.text)) 63 | print(len(response.text)) 64 | with open("data/ticker_symbols/ticker_symbols.txt", "w") as f: 65 | f.write(response.text) 66 | with open("data/ticker_symbols/ticker_symbols.csv", "w") as f: 67 | f.write(response.text) 68 | return None 69 | else: 70 | response.raise_for_status() 71 | 72 | if __name__ == "__main__": 73 | EXCHANGE_CODE = ['NYSE', 'NASDAQ'] # Replace with your desired exchange code 74 | 75 | try: 76 | data = get_symbols_for_exchange(EXCHANGE_CODE, API_TOKEN) 77 | print(data) 78 | except requests.RequestException as e: 79 | print(f"Error occurred: {e}") 80 | -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | 8 | from langchain.vectorstores import FAISS 9 | from langchain.text_splitter import CharacterTextSplitter 10 | from langchain.embeddings import OpenAIEmbeddings 11 | from langchain.prompts import PromptTemplate 12 | from langchain.output_parsers import PydanticOutputParser 13 | from langchain.chat_models import ChatOpenAI 14 | # from llama_index import VectorStoreIndex, SimpleDirectoryReader 15 | # from llama_index.vector_stores import WeaviateVectorStore 16 | from llama_index.schema import Document 17 | from llama_index.llms import OpenAI 18 | # from llama_index.node_parser import SimpleNodeParser 19 | 20 | 21 | # from dotenv import dotenv_values 22 | import weaviate 23 | from pypdf import PdfReader 24 | import streamlit as st 25 | import requests 26 | import time 27 | import json 28 | import plotly.graph_objects as go 29 | from pydantic import create_model 30 | from langchain.llms import OpenAI 31 | import os 32 | # config = dotenv_values(".env") 33 | 34 | # OPENAI_API_KEY = config["OPENAI_API_KEY"] 35 | # AV_API_KEY = config["ALPHA_VANTAGE_API_KEY"] 36 | 37 | # OPENAI_API_KEY = st.secrets["openai_api_key"] 38 | # AV_API_KEY = st.secrets["av_api_key"] 39 | 40 | AV_API_KEY = os.environ.get("AV_API_KEY") 41 | OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") 42 | 43 | USER_ID = 'openai' 44 | APP_ID = 'chat-completion' 45 | MODEL_ID = 'GPT-4' 46 | MODEL_VERSION_ID = '4aa760933afa4a33a0e5b4652cfa92fa' 47 | 48 | def get_model(model_name, api_key): 49 | if model_name == "openai": 50 | model = ChatOpenAI(openai_api_key=api_key, model_name="gpt-3.5-turbo") 51 | return model 52 | 53 | def process_pdf(pdfs): 54 | docs = [] 55 | 56 | for pdf in pdfs: 57 | file = PdfReader(pdf) 58 | text = "" 59 | for page in file.pages: 60 | text += str(page.extract_text()) 61 | # docs.append(Document(TextNode(text))) 62 | 63 | text_splitter = CharacterTextSplitter(separator="\n", 64 | chunk_size=2000, 65 | chunk_overlap=300, 66 | length_function=len) 67 | docs = text_splitter.split_documents(docs) 68 | # docs = text_splitter.split_text(text) 69 | 70 | return docs 71 | 72 | def process_pdf2(pdf): 73 | file = PdfReader(pdf) 74 | text = "" 75 | for page in file.pages: 76 | text += str(page.extract_text()) 77 | 78 | doc = Document(text=text) 79 | return [doc] 80 | 81 | 82 | def faiss_db(splitted_text): 83 | embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY) 84 | db = FAISS.from_texts(splitted_text, embeddings) 85 | db.save_local("faiss_db") 86 | return db 87 | 88 | def safe_float(value): 89 | if value == "None" or value == None: 90 | return "N/A" 91 | return float(value) 92 | 93 | def round_numeric(value, decimal_places=2): 94 | if isinstance(value, (int, float)): 95 | return round(value, decimal_places) 96 | elif isinstance(value, str) and value.replace(".", "", 1).isdigit(): 97 | # Check if the string represents a numeric value 98 | return round(float(value), decimal_places) 99 | else: 100 | return value 101 | 102 | def format_currency(value): 103 | if value == "N/A": 104 | return value 105 | if value >= 1_000_000_000: # billion 106 | return f"${value / 1_000_000_000:.2f} billion" 107 | elif value >= 1_000_000: # million 108 | return f"${value / 1_000_000:.2f} million" 109 | else: 110 | return f"${value:.2f}" 111 | 112 | def get_total_revenue(symbol): 113 | time.sleep(3) 114 | url = "https://www.alphavantage.co/query" 115 | params = { 116 | "function": "INCOME_STATEMENT", 117 | "symbol": symbol, 118 | "apikey": AV_API_KEY 119 | } 120 | response = requests.get(url, params=params) 121 | data = response.json() 122 | total_revenue = safe_float(data["annualReports"][0]["totalRevenue"]) 123 | 124 | return total_revenue 125 | 126 | def get_total_debt(symbol): 127 | time.sleep(3) 128 | url = "https://www.alphavantage.co/query" 129 | params = { 130 | "function": "BALANCE_SHEET", 131 | "symbol": symbol, 132 | "apikey": AV_API_KEY 133 | } 134 | response = requests.get(url, params=params) 135 | data = response.json() 136 | short_term = safe_float(data["annualReports"][0]["shortTermDebt"]) 137 | time.sleep(3) 138 | long_term = safe_float(data["annualReports"][0]["longTermDebt"]) 139 | 140 | if short_term == "N/A" or long_term == "N/A": 141 | return "N/A" 142 | return short_term + long_term 143 | 144 | def generate_pydantic_model(fields_to_include, attributes, base_fields): 145 | selected_fields = {attr: base_fields[attr] for attr, include in zip(attributes, fields_to_include) if include} 146 | 147 | return create_model("DynamicModel", **selected_fields) 148 | 149 | def insights(insight_name, type_of_data, data, output_format, api_key): 150 | print(type_of_data) 151 | 152 | with open("prompts/iv2.prompt", "r") as f: 153 | template = f.read() 154 | 155 | 156 | prompt = PromptTemplate( 157 | template=template, 158 | input_variables=["insight_name","type_of_data","inputs", "output_format"], 159 | # partial_variables={"output_format": parser.get_format_instructions()} 160 | ) 161 | 162 | model = get_model("openai", api_key) 163 | 164 | data = json.dumps(data) 165 | 166 | formatted_input = prompt.format(insight_name=insight_name,type_of_data=type_of_data, inputs=data, output_format=output_format) 167 | 168 | print("-"*30) 169 | print("Formatted Input:") 170 | print(formatted_input) 171 | print("-"*30) 172 | 173 | response = model.predict(formatted_input) 174 | return response 175 | 176 | 177 | 178 | def format_title(s: str) -> str: 179 | return ' '.join(word.capitalize() for word in s.split('_')) 180 | 181 | def create_time_series_chart(data, type_of_data: str, title: str): 182 | yaxis_title = format_title(type_of_data) 183 | fig = go.Figure(data=[go.Scatter(x=data['dates'], y=data[type_of_data], mode='lines+markers')]) 184 | fig.update_layout(yaxis=dict(range=[0, max(data)])) 185 | fig.update_layout(title=title, 186 | xaxis_title='Date', 187 | yaxis_title=yaxis_title) 188 | 189 | 190 | 191 | return fig 192 | 193 | # data = { 194 | # 'dates': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'], 195 | # 'temperature': [22, 24, 23, 22, 21], 196 | # 'humidity': [40, 42, 41, 40, 39] 197 | # } 198 | # # Create a temperature time series chart 199 | # temperature_chart = create_time_series_chart(data, 'temperature') 200 | # temperature_chart.show() 201 | 202 | # # Create a humidity time series chart 203 | # humidity_chart = create_time_series_chart(data, 'humidity') 204 | # humidity_chart.show() 205 | 206 | import plotly.graph_objects as go 207 | 208 | def create_donut_chart(data, type_of_data, hole_size=0.3): 209 | 210 | labels = list(data[type_of_data].keys()) 211 | values = list(data[type_of_data].values()) 212 | 213 | fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=hole_size)]) 214 | fig.update_layout(title=format_title(type_of_data)) 215 | 216 | return fig 217 | 218 | # # Example usage: 219 | # data = { 220 | # 'Oxygen': 4500, 221 | # 'Hydrogen': 2500, 222 | # 'Carbon_Dioxide': 1053, 223 | # 'Nitrogen': 500 224 | # } 225 | # chart = create_donut_chart(data, title="Donut Chart") 226 | # chart.show() 227 | 228 | def create_bar_chart(data, type_of_data: str, title: str = None): 229 | yaxis_title = format_title(type_of_data) 230 | fig = go.Figure(data=[go.Bar(x=data['dates'], y=data[type_of_data])]) 231 | # fig.update_layout(yaxis=dict(range=[0, max(data[type_of_data])])) 232 | fig.update_layout(title=format_title(type_of_data), 233 | xaxis_title='Date', 234 | yaxis_title=yaxis_title) 235 | 236 | return fig 237 | 238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | -------------------------------------------------------------------------------- /src/🏡_Home.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | import streamlit as st 8 | 9 | st.set_page_config(page_title="FinSight", page_icon=":money_with_wings:", layout="wide") 10 | 11 | st.title(":money_with_wings: FinSight \n\n **Financial Insights at Your Fingertip**") 12 | 13 | st.balloons() 14 | 15 | st.success(""" 16 | If you'd like to learn more about the technical details of FinSight, check out the LlamaIndex blogpost below where I do a deep dive into the project: 17 | 18 | [How I built the Streamlit LLM Hackathon winning app — FinSight using LlamaIndex.](https://blog.llamaindex.ai/how-i-built-the-streamlit-llm-hackathon-winning-app-finsight-using-llamaindex-9dcf6c46d7a0) 19 | 20 | """) 21 | 22 | with open("docs/news.md", "r") as f: 23 | st.success(f.read()) 24 | 25 | with open("docs/main.md", "r") as f: 26 | st.info(f.read()) 27 | 28 | -------------------------------------------------------------------------------- /test_files/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/test_files/.DS_Store -------------------------------------------------------------------------------- /test_files/Models.py: -------------------------------------------------------------------------------- 1 | from typing import List 2 | from pydantic import BaseModel 3 | class IncomeStatementRequest(BaseModel): 4 | symbol: str 5 | fields_to_include: List[bool] -------------------------------------------------------------------------------- /test_files/RAG/data1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/test_files/RAG/data1.pdf -------------------------------------------------------------------------------- /test_files/RAG/tech_1.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | import camelot 8 | import tabula 9 | import pandas as pd 10 | from llama_index import Document, SummaryIndex 11 | 12 | # https://en.wikipedia.org/wiki/The_World%27s_Billionaires 13 | from llama_index import VectorStoreIndex, ServiceContext, LLMPredictor 14 | from llama_index.query_engine import PandasQueryEngine, RetrieverQueryEngine 15 | from llama_index.retrievers import RecursiveRetriever 16 | from llama_index.schema import IndexNode 17 | from llama_index.llms import OpenAI 18 | from llama_index import download_loader 19 | 20 | from pathlib import Path 21 | from typing import List 22 | 23 | from src.utils import get_model 24 | 25 | PDFReader = download_loader("PDFReader") 26 | loader = PDFReader() 27 | pdf = loader.load_data(file=Path("data/apple/AAPL.pdf")) 28 | 29 | 30 | 31 | # print(pdf) 32 | 33 | pages = ['32', '33', '34', '35', '36'] 34 | table_titles = ['Consolidated Statements of Operations', 'Consolidated Statements of Comprehensive Income', 'Consolidated Balance Sheets', 'Consolidated Statements of Shareholders’ Equity', 'Consolidated Statements of Cash Flows'] 35 | 36 | table1 = tabula.read_pdf("data/apple/AAPL.pdf",output_format="dataframe", pages="32") 37 | print(type(table1)) 38 | print(len(table1)) 39 | print(pd.DataFrame(table1)) 40 | 41 | 42 | # def get_tables(path, pages, table_titles): 43 | # tables = {} 44 | # for i, page in enumerate(pages): 45 | # table = tabula.read_pdf(path, pages=f"{page}") 46 | 47 | # tables[table_titles[i]] = table 48 | 49 | # return tables 50 | 51 | 52 | # tables = get_tables("data/apple/AAPL.pdf", pages, table_titles) 53 | 54 | # # iterate through json object 55 | # for key, value in tables.items(): 56 | # print("-"*30) 57 | # print("Title: ",key) 58 | # print(value) 59 | 60 | 61 | -------------------------------------------------------------------------------- /test_files/RAG/test.py: -------------------------------------------------------------------------------- 1 | import tabula 2 | file1 = "https://nbviewer.jupyter.org/github/kuruvasatya/Scraping-Tables-from-PDF/blob/master/data1.pdf" 3 | table = tabula.read_pdf(file1,pages=1) 4 | print(table[0]) -------------------------------------------------------------------------------- /test_files/__pycache__/Models.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/test_files/__pycache__/Models.cpython-311.pyc -------------------------------------------------------------------------------- /test_files/__pycache__/main.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/test_files/__pycache__/main.cpython-311.pyc -------------------------------------------------------------------------------- /test_files/attribs.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | from pydantic import BaseModel, Field, create_model 8 | import requests 9 | import streamlit as st 10 | 11 | 12 | from src.utils import insights 13 | from src.income_statement import charts, metrics 14 | 15 | 16 | AV_API_KEY = st.secrets["av_api_key"] 17 | 18 | # from src.income_statement import income_statement 19 | 20 | def generate_model(fields_to_include, attributes, base_fields): 21 | attributes = ["revenue_health", "operational_efficiency", "r_and_d_focus", "debt_management", "profit_retention"] 22 | selected_fields = {attr: base_fields[attr] for attr, include in zip(attributes, fields_to_include) if include} 23 | 24 | return create_model("DynamicIncomeStatementInsights", **selected_fields) 25 | 26 | def income_statement(symbol, fields_to_include): 27 | 28 | Model = generate_model(fields_to_include) 29 | 30 | url = "https://www.alphavantage.co/query" 31 | params = { 32 | "function": "INCOME_STATEMENT", 33 | "symbol": symbol, 34 | "apikey": AV_API_KEY 35 | } 36 | 37 | # Send a GET request to the API 38 | response = requests.get(url, params=params) 39 | if response.status_code == 200: 40 | data = response.json() 41 | if not data: 42 | print(f"No data found for {symbol}") 43 | return None 44 | 45 | 46 | else: 47 | print(f"Error: {response.status_code} - {response.text}") 48 | 49 | chart_data = charts(data) 50 | 51 | report = data["annualReports"][0] 52 | met = metrics(report) 53 | 54 | data_for_insights = { 55 | "annual_report_data": report, 56 | "historical_data": chart_data, 57 | } 58 | ins = insights("income statement", data_for_insights, Model) 59 | 60 | return { 61 | "metrics": met, 62 | "chart_data": chart_data, 63 | "insights": ins 64 | } 65 | 66 | 67 | 68 | min_length = 5 69 | 70 | base_fields = { 71 | "revenue_health": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's total revenue, providing a perspective on the health of the primary business activity.")), 72 | "operational_efficiency": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of the company's operating expenses in relation to its revenue, offering a view into the firm's operational efficiency.")), 73 | "r_and_d_focus": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's commitment to research and development, signifying its emphasis on innovation and future growth.")), 74 | "debt_management": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of the company's interest expenses, highlighting the scale of its debt obligations and its approach to leveraging.")), 75 | "profit_retention": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's net income, showcasing the amount retained post all expenses, which can be reinvested or distributed.")) 76 | } 77 | 78 | 79 | 80 | # Example usage 81 | fields_to_include = [True, False, False, False, True] 82 | 83 | # instance = DynamicModel(revenue_health="good", r_and_d_focus="high", profit_retention="medium") 84 | # print(instance) 85 | 86 | response = income_statement("TSLA", fields_to_include) 87 | print(response['insights']) 88 | 89 | -------------------------------------------------------------------------------- /test_files/av-api-test.py: -------------------------------------------------------------------------------- 1 | from dotenv import dotenv_values 2 | import requests 3 | 4 | config = dotenv_values(".env") 5 | 6 | AV_API_KEY = config["ALPHA_VANTAGE_API_KEY"] 7 | 8 | import requests 9 | 10 | 11 | url = "https://www.alphavantage.co/query" 12 | 13 | symbol = "TSLA" 14 | params = { 15 | "function": "CASH_FLOW", 16 | "symbol": symbol, 17 | "apikey": AV_API_KEY 18 | } 19 | response = requests.get(url, params=params) 20 | data = response.json() 21 | data = data["annualReports"][0] 22 | print(data) 23 | 24 | 25 | -------------------------------------------------------------------------------- /test_files/finchat.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | 8 | from langchain.chat_models import ChatOpenAI 9 | from langchain.memory import ConversationBufferWindowMemory 10 | from langchain.chains import ConversationalRetrievalChain 11 | 12 | import streamlit as st 13 | from dotenv import dotenv_values 14 | 15 | from src.utils import process_pdf, vector_store 16 | 17 | # config = dotenv_values(".env") 18 | 19 | # OPENAI_API_KEY = config["OPENAI_API_KEY"] 20 | 21 | OPENAI_API_KEY = st.secrets["openai_api_key"] 22 | 23 | 24 | def handle_query(query: str): 25 | result = st.session_state.conversation({"question": query, "chat_history": ""}) 26 | history = st.session_state.memory.load_memory_variables({})['chat_history'] 27 | print(st.session_state.memory.load_memory_variables({})['chat_history']) 28 | for i, msg in enumerate(history): 29 | if i%2 == 0: 30 | st.write("hello") 31 | st.chat_message("user").write(msg.content) 32 | else: 33 | st.chat_message("assistant").write(msg.content) 34 | 35 | 36 | if __name__ == "__main__": 37 | 38 | if "memory" not in st.session_state: 39 | st.session_state.memory = None 40 | 41 | if "conversation" not in st.session_state: 42 | st.session_state.conversation = None 43 | 44 | st.divider() 45 | 46 | model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-3.5-turbo") 47 | 48 | if "process_pdf" not in st.session_state: 49 | st.session_state.process_pdf = False 50 | 51 | pdfs = st.sidebar.file_uploader("Upload a PDF file", type=["pdf"], accept_multiple_files=True) 52 | if st.sidebar.button("Process PDF"): 53 | st.session_state.process_pdf = True 54 | with st.spinner("Processing PDF..."): 55 | splitted_text = process_pdf(pdfs) 56 | db = vector_store(splitted_text) 57 | st.session_state.memory = ConversationBufferWindowMemory(memory_key='chat_history', return_messages=True, k=5) 58 | st.session_state.conversation = ConversationalRetrievalChain.from_llm(llm=model, 59 | chain_type="map_reduce", 60 | retriever=db.as_retriever(), 61 | memory=st.session_state.memory) 62 | 63 | if st.session_state.process_pdf: 64 | query = st.chat_input("Ask a question") 65 | if query: 66 | handle_query(query) -------------------------------------------------------------------------------- /test_files/fmp-api.py: -------------------------------------------------------------------------------- 1 | from dotenv import dotenv_values 2 | 3 | config = dotenv_values(".env") 4 | 5 | FMP_API_KEY = config["FMP_API_KEY"] 6 | 7 | import requests 8 | 9 | # Define the API endpoint URL 10 | url = "https://financialmodelingprep.com/api/v4/financial-reports-json" 11 | 12 | # Define your API key (replace YOUR_API_KEY with your actual API key) 13 | api_key = "YOUR_API_KEY" 14 | 15 | # Define the parameters for the request 16 | params = { 17 | "symbol": "AAPL", 18 | "year": "2020", 19 | "period": "FY", 20 | "apikey": FMP_API_KEY 21 | } 22 | 23 | # Send a GET request to the API 24 | response = requests.get(url, params=params) 25 | 26 | # Check if the request was successful (status code 200) 27 | if response.status_code == 200: 28 | data = response.json() 29 | print(data) 30 | else: 31 | print(f"Error: {response.status_code} - {response.text}") 32 | -------------------------------------------------------------------------------- /test_files/main.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | import uvicorn 8 | from fastapi import FastAPI 9 | from pydantic import BaseModel 10 | from test_files.Models import IncomeStatementRequest 11 | from src.income_statement import income_statement 12 | 13 | app = FastAPI() 14 | @app.post("/income_statement") 15 | def get_income_statement(request_data: IncomeStatementRequest): 16 | symbol = request_data.symbol 17 | fields_to_include = request_data.fields_to_include 18 | 19 | # Call the income_statement function to retrieve data 20 | income_statement_data = income_statement(symbol, fields_to_include) 21 | 22 | return income_statement_data # Return the data as a response 23 | 24 | if __name__ == "main": 25 | uvicorn.run(app, host="0.0.0.0", port=8000) -------------------------------------------------------------------------------- /test_files/node_parsing.py: -------------------------------------------------------------------------------- 1 | 2 | from llama_index.node_parser.extractors import ( 3 | MetadataExtractor, 4 | SummaryExtractor, 5 | QuestionsAnsweredExtractor, 6 | TitleExtractor, 7 | KeywordExtractor, 8 | EntityExtractor, 9 | MetadataFeatureExtractor, 10 | ) 11 | from llama_index.text_splitter import TokenTextSplitter 12 | from llama_index.node_parser import SimpleNodeParser 13 | from llama_index import SimpleDirectoryReader 14 | 15 | document = SimpleDirectoryReader("data/apple").load_data() 16 | 17 | print(len(document[0].text)) 18 | chunk_size = len(document[0].text) // 25 19 | 20 | text_splitter = TokenTextSplitter(separator=" ", chunk_size=chunk_size, chunk_overlap=chunk_size//10) 21 | metadata_extractor = MetadataExtractor( 22 | extractors=[ 23 | # TitleExtractor(), 24 | # SummaryExtractor(), 25 | KeywordExtractor(keywords=1), 26 | # QuestionsAnsweredExtractor(questions=3), 27 | ], 28 | ) 29 | 30 | node_parser = SimpleNodeParser.from_defaults( 31 | text_splitter=text_splitter, 32 | metadata_extractor=metadata_extractor, 33 | ) 34 | # assume documents are defined -> extract nodes 35 | nodes = node_parser.get_nodes_from_documents(document) 36 | print(len(nodes)) 37 | print(nodes[0].metadata) 38 | print(nodes[0]) 39 | print(nodes[1].metadata) 40 | print(nodes[1]) 41 | -------------------------------------------------------------------------------- /test_files/nodes.py: -------------------------------------------------------------------------------- 1 | from pypdf import PdfReader 2 | import streamlit as st 3 | from llama_index import Document 4 | from llama_index.node_parser import SimpleNodeParser 5 | 6 | node_parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20) 7 | 8 | pdfs = st.file_uploader("pdf file") 9 | docs = [] 10 | for pdf in [pdfs]: 11 | file = PdfReader(pdf) 12 | text = "" 13 | for page in file.pages: 14 | text += str(page.extract_text()) 15 | 16 | docs.append(Document(text=text)) 17 | 18 | # nodes = node_parser.get_nodes_from_documents(docs, show_progress=False) 19 | # print(nodes) 20 | 21 | print(docs) -------------------------------------------------------------------------------- /test_files/open_ai_api.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | from typing import Literal 4 | 5 | script_dir = Path(__file__).resolve().parent 6 | project_root = script_dir.parent 7 | sys.path.append(str(project_root)) 8 | 9 | import os 10 | import openai 11 | import streamlit as st 12 | from pydantic import BaseModel, Field 13 | 14 | OPENAI_API_KEY = st.secrets["openai_api_key"] 15 | 16 | 17 | # # Load your API key from an environment variable or secret management service 18 | openai.api_key = os.getenv("OPENAI_API_KEY") 19 | 20 | # chat_completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "who is lionel messi?"}]) 21 | # print(chat_completion) 22 | 23 | # from llama_index.llms import OpenAI 24 | from langchain.chat_models import ChatOpenAI 25 | 26 | llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-4") 27 | 28 | response = llm.predict("who is virat kohli?") 29 | print(type(response)) 30 | print(response) -------------------------------------------------------------------------------- /test_files/parser.py: -------------------------------------------------------------------------------- 1 | # Define your desired data structure. 2 | from typing import List 3 | 4 | from langchain.chat_models import ChatOpenAI 5 | from langchain.output_parsers import PydanticOutputParser 6 | from langchain.prompts import PromptTemplate 7 | from langchain.pydantic_v1 import BaseModel, Field, validator 8 | 9 | import streamlit as st 10 | 11 | OPENAI_API_KEY = st.secrets["openai_api_key"] 12 | 13 | 14 | model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-4") 15 | 16 | 17 | # Here's another example, but with a compound typed field. 18 | class Actor(BaseModel): 19 | name: str = Field(description="name of an actor") 20 | film_names: List[str] = Field(description="list of names of films they starred in") 21 | 22 | 23 | actor_query = "Generate the filmography for a random actor." 24 | 25 | parser = PydanticOutputParser(pydantic_object=Actor) 26 | 27 | prompt = PromptTemplate( 28 | template="Answer the user query.\n{format_instructions}\n{query}\n", 29 | input_variables=["query"], 30 | partial_variables={"format_instructions": parser.get_format_instructions()}, 31 | ) 32 | 33 | _input = prompt.format_prompt(query=actor_query) 34 | 35 | output = model(_input.to_string()) 36 | 37 | print(parser.parse(output)) 38 | 39 | -------------------------------------------------------------------------------- /test_files/pdf1.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | pdf1 = open("AAPL.pdf", "r") 8 | print(pdf1.read()) -------------------------------------------------------------------------------- /test_files/pdf_gen.py: -------------------------------------------------------------------------------- 1 | import json 2 | import sys 3 | from pathlib import Path 4 | script_dir = Path(__file__).resolve().parent 5 | project_root = script_dir.parent 6 | sys.path.append(str(project_root)) 7 | 8 | from reportlab.lib.pagesizes import letter 9 | from reportlab.platypus import PageBreak, Paragraph, SimpleDocTemplate 10 | from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle 11 | from reportlab.lib.enums import TA_CENTER 12 | from reportlab.lib.pagesizes import landscape 13 | from reportlab.platypus import Paragraph, Table, TableStyle, Spacer 14 | from reportlab.lib import colors 15 | 16 | 17 | from src.company_overview import company_overview 18 | from src.income_statement import income_statement 19 | from src.balance_sheet import balance_sheet 20 | from src.cash_flow import cash_flow 21 | from src.news_sentiment import top_news 22 | from src.utils import round_numeric 23 | 24 | # Get the default styles 25 | styles = getSampleStyleSheet() 26 | 27 | # Define custom styles 28 | centered_style = ParagraphStyle( 29 | 'CenteredStyle', 30 | parent=styles['Heading1'], 31 | alignment=TA_CENTER, 32 | fontSize=48, 33 | spaceAfter=50, 34 | ) 35 | 36 | sub_centered_style = ParagraphStyle( 37 | 'SubCenteredStyle', 38 | parent=styles['Heading2'], 39 | alignment=TA_CENTER, 40 | fontSize=24, 41 | spaceAfter=15, 42 | ) 43 | 44 | def cover_page(company_name): 45 | flowables = [] 46 | 47 | # Title 48 | title = "FinSight" 49 | para_title = Paragraph(title, centered_style) 50 | flowables.append(para_title) 51 | 52 | # Subtitle 53 | subtitle = "Financial Insights for
" 54 | para_subtitle = Paragraph(subtitle, sub_centered_style) 55 | flowables.append(para_subtitle) 56 | 57 | subtitle2 = "{} {}".format(company_name, "2022") 58 | para_subtitle2 = Paragraph(subtitle2, sub_centered_style) 59 | flowables.append(para_subtitle2) 60 | 61 | # Add a page break after the cover page 62 | flowables.append(PageBreak()) 63 | 64 | return flowables 65 | 66 | from reportlab.lib.styles import ParagraphStyle 67 | from reportlab.lib.enums import TA_LEFT 68 | 69 | # Define custom styles 70 | header_style = ParagraphStyle( 71 | 'HeaderStyle', 72 | parent=styles['Heading2'], 73 | fontSize=24, 74 | spaceAfter=20, 75 | leading=30 76 | ) 77 | 78 | sub_section_header_style = ParagraphStyle( 79 | 'SubSectionHeaderStyle', 80 | parent=styles['Heading3'], 81 | fontSize=16, 82 | spaceAfter=8, 83 | leading=20 84 | ) 85 | 86 | data_style = ParagraphStyle( 87 | 'DataStyle', 88 | parent=styles['Normal'], 89 | fontSize=14, 90 | spaceAfter=15, 91 | leading=20 92 | ) 93 | 94 | sub_header_style = ParagraphStyle( 95 | 'DataStyle', 96 | parent=styles['Normal'], 97 | fontSize=20, 98 | spaceAfter=15, 99 | leading=20 100 | ) 101 | 102 | def pdf_company_overview(data): 103 | flowables = [] 104 | 105 | # Section Title 106 | title = "Company Overview" 107 | para_title = Paragraph(title, header_style) 108 | flowables.append(para_title) 109 | 110 | # Company Name 111 | # data = json.loads(data) 112 | company_name = data.get("Name") 113 | print(company_name) 114 | para_name = Paragraph(" {} ".format(company_name), sub_header_style) 115 | flowables.append(para_name) 116 | 117 | # Other details 118 | details = [ 119 | ("Symbol:", data.get("Symbol")), 120 | ("Exchange:", data.get("Exchange")), 121 | ("Currency:", data.get("Currency")), 122 | ("Sector:", data.get("Sector")), 123 | ("Industry:", data.get("Industry")), 124 | ("Description:", data.get("Description")), 125 | ("Country:", data.get("Country")), 126 | ("Address:", data.get("Address")), 127 | ("Fiscal Year End:", data.get("Fiscal_year_end")), 128 | ("Latest Quarter:", data.get("Latest_quarter")), 129 | ("Market Capitalization:", data.get("Market_cap")) 130 | ] 131 | 132 | for label, value in details: 133 | para_label = Paragraph("{} {}".format(label, value), data_style) 134 | flowables.append(para_label) 135 | 136 | return flowables 137 | 138 | 139 | def pdf_income_statement(metrics, insights): 140 | flowables = [] 141 | 142 | # Section Title 143 | title = "INCOME STATEMENT" 144 | para_title = Paragraph(title, header_style) 145 | flowables.append(para_title) 146 | 147 | 148 | # Metrics 149 | flowables.append(Paragraph("METRICS", sub_header_style)) 150 | for label, value in metrics.items(): 151 | metric_text = "{}: {}".format(label.replace("_", " ").title(), round_numeric(value)) 152 | flowables.append(Paragraph(metric_text, data_style)) 153 | 154 | # Insights 155 | flowables.append(Paragraph("INSIGHTS", sub_header_style)) 156 | flowables.append(Paragraph("Revenue Health", sub_section_header_style)) 157 | flowables.append(Paragraph(insights.revenue_health, data_style)) 158 | flowables.append(Paragraph("Operational Efficiency", sub_section_header_style)) 159 | flowables.append(Paragraph(insights.operational_efficiency, data_style)) 160 | flowables.append(Paragraph("R&D Focus", sub_section_header_style)) 161 | flowables.append(Paragraph(insights.r_and_d_focus, data_style)) 162 | flowables.append(Paragraph("Debt Management", sub_section_header_style)) 163 | flowables.append(Paragraph(insights.debt_management, data_style)) 164 | flowables.append(Paragraph("Profit Retention", sub_section_header_style)) 165 | flowables.append(Paragraph(insights.profit_retention, data_style)) 166 | 167 | return flowables 168 | 169 | def pdf_balance_sheet(metrics, insights): 170 | flowables = [] 171 | 172 | # Section Title 173 | title = "BALANCE SHEET" 174 | para_title = Paragraph(title, header_style) 175 | flowables.append(para_title) 176 | 177 | # Metrics 178 | flowables.append(Paragraph("METRICS", sub_header_style)) 179 | for label, value in metrics.items(): 180 | metric_text = "{}: {}".format(label.replace("_", " ").title(), round_numeric(value)) 181 | flowables.append(Paragraph(metric_text, data_style)) 182 | 183 | # Insights 184 | flowables.append(Paragraph("INSIGHTS", sub_header_style)) 185 | insight_sections = [ 186 | ("Liquidity Position", insights.liquidity_position), 187 | ("Operational Efficiency", insights.operational_efficiency), 188 | ("Capital Structure", insights.capital_structure), 189 | ("Inventory Management", insights.inventory_management), 190 | ("Overall Solvency", insights.overall_solvency) 191 | ] 192 | 193 | for section_title, insight_text in insight_sections: 194 | flowables.append(Paragraph(section_title, sub_section_header_style)) 195 | flowables.append(Paragraph(insight_text, data_style)) 196 | 197 | return flowables 198 | 199 | def pdf_cash_flow(metrics, insights): 200 | flowables = [] 201 | 202 | # Section Title 203 | title = "CASH FLOW" 204 | para_title = Paragraph(title, header_style) 205 | flowables.append(para_title) 206 | 207 | flowables.append(Paragraph("METRICS", sub_header_style)) 208 | for label, value in metrics.items(): 209 | metric_text = "{}: {}".format(label.replace("_", " ").title(), round_numeric(value)) 210 | flowables.append(Paragraph(metric_text, data_style)) 211 | 212 | # Insights 213 | flowables.append(Paragraph("INSIGHTS", sub_header_style)) 214 | insight_sections = [ 215 | ("Operational Cash Efficiency", insights.operational_cash_efficiency), 216 | ("Investment Capability", insights.investment_capability), 217 | ("Financial Flexibility", insights.financial_flexibility), 218 | ("Dividend Sustainability", insights.dividend_sustainability), 219 | ("Debt Service Capability", insights.debt_service_capability) 220 | ] 221 | 222 | for section_title, insight_text in insight_sections: 223 | flowables.append(Paragraph(section_title, sub_section_header_style)) 224 | flowables.append(Paragraph(insight_text, data_style)) 225 | 226 | return flowables 227 | 228 | def pdf_news_sentiment(data): 229 | flowables = [] 230 | 231 | # Section Title 232 | title = "NEWS SENTIMENT" 233 | para_title = Paragraph(title, header_style) 234 | flowables.append(para_title) 235 | flowables.append(Spacer(1, 12)) 236 | 237 | # News DataFrame to Table 238 | df = data['news'] 239 | table_data = [df.columns.to_list()] + df.values.tolist() 240 | table = Table(table_data, repeatRows=1) # repeatRows ensures the header is repeated if the table spans multiple pages 241 | 242 | # Table Style 243 | style = TableStyle([ 244 | ('BACKGROUND', (0, 0), (-1, 0), colors.grey), 245 | ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke), 246 | ('ALIGN', (0, 0), (-1, -1), 'LEFT'), 247 | ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'), 248 | ('FONTSIZE', (0, 0), (-1, 0), 12), 249 | ('BOTTOMPADDING', (0, 0), (-1, 0), 12), 250 | ('BACKGROUND', (0, 1), (-1, -1), colors.beige), 251 | ('GRID', (0, 0), (-1, -1), 1, colors.black) 252 | ]) 253 | table.setStyle(style) 254 | flowables.append(table) 255 | flowables.append(Spacer(1, 12)) 256 | 257 | # Mean Sentiment Score 258 | mean_sentiment_score_text = f"Mean Sentiment Score: {data['mean_sentiment_score']:.2f}" 259 | flowables.append(Paragraph(mean_sentiment_score_text, data_style)) 260 | flowables.append(Spacer(1, 12)) 261 | 262 | # Mean Sentiment Class 263 | mean_sentiment_class_text = f"Mean Sentiment Class: {data['mean_sentiment_class']}" 264 | flowables.append(Paragraph(mean_sentiment_class_text, data_style)) 265 | 266 | return flowables 267 | 268 | 269 | def gen_pdf(company_name, overview_data, income_statement_data, balance_sheet_data, cash_flow_data, news_data): 270 | doc = SimpleDocTemplate("final_report.pdf", pagesize=letter) 271 | all_flowables = [] 272 | 273 | all_flowables.extend(cover_page(company_name)) 274 | all_flowables.extend(pdf_company_overview(overview_data)) 275 | all_flowables.extend(pdf_income_statement(income_statement_data['metrics'], income_statement_data['insights'])) 276 | all_flowables.extend(pdf_balance_sheet(balance_sheet_data['metrics'], balance_sheet_data['insights'])) 277 | all_flowables.extend(pdf_cash_flow(cash_flow_data['metrics'], cash_flow_data['insights'])) 278 | # all_flowables.extend(pdf_news_sentiment(news_data)) 279 | doc.build(all_flowables) 280 | 281 | if __name__ == "__main__": 282 | overview_data = company_overview("AAPL") 283 | inc = income_statement("AAPL") 284 | bal = balance_sheet("AAPL") 285 | cash = cash_flow("AAPL") 286 | news = top_news("AAPL", 10) 287 | gen_pdf("Apple Inc.", overview_data, inc, bal, cash, None) 288 | -------------------------------------------------------------------------------- /test_files/plotly_chart.py: -------------------------------------------------------------------------------- 1 | import plotly.graph_objects as go 2 | from reportlab.lib.pagesizes import letter 3 | from reportlab.lib.units import inch 4 | from reportlab.platypus import SimpleDocTemplate, Image 5 | import tempfile 6 | import os 7 | 8 | def create_plotly_chart_image(data, labels): 9 | """ 10 | Create a Plotly chart and save it to a temporary image file. 11 | Returns the path to the temporary image file. 12 | """ 13 | fig = go.Figure(data=[go.Bar(y=data, x=labels)]) 14 | img_bytes = fig.to_image(format="png") 15 | 16 | # Save the image bytes to a temporary file 17 | temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".png") 18 | temp_file.write(img_bytes) 19 | temp_file.close() 20 | img = Image(temp_file.name, width=5*inch, height=3*inch) 21 | 22 | return img 23 | 24 | def main(): 25 | # Create two charts and get their image paths 26 | chart1_path = create_plotly_chart_image([2, 1, 3], ["a", "b", "c"]) 27 | # chart2_path = create_plotly_chart_image([5, 3, 4], ["d", "e", "f"]) 28 | 29 | # Create a list of flowables 30 | flowables = [ 31 | chart1_path, 32 | # chart2_path 33 | ] 34 | 35 | # Create a PDF and add the flowables 36 | doc = SimpleDocTemplate("charts.pdf", pagesize=letter) 37 | doc.build(flowables) 38 | 39 | # Clean up the temporary files 40 | # os.unlink(chart1_path) 41 | # os.unlink(chart2_path) 42 | 43 | print("PDF with two Plotly charts created!") 44 | 45 | if __name__ == "__main__": 46 | main() -------------------------------------------------------------------------------- /test_files/plotly_pdf.py: -------------------------------------------------------------------------------- 1 | # import sys 2 | # from pathlib import Path 3 | # script_dir = Path(__file__).resolve().parent 4 | # project_root = script_dir.parent 5 | # sys.path.append(str(project_root)) 6 | 7 | # import plotly.io as pio 8 | # import plotly.graph_objects as go 9 | 10 | # from reportlab.lib.pagesizes import letter 11 | # from reportlab.lib.units import inch 12 | # from reportlab.platypus import SimpleDocTemplate, Image 13 | # from io import BytesIO 14 | # from PIL import Image as PILImage 15 | # import tempfile 16 | 17 | 18 | 19 | # from src.utils import create_donut_chart, create_bar_chart 20 | 21 | # def create_pdf_flowable_with_plotly(data, type_of_data): 22 | # # Convert the Plotly figure to an image (in this case PNG format) 23 | # fig = go.Figure(data=[go.Bar(y=[2, 1, 3], x=["a", "b", "c"])]) 24 | # img_bytes = fig.to_image(format="png") 25 | 26 | # temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".png") 27 | # temp_file.write(img_bytes) 28 | # temp_file.close() 29 | # img = Image(temp_file.name, width=5*inch, height=3*inch) 30 | # return img 31 | 32 | # data = { 33 | # "fruits": {"Apple": 18, "Banana": 20, "Cherry": 30} 34 | # } 35 | # type_of_data = "fruits" 36 | 37 | # flowables = [] 38 | 39 | # flowables.append(create_pdf_flowable_with_plotly(data, type_of_data)) 40 | 41 | # doc = SimpleDocTemplate("output.pdf", pagesize=letter) 42 | # doc.build(flowables) 43 | import plotly.graph_objects as go 44 | from reportlab.lib.pagesizes import letter 45 | from reportlab.lib.units import inch 46 | from reportlab.platypus import SimpleDocTemplate, Image 47 | import tempfile 48 | import os 49 | 50 | def create_plotly_chart_image(data, labels): 51 | """ 52 | Create a Plotly chart and save it to a temporary image file. 53 | Returns the path to the temporary image file. 54 | """ 55 | fig = go.Figure(data=[go.Bar(y=data, x=labels)]) 56 | img_bytes = fig.to_image(format="png") 57 | 58 | # Save the image bytes to a temporary file 59 | temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".png") 60 | temp_file.write(img_bytes) 61 | temp_file.close() 62 | img = Image(temp_file.name, width=5*inch, height=3*inch) 63 | 64 | return img 65 | 66 | def main(): 67 | # Create two charts and get their image paths 68 | chart1_path = create_plotly_chart_image([2, 1, 3], ["a", "b", "c"]) 69 | # chart2_path = create_plotly_chart_image([5, 3, 4], ["d", "e", "f"]) 70 | 71 | # Create a list of flowables 72 | flowables = [] 73 | flowables.append(chart1_path) 74 | 75 | # Create a PDF and add the flowables 76 | doc = SimpleDocTemplate("out1put.pdf", pagesize=letter) 77 | doc.build(flowables) 78 | 79 | # Clean up the temporary files 80 | # os.unlink(chart1_path) 81 | # os.unlink(chart2_path) 82 | 83 | print("PDF with two Plotly charts created!") 84 | 85 | if __name__ == "__main__": 86 | main() -------------------------------------------------------------------------------- /test_files/pydant.py: -------------------------------------------------------------------------------- 1 | def calculate_metrics(data): 2 | # Extracting values from the data 3 | grossProfit = float(data["grossProfit"]) 4 | totalRevenue = float(data["totalRevenue"]) 5 | operatingIncome = float(data["operatingIncome"]) 6 | costOfRevenue = float(data["costOfRevenue"]) 7 | costofGoodsAndServicesSold = float(data["costofGoodsAndServicesSold"]) 8 | sellingGeneralAndAdministrative = float(data["sellingGeneralAndAdministrative"]) 9 | ebit = float(data["ebit"]) 10 | interestAndDebtExpense = float(data["interestAndDebtExpense"]) 11 | 12 | # Calculating metrics 13 | gross_profit_margin = grossProfit / totalRevenue 14 | operating_profit_margin = operatingIncome / totalRevenue 15 | net_profit_margin = float(data["netIncome"]) / totalRevenue 16 | cost_efficiency = totalRevenue / (costOfRevenue + costofGoodsAndServicesSold) 17 | sg_and_a_efficiency = totalRevenue / sellingGeneralAndAdministrative 18 | interest_coverage_ratio = ebit / interestAndDebtExpense 19 | 20 | # Returning the results 21 | return { 22 | "gross_profit_margin": gross_profit_margin, 23 | "operating_profit_margin": operating_profit_margin, 24 | "net_profit_margin": net_profit_margin, 25 | "cost_efficiency": cost_efficiency, 26 | "sg_and_a_efficiency": sg_and_a_efficiency, 27 | "interest_coverage_ratio": interest_coverage_ratio 28 | } 29 | 30 | # Example usage: 31 | data = { 32 | "fiscalDateEnding": "2022-12-31", 33 | "reportedCurrency": "USD", 34 | "grossProfit": "32687000000", 35 | "totalRevenue": "60530000000", 36 | "costOfRevenue": "27842000000", 37 | "costofGoodsAndServicesSold": "385000000", 38 | "operatingIncome": "6408000000", 39 | "sellingGeneralAndAdministrative": "18609000000", 40 | "researchAndDevelopment": "6567000000", 41 | "operatingExpenses": "26279000000", 42 | "investmentIncomeNet": "None", 43 | "netInterestIncome": "-1216000000", 44 | "interestIncome": "162000000", 45 | "interestExpense": "1216000000", 46 | "nonInterestIncome": "365000000", 47 | "otherNonOperatingIncome": "443000000", 48 | "depreciation": "2407000000", 49 | "depreciationAndAmortization": "2395000000", 50 | "incomeBeforeTax": "1013000000", 51 | "incomeTaxExpense": "-626000000", 52 | "interestAndDebtExpense": "1216000000", 53 | "netIncomeFromContinuingOperations": "1783000000", 54 | "comprehensiveIncomeNetOfTax": "8134000000", 55 | "ebit": "2229000000", 56 | "ebitda": "4624000000", 57 | "netIncome": "1639000000" 58 | } 59 | 60 | print(calculate_metrics(data)) 61 | -------------------------------------------------------------------------------- /test_files/remove_tags.py: -------------------------------------------------------------------------------- 1 | from bs4 import BeautifulSoup 2 | 3 | with open('data/sec-edgar-filings/AAPL/10-K/0000320193-22-000108/full-submission.txt', 'r', encoding='utf-8') as file: 4 | content = file.read() 5 | 6 | soup = BeautifulSoup(content, 'html.parser') 7 | cleaned_text = soup.get_text() 8 | 9 | 10 | with open('cleaned_file.txt', 'w', encoding='utf-8') as file: 11 | file.write(cleaned_text) 12 | 13 | print(cleaned_text) -------------------------------------------------------------------------------- /test_files/sec_api_test.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import pandas as pd 3 | import json 4 | 5 | # create request header 6 | headers = {'User-Agent': "vishwas.g217@gmail.com"} 7 | 8 | # get all companies data 9 | companyTickers = requests.get( 10 | "https://www.sec.gov/files/company_tickers.json", 11 | headers=headers 12 | ) 13 | 14 | print(companyTickers.json()) 15 | # # review response / keys 16 | print(companyTickers.json().keys()) 17 | # print(companyTickers.json().data()) 18 | 19 | # # format response to dictionary and get first key/value 20 | firstEntry = companyTickers.json()['0'] 21 | print(firstEntry) 22 | 23 | # # parse CIK // without leading zeros 24 | directCik = companyTickers.json()['0']['cik_str'] 25 | 26 | # # dictionary to dataframe 27 | companyData = pd.DataFrame.from_dict(companyTickers.json(), 28 | orient='index') 29 | # print(companyData.head()) 30 | 31 | # # add leading zeros to CIK 32 | companyData['cik_str'] = companyData['cik_str'].astype( 33 | str).str.zfill(10) 34 | 35 | cik = companyData['cik_str'][0] 36 | 37 | filingMetadata = requests.get( 38 | f'https://data.sec.gov/submissions/CIK{cik}.json', 39 | headers=headers 40 | ) 41 | 42 | # print(filingMetadata.json().keys()) 43 | data = filingMetadata.json()['filings']['recent'] 44 | 45 | with open('data.json', 'w') as f: 46 | json.dump(data, f) -------------------------------------------------------------------------------- /test_files/sec_download.py: -------------------------------------------------------------------------------- 1 | from sec_edgar_downloader import Downloader 2 | 3 | dl = Downloader("Personal", "vishwas.g217@gmail.com", "data/") 4 | dl.get("10-K", "AAPL", download_details=False, after="2020-01-01") 5 | 6 | -------------------------------------------------------------------------------- /test_files/summarize.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | from langchain.chains.question_answering import load_qa_chain 8 | from langchain.document_loaders import PyPDFLoader 9 | from langchain.llms import OpenAI 10 | 11 | 12 | from dotenv import dotenv_values 13 | 14 | config = dotenv_values(".env") 15 | OPENAI_API_KEY = config["OPENAI_API_KEY"] 16 | 17 | # load document 18 | loader = PyPDFLoader("AAPL.pdf") 19 | documents = loader.load() 20 | 21 | llm = OpenAI(openai_api_key=OPENAI_API_KEY) 22 | 23 | ### For multiple documents 24 | # loaders = [....] 25 | # documents = [] 26 | # for loader in loaders: 27 | # documents.extend(loader.load()) 28 | 29 | chain = load_qa_chain(llm=llm, chain_type="map_reduce") 30 | query = "what is the total number of AI publications?" 31 | print(chain.run(input_documents=documents, question=query)) 32 | 33 | -------------------------------------------------------------------------------- /test_files/table_pdf.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | from reportlab.lib.pagesizes import letter 8 | from reportlab.platypus import PageBreak, Paragraph, SimpleDocTemplate, Flowable 9 | from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle 10 | from reportlab.lib.enums import TA_CENTER 11 | from reportlab.lib.pagesizes import landscape 12 | from reportlab.platypus import Paragraph, Table, TableStyle, Spacer 13 | from reportlab.lib import colors 14 | 15 | from src.news_sentiment import top_news 16 | 17 | styles = getSampleStyleSheet() 18 | 19 | 20 | header_style = ParagraphStyle( 21 | 'HeaderStyle', 22 | parent=styles['Heading2'], 23 | fontSize=24, 24 | spaceAfter=20, 25 | leading=30 26 | ) 27 | 28 | sub_section_header_style = ParagraphStyle( 29 | 'SubSectionHeaderStyle', 30 | parent=styles['Heading3'], 31 | fontSize=16, 32 | spaceAfter=8, 33 | leading=20 34 | ) 35 | 36 | data_style = ParagraphStyle( 37 | 'DataStyle', 38 | parent=styles['Normal'], 39 | fontSize=14, 40 | spaceAfter=15, 41 | leading=20 42 | ) 43 | 44 | class RotatedTable(Flowable): 45 | def __init__(self, table_data): 46 | Flowable.__init__(self) 47 | self.table_data = table_data 48 | 49 | def wrap(self, availWidth, availHeight): 50 | # Swap width and height for rotated table 51 | self.width, self.height = availHeight, availWidth 52 | return self.width, self.height 53 | 54 | def draw(self): 55 | # Create the table 56 | table = Table(self.table_data, repeatRows=1) 57 | 58 | # Table Style 59 | style = TableStyle([ 60 | ('BACKGROUND', (0, 0), (-1, 0), colors.grey), 61 | ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke), 62 | ('ALIGN', (0, 0), (-1, -1), 'LEFT'), 63 | ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'), 64 | ('FONTSIZE', (0, 0), (-1, 0), 12), 65 | ('BOTTOMPADDING', (0, 0), (-1, 0), 12), 66 | ('BACKGROUND', (0, 1), (-1, -1), colors.beige), 67 | ('GRID', (0, 0), (-1, -1), 1, colors.black) 68 | ]) 69 | table.setStyle(style) 70 | 71 | # Rotate the canvas, draw the table, then reset rotation 72 | self.canv.saveState() 73 | self.canv.translate(0, self.width) 74 | self.canv.rotate(-90) 75 | table.wrapOn(self.canv, self.height, self.width) 76 | table.drawOn(self.canv, 0, 0) 77 | self.canv.restoreState() 78 | 79 | def pdf_news_sentiment(data): 80 | flowables = [] 81 | 82 | # Section Title 83 | title = "NEWS SENTIMENT" 84 | para_title = Paragraph(title, header_style) 85 | flowables.append(para_title) 86 | flowables.append(Spacer(1, 12)) 87 | 88 | # News DataFrame to Table 89 | df = data['news'].astype(str) 90 | table_data = [df.columns.to_list()] + df.values.tolist() 91 | flowables.append(RotatedTable(table_data)) 92 | flowables.append(Spacer(1, 12)) 93 | 94 | table = Table(table_data, repeatRows=1) # repeatRows ensures the header is repeated if the table spans multiple pages 95 | table.hAlign = 'CENTER' 96 | table.rotate = 90 97 | 98 | # Table Style 99 | style = TableStyle([ 100 | ('BACKGROUND', (0, 0), (-1, 0), colors.grey), 101 | ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke), 102 | ('ALIGN', (0, 0), (-1, -1), 'LEFT'), 103 | ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'), 104 | ('FONTSIZE', (0, 0), (-1, 0), 12), 105 | ('BOTTOMPADDING', (0, 0), (-1, 0), 12), 106 | ('BACKGROUND', (0, 1), (-1, -1), colors.beige), 107 | ('GRID', (0, 0), (-1, -1), 1, colors.black) 108 | ]) 109 | table.setStyle(style) 110 | flowables.append(table) 111 | flowables.append(Spacer(1, 12)) 112 | 113 | # Mean Sentiment Score 114 | mean_sentiment_score_text = f"Mean Sentiment Score: {data['mean_sentiment_score']:.2f}" 115 | flowables.append(Paragraph(mean_sentiment_score_text, data_style)) 116 | flowables.append(Spacer(1, 12)) 117 | 118 | # Mean Sentiment Class 119 | mean_sentiment_class_text = f"Mean Sentiment Class: {data['mean_sentiment_class']}" 120 | flowables.append(Paragraph(mean_sentiment_class_text, data_style)) 121 | 122 | return flowables 123 | 124 | news = top_news('AAPL', 10) 125 | 126 | flow1 = pdf_news_sentiment(news) 127 | 128 | doc = SimpleDocTemplate("table.pdf", pagesize=letter) 129 | all_flowables = [] 130 | all_flowables.extend(flow1) 131 | doc.build(all_flowables) 132 | 133 | 134 | -------------------------------------------------------------------------------- /test_files/tbl.py: -------------------------------------------------------------------------------- 1 | from pydantic import BaseModel 2 | from typing import List 3 | 4 | class MyPydanticModel(BaseModel): 5 | str_value: str 6 | bool_values: List[bool] 7 | 8 | # Example usage: 9 | data = { 10 | "str_value": "Hello, Pydantic!", 11 | "bool_values": [True, False, True, False, True] 12 | } 13 | 14 | my_model = MyPydanticModel(**data) 15 | print(my_model) 16 | -------------------------------------------------------------------------------- /test_files/tools.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from pathlib import Path 3 | script_dir = Path(__file__).resolve().parent 4 | project_root = script_dir.parent 5 | sys.path.append(str(project_root)) 6 | 7 | from langchain.prompts import PromptTemplate 8 | from langchain.output_parsers import PydanticOutputParser 9 | 10 | from llama_index import VectorStoreIndex, ServiceContext, StorageContext, SimpleDirectoryReader 11 | from llama_index.vector_stores import WeaviateVectorStore, FaissVectorStore, ChromaVectorStore 12 | from llama_index.embeddings import OpenAIEmbedding 13 | from llama_index.tools import QueryEngineTool, ToolMetadata 14 | from llama_index.query_engine import SubQuestionQueryEngine 15 | 16 | 17 | from weaviate.embedded import EmbeddedOptions 18 | 19 | import streamlit as st 20 | import os 21 | import openai 22 | 23 | 24 | OPENAI_API_KEY = st.secrets["openai_api_key"] 25 | 26 | 27 | os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY 28 | openai.api_key = os.environ["OPENAI_API_KEY"] 29 | 30 | 31 | query = """ 32 | You are given the task of generating insights for Fiscal Year Highlights from the annual report of the company. 33 | 34 | Given below is the output format, which has the subsections. 35 | Write atleast 50 words for each subsection. 36 | Incase you don't have enough info you can just write: No information available 37 | --- 38 | The output should be formatted as a JSON instance that conforms to the JSON schema below. 39 | 40 | As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]} 41 | the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted. 42 | 43 | Here is the output schema: 44 | ``` 45 | {"properties": {"performance_highlights": {"title": "Performance Highlights", "description": "Key performance metrics and achievements over the fiscal year.", "type": "string"}, "major_events": {"title": "Major Events", "description": "Highlight of significant events, acquisitions, or strategic shifts that occurred during the year.", "type": "string"}, "challenges_encountered": {"title": "Challenges Encountered", "description": "Challenges the company faced during the year and how they managed or overcame them.", "type": "string"}, "milestone_achievements": {"title": "Milestone Achievements", "description": "Milestones achieved in terms of projects, expansions, or any other notable accomplishments.", "type": "string"}}, "required": ["performance_highlights", "major_events", "challenges_encountered", "milestone_achievements"]} 46 | ``` 47 | --- 48 | """ 49 | 50 | report = SimpleDirectoryReader( 51 | input_files=["data/meta/meta.pdf"] 52 | ).load_data() 53 | 54 | index = VectorStoreIndex.from_documents(report) 55 | engine = index.as_query_engine(similarity_top_k=3) 56 | 57 | query_engine_tools = [ 58 | QueryEngineTool( 59 | query_engine=engine, 60 | metadata=ToolMetadata( 61 | name="Annual Report", 62 | description="Provides information about Apple Inc. from its annual report.", 63 | ), 64 | ), 65 | ] 66 | 67 | s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools) 68 | 69 | 70 | response = s_engine.query(query) 71 | print(response) 72 | --------------------------------------------------------------------------------