├── .DS_Store
├── .gitignore
├── .vscode
    └── settings.json
├── README.md
├── data
    ├── .DS_Store
    ├── meta
    │   └── meta.pdf
    ├── microsoft
    │   └── MSFT_FY23Q4_10K.pdf
    └── ticker_symbols
    │   ├── ticker_symbols.csv
    │   └── ticker_symbols.txt
├── docs
    ├── finsight.gif
    ├── main.md
    └── news.md
├── experiments
    ├── new_sections.txt
    ├── sem_qa.py
    └── semantic_qa_over_tables.ipynb
├── pdf
    └── final_report.pdf
├── prompts
    ├── insights.prompt
    ├── iv2.prompt
    ├── main.prompt
    └── report.prompt
├── requirements.txt
├── src
    ├── .DS_Store
    ├── balance_sheet.py
    ├── cash_flow.py
    ├── company_overview.py
    ├── fields.py
    ├── fields2.py
    ├── income_statement.py
    ├── news_sentiment.py
    ├── pages
    │   ├── 1_📊_Finance_Metrics_Review.py
    │   └── 2_🗂️_Annual_Report_Analyzer.py
    ├── pdf_gen.py
    ├── pydantic_models.py
    ├── ticker_symbol.py
    ├── utils.py
    └── 🏡_Home.py
└── test_files
    ├── .DS_Store
    ├── Models.py
    ├── RAG
        ├── data1.pdf
        ├── tech_1.py
        └── test.py
    ├── __pycache__
        ├── Models.cpython-311.pyc
        └── main.cpython-311.pyc
    ├── attribs.py
    ├── av-api-test.py
    ├── finchat.py
    ├── fmp-api.py
    ├── main.py
    ├── node_parsing.py
    ├── nodes.py
    ├── open_ai_api.py
    ├── parser.py
    ├── pdf1.py
    ├── pdf_gen.py
    ├── plotly_chart.py
    ├── plotly_pdf.py
    ├── pydant.py
    ├── remove_tags.py
    ├── sec_api_test.py
    ├── sec_download.py
    ├── summarize.py
    ├── table_pdf.py
    ├── tbl.py
    └── tools.py


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/.DS_Store


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | venv
2 | .env
3 | src/__pycache__
4 | AAPL.pdf
5 | example.pdf
6 | .streamlit


--------------------------------------------------------------------------------
/.vscode/settings.json:
--------------------------------------------------------------------------------
1 | {
2 |     "python.analysis.typeCheckingMode": "off",
3 |     "python.analysis.extraPaths": [
4 |         "./venv/lib/python3.11/site-packages"
5 |     ]
6 | }


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | 
  3 | # 💸 FinSight: 
  4 | **Financial Insights at Your Fingertips**
  5 | 
  6 | Finsight is a cutting-edge finance AI assistant tailored to meet the needs of portfolio managers, investors, and finance enthusiasts. By leveraging `GPT-4` and financial data, Finsight provides deep insights and actionable summaries about a company, aiding in more informed investment decisions.
  7 | 
  8 | ![demo](docs/demo.gif)
  9 | 
 10 | If you'd like to learn more about the technical details of FinSight, check out the LlamaIndex blogpost below where I do a deep dive into the project:
 11 |            
 12 | [How I built the Streamlit LLM Hackathon winning app — FinSight using LlamaIndex.](https://blog.llamaindex.ai/how-i-built-the-streamlit-llm-hackathon-winning-app-finsight-using-llamaindex-9dcf6c46d7a0)
 13 | 
 14 | ## Features
 15 | 📊 **Finance Metrics Overview**:
 16 | - Dive deep into core financial metrics extracted from the income statement, balance sheet, and cash flow.
 17 | - Stay updated with the top news sentiment surrounding the company for the current year, ensuring you're always in the loop.
 18 | - These are the different sections:
 19 |   - **Company Overview**: Get a quick overview of the company.
 20 |   - **Income Statement**: Understand the company's revenue and expenses.
 21 |   - **Balance Sheet**: Get a grasp on the company's assets, liabilities, and shareholders' equity.
 22 |   - **Cash Flow**: Understand the company's cash flow from operating, investing, and financing activities.
 23 |   - **News Sentiment**: Stay updated with the top news sentiment surrounding the company for the current year.
 24 | 
 25 | 📄 **Annual Report Analyzer**:
 26 | - Simply upload a company's annual report.
 27 | - Finsight will then provide comprehensive insights into:
 28 |   - **Fiscal Year Highlights**: Major achievements, milestones, and financial highlights.
 29 |   - **Strategy Outlook and Future Direction**: Understand the company's strategic plans and anticipated future trajectory.
 30 |   - **Risk Management**: Insight into the company's risk assessment, potential challenges, and mitigation strategies.
 31 |   - **Innovation and R&D**: Get a grasp on the company's commitment to innovation and its R&D endeavors.
 32 | 
 33 | ## Tech Stack 
 34 | **Streamlit**: Powers the frontend, providing a seamless user interface. 
 35 | **LangChain**: Acts as the foundation for integrating the LLM into the web app.
 36 | **Llama Index**: The data framework behind the Research Agent Generator (RAG) and agent-based features, such as the Annual Report Analyzer.
 37 | **Alpha Vantage**: The go-to API service for fetching the most recent financial data about companies.
 38 | 
 39 | ## How to Use
 40 | ### Website Access: 
 41 | Head over to [Finsight](https://finsight-report.streamlit.app/)
 42 | 
 43 | ### **Local Setup**:
 44 | 
 45 | 
 46 | 1. **Clone the Repository**:
 47 | ```bash
 48 | git clone https://github.com/vishwasg217/finsight.git
 49 | cd finsight
 50 | ```
 51 | 
 52 | 2. **Set Up a Virtual Environment** (Optional but Recommended):
 53 | ```bash
 54 | # For macOS and Linux:
 55 | python3 -m venv venv
 56 | 
 57 | # For Windows:
 58 | python -m venv venv
 59 | ```
 60 | 
 61 | 3. **Activate the Virtual Environment**:
 62 | ```bash
 63 | # For macOS and Linux:
 64 | source venv/bin/activate
 65 | 
 66 | # For Windows:
 67 | .\venv\Scripts\activate
 68 | ```
 69 | 
 70 | 4. **Install Required Dependencies**:
 71 | ```bash
 72 | pip install -r requirements.txt
 73 | ```
 74 | 
 75 | 5. **Set up the Environment Variables**:
 76 | ```bash
 77 | # create directory
 78 | mkdir .streamlit
 79 | 
 80 | # create toml file
 81 | touch .streamlit/secrets.toml
 82 | ```
 83 | 
 84 | You can get your API keys here: [AlphaVantage](https://www.alphavantage.co/support/#api-key), [OpenAI](https://openai.com/blog/openai-api)
 85 | 
 86 | ```bash
 87 | # add the following API keys
 88 | av_api_key = "ALPHA_VANTAGE API KEY"
 89 | 
 90 | openai_api_key = "OPEN AI API KEY"
 91 | 
 92 | 
 93 | ```
 94 | 
 95 | 6. **Run Finsight**:
 96 | ```bash
 97 | streamlit run src/🏡_Home.py
 98 | ```
 99 | 
100 | After running the command, Streamlit will provide a local URL (usually `http://localhost:8501/`) which you can open in your web browser to access Finsight.
101 | 


--------------------------------------------------------------------------------
/data/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/data/.DS_Store


--------------------------------------------------------------------------------
/data/meta/meta.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/data/meta/meta.pdf


--------------------------------------------------------------------------------
/data/microsoft/MSFT_FY23Q4_10K.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/data/microsoft/MSFT_FY23Q4_10K.pdf


--------------------------------------------------------------------------------
/docs/finsight.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/docs/finsight.gif


--------------------------------------------------------------------------------
/docs/main.md:
--------------------------------------------------------------------------------
 1 | ## About the App and Its Features:
 2 | Finsight is a cutting-edge AI assistant tailored for portfolio managers, investors, and finance enthusiasts. It streamlines the process of gaining crucial insights and summaries about a company in a user-friendly manner.
 3 | 
 4 | ### [Finance Metrics Review](https://finsight-report.streamlit.app/Finance_Metrics_Review): 
 5 | Simply enter the ticker symbol of your desired company. With a click, Finsight delves deep into the financial data and current news sentiment, presenting you with a comprehensive analysis. From metrics derived from income statements to the latest news sentiments, get a 360° view of the company's financial health.
 6 |  
 7 | ### [Annual Report Analyzer](https://finsight-report.streamlit.app/Annual_Report_Analyzer): 
 8 | Want a deep dive into a company's annual report? Upload the report in PDF format, and Finsight will process and analyze it, offering insights into Fiscal Year Highlights, Strategy Outlook and Future Direction, Risk Management, and Innovation & R&D. 
 9 | 
10 | #### GitHub Repository:
11 | For those keen on diving into the code or contributing, the entire project is open-source and hosted on GitHub. Find it [here](<GitHub link>).
12 | 
13 | #### About the Creator:
14 | Hi! I'm Vishwas Gowda, an ML Engineer and an LLMs enthusiast.
15 | 
16 | Let's Connect and Collaborate:            
17 | - [GitHub](https://github.com/vishwasg217)
18 | - [Twitter](https://twitter.com/VishwasAiTech)
19 | - [LinkedIn](https://www.linkedin.com/in/vishwasgowda217/)
20 | 


--------------------------------------------------------------------------------
/docs/news.md:
--------------------------------------------------------------------------------
 1 | #### FinSight Wins Streamlit LLM Hackathon!!
 2 | 
 3 | I'm excited to share some great news with you all. Over the past month, I've been working tirelessly on Finsight, and it's been an incredible journey. Today, I'm thrilled to announce that Finsight has emerged victorious in the LLM Hackathon organized by streamlit, specifically in the LlamaIndex category!
 4 | 
 5 | 
 6 | [Read more](https://www.linkedin.com/posts/vishwasgowda217_llm-hackathon-streamlit-activity-7115398433573666816-1y72?utm_source=share&utm_medium=member_desktop)
 7 | 
 8 | 
 9 | Stay tuned for more exciting updates from FinSight!
10 | 


--------------------------------------------------------------------------------
/experiments/new_sections.txt:
--------------------------------------------------------------------------------
 1 | Section 1: Business
 2 | 
 3 |     Company Background: Provide the company overiew in a neatly formatted way
 4 | 
 5 |     Products and Services: List out the products and services offered by the company in a neatly formatted way
 6 | 
 7 |     Competition and Strategy: Provide the details about the company's competition and their strategy to compete with them.
 8 | 
 9 |     Intellectual Property and Human Resources: Provide details about the company's intellectual property and steps taken to protect and enchance human capital
10 | 
11 |     Regulatory and Legal Matters: What are the regulatory and legal issues? List them out in detail
12 | 
13 | 
14 | Section 2: Risk Factors
15 | 
16 |     Product Related Risks: What are the risks related to the product? List them out in detail
17 | 
18 |     Regulatory and Enforcement Risks: What are the risks faced by the company in enforcing regulations?
19 | 
20 |     Operational Risks: what are the risks faced in day to day operations by the company?
21 | 
22 |     Market Risk: what are the market risks faced by the company? List them in detail
23 | 
24 | 
25 | Section 3: Management's Discussion and Analysis
26 | 
27 |     Fiscal Year Highlights: 
28 | 
29 | 
30 | Section 4: Financial Statements
31 | 
32 |     Income statements: provide the net income, basic earnings per share, cost and expenses. provide insights from these metrics.
33 |     
34 |     Balance Sheet: provide the total assets, total liabilites, retained earnings. provide insights from these metrics.
35 | 
36 | 
37 | 
38 | 
39 | 


--------------------------------------------------------------------------------
/experiments/sem_qa.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | from pathlib import Path
  3 | script_dir = Path(__file__).resolve().parent
  4 | project_root = script_dir.parent
  5 | sys.path.append(str(project_root))
  6 | 
  7 | 
  8 | import pandas as pd
  9 | from llama_index import SimpleDirectoryReader
 10 | from platform import node
 11 | from llama_index.node_parser import UnstructuredElementNodeParser, SimpleNodeParser
 12 | from llama_index.retrievers import RecursiveRetriever
 13 | from llama_index.query_engine import RetrieverQueryEngine
 14 | from llama_index import VectorStoreIndex
 15 | from llama_index.tools import QueryEngineTool, ToolMetadata
 16 | from llama_index.query_engine import SubQuestionQueryEngine
 17 | from llama_index import ServiceContext
 18 | from llama_index.llms import OpenAI
 19 | 
 20 | import nest_asyncio
 21 | 
 22 | nest_asyncio.apply()
 23 | 
 24 | import streamlit as st
 25 | import os
 26 | 
 27 | OPENAI_API_KEY = st.secrets["openai_api_key"]
 28 | os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
 29 | 
 30 | pd.set_option("display.max_rows", None)
 31 | pd.set_option("display.max_columns", None)
 32 | pd.set_option("display.width", None)
 33 | pd.set_option("display.max_colwidth", None)
 34 | 
 35 | 
 36 | company = input("Enter company name: ")
 37 | 
 38 | # load pdfs
 39 | if company == "apple":
 40 |     reader = SimpleDirectoryReader(
 41 |         input_files=["data/apple/AAPL.pdf"]
 42 |     )
 43 |     docs = reader.load_data()
 44 | 
 45 | 
 46 | elif company =="meta":
 47 |     reader = SimpleDirectoryReader(
 48 |         input_files=["data/meta/meta.pdf"]
 49 |     )
 50 | 
 51 |     docs = reader.load_data()
 52 | 
 53 | 
 54 | # node_parser = UnstructuredElementNodeParser()
 55 | node_parser = SimpleNodeParser()
 56 | 
 57 | nodes = node_parser.get_nodes_from_documents(docs, show_progress=True)
 58 | # base_nodes, node_mappings = node_parser.get_base_nodes_and_mappings(nodes)
 59 | 
 60 | 
 61 | vector_index = VectorStoreIndex(nodes)
 62 | vector_retriever = vector_index.as_retriever(similarity_top_k=3)
 63 | query_engine = vector_index.as_query_engine(similarity_top_k=3)
 64 | 
 65 | # recursive_retriever = RecursiveRetriever(
 66 | #     "vector",
 67 | #     retriever_dict={"vector": vector_retriever},
 68 | #     node_dict=aapl_node_mappings,
 69 | #     verbose=True,
 70 | # )
 71 | # query_engine = RetrieverQueryEngine.from_args(recursive_retriever)
 72 | 
 73 | 
 74 | llm = OpenAI(model="gpt-3.5-turbo", api_key=OPENAI_API_KEY)
 75 | service_context = ServiceContext.from_defaults(llm=llm)
 76 | 
 77 | query_engine_tool = [
 78 |     QueryEngineTool(
 79 |         query_engine=query_engine,
 80 |         metadata=ToolMetadata(
 81 |             name = company,
 82 |             description=f"provides information about {company} financials for the year 2022",
 83 |         ),
 84 |     )
 85 | ]
 86 | 
 87 | sub_query_engine = SubQuestionQueryEngine.from_defaults(
 88 |     query_engine_tools=query_engine_tool,
 89 |     service_context=service_context,
 90 |     use_async=True
 91 | )
 92 | 
 93 | query = """
 94 | You are tasked with generating performance_highlights insight about the company for the Fiscal Year Highlights section from the annual report of the company:
 95 | 
 96 | Given below is the output format, which has the subsections.
 97 | Must use bullet points.
 98 | Always use $ symbol for money values, and round it off to millions or billions accordingly
 99 | 
100 | Incase you don't have enough info you can just write: No information available
101 | ---
102 | {'performance_highlights': 'Key performance and financial stats over the fiscal year. Provide the Revenue Growth, Net Profit Margin, Market Share expansion, Cost Savings and Efficiency, Dividend Distribution'}
103 | 
104 | """
105 | 
106 | query = ""
107 | 
108 | while query != "exit":
109 |     query = input("ENTER QUERY: ")
110 |     response = sub_query_engine.query(query)
111 |     print("-"*50)
112 |     print(str(response))
113 |     print("-"*50)
114 | 
115 | {
116 |   "Business": "Item 1: ",
117 |   "Risk Factors": "Item 1A, Item 7A",
118 |   "MD&A": "Item 7",
119 |   "Financial Statements": "Item 8",
120 |   "Management's Report on Internal Control Over Financial Reporting": "Item 9A",
121 |   "Report of Independent Registered Public Accounting Firm": "Item 8",
122 |   "Corporate Governance": "Item 10"
123 | }
124 | 
125 | 


--------------------------------------------------------------------------------
/pdf/final_report.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/pdf/final_report.pdf


--------------------------------------------------------------------------------
/prompts/insights.prompt:
--------------------------------------------------------------------------------
 1 | You are tasked with generating insights about the company from the {type_of_data} below:
 2 | 
 3 | ----
 4 | {inputs}
 5 | ----
 6 | 
 7 | Rules:
 8 | Each insight must must not state the obvious.
 9 | Always use $ symbol for money values, and round it off to millions or billions accordingly
10 | 
11 | Generate insights about the company according to the following format.
12 | 
13 | ----
14 | {output_format}
15 | ----


--------------------------------------------------------------------------------
/prompts/iv2.prompt:
--------------------------------------------------------------------------------
 1 | You are tasked with generating {insight_name} insight about the company for the {type_of_data} data given below:
 2 | 
 3 | ----
 4 | {inputs}
 5 | ----
 6 | 
 7 | Rules:
 8 | The insight must not state the obvious.
 9 | Always use $ symbol for money values, and round it off to millions or billions accordingly
10 | 
11 | Generate the insight as per the given description:
12 | 
13 | ----
14 | {output_format}
15 | ----


--------------------------------------------------------------------------------
/prompts/main.prompt:
--------------------------------------------------------------------------------
1 | You are a financial analyzer for a mutual fund portfolio manager.
2 | Your job is the is to provide a detailed analysis of the following:
3 | 
4 | 


--------------------------------------------------------------------------------
/prompts/report.prompt:
--------------------------------------------------------------------------------
 1 | You are tasked with generating {insight_name} insight about the company for the {section_name} section from the annual report of the company:
 2 | 
 3 | Given below is the output format, which has the subsections.
 4 | Must use bullet points.
 5 | Always use $ symbol for money values, and round it off to millions or billions accordingly
 6 | 
 7 | Incase you don't have enough info you can just write: No information available
 8 | ---
 9 | {output_format}
10 | ---


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
  1 | aiofiles==23.2.1
  2 | aiohttp==3.8.5
  3 | aiosignal==1.3.1
  4 | altair==5.1.1
  5 | annotated-types==0.5.0
  6 | antlr4-python3-runtime==4.9.3
  7 | anyio==3.7.1
  8 | appnope==0.1.3
  9 | astor==0.8.1
 10 | asttokens==2.4.0
 11 | async-timeout==4.0.3
 12 | atlassian-python-api==3.41.3
 13 | attrs==23.1.0
 14 | Authlib==1.2.1
 15 | backcall==0.2.0
 16 | backoff==2.2.1
 17 | base58==2.1.1
 18 | bcrypt==4.0.1
 19 | beautifulsoup4==4.12.2
 20 | blinker==1.6.2
 21 | cachetools==5.3.1
 22 | camelot-py==0.11.0
 23 | certifi==2023.7.22
 24 | cffi==1.15.1
 25 | chardet==5.2.0
 26 | charset-normalizer==3.2.0
 27 | Chroma==0.2.0
 28 | chroma-hnswlib==0.7.3
 29 | ci-info==0.3.0
 30 | clarifai==9.8.1
 31 | clarifai-grpc==9.8.2
 32 | click==7.1.2
 33 | coloredlogs==15.0.1
 34 | comm==0.1.4
 35 | configobj==5.0.8
 36 | configparser==6.0.0
 37 | contourpy==1.1.1
 38 | cryptography==41.0.3
 39 | cycler==0.12.1
 40 | dataclasses-json==0.5.14
 41 | debugpy==1.8.0
 42 | decorator==5.1.1
 43 | Deprecated==1.2.14
 44 | diskcache==5.6.3
 45 | distro==1.8.0
 46 | dotenv-python==0.0.1
 47 | EbookLib==0.18
 48 | effdet==0.4.1
 49 | emoji==2.8.0
 50 | et-xmlfile==1.1.0
 51 | etelemetry==0.3.1
 52 | executing==2.0.0
 53 | faiss-cpu==1.7.4
 54 | fastapi==0.99.1
 55 | filelock==3.12.3
 56 | filetype==1.2.0
 57 | fitz==0.0.1.dev2
 58 | flatbuffers==23.5.26
 59 | fonttools==4.43.1
 60 | frontend==0.0.3
 61 | frozenlist==1.4.0
 62 | fsspec==2023.9.0
 63 | future==0.18.3
 64 | gitdb==4.0.10
 65 | GitPython==3.1.35
 66 | googleapis-common-protos==1.60.0
 67 | greenlet==3.0.0
 68 | grpcio==1.58.0
 69 | h11==0.14.0
 70 | html2text==2020.1.16
 71 | httplib2==0.22.0
 72 | httptools==0.6.0
 73 | huggingface-hub==0.16.4
 74 | humanfriendly==10.0
 75 | idna==3.4
 76 | importlib-metadata==6.8.0
 77 | importlib-resources==6.0.1
 78 | iopath==0.1.10
 79 | ipykernel==6.25.2
 80 | ipython==8.16.1
 81 | isodate==0.6.1
 82 | itsdangerous==2.1.2
 83 | jedi==0.19.1
 84 | Jinja2==3.1.2
 85 | joblib==1.3.2
 86 | JPype1==1.4.1
 87 | jsonpatch==1.33
 88 | jsonpointer==2.4
 89 | jsonschema==4.19.0
 90 | jsonschema-specifications==2023.7.1
 91 | jupyter_client==8.3.1
 92 | jupyter_core==5.3.2
 93 | kaleido==0.2.1
 94 | kiwisolver==1.4.5
 95 | langchain==0.0.310
 96 | langdetect==1.0.9
 97 | langsmith==0.0.43
 98 | layoutparser==0.3.4
 99 | llama-hub==0.0.38
100 | llama-index==0.8.48
101 | llama_cpp_python==0.2.2
102 | looseversion==1.3.0
103 | lxml==4.9.3
104 | Markdown==3.5
105 | markdown-it-py==3.0.0
106 | MarkupSafe==2.1.3
107 | marshmallow==3.20.1
108 | matplotlib==3.8.0
109 | matplotlib-inline==0.1.6
110 | mdurl==0.1.2
111 | monotonic==1.6
112 | mpmath==1.3.0
113 | msg-parser==1.2.0
114 | multidict==6.0.4
115 | mypy-extensions==1.0.0
116 | nest-asyncio==1.5.8
117 | networkx==3.2
118 | nibabel==5.1.0
119 | nipype==1.8.6
120 | nltk==3.8.1
121 | numexpr==2.8.5
122 | numpy==1.25.2
123 | oauthlib==3.2.2
124 | olefile==0.46
125 | omegaconf==2.3.0
126 | onnx==1.14.1
127 | onnxruntime==1.15.1
128 | openai==0.28.0
129 | opencv-python==4.8.1.78
130 | openpyxl==3.1.2
131 | overrides==7.4.0
132 | packaging==23.1
133 | pandas==2.1.0
134 | parso==0.8.3
135 | pathlib==1.0.1
136 | pdf2image==1.16.3
137 | pdfminer.six==20221105
138 | pdfplumber==0.10.2
139 | pexpect==4.8.0
140 | pickleshare==0.7.5
141 | Pillow==9.5.0
142 | pipdeptree==2.13.0
143 | platformdirs==3.11.0
144 | plotly==5.17.0
145 | portalocker==2.8.2
146 | posthog==3.0.2
147 | prompt-toolkit==3.0.39
148 | protobuf==4.24.3
149 | prov==2.0.0
150 | psutil==5.9.5
151 | ptyprocess==0.7.0
152 | pulsar-client==3.3.0
153 | pure-eval==0.2.2
154 | pyarrow==13.0.0
155 | pycocotools==2.0.7
156 | pycparser==2.21
157 | pydantic==1.10.12
158 | pydantic_core==2.6.3
159 | pydeck==0.8.0
160 | pydot==1.4.2
161 | Pygments==2.16.1
162 | Pympler==1.0.1
163 | pypandoc==1.12
164 | pyparsing==3.1.1
165 | pypdf==3.15.5
166 | PyPDF2==2.12.1
167 | pypdfium2==4.22.0
168 | PyPika==0.48.9
169 | pyrate-limiter==3.1.0
170 | pytesseract==0.3.10
171 | python-dateutil==2.8.2
172 | python-docx==1.0.1
173 | python-dotenv==1.0.0
174 | python-iso639==2023.6.15
175 | python-magic==0.4.27
176 | python-multipart==0.0.6
177 | python-pptx==0.6.21
178 | python-rapidjson==1.11
179 | pytz==2023.3.post1
180 | pytz-deprecation-shim==0.1.0.post0
181 | pyxnat==1.6
182 | PyYAML==6.0.1
183 | pyzmq==25.1.1
184 | rapidfuzz==3.4.0
185 | rdflib==7.0.0
186 | referencing==0.30.2
187 | regex==2023.8.8
188 | reportlab==4.0.4
189 | requests==2.31.0
190 | requests-oauthlib==1.3.1
191 | retrying==1.3.4
192 | rich==13.4.2
193 | rpds-py==0.10.2
194 | safetensors==0.4.0
195 | scipy==1.11.3
196 | sec-edgar-downloader==5.0.0
197 | simplejson==3.19.2
198 | six==1.16.0
199 | slack-sdk==3.23.0
200 | smmap==5.0.0
201 | sniffio==1.3.0
202 | soupsieve==2.5
203 | SQLAlchemy==2.0.20
204 | stack-data==0.6.3
205 | starlette==0.27.0
206 | streamlit==1.27.2
207 | sympy==1.12
208 | tabula-py==2.8.2
209 | tabulate==0.9.0
210 | tenacity==8.2.3
211 | tiktoken==0.5.1
212 | timm==0.9.8
213 | tokenizers==0.14.0
214 | toml==0.10.2
215 | toolz==0.12.0
216 | torch==2.1.0
217 | torchvision==0.16.0
218 | tornado==6.3.3
219 | tqdm==4.64.1
220 | traitlets==5.11.2
221 | traits==6.3.2
222 | transformers==4.34.1
223 | tritonclient==2.34.0
224 | typing-inspect==0.9.0
225 | typing_extensions==4.7.1
226 | tzdata==2023.3
227 | tzlocal==4.3.1
228 | unstructured==0.10.25
229 | unstructured-inference==0.7.9
230 | unstructured.pytesseract==0.3.12
231 | urllib3==1.26.16
232 | uvicorn==0.23.2
233 | uvloop==0.17.0
234 | validators==0.21.0
235 | watchdog==3.0.0
236 | watchfiles==0.20.0
237 | wcwidth==0.2.8
238 | weaviate-client==3.23.2
239 | websockets==11.0.3
240 | wrapt==1.15.0
241 | xlrd==2.0.1
242 | XlsxWriter==3.1.9
243 | yarl==1.9.2
244 | zipp==3.16.2
245 | 


--------------------------------------------------------------------------------
/src/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/src/.DS_Store


--------------------------------------------------------------------------------
/src/balance_sheet.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | from pathlib import Path
  3 | script_dir = Path(__file__).resolve().parent
  4 | project_root = script_dir.parent
  5 | sys.path.append(str(project_root))
  6 | 
  7 | import requests
  8 | import streamlit as st
  9 | import os
 10 | # from dotenv import dotenv_values
 11 | 
 12 | from src.pydantic_models import BalanceSheetInsights
 13 | from src.utils import insights, get_total_revenue, safe_float, generate_pydantic_model
 14 | # from src.fields import balance_sheet_fields, balance_sheet_attributes
 15 | from src.fields2 import bal_sheet, balance_sheet_attributes
 16 | 
 17 | # config = dotenv_values(".env")
 18 | # OPENAI_API_KEY = config["OPENAI_API_KEY"]
 19 | # AV_API_KEY = config["ALPHA_VANTAGE_API_KEY"]
 20 | 
 21 | # AV_API_KEY = st.secrets["av_api_key"]
 22 | # OPENAI_API_KEY = st.secrets["openai_api_key"]
 23 | 
 24 | AV_API_KEY = os.environ.get("AV_API_KEY")
 25 | OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
 26 | 
 27 | def charts(data):
 28 |     report = data['annualReports'][0]
 29 |     asset_composition = {"total_current_assets": report['totalCurrentAssets'],
 30 |         "total_non_current_assets": report['totalNonCurrentAssets']              
 31 |     }
 32 | 
 33 |     liabilities_composition = {
 34 |         "total_current_liabilities": report['totalCurrentLiabilities'],
 35 |         "total_non_current_liabilities": report['totalNonCurrentLiabilities']
 36 |     }
 37 | 
 38 |     debt_structure = {
 39 |         "short_term_debt": report['shortTermDebt'],
 40 |         "long_term_debt": report['longTermDebt']
 41 |     }
 42 | 
 43 |     return {
 44 |         "asset_composition": asset_composition,
 45 |         "liabilities_composition": liabilities_composition,
 46 |         "debt_structure": debt_structure
 47 |     }
 48 | 
 49 |          
 50 | 
 51 | def metrics(data, total_revenue):
 52 | 
 53 |     # Extracting values from the data
 54 |     totalCurrentAssets = safe_float(data.get("totalCurrentAssets"))
 55 |     totalCurrentLiabilities = safe_float(data.get("totalCurrentLiabilities"))
 56 |     totalLiabilities = safe_float(data.get("totalLiabilities"))
 57 |     totalShareholderEquity = safe_float(data.get("totalShareholderEquity"))
 58 |     totalAssets = safe_float(data.get("totalAssets"))
 59 |     inventory = safe_float(data.get("inventory"))
 60 | 
 61 |     # Calculate metrics, but check for N/A values in operands
 62 |     current_ratio = (
 63 |         "N/A"
 64 |         if "N/A" in (totalCurrentAssets, totalCurrentLiabilities)
 65 |         else totalCurrentAssets / totalCurrentLiabilities
 66 |     )
 67 |     debt_to_equity_ratio = (
 68 |         "N/A"
 69 |         if "N/A" in (totalLiabilities, totalShareholderEquity)
 70 |         else totalLiabilities / totalShareholderEquity
 71 |     )
 72 |     quick_ratio = (
 73 |         "N/A"
 74 |         if "N/A" in (totalCurrentAssets, totalCurrentLiabilities, inventory)
 75 |         else (totalCurrentAssets - inventory) / totalCurrentLiabilities
 76 |     )
 77 |     asset_turnover = (
 78 |         "N/A" if "N/A" in (total_revenue, totalAssets) else total_revenue / totalAssets
 79 |     )
 80 |     equity_multiplier = (
 81 |         "N/A"
 82 |         if "N/A" in (totalAssets, totalShareholderEquity)
 83 |         else totalAssets / totalShareholderEquity
 84 |     )
 85 | 
 86 |     # Returning the results
 87 |     return {
 88 |         "current_ratio": current_ratio,
 89 |         "debt_to_equity_ratio": debt_to_equity_ratio,
 90 |         "quick_ratio": quick_ratio,
 91 |         "asset_turnover": asset_turnover,
 92 |         "equity_multiplier": equity_multiplier,
 93 |     }
 94 | 
 95 | 
 96 | def balance_sheet(symbol, fields_to_include, api_key):
 97 |     url = "https://www.alphavantage.co/query"
 98 |     params = {
 99 |         "function": "BALANCE_SHEET",
100 |         "symbol": symbol,
101 |         "apikey": AV_API_KEY
102 |     }
103 |     response = requests.get(url, params=params)
104 |     data = response.json()
105 |     if not data:
106 |             print(f"No data found for {symbol}")
107 |             return None
108 |     
109 |     if "Error Message" in data:
110 |         return {"Error": data["Error Message"]}
111 |     
112 |     chart_data = charts(data)
113 | 
114 |     report = data["annualReports"][0]
115 |     total_revenue = get_total_revenue(symbol)
116 |     met = metrics(report, total_revenue)
117 | 
118 |     data_for_insights = {
119 |         "annual_report_data": report,
120 |         "historical_data": chart_data,
121 |     }
122 | 
123 |     ins = {}
124 |     for i, field in enumerate(balance_sheet_attributes):
125 |         if fields_to_include[i]:
126 |             response = insights(field, "balance sheet", data_for_insights, str({field: bal_sheet[field]}), api_key)
127 |             ins[field] = response
128 | 
129 |     return {
130 |         "metrics": met,
131 |         "chart_data": chart_data,
132 |         "insights": ins
133 |     }
134 | 
135 | if __name__ == "__main__":
136 |     fields = [True, True, False, False, False]
137 |     data = balance_sheet("MSFT", fields, OPENAI_API_KEY)
138 |     print("Metrics: ", data['metrics'])
139 |     print("Chart Data: ", data['chart_data'])
140 |     print("Insights", data['insights'])
141 | 
142 | 
143 | 


--------------------------------------------------------------------------------
/src/cash_flow.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import sys
  3 | from pathlib import Path
  4 | script_dir = Path(__file__).resolve().parent
  5 | project_root = script_dir.parent
  6 | sys.path.append(str(project_root))
  7 | 
  8 | import requests
  9 | import streamlit as st
 10 | import os
 11 | # from dotenv import dotenv_values
 12 | 
 13 | from src.pydantic_models import CashFlowInsights
 14 | from src.utils import insights, get_total_revenue, get_total_debt, safe_float, generate_pydantic_model
 15 | from src.fields2 import cashflow, cashflow_attributes
 16 | # config = dotenv_values(".env")
 17 | # OPENAI_API_KEY = config["OPENAI_API_KEY"]
 18 | # AV_API_KEY = config["ALPHA_VANTAGE_API_KEY"]
 19 | 
 20 | # AV_API_KEY = st.secrets["av_api_key"]
 21 | # OPENAI_API_KEY = st.secrets["openai_api_key"]
 22 | 
 23 | AV_API_KEY = os.environ.get("AV_API_KEY")
 24 | OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
 25 | 
 26 | 
 27 | def charts(data):
 28 |     dates = []
 29 |     operating_cash_flow = []
 30 |     cash_flow_from_investment = []
 31 |     cash_flow_from_financing = []
 32 | 
 33 |     for report in reversed(data["annualReports"]):
 34 |         dates.append(report["fiscalDateEnding"])
 35 |         operating_cash_flow.append(report["operatingCashflow"])
 36 |         cash_flow_from_investment.append(report["cashflowFromInvestment"])
 37 |         cash_flow_from_financing.append(report["cashflowFromFinancing"])
 38 | 
 39 |     return {
 40 |         "dates": dates,
 41 |         "operating_cash_flow": operating_cash_flow,
 42 |         "cash_flow_from_investment": cash_flow_from_investment,
 43 |         "cash_flow_from_financing": cash_flow_from_financing
 44 |     }
 45 |     
 46 | 
 47 | def metrics(data, total_revenue, total_debt):
 48 | 
 49 |     # Helper function to safely convert to float or set to N/A
 50 |     
 51 | 
 52 |     operatingCashFlow = safe_float(data.get("operatingCashflow"))
 53 |     capitalExpenditures = safe_float(data.get("capitalExpenditures"))
 54 |     dividendPayout = safe_float(data.get("dividendPayout"))
 55 |     netIncome = safe_float(data.get("netIncome"))
 56 | 
 57 |     operating_cash_flow_margin = "N/A" if "N/A" in (operatingCashFlow, total_revenue) else operatingCashFlow / total_revenue
 58 |     capital_expenditure_coverage_ratio = "N/A" if "N/A" in (operatingCashFlow, capitalExpenditures) else operatingCashFlow / capitalExpenditures
 59 |     free_cash_flow = "N/A" if "N/A" in (operatingCashFlow, capitalExpenditures) else operatingCashFlow - capitalExpenditures
 60 |     dividend_coverage_ratio = "N/A" if "N/A" in (dividendPayout, netIncome) else netIncome / dividendPayout
 61 |     cash_flow_to_debt_ratio = "N/A" if "N/A" in (operatingCashFlow, total_debt) else operatingCashFlow / total_debt
 62 | 
 63 |     return {
 64 |         "operating_cash_flow_margin": operating_cash_flow_margin,
 65 |         "capital_expenditure_coverage_ratio": capital_expenditure_coverage_ratio,
 66 |         "free_cash_flow": free_cash_flow,
 67 |         "dividend_coverage_ratio": dividend_coverage_ratio,
 68 |         "cash_flow_to_debt_ratio": cash_flow_to_debt_ratio
 69 |     }
 70 | 
 71 | 
 72 | def cash_flow(symbol, fields_to_include, api_key):
 73 |     url = "https://www.alphavantage.co/query"
 74 |     params = {
 75 |         "function": "CASH_FLOW",
 76 |         "symbol": symbol,
 77 |         "apikey": AV_API_KEY
 78 |     }
 79 |     response = requests.get(url, params=params)
 80 |     data = response.json()
 81 |     if not data:
 82 |         print(f"No data found for {symbol}")
 83 |         return None
 84 |     
 85 |     if "Error Message" in data:
 86 |         return {"Error": data["Error Message"]}
 87 |     
 88 |     chart_data = charts(data)
 89 | 
 90 |     report = data["annualReports"][0]
 91 |     total_revenue = get_total_revenue(symbol)
 92 |     total_debt = get_total_debt(symbol)
 93 |     met = metrics(report, total_revenue, total_debt)
 94 | 
 95 |     data_for_insights = {
 96 |         "annual_report_data": report,
 97 |         "historical_data": chart_data,
 98 |     }
 99 |     ins = {}
100 |     for i, field in enumerate(cashflow_attributes):
101 |         if fields_to_include[i]:
102 |             response = insights(field, "cash flow", data_for_insights, str({field: cashflow[field]}), api_key)
103 |             ins[field] = response
104 | 
105 | 
106 |     return {
107 |         "metrics": met,
108 |         "chart_data": chart_data,
109 |         "insights": ins
110 |     }
111 | 
112 | if __name__ == "__main__":
113 |     fields = [True, True, False, False, False]
114 |     data = cash_flow("AAPL", fields, OPENAI_API_KEY)
115 |     print("Metrics: ", data['metrics'])
116 |     print("Chart Data: ", data['chart_data'])
117 |     print("Insights", data['insights'])
118 | 
119 | # if __name__ == "__main__":
120 | #     symbol = "AAPL"
121 | #     url = "https://www.alphavantage.co/query"
122 | #     params = {
123 | #         "function": "CASH_FLOW",
124 | #         "symbol": symbol,
125 | #         "apikey": AV_API_KEY
126 | #     }
127 | #     response = requests.get(url, params=params)
128 | #     data = response.json()
129 | #     if not data:
130 | #         print(f"No data found for {symbol}")
131 |     
132 | #     ans = charts(data)
133 | #     print(ans)


--------------------------------------------------------------------------------
/src/company_overview.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from pathlib import Path
 3 | script_dir = Path(__file__).resolve().parent
 4 | project_root = script_dir.parent
 5 | sys.path.append(str(project_root))
 6 | 
 7 | import requests
 8 | import streamlit as st
 9 | import os
10 | 
11 | from src.utils import safe_float
12 | 
13 | 
14 | # AV_API_KEY = st.secrets["av_api_key"]
15 | 
16 | AV_API_KEY = os.environ.get("AV_API_KEY")
17 | OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
18 | 
19 | 
20 | def company_overview(symbol):
21 |     url = "https://www.alphavantage.co/query"
22 |     params = {
23 |         "function": "OVERVIEW",
24 |         "symbol": symbol,
25 |         "apikey": AV_API_KEY
26 |     }
27 | 
28 |     # Send a GET request to the API
29 |     response = requests.get(url, params=params)
30 |     if response.status_code == 200:
31 |         data = response.json()
32 |         if not data:
33 |             print(f"No data found for {symbol}")
34 |             return None
35 |         
36 |         if "Error Message" in data:
37 |             return {"Error": data["Error Message"]}
38 | 
39 |         extracted_data = {
40 |             "Symbol": data.get("Symbol"),
41 |             "AssetType": data.get("AssetType"),
42 |             "Name": data.get("Name"),
43 |             "Description": data.get("Description"),
44 |             "CIK": data.get("CIK"),
45 |             "Exchange": data.get("Exchange"),
46 |             "Currency": data.get("Currency"),
47 |             "Country": data.get("Country"),
48 |             "Sector": data.get("Sector"),
49 |             "Industry": data.get("Industry"),
50 |             "Address": data.get("Address"),
51 |             "FiscalYearEnd": data.get("FiscalYearEnd"),
52 |             "LatestQuarter": data.get("LatestQuarter"),
53 |             "MarketCapitalization": safe_float(data.get("MarketCapitalization")),
54 |         }
55 |         
56 |     else:
57 |         print(f"Error: {response.status_code} - {response.text}")
58 | 
59 |     return extracted_data
60 | 
61 | 
62 | if __name__ == "__main__":
63 |     ans = company_overview("TSLA")
64 |     print(ans)
65 | 


--------------------------------------------------------------------------------
/src/fields.py:
--------------------------------------------------------------------------------
 1 | from pydantic import Field
 2 | 
 3 | min_length = 40
 4 | 
 5 | inc_stat_fields = {
 6 |     "revenue_health": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's total revenue, providing a perspective on the health of the primary business activity.")),
 7 |     "operational_efficiency": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of the company's operating expenses in relation to its revenue, offering a view into the firm's operational efficiency.")),
 8 |     "r_and_d_focus": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's commitment to research and development, signifying its emphasis on innovation and future growth.")),
 9 |     "debt_management": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of the company's interest expenses, highlighting the scale of its debt obligations and its approach to leveraging.")),
10 |     "profit_retention": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's net income, showcasing the amount retained post all expenses, which can be reinvested or distributed."))
11 | }
12 | 
13 | inc_stat_attributes = ["revenue_health", "operational_efficiency", "r_and_d_focus", "debt_management", "profit_retention"]
14 | 
15 | balance_sheet_fields = {
16 |     "liquidity_position": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's ability to meet its short-term obligations using its short-term assets.")),
17 |     "operational_efficiency": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of how efficiently the company is using its assets to generate sales.")),
18 |     "capital_structure": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's financial leverage and its reliance on external liabilities versus internal equity.")),
19 |     "inventory_management": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of the company's efficiency in managing, selling, and replacing its inventory.")),
20 |     "overall_solvency": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's overall ability to meet its long-term debts and obligations."))
21 | }
22 | 
23 | balance_sheet_attributes = ["liquidity_position", "operational_efficiency", "capital_structure", "inventory_management", "overall_solvency"]
24 | 
25 | cashflow_fields = {
26 |     "operational_cash_efficiency": (str, Field(..., description=f"Must be more than {min_length} words. Insight into how efficiently the company is generating cash from its core operations.")),
27 |     "investment_capability": (str, Field(..., description=f"Must be more than {min_length} words. Indicates the company's ability to invest in its business using its operational cash flows.")),
28 |     "financial_flexibility": (str, Field(..., description=f"Must be more than {min_length} words. Demonstrates the cash left after all operational expenses and investments, which can be used for dividends, share buybacks, or further investments.")),
29 |     "dividend_sustainability": (str, Field(..., description=f"Must be more than {min_length} words. Indicates the company's ability to cover its dividend payouts with its net earnings.")),
30 |     "debt_service_capability": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of the company's ability to service its debt using the operational cash flows."))
31 | }
32 | 
33 | cashflow_attributes = ["operational_cash_efficiency", "investment_capability", "financial_flexibility", "dividend_sustainability", "debt_service_capability"]
34 | 
35 | fiscal_year_fields = {
36 |     "performance_highlights": (str, Field(..., description="Key performance and financial stats over the fiscal year.")),
37 |     "major_events": (str, Field(..., description="Highlight of significant events, acquisitions, or strategic shifts that occurred during the year.")),
38 |     "challenges_encountered": (str, Field(..., description="Challenges the company faced during the year and, if and how they managed or overcame them.")),
39 | }
40 | fiscal_year_attributes = ["performance_highlights", "major_events", "challenges_encountered"]
41 | 
42 | strat_outlook_fields = {
43 |     "strategic_initiatives": (str, Field(..., description="The company's primary objectives and growth strategies for the upcoming years.")),
44 |     "market_outlook": (str, Field(..., description="Insights into the broader market, competitive landscape, and industry trends the company anticipates.")),
45 |     "product_roadmap": (str, Field(..., description="Upcoming launches, expansions, or innovations the company plans to roll out."))
46 | }
47 | 
48 | strat_outlook_attributes = ["strategic_initiatives", "market_outlook", "product_roadmap"]
49 | 
50 | risk_management_fields = {
51 |     "risk_factors": (str, Field(..., description="Primary risks the company acknowledges.")),
52 |     "risk_mitigation": (str, Field(..., description="Strategies for managing these risks."))
53 | }
54 | 
55 | risk_management_attributes = ["risk_factors", "risk_mitigation"]
56 | 
57 | 
58 | innovation_fields = {
59 |     "r_and_d_activities": (str, Field(..., description="Overview of the company's focus on research and development, major achievements, or breakthroughs.")),
60 |     "innovation_focus": (str, Field(..., description="Mention of new technologies, patents, or areas of research the company is diving into."))
61 | }
62 | 
63 | innovation_attributes = ["r_and_d_activities", "innovation_focus"]
64 | 
65 | 


--------------------------------------------------------------------------------
/src/fields2.py:
--------------------------------------------------------------------------------
 1 | min_length = 40
 2 | 
 3 | inc_stat = {
 4 |     "revenue_health": f"Must be more than {min_length} words. Insight into the company's total revenue, providing a perspective on the health of the primary business activity.",
 5 |     "operational_efficiency": f"Must be more than {min_length} words. Analysis of the company's operating expenses in relation to its revenue, offering a view into the firm's operational efficiency.",
 6 |     "r_and_d_focus": f"Must be more than {min_length} words. Insight into the company's commitment to research and development, signifying its emphasis on innovation and future growth.",
 7 |     "debt_management": f"Must be more than {min_length} words. Analysis of the company's interest expenses, highlighting the scale of its debt obligations and its approach to leveraging.",
 8 |     "profit_retention": f"Must be more than {min_length} words. Insight into the company's net income, showcasing the amount retained post all expenses, which can be reinvested or distributed."
 9 | }
10 | 
11 | inc_stat_attributes = ["revenue_health", "operational_efficiency", "r_and_d_focus", "debt_management", "profit_retention"]
12 | 
13 | bal_sheet = {
14 |     "liquidity_position": f"Must be more than {min_length} words. Insight into the company's ability to meet its short-term obligations using its short-term assets.",
15 |     "assets_efficiency": f"Must be more than {min_length} words. Analysis of how efficiently the company is using its assets to generate sales.",
16 |     "capital_structure": f"Must be more than {min_length} words. Insight into the company's financial leverage and its reliance on external liabilities versus internal equity.",
17 |     "inventory_management": f"Must be more than {min_length} words. Analysis of the company's efficiency in managing, selling, and replacing its inventory.",
18 |     "overall_solvency": f"Must be more than {min_length} words. Insight into the company's overall ability to meet its long-term debts and obligations."
19 | }
20 | balance_sheet_attributes = ["liquidity_position", "assets_efficiency", "capital_structure", "inventory_management", "overall_solvency"]
21 | 
22 | 
23 | cashflow = {
24 |     "operational_cash_efficiency": f"Must be more than {min_length} words. Insight into how efficiently the company is generating cash from its core operations.",
25 |     "investment_capability": f"Must be more than {min_length} words. Indicates the company's ability to invest in its business using its operational cash flows.",
26 |     "financial_flexibility": f"Must be more than {min_length} words. Demonstrates the cash left after all operational expenses and investments, which can be used for dividends, share buybacks, or further investments.",
27 |     "dividend_sustainability": f"Must be more than {min_length} words. Indicates the company's ability to cover its dividend payouts with its net earnings.",
28 |     "debt_service_capability": f"Must be more than {min_length} words. Analysis of the company's ability to service its debt using the operational cash flows."
29 | }
30 | 
31 | cashflow_attributes = ["operational_cash_efficiency", "investment_capability", "financial_flexibility", "dividend_sustainability", "debt_service_capability"]
32 | 
33 | fiscal_year = {
34 |     "performance_highlights": "Key performance and financial stats over the fiscal year.",
35 |     "major_events": "Highlight of significant events, acquisitions, or strategic shifts that occurred during the year.",
36 |     "challenges_encountered": "Challenges the company faced during the year and, if and how they managed or overcame them."
37 | }
38 | 
39 | fiscal_year_attributes = ["performance_highlights", "major_events", "challenges_encountered"]
40 | 
41 | strat_outlook = {
42 |     "strategic_initiatives": "The company's primary objectives and growth strategies for the upcoming years.",
43 |     "market_outlook": "Insights into the broader market, competitive landscape, and industry trends the company anticipates.",
44 |     "product_roadmap": "Upcoming launches, expansions, or innovations the company plans to roll out."
45 | }
46 | 
47 | strat_outlook_attributes = ["strategic_initiatives", "market_outlook", "product_roadmap"]
48 | 
49 | risk_management = {
50 |     "risk_factors": "Primary risks the company acknowledges.",
51 |     "risk_mitigation": "Strategies for managing these risks."
52 | }
53 | 
54 | risk_management_attributes = ["risk_factors", "risk_mitigation"]
55 | 
56 | innovation = {
57 |     "r_and_d_activities": "Overview of the company's focus on research and development, major achievements, or breakthroughs.",
58 |     "innovation_focus": "Mention of new technologies, patents, or areas of research the company is diving into."
59 | }
60 | 
61 | innovation_attributes = ["r_and_d_activities", "innovation_focus"]


--------------------------------------------------------------------------------
/src/income_statement.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | from pathlib import Path
  3 | script_dir = Path(__file__).resolve().parent
  4 | project_root = script_dir.parent
  5 | sys.path.append(str(project_root))
  6 | 
  7 | import os
  8 | import pandas as pd
  9 | import requests
 10 | import streamlit as st
 11 | import plotly.graph_objects as go
 12 | # from dotenv import dotenv_values
 13 | 
 14 | from src.pydantic_models import IncomeStatementInsights
 15 | from src.utils import insights, safe_float, generate_pydantic_model
 16 | from src.fields import inc_stat_attributes, inc_stat_fields
 17 | from src.fields2 import inc_stat, inc_stat_attributes
 18 | 
 19 | # config = dotenv_values(".env")
 20 | # OPENAI_API_KEY = config["OPENAI_API_KEY"]
 21 | # AV_API_KEY = config["ALPHA_VANTAGE_API_KEY"]
 22 | 
 23 | # AV_API_KEY = st.secrets["av_api_key"]
 24 | # OPENAI_API_KEY = st.secrets["openai_api_key"]
 25 | 
 26 | OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
 27 | AV_API_KEY = os.environ.get("AV_API_KEY")
 28 | 
 29 | ## 
 30 | 
 31 | def charts(data):
 32 |     dates = []
 33 |     total_revenue = []
 34 |     net_income = []
 35 |     interest_expense = []
 36 | 
 37 |     print(data)
 38 | 
 39 |     for report in reversed(data["annualReports"]):
 40 |         dates.append(report["fiscalDateEnding"])
 41 |         total_revenue.append(report["totalRevenue"])
 42 |         net_income.append(report["netIncome"])
 43 |         interest_expense.append(report["interestAndDebtExpense"])
 44 | 
 45 |     return {
 46 |         "dates": dates,
 47 |         "total_revenue": total_revenue,
 48 |         "net_income": net_income,
 49 |         "interest_expense": interest_expense
 50 |     }
 51 | 
 52 | 
 53 | def metrics(data):
 54 | 
 55 |     # Extracting values from the data
 56 |     grossProfit = safe_float(data.get("grossProfit"))
 57 |     totalRevenue = safe_float(data.get("totalRevenue"))
 58 |     operatingIncome = safe_float(data.get("operatingIncome"))
 59 |     costOfRevenue = safe_float(data.get("costOfRevenue"))
 60 |     costofGoodsAndServicesSold = safe_float(data.get("costofGoodsAndServicesSold"))
 61 |     sellingGeneralAndAdministrative = safe_float(data.get("sellingGeneralAndAdministrative"))
 62 |     ebit = safe_float(data.get("ebit"))
 63 |     interestAndDebtExpense = safe_float(data.get("interestAndDebtExpense"))
 64 |     netIncome = safe_float(data["netIncome"])
 65 | 
 66 |     # Calculate metrics, but check for N/A values in operands
 67 |     gross_profit_margin = (
 68 |         "N/A" if "N/A" in (grossProfit, totalRevenue) else grossProfit / totalRevenue
 69 |     )
 70 |     operating_profit_margin = (
 71 |         "N/A" if "N/A" in (operatingIncome, totalRevenue) else operatingIncome / totalRevenue
 72 |     )
 73 |     net_profit_margin = (
 74 |         "N/A" if "N/A" in (netIncome, totalRevenue) else netIncome / totalRevenue
 75 |     )
 76 |     cost_efficiency = (
 77 |         "N/A"
 78 |         if "N/A" in (totalRevenue, costOfRevenue, costofGoodsAndServicesSold)
 79 |         else totalRevenue / (costOfRevenue + costofGoodsAndServicesSold)
 80 |     )
 81 |     sg_and_a_efficiency = (
 82 |         "N/A"
 83 |         if "N/A" in (totalRevenue, sellingGeneralAndAdministrative)
 84 |         else totalRevenue / sellingGeneralAndAdministrative
 85 |     )
 86 |     interest_coverage_ratio = (
 87 |         "N/A" if "N/A" in (ebit, interestAndDebtExpense) else ebit / interestAndDebtExpense
 88 |     )
 89 | 
 90 |     # Returning the results
 91 |     return {
 92 |         "gross_profit_margin": gross_profit_margin,
 93 |         "operating_profit_margin": operating_profit_margin,
 94 |         "net_profit_margin": net_profit_margin,
 95 |         "cost_efficiency": cost_efficiency,
 96 |         "sg_and_a_efficiency": sg_and_a_efficiency,
 97 |         "interest_coverage_ratio": interest_coverage_ratio,
 98 |     }
 99 | 
100 | 
101 | 
102 | def income_statement(symbol, fields_to_include, api_key):
103 |     url = "https://www.alphavantage.co/query"
104 |     params = {
105 |         "function": "INCOME_STATEMENT",
106 |         "symbol": symbol,
107 |         "apikey": AV_API_KEY
108 |     }
109 | 
110 |     # Send a GET request to the API
111 |     response = requests.get(url, params=params)
112 |     if response.status_code == 200:
113 |         data = response.json()
114 |         if not data:
115 |             print(f"No data found for {symbol}")
116 |             return None
117 |         
118 |     else:
119 |         print(f"Error: {response.status_code} - {response.text}")
120 | 
121 |     if 'Error Message' in data:
122 |         return {"Error": data['Error Message']}    
123 | 
124 |     chart_data = charts(data)
125 | 
126 |     report = data["annualReports"][0]
127 |     met = metrics(report)
128 | 
129 |     data_for_insights = {
130 |         "annual_report_data": report,
131 |         "historical_data": chart_data,
132 |     }
133 | 
134 |     ins = {}
135 |     for i, field in enumerate(inc_stat_attributes):
136 |         if fields_to_include[i]:
137 |             response = insights(field, "income statement", data_for_insights, str({field: inc_stat[field]}), api_key)
138 |             ins[field] = response
139 | 
140 |     return {
141 |         "metrics": met,
142 |         "chart_data": chart_data,
143 |         "insights": ins
144 |     }
145 | 
146 | 
147 | if __name__ == "__main__":
148 |     fields_to_include = [True, False, False, False, True]
149 | 
150 |     data = income_statement("TSLA", fields_to_include, OPENAI_API_KEY)
151 |     print("Metrics: ", data['metrics'])
152 |     print("Chart Data: ", data['chart_data'])
153 |     print("Insights", data['insights'])
154 |     
155 | 
156 | 
157 | 
158 |     
159 | 
160 | 
161 | 
162 | 
163 | 
164 | 


--------------------------------------------------------------------------------
/src/news_sentiment.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | from pathlib import Path
  3 | script_dir = Path(__file__).resolve().parent
  4 | project_root = script_dir.parent
  5 | sys.path.append(str(project_root))
  6 | 
  7 | import streamlit as st
  8 | import requests
  9 | from datetime import datetime, timedelta
 10 | import pandas as pd
 11 | import os
 12 | 
 13 | # AV_API_KEY = st.secrets["av_api_key"]
 14 | 
 15 | AV_API_KEY = os.environ.get("AV_API_KEY")
 16 | OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
 17 | 
 18 | def classify_sentiment(mean_score):
 19 |     if mean_score <= -0.35:
 20 |         return "Bearish"
 21 |     elif -0.35 < mean_score <= -0.15:
 22 |         return "Somewhat-Bearish"
 23 |     elif -0.15 < mean_score < 0.15:
 24 |         return "Neutral"
 25 |     elif 0.15 <= mean_score < 0.35:
 26 |         return "Somewhat_Bullish"
 27 |     elif mean_score >= 0.35:
 28 |         return "Bullish"
 29 |     else:
 30 |         return "Undefined"
 31 | 
 32 | def top_news(symbol, max_feed):
 33 | 
 34 |     current_datetime = datetime.now()
 35 |     one_year_ago = current_datetime - timedelta(days=365)
 36 |     formatted_time_from = one_year_ago.strftime("%Y%m%dT%H%M")
 37 |     print("time_from=", formatted_time_from)
 38 | 
 39 |     url = "https://www.alphavantage.co/query"
 40 |     params = {
 41 |         "function": "NEWS_SENTIMENT",
 42 |         "tickers": symbol,
 43 |         "apikey": AV_API_KEY,
 44 |         "sort": "RELEVANCE",
 45 |     }
 46 | 
 47 |     # Send a GET request to the API
 48 |     response = requests.get(url, params=params)
 49 |     if response.status_code == 200:
 50 |         data = response.json()
 51 |         if not data:
 52 |             print(f"No data found for {symbol}")
 53 |             print(data)
 54 |             return None
 55 |         news = []
 56 | 
 57 |         if "Error Message" in data:
 58 |             return {"Error": data["Error Message"]}
 59 | 
 60 |         try:
 61 |             for i in data["feed"][:max_feed]:
 62 |                 temp = {}
 63 |                 temp["title"] = i["title"]
 64 |                 temp["url"] = i["url"]
 65 |                 temp["authors"] = i["authors"]
 66 | 
 67 |                 topics = []
 68 |                 for j in i["topics"]:
 69 |                     topics.append(j["topic"])
 70 |                 temp["topics"] = topics
 71 | 
 72 |                 sentiment_score = ""
 73 |                 sentiment_label = ""
 74 |                 for j in i["ticker_sentiment"]:
 75 |                     if j["ticker"] == symbol:
 76 |                         sentiment_score = j["ticker_sentiment_score"]
 77 |                         sentiment_label = j["ticker_sentiment_label"]
 78 |                         break
 79 |                 temp["sentiment_score"] = sentiment_score
 80 |                 temp["sentiment_label"] = sentiment_label
 81 | 
 82 |                 news.append(temp)
 83 | 
 84 |         except Exception as e:
 85 |             print(e)
 86 |             return None
 87 |         
 88 |     else:
 89 |         print(f"Error: {response.status_code} - {response.text}")
 90 | 
 91 |     news = pd.DataFrame(news)
 92 |     news["sentiment_score"] = pd.to_numeric(news["sentiment_score"])
 93 |     mean_sentiment_score  = news["sentiment_score"].mean()
 94 |     mean_sentiment_class = classify_sentiment(mean_sentiment_score)
 95 | 
 96 |     return {
 97 |         "news": news,
 98 |         "mean_sentiment_score": mean_sentiment_score,
 99 |         "mean_sentiment_class": mean_sentiment_class
100 |     }
101 | 
102 | if __name__ == "__main__":
103 |     news = top_news("AAPL", 10)
104 |     print(news)
105 | 
106 | 


--------------------------------------------------------------------------------
/src/pages/1_📊_Finance_Metrics_Review.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | from pathlib import Path
  3 | script_dir = Path(__file__).resolve().parent
  4 | project_root = script_dir.parent
  5 | sys.path.append(str(project_root))
  6 | 
  7 | 
  8 | import streamlit as st
  9 | import os
 10 | 
 11 | st.set_page_config(page_title="Finance Metrics Reviews", page_icon=":bar_chart:", layout="wide", initial_sidebar_state="collapsed")
 12 | 
 13 | st.title(":chart_with_upwards_trend: Finance Metrics Review")
 14 | st.info("""
 15 | Simply input the ticker symbol of your desired company and hit the 'Generate Insights' button. Allow a few moments for the system to compile the data and insights tailored to the selected company. Once done, you have the option to browse through these insights directly on the platform or download a comprehensive report by selecting 'Generate PDF', followed by 'Download PDF'.
 16 | """)
 17 |         
 18 | 
 19 | from src.income_statement import income_statement
 20 | from src.balance_sheet import balance_sheet
 21 | from src.cash_flow import cash_flow
 22 | from src.news_sentiment import top_news
 23 | from src.company_overview import company_overview
 24 | from src.utils import round_numeric, format_currency, create_donut_chart, create_bar_chart
 25 | from src.pdf_gen import gen_pdf
 26 | from src.fields2 import inc_stat, inc_stat_attributes, bal_sheet, balance_sheet_attributes, cashflow, cashflow_attributes
 27 | 
 28 | st.sidebar.info("""
 29 | You can get your API keys here: [OpenAI](https://openai.com/blog/openai-api), [AlphaVantage](https://www.alphavantage.co/support/#api-key), 
 30 | """)
 31 | 
 32 | OPENAI_API_KEY = st.sidebar.text_input("Enter OpenAI API key", type="password")
 33 | os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
 34 | 
 35 | AV_API_KEY = st.sidebar.text_input("Enter Alpha Vantage API key", type="password")
 36 | os.environ["AV_API_KEY"] = AV_API_KEY
 37 | 
 38 | 
 39 | if not OPENAI_API_KEY:
 40 |     st.error("Please enter your OpenAI API Key")
 41 | elif not AV_API_KEY:
 42 |     st.error("Please enter your Alpha Vantage API Key")
 43 | else:
 44 | 
 45 | 
 46 |     col1, col2 = st.columns([0.25, 0.75], gap="medium")
 47 | 
 48 |     with col1:
 49 |         st.write("""
 50 |         ### Select Insights
 51 |         """)
 52 |         with st.expander("**Income Statement Insights**", expanded=True):
 53 |             revenue_health = st.toggle("Revenue Health")
 54 |             operational_efficiency = st.toggle("Operational Efficiency")
 55 |             r_and_d_focus = st.toggle("R&D Focus")
 56 |             debt_management = st.toggle("Debt Management")
 57 |             profit_retention = st.toggle("Profit Retention")
 58 | 
 59 | 
 60 |             income_statement_feature_list = [revenue_health, operational_efficiency, r_and_d_focus, debt_management, profit_retention]
 61 | 
 62 |         with st.expander("**Balance Sheet Insights**", expanded=True):
 63 |             liquidity_position = st.toggle("Liquidity Position")
 64 |             assets_efficiency = st.toggle("Operational efficiency")
 65 |             capital_structure = st.toggle("Capital Structure")
 66 |             inventory_management = st.toggle("Inventory Management")
 67 |             overall_solvency = st.toggle("Overall Solvency")
 68 | 
 69 |             balance_sheet_feature_list = [liquidity_position, assets_efficiency, capital_structure, inventory_management, overall_solvency]
 70 | 
 71 |         with st.expander("**Cash Flow Insights**", expanded=True):
 72 |             operational_cash_efficiency = st.toggle("Operational Cash Efficiency")
 73 |             investment_capability = st.toggle("Investment Capability")
 74 |             financial_flexibility = st.toggle("Financial Flexibility")
 75 |             dividend_sustainability = st.toggle("Dividend Sustainability")
 76 |             debt_service_capability = st.toggle("Debt Service Capability")
 77 | 
 78 |             cash_flow_feature_list = [operational_cash_efficiency, investment_capability, financial_flexibility, dividend_sustainability, debt_service_capability]
 79 | 
 80 | 
 81 |     with col2:
 82 |         ticker = st.text_input("**Enter ticker symbol**")
 83 |         st.warning("Example Tickers: Apple Inc. - AAPL, Microsoft Corporation - MSFT, Tesla Inc. - TSLA")
 84 | 
 85 | 
 86 |         for insight in inc_stat_attributes:
 87 |             if insight not in st.session_state:
 88 |                 st.session_state[insight] = None
 89 | 
 90 |         for insight in balance_sheet_attributes:
 91 |             if insight not in st.session_state:
 92 |                 st.session_state[insight] = None
 93 | 
 94 |         for insight in cashflow_attributes:
 95 |             if insight not in st.session_state:
 96 |                 st.session_state[insight] = None
 97 |  
 98 | 
 99 |         if "company_overview" not in st.session_state:
100 |             st.session_state.company_overview = None
101 | 
102 |         if "income_statement" not in st.session_state:
103 |             st.session_state.income_statement = None
104 | 
105 |         if "balance_sheet" not in st.session_state:
106 |             st.session_state.balance_sheet = None
107 | 
108 |         if "cash_flow" not in st.session_state:
109 |             st.session_state.cash_flow = None
110 | 
111 |         if "news" not in st.session_state:
112 |             st.session_state.news = None
113 | 
114 |         if "all_outputs" not in st.session_state:
115 |             st.session_state.all_outputs = None
116 | 
117 |         if ticker:
118 |             if st.button("Generate Insights"):
119 | 
120 |                 with st.status("**Generating Insights...**"):
121 | 
122 | 
123 |                     if not st.session_state.company_overview:
124 |                         st.write("Getting company overview...")
125 |                         st.session_state.company_overview = company_overview(ticker)
126 |                         
127 |                     
128 |                     if any(income_statement_feature_list):
129 |                         st.write("Generating income statement insights...")
130 |                         for i, insight in enumerate(inc_stat_attributes):
131 |                             if st.session_state[insight]:
132 |                                    income_statement_feature_list[i] = False 
133 | 
134 |                         response = income_statement(ticker, income_statement_feature_list, OPENAI_API_KEY)
135 | 
136 |                         st.session_state.income_statement = response
137 |                         
138 |                         for key, value in response["insights"].items():
139 |                             st.session_state[key] = value
140 |                     
141 |                     
142 |                     if any(balance_sheet_feature_list):
143 |                         st.write("Generating balance sheet insights...")
144 |                         for i, insight in enumerate(balance_sheet_attributes):
145 |                             if st.session_state[insight]:
146 |                                    balance_sheet_feature_list[i] = False
147 | 
148 |                         response = balance_sheet(ticker, balance_sheet_feature_list, OPENAI_API_KEY)
149 | 
150 |                         st.session_state.balance_sheet = response
151 | 
152 |                         for key, value in response["insights"].items():
153 |                             st.session_state[key] = value
154 |                     
155 |                     
156 |                     if any(cash_flow_feature_list):
157 |                         st.write("Generating cash flow insights...")
158 |                         for i, insight in enumerate(cashflow_attributes):
159 |                             if st.session_state[insight]:
160 |                                    cash_flow_feature_list[i] = False
161 | 
162 |                         
163 | 
164 |                         response = cash_flow(ticker, cash_flow_feature_list, OPENAI_API_KEY)
165 | 
166 |                         st.session_state.cash_flow = response
167 | 
168 |                         for key, value in response["insights"].items():
169 |                             st.session_state[key] = value
170 | 
171 |                     if not st.session_state.news:
172 |                         st.write('Getting latest news...')
173 |                         st.session_state.news = top_news(ticker, 10)
174 | 
175 |                     if st.session_state.company_overview and st.session_state.income_statement and st.session_state.balance_sheet and st.session_state.cash_flow and st.session_state.news:
176 |                         st.session_state.all_outputs = True
177 | 
178 |                     if st.session_state.company_overview == None:
179 |                         st.error(f"No Data available")
180 | 
181 |         if st.session_state.all_outputs:
182 |             st.toast("Insights successfully Generated!")
183 |             if st.button("Generate PDF"):
184 |                 gen_pdf(st.session_state.company_overview["Name"], 
185 |                     st.session_state.company_overview,
186 |                     st.session_state.income_statement,
187 |                     st.session_state.balance_sheet,
188 |                     st.session_state.cash_flow,
189 |                     None)
190 |                 st.toast("PDF successfully generated!")
191 |                 with open("pdf/final_report.pdf", "rb") as file:
192 |                     st.download_button(
193 |                         label="Download PDF",
194 |                         data=file,
195 |                         file_name="final_report.pdf",
196 |                         mime="application/pdf"
197 |                     )
198 | 
199 |             
200 | 
201 |         tab1, tab2, tab3, tab4, tab5 = st.tabs(["Company Overview", "Income Statement", "Balance Sheet", "Cash Flow", "News Sentiment"])
202 | 
203 | 
204 |         if st.session_state.company_overview:
205 | 
206 |             if "Error" in st.session_state.company_overview:
207 |                 st.error(st.session_state.company_overview["Error Message"])
208 | 
209 |             else:
210 |                 with tab1:
211 |                     with st.container():
212 |                         
213 |                         st.write("# Company Overview")
214 |                         # st.markdown("### Company Name:")
215 |                         st.markdown(f"""### {st.session_state.company_overview["Name"]}""")
216 |                         col1, col2, col3 = st.columns(3)
217 |                         col1.markdown("### Symbol:")
218 |                         col1.write(st.session_state.company_overview["Symbol"])
219 |                         col2.markdown("### Exchange:")
220 |                         col2.write(st.session_state.company_overview["Exchange"])
221 |                         col3.markdown("### Currency:")
222 |                         col3.write(st.session_state.company_overview["Currency"])
223 | 
224 |                         col1, col2, col3 = st.columns(3)
225 |                         col1.markdown("### Sector:")
226 |                         col1.write(st.session_state.company_overview["Sector"])
227 |                         col2.markdown("### Industry:")
228 |                         col2.write(st.session_state.company_overview["Industry"])
229 |                         col3.write()
230 |                         st.markdown("### Description:")
231 |                         st.write(st.session_state.company_overview["Description"])
232 |                         
233 |                         col1, col2, col3 = st.columns(3)
234 |                         col1.markdown("### Country:")
235 |                         col1.write(st.session_state.company_overview["Country"])
236 |                         col2.markdown("### Address:")
237 |                         col2.write(st.session_state.company_overview["Address"])
238 |                         col3.write()
239 | 
240 |                         col1, col2, col3 = st.columns(3)
241 |                         col1.markdown("### Fiscal Year End:")
242 |                         col1.write(st.session_state.company_overview["FiscalYearEnd"])
243 |                         col2.markdown("### Latest Quarter:")
244 |                         col2.write(st.session_state.company_overview["LatestQuarter"])
245 |                         col3.markdown("### Market Capitalization:")
246 |                         col3.write(format_currency(st.session_state.company_overview["MarketCapitalization"]))
247 | 
248 | 
249 |         if st.session_state.income_statement:
250 | 
251 |             if "Error" in st.session_state.income_statement:
252 |                 st.error(st.session_state.income_statement["Error Message"])
253 | 
254 |             else:
255 | 
256 |                 with tab2:
257 |                     
258 |                     st.write("# Income Statement")
259 |                     st.write("## Metrics")
260 | 
261 |                     with st.container():
262 | 
263 |                         col1, col2, col3 = st.columns(3)
264 | 
265 |                         col1.metric("Gross Profit Margin", round_numeric(st.session_state.income_statement["metrics"]["gross_profit_margin"], 2))
266 |                         col2.metric("Operating Profit Margin", round_numeric(st.session_state.income_statement["metrics"]["operating_profit_margin"], 2))
267 |                         col3.metric("Net Profit Margin", round_numeric(st.session_state.income_statement["metrics"]["net_profit_margin"], 2))
268 |                         col1.metric("Cost Efficiency", round_numeric(st.session_state.income_statement["metrics"]["cost_efficiency"], 2))
269 |                         col2.metric("SG&A Efficiency", round_numeric(st.session_state.income_statement["metrics"]["sg_and_a_efficiency"], 2))
270 |                         col3.metric("Interest Coverage Ratio", round_numeric(st.session_state.income_statement["metrics"]["interest_coverage_ratio"], 2))
271 |                 
272 |                     
273 |                     st.write("## Insights")
274 | 
275 |                     
276 |                     if revenue_health:
277 |                         if st.session_state["revenue_health"]:
278 |                             st.write("### Revenue Health")
279 |                             st.markdown(st.session_state["revenue_health"])
280 |                             total_revenue_chart = create_bar_chart(st.session_state.income_statement["chart_data"], 
281 |                                                                         "total_revenue", 
282 |                                                                         "Revenue Growth")
283 |                             st.write(total_revenue_chart)
284 |                         else:
285 |                             st.error("Revenue Health insight has not been generated")
286 |                 
287 | 
288 |                     if operational_efficiency:
289 |                         if st.session_state["operational_efficiency"]:
290 |                             st.write("### Operational Efficiency")
291 |                             st.write(st.session_state["operational_efficiency"])
292 |                         else:
293 |                             st.error("Operational Efficiency insight has not been generated")
294 |                     
295 |                     
296 |                     if r_and_d_focus:
297 |                         if st.session_state["r_and_d_focus"]:
298 |                             st.write("### R&D Focus")
299 |                             st.write(st.session_state["r_and_d_focus"])
300 |                         else:
301 |                             st.error("R&D Focus insight has not been generated")
302 |                 
303 | 
304 |                     
305 |                     if debt_management:
306 |                         if st.session_state["debt_management"]:
307 |                             st.write("### Debt Management")
308 |                             st.write(st.session_state["debt_management"])
309 |                             interest_expense_chart = create_bar_chart(st.session_state.income_statement["chart_data"], 
310 |                                                                             "interest_expense", 
311 |                                                                             "Debt Service Obligation")
312 |                             st.write(interest_expense_chart)
313 |                         else:
314 |                             st.error("Debt Management insight has not been generated")
315 |                         
316 |                     
317 | 
318 |                     
319 |                     if profit_retention:
320 |                         if st.session_state["profit_retention"]:
321 |                             st.write("### Profit Retention")
322 |                             st.write(st.session_state["profit_retention"])
323 |                             net_income_chart = create_bar_chart(st.session_state.income_statement["chart_data"], 
324 |                                                                     "net_income",
325 |                                                                     "Profitability Trend")
326 |                             st.write(net_income_chart)
327 |                         else:
328 |                             st.error("Profit Retention insight has not been generated")
329 |                 
330 | 
331 | 
332 |         if st.session_state.balance_sheet:
333 |             with tab3:
334 |                 
335 |                 st.write("# Balance Sheet")
336 |                 st.write("## Metrics")
337 | 
338 |                 with st.container():
339 | 
340 |                     col1, col2, col3 = st.columns(3)
341 | 
342 |                     col1.metric("Current Ratio", round_numeric(st.session_state.balance_sheet['metrics']['current_ratio'], 2))
343 |                     col2.metric("Debt to Equity Ratio", round_numeric(st.session_state.balance_sheet['metrics']['debt_to_equity_ratio'], 2))
344 |                     col3.metric("Quick Ratio", round_numeric(st.session_state.balance_sheet['metrics']['quick_ratio'], 2))
345 |                     col1.metric("Asset Turnover", round_numeric(st.session_state.balance_sheet['metrics']['asset_turnover'], 2))
346 |                     col2.metric("Equity Multiplier", round_numeric(st.session_state.balance_sheet['metrics']['equity_multiplier'], 2))
347 | 
348 | 
349 | 
350 |                 st.write("## Insights")
351 | 
352 |                 
353 |                 if liquidity_position:
354 |                     if st.session_state['liquidity_position']:
355 |                         st.write("### Liquidity Position")
356 |                         st.write(st.session_state["liquidity_position"])
357 |                         asset_comp_chart = create_donut_chart(st.session_state.balance_sheet["chart_data"],"asset_composition")
358 |                         st.write(asset_comp_chart)
359 |                     else:
360 |                         st.error("Liquidity Position insight has not been generated")
361 | 
362 | 
363 |                 if assets_efficiency:
364 |                     if st.session_state['assets_efficiency']:
365 |                         st.write("### Assets Efficiency")
366 |                         st.write(st.session_state["assets_efficiency"])
367 |                     else:
368 |                         st.error("Assets Efficiency insight has not been generated")
369 | 
370 |                 
371 |                 if capital_structure:
372 |                     if st.session_state['capital_structure']:
373 |                         st.write("### Capital Structure")
374 |                         st.write(st.session_state["capital_structure"])
375 |                         liabilities_comp_chart = create_donut_chart(st.session_state.balance_sheet["chart_data"],"liabilities_composition")
376 |                         st.write(liabilities_comp_chart)
377 |                     else:
378 |                         st.error("Capital Structure insight has not been generated")
379 |                    
380 | 
381 |                 if inventory_management:
382 |                     if st.session_state['inventory_management']:
383 |                         st.write("### Inventory Management")
384 |                         st.write(st.session_state["inventory_management"])
385 |                     else:
386 |                         st.error("Inventory Management insight has not been generated")
387 | 
388 |                 if overall_solvency:
389 |                     if st.session_state['overall_solvency']:
390 |                         st.write("### Overall Solvency")
391 |                         st.write(st.session_state["overall_solvency"])
392 |                         liabilities_comp_chart = create_donut_chart(st.session_state.balance_sheet["chart_data"],"debt_structure")
393 |                         st.write(liabilities_comp_chart)
394 |                     else:
395 |                         st.error("Overall Solvency insight has not been generated")
396 |             
397 | 
398 |         if st.session_state.cash_flow:
399 |             with tab4:
400 |                     
401 |                 st.write("# Cash Flow")
402 |                 st.write("## Metrics")
403 | 
404 |                 with st.container():
405 | 
406 |                     col1, col2, col3 = st.columns(3)
407 | 
408 |                     col1.metric("Operating Cash Flow Margin", round_numeric(st.session_state.cash_flow['metrics']['operating_cash_flow_margin'], 2))
409 |                     col2.metric("Capital Expenditure Coverage Ratio", round_numeric(st.session_state.cash_flow['metrics']['capital_expenditure_coverage_ratio'], 2))
410 |                     col3.metric("Dividend Coverage Ratio", round_numeric(st.session_state.cash_flow['metrics']['dividend_coverage_ratio'], 2))
411 |                     col1.metric("Cash Flow to Debt Ratio", round_numeric(st.session_state.cash_flow['metrics']['cash_flow_to_debt_ratio'], 2))
412 |                     
413 |                     col2.metric("Free Cash Flow", format_currency(st.session_state.cash_flow['metrics']['free_cash_flow']))
414 |                     
415 | 
416 |                 if operational_cash_efficiency:
417 |                     if st.session_state["operational_cash_efficiency"]:
418 |                         st.write("## Insights")
419 |                         st.write("### Operational Cash Efficiency")
420 |                         st.write(st.session_state["operational_cash_efficiency"])
421 |                         operating_cash_flow_chart = create_bar_chart(st.session_state.cash_flow["chart_data"], 
422 |                                                                             "operating_cash_flow", 
423 |                                                                             "Operating Cash Flow Trend")
424 |                         st.write(operating_cash_flow_chart)
425 |                     else:
426 |                         st.error("Operational Cash Efficiency insight has not been generated")
427 | 
428 |                 if investment_capability:
429 |                     if st.session_state["investment_capability"]:
430 |                         st.write("### Investment Capability")
431 |                         st.write(st.session_state["investment_capability"])
432 |                         cash_flow_from_investment_chart = create_bar_chart(st.session_state.cash_flow["chart_data"], 
433 |                                                                                 "cash_flow_from_investment", 
434 |                                                                                 "Investment Capability Trend")
435 |                         st.write(cash_flow_from_investment_chart)
436 |                     else:
437 |                         st.error("Investment Capability insight has not been generated")
438 | 
439 |                 
440 | 
441 |                 if financial_flexibility:
442 |                     if st.session_state["financial_flexibility"]:
443 |                         st.write("### Financial Flexibility")
444 |                         st.write(st.session_state["financial_flexibility"])
445 |                         free_cash_flow_chart = create_bar_chart(st.session_state.cash_flow["chart_data"], 
446 |                                                                             "cash_flow_from_financing", 
447 |                                                                             "Free Cash Flow Trend")
448 |                         st.write(free_cash_flow_chart)
449 |                     else:
450 |                         st.error("Financial Flexibility insight has not been generated")
451 | 
452 | 
453 |                 if dividend_sustainability:
454 |                     if st.session_state["dividend_sustainability"]:
455 |                         st.write("### Dividend Sustainability")
456 |                         st.write(st.session_state["dividend_sustainability"])
457 |                     else:
458 |                         st.error("Dividend Sustainability insight has not been generated")
459 | 
460 |                 if debt_service_capability:
461 |                     if st.session_state["debt_service_capability"]:
462 |                         st.write("### Debt Service Capability")
463 |                         st.write(st.session_state["debt_service_capability"])
464 |                     else:
465 |                         st.error("Debt Service Capability insight has not been generated")
466 | 
467 | 
468 | 
469 |         if st.session_state.news:
470 |             
471 |             with tab5:
472 |                 st.markdown("## Top News")
473 |                 column_config = {
474 |                         "title": st.column_config.Column(
475 |                             "Title",
476 |                             width="large",
477 |                         ),
478 |                         "url": st.column_config.LinkColumn(
479 |                             "Link",
480 |                             width="medium",
481 |                         ),
482 |                         "authors": st.column_config.ListColumn(
483 |                             "Authors",
484 |                             width = "medium"
485 |                         ),
486 |                         "topics": st.column_config.ListColumn(
487 |                             "Topics",
488 |                             width="large"
489 |                         ),
490 |                         "sentiment_score" : st.column_config.ProgressColumn(
491 |                             "Sentiment Score",
492 |                             min_value=-0.5,
493 |                             max_value=0.5
494 |                         ),
495 |                         "sentiment_label": st.column_config.Column(
496 |                         "Sentiment Label" 
497 |                         )
498 | 
499 |                     }
500 | 
501 |                 st.metric("Mean Sentiment Score", 
502 |                         value=round_numeric(st.session_state.news["mean_sentiment_score"]), 
503 |                         delta=st.session_state.news["mean_sentiment_class"])
504 |                 
505 |                 st.dataframe(st.session_state.news["news"], column_config=column_config)
506 | 
507 | 


--------------------------------------------------------------------------------
/src/pages/2_🗂️_Annual_Report_Analyzer.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | from pathlib import Path
  3 | script_dir = Path(__file__).resolve().parent
  4 | project_root = script_dir.parent
  5 | sys.path.append(str(project_root))
  6 | 
  7 | 
  8 | from langchain.prompts import PromptTemplate
  9 | from langchain.output_parsers import PydanticOutputParser
 10 | 
 11 | from llama_index import VectorStoreIndex, ServiceContext, StorageContext
 12 | from llama_index.vector_stores import FaissVectorStore
 13 | from llama_index.tools import QueryEngineTool, ToolMetadata
 14 | from llama_index.query_engine import SubQuestionQueryEngine
 15 | from llama_index.embeddings import OpenAIEmbedding
 16 | from llama_index.schema import Document
 17 | from llama_index.node_parser import UnstructuredElementNodeParser
 18 | 
 19 | from src.utils import get_model, process_pdf2, generate_pydantic_model
 20 | from src.pydantic_models import FiscalYearHighlights, StrategyOutlookFutureDirection, RiskManagement, CorporateGovernanceSocialResponsibility, InnovationRnD
 21 | # from src.fields import (
 22 | #     fiscal_year_fields, fiscal_year_attributes, 
 23 | #     strat_outlook_fields, strat_outlook_attributes, 
 24 | #     risk_management_fields, risk_management_attributes, 
 25 | #     innovation_fields, innovation_attributes
 26 | # )
 27 | 
 28 | from src.fields2 import (
 29 |     fiscal_year, fiscal_year_attributes,
 30 |     strat_outlook, strat_outlook_attributes,
 31 |     risk_management, risk_management_attributes,
 32 |     innovation, innovation_attributes
 33 | )
 34 | 
 35 | import streamlit as st
 36 | import weaviate
 37 | import os
 38 | import openai
 39 | import faiss
 40 | import time
 41 | from pypdf import PdfReader
 42 | 
 43 | 
 44 | st.set_page_config(page_title="Annual Report Analyzer", page_icon=":card_index_dividers:", initial_sidebar_state="expanded", layout="wide")
 45 | 
 46 | st.title(":card_index_dividers: Annual Report Analyzer")
 47 | st.info("""
 48 | Begin by uploading the annual report of your chosen company in PDF format. Afterward, click on 'Process PDF'. Once the document has been processed, tap on 'Analyze Report' and the system will start its magic. After a brief wait, you'll be presented with a detailed analysis and insights derived from the report for your reading.
 49 | """)
 50 | 
 51 | 
 52 | # OPENAI_API_KEY = st.secrets["openai_api_key"]
 53 | 
 54 | 
 55 | # openai.api_key = os.environ["OPENAI_API_KEY"]
 56 | 
 57 | def process_pdf(pdf):
 58 |     file = PdfReader(pdf)
 59 | 
 60 |     document_list = []
 61 |     for page in file.pages:
 62 |         document_list.append(Document(text=str(page.extract_text())))
 63 | 
 64 |     node_paser = UnstructuredElementNodeParser()
 65 |     nodes = node_paser.get_nodes_from_documents(document_list, show_progress=True)
 66 |     
 67 |     return nodes
 68 | 
 69 | 
 70 | def get_vector_index(nodes, vector_store):
 71 |     print(nodes)
 72 |     llm = get_model("openai", OPENAI_API_KEY)
 73 |     if vector_store == "faiss":
 74 |         d = 1536
 75 |         faiss_index = faiss.IndexFlatL2(d)
 76 |         vector_store = FaissVectorStore(faiss_index=faiss_index)
 77 |         storage_context = StorageContext.from_defaults(vector_store=vector_store)
 78 |         # embed_model = OpenAIEmbedding()
 79 |         # service_context = ServiceContext.from_defaults(embed_model=embed_model)
 80 |         service_context = ServiceContext.from_defaults(llm=llm) 
 81 |         index = VectorStoreIndex(nodes, 
 82 |             service_context=service_context,
 83 |             storage_context=storage_context
 84 |         )
 85 |     elif vector_store == "simple":
 86 |         index = VectorStoreIndex.from_documents(nodes)
 87 | 
 88 | 
 89 |     return index
 90 | 
 91 | 
 92 | 
 93 | def generate_insight(engine, insight_name, section_name, output_format):
 94 | 
 95 |     with open("prompts/report.prompt", "r") as f:
 96 |         template = f.read()
 97 | 
 98 |     prompt_template = PromptTemplate(
 99 |         template=template,
100 |         input_variables=['insight_name', 'section_name', 'output_format']
101 |     )
102 | 
103 |     formatted_input = prompt_template.format(insight_name=insight_name, section_name=section_name, output_format=output_format)
104 |     print(formatted_input)
105 |     response = engine.query(formatted_input)
106 |     return response.response
107 |     
108 | 
109 | 
110 | def report_insights(engine, section_name, fields_to_include, section_num):
111 | 
112 |     fields = None
113 |     attribs = None
114 | 
115 |     if section_num == 1:
116 |         fields = fiscal_year
117 |         attribs = fiscal_year_attributes
118 |     elif section_num == 2:
119 |         fields = strat_outlook
120 |         attribs = strat_outlook_attributes
121 |     elif section_num == 3:
122 |         fields = risk_management
123 |         attribs = risk_management_attributes
124 |     elif section_num == 4:
125 |         fields = innovation
126 |         attribs = innovation_attributes
127 | 
128 |     ins = {}
129 |     for i, field in enumerate(attribs):
130 |         if fields_to_include[i]:
131 |             response = generate_insight(engine, field, section_name, str({field: fields[field]}))
132 |             ins[field] = response
133 | 
134 |     return {
135 |         "insights": ins
136 |     }
137 | 
138 | def get_query_engine(engine):
139 |     llm = get_model("openai", OPENAI_API_KEY)
140 |     service_context = ServiceContext.from_defaults(llm=llm)
141 | 
142 |     query_engine_tools = [
143 |         QueryEngineTool(
144 |             query_engine=engine,
145 |             metadata=ToolMetadata(
146 |                 name="Annual Report",
147 |                 description=f"Provides information about the company from its annual report.",
148 |             ),
149 |         ),
150 |     ]
151 | 
152 | 
153 |     s_engine = SubQuestionQueryEngine.from_defaults(
154 |         query_engine_tools=query_engine_tools,
155 |         service_context=service_context
156 |     )
157 |     return s_engine
158 | 
159 | 
160 | for insight in fiscal_year_attributes:
161 |     if insight not in st.session_state:
162 |         st.session_state[insight] = None
163 | 
164 | for insight in strat_outlook_attributes:
165 |     if insight not in st.session_state:
166 |         st.session_state[insight] = None
167 | 
168 | for insight in risk_management_attributes:
169 |     if insight not in st.session_state:
170 |         st.session_state[insight] = None
171 | 
172 | for insight in innovation_attributes:
173 |     if insight not in st.session_state:
174 |         st.session_state[insight] = None
175 | 
176 | if "end_time" not in st.session_state:
177 |     st.session_state.end_time = None
178 | 
179 | 
180 | if "process_doc" not in st.session_state:
181 |         st.session_state.process_doc = False
182 | 
183 | 
184 | st.sidebar.info("""
185 | You can get your OpenAI API key [here](https://openai.com/blog/openai-api)
186 | """)
187 | OPENAI_API_KEY = st.sidebar.text_input("OpenAI API Key", type="password")
188 | os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
189 | 
190 | if not OPENAI_API_KEY:
191 |     st.error("Please enter your OpenAI API Key")
192 | 
193 | if OPENAI_API_KEY:
194 |     pdfs = st.sidebar.file_uploader("Upload the annual report in PDF format", type="pdf")
195 |     st.sidebar.info("""
196 |     Example reports you can upload here: 
197 |     - [Apple Inc.](https://s2.q4cdn.com/470004039/files/doc_financials/2022/q4/_10-K-2022-(As-Filed).pdf)
198 |     - [Microsoft Corporation](https://microsoft.gcs-web.com/static-files/07cf3c30-cfc3-4567-b20f-f4b0f0bd5087)
199 |     - [Tesla Inc.](https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q4-2022-Update)
200 |     """)
201 | 
202 |     if st.sidebar.button("Process Document"):
203 |         with st.spinner("Processing Document..."):
204 |             nodes = process_pdf(pdfs)
205 |             st.session_state.index = get_vector_index(nodes, vector_store="faiss")
206 |             st.session_state.process_doc = True
207 |             
208 | 
209 |         st.toast("Document Processsed!")
210 | 
211 | 
212 |     if st.session_state.process_doc:
213 | 
214 |         col1, col2 = st.columns([0.25, 0.75])
215 | 
216 |         with col1:
217 |             st.write("""
218 |                 ### Select Insights
219 |             """)
220 |             
221 |             with st.expander("**Fiscal Year Highlights**", expanded=True):
222 |                 performance_highlights = st.toggle("Performance Highlights")
223 |                 major_events = st.toggle("Major Events")
224 |                 challenges_encountered = st.toggle("Challenges Encountered")
225 | 
226 |                 fiscal_year_highlights_list = [performance_highlights, major_events, challenges_encountered]
227 | 
228 |             with st.expander("**Strategy Outlook and Future Direction**", expanded=True):
229 |                 strategic_initiatives = st.toggle("Strategic Initiatives")
230 |                 market_outlook = st.toggle("Market Outlook")
231 |                 product_roadmap = st.toggle("Product Roadmap")
232 | 
233 |                 strategy_outlook_future_direction_list = [strategic_initiatives, market_outlook, product_roadmap]
234 | 
235 |             with st.expander("**Risk Management**", expanded=True):
236 |                 risk_factors = st.toggle("Risk Factors")
237 |                 risk_mitigation = st.toggle("Risk Mitigation")
238 | 
239 |                 risk_management_list = [risk_factors, risk_mitigation]
240 | 
241 |             with st.expander("**Innovation and R&D**", expanded=True):
242 |                 r_and_d_activities = st.toggle("R&D Activities")
243 |                 innovation_focus = st.toggle("Innovation Focus")
244 | 
245 |                 innovation_and_rd_list = [r_and_d_activities, innovation_focus]
246 | 
247 | 
248 |         with col2:
249 |             if st.button("Analyze Report"):
250 |                 engine = get_query_engine(st.session_state.index.as_query_engine(similarity_top_k=3))
251 |                 start_time = time.time()
252 | 
253 |                 with st.status("**Analyzing Report...**"):
254 | 
255 | 
256 |                     if any(fiscal_year_highlights_list):
257 |                         st.write("Fiscal Year Highlights...")
258 | 
259 |                         for i, insight in enumerate(fiscal_year_attributes):
260 |                             if st.session_state[insight]:
261 |                                 fiscal_year_highlights_list[i] = False
262 | 
263 |                         response = report_insights(engine, "Fiscal Year Highlights", fiscal_year_highlights_list, 1)
264 | 
265 |                         for key, value in response["insights"].items():
266 |                             st.session_state[key] = value
267 | 
268 |                     if any(strategy_outlook_future_direction_list):
269 |                         st.write("Strategy Outlook and Future Direction...")
270 | 
271 |                         for i, insight in enumerate(strat_outlook_attributes):
272 |                             if st.session_state[insight]:
273 |                                 strategy_outlook_future_direction_list[i] = False
274 |                         response = report_insights(engine, "Strategy Outlook and Future Direction", strategy_outlook_future_direction_list, 2)
275 | 
276 |                         for key, value in response["insights"].items():
277 |                             st.session_state[key] = value
278 | 
279 | 
280 |                     if any(risk_management_list):
281 |                         st.write("Risk Management...")
282 | 
283 |                         for i, insight in enumerate(risk_management_attributes):
284 |                             if st.session_state[insight]:
285 |                                 risk_management_list[i] = False
286 |                         
287 |                         response = report_insights(engine, "Risk Management", risk_management_list, 3)
288 | 
289 |                         for key, value in response["insights"].items():
290 |                             st.session_state[key] = value
291 | 
292 |                     if any(innovation_and_rd_list):
293 |                         st.write("Innovation and R&D...")
294 | 
295 |                         for i, insight in enumerate(innovation_attributes):
296 |                             if st.session_state[insight]:
297 |                                 innovation_and_rd_list[i] = False
298 | 
299 |                         response = report_insights(engine, "Innovation and R&D", innovation_and_rd_list, 4)
300 |                         st.session_state.innovation_and_rd = response
301 | 
302 |                         for key, value in response["insights"].items():
303 |                             st.session_state[key] = value
304 | 
305 |                     st.session_state["end_time"] = "{:.2f}".format((time.time() - start_time))
306 | 
307 | 
308 | 
309 |                     st.toast("Report Analysis Complete!")
310 |             
311 |             if st.session_state.end_time:
312 |                 st.write("Report Analysis Time: ", st.session_state.end_time, "s")
313 | 
314 | 
315 |         # if st.session_state.all_report_outputs:
316 |         #     st.toast("Report Analysis Complete!")
317 |             
318 |             tab1, tab2, tab3, tab4 = st.tabs(["Fiscal Year Highlights", "Strategy Outlook and Future Direction", "Risk Management", "Innovation and R&D"])
319 | 
320 |             
321 |                 
322 | 
323 |             with tab1:
324 |                 st.write("## Fiscal Year Highlights")
325 |                 try: 
326 |                     if performance_highlights:
327 |                         if st.session_state['performance_highlights']:
328 |                             st.write("### Performance Highlights")
329 |                             st.write(st.session_state['performance_highlights'])
330 |                         else:
331 |                             st.error("fiscal Year Highlights insight has not been generated")
332 |                 except:
333 |                     st.error("This insight has not been generated")
334 | 
335 |                 try:
336 |                     if major_events:
337 |                         if st.session_state["major_events"]:
338 |                             st.write("### Major Events")
339 |                             st.write(st.session_state["major_events"])
340 |                         else:
341 |                             st.error("Major Events insight has not been generated")
342 |                 except:
343 |                     st.error("This insight has not been generated")
344 |                 try:
345 |                     if challenges_encountered:
346 |                         if st.session_state["challenges_encountered"]:
347 |                             st.write("### Challenges Encountered")
348 |                             st.write(st.session_state["challenges_encountered"])
349 |                         else:
350 |                             st.error("Challenges Encountered insight has not been generated")
351 |                 except:
352 |                     st.error("This insight has not been generated")
353 |                 # st.write("### Milestone Achievements")
354 |                 # st.write(str(st.session_state.fiscal_year_highlights.milestone_achievements))
355 | 
356 | 
357 |             
358 |             with tab2:
359 |                 st.write("## Strategy Outlook and Future Direction")
360 |                 try:
361 |                     if strategic_initiatives:
362 |                         if st.session_state["strategic_initiatives"]:
363 |                             st.write("### Strategic Initiatives")
364 |                             st.write(st.session_state["strategic_initiatives"])
365 |                         else:
366 |                             st.error("Strategic Initiatives insight has not been generated")
367 |                 except:
368 |                     st.error("This insight has not been generated")
369 | 
370 |                 try:
371 |                     if market_outlook:
372 |                         if st.session_state["market_outlook"]:
373 |                             st.write("### Market Outlook")
374 |                             st.write(st.session_state["market_outlook"])
375 |                         else:
376 |                             st.error("Market Outlook insight has not been generated")
377 | 
378 |                 except:
379 |                     st.error("This insight has not been generated")
380 | 
381 |                 try:
382 |                     if product_roadmap:
383 |                         if st.session_state["product_roadmap"]:
384 |                             st.write("### Product Roadmap")
385 |                             st.write(st.session_state["product_roadmap"])
386 |                         else:
387 |                             st.error("Product Roadmap insight has not been generated")
388 |                 except:
389 |                     st.error("This insight has not been generated")
390 | 
391 |             with tab3:
392 |                 st.write("## Risk Management")
393 | 
394 |                 try:
395 |                     if risk_factors:
396 |                         if st.session_state["risk_factors"]:
397 |                             st.write("### Risk Factors")
398 |                             st.write(st.session_state["risk_factors"])
399 |                         else:
400 |                             st.error("Risk Factors insight has not been generated")
401 |                 except:
402 |                     st.error("This insight has not been generated")
403 | 
404 |                 try:
405 |                     if risk_mitigation:
406 |                         if st.session_state["risk_mitigation"]:
407 |                             st.write("### Risk Mitigation")
408 |                             st.write(st.session_state["risk_mitigation"])
409 |                         else:
410 |                             st.error("Risk Mitigation insight has not been generated")
411 |                 except:
412 |                     st.error("This insight has not been generated")
413 | 
414 | 
415 |             with tab4:
416 |                 st.write("## Innovation and R&D")
417 | 
418 |                 try:
419 |                     if r_and_d_activities:
420 |                         if st.session_state["r_and_d_activities"]:
421 |                             st.write("### R&D Activities")
422 |                             st.write(st.session_state["r_and_d_activities"])
423 |                         else:
424 |                             st.error("R&D Activities insight has not been generated")
425 |                 except:
426 |                     st.error("This insight has not been generated")
427 | 
428 |                 try:
429 |                     if innovation_focus:
430 |                         if st.session_state["innovation_focus"]:
431 |                             st.write("### Innovation Focus")
432 |                             st.write(st.session_state["innovation_focus"])
433 |                         else:
434 |                             st.error("Innovation Focus insight has not been generated")
435 |                 except:
436 |                     st.error("This insight has not been generated")
437 | 


--------------------------------------------------------------------------------
/src/pdf_gen.py:
--------------------------------------------------------------------------------
  1 | from io import BytesIO
  2 | import json
  3 | import sys
  4 | from pathlib import Path
  5 | script_dir = Path(__file__).resolve().parent
  6 | project_root = script_dir.parent
  7 | sys.path.append(str(project_root))
  8 | 
  9 | from reportlab.lib.pagesizes import letter
 10 | from reportlab.lib.units import inch
 11 | from reportlab.platypus import PageBreak, Paragraph, SimpleDocTemplate, Image
 12 | from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
 13 | from reportlab.lib.enums import TA_CENTER
 14 | from reportlab.lib.pagesizes import landscape
 15 | from reportlab.platypus import Paragraph, Table, TableStyle, Spacer
 16 | from reportlab.lib import colors
 17 | import plotly.io as pio
 18 | from io import BytesIO
 19 | import tempfile
 20 | 
 21 | 
 22 | from src.company_overview import company_overview
 23 | from src.income_statement import income_statement
 24 | from src.balance_sheet import balance_sheet
 25 | from src.cash_flow import cash_flow
 26 | from src.news_sentiment import top_news
 27 | from src.utils import round_numeric, create_donut_chart, create_bar_chart
 28 | 
 29 | # Get the default styles
 30 | styles = getSampleStyleSheet()
 31 | 
 32 | # Define custom styles
 33 | centered_style = ParagraphStyle(
 34 |     'CenteredStyle',
 35 |     parent=styles['Heading1'],
 36 |     alignment=TA_CENTER,
 37 |     fontSize=48,
 38 |     spaceAfter=50,
 39 | )
 40 | 
 41 | sub_centered_style = ParagraphStyle(
 42 |     'SubCenteredStyle',
 43 |     parent=styles['Heading2'],
 44 |     alignment=TA_CENTER,
 45 |     fontSize=24,
 46 |     spaceAfter=15,
 47 | )
 48 | 
 49 | def cover_page(company_name):
 50 |     flowables = []
 51 |     
 52 |     # Title
 53 |     title = "FinSight"
 54 |     para_title = Paragraph(title, centered_style)
 55 |     flowables.append(para_title)
 56 |     
 57 |     # Subtitle
 58 |     subtitle = "Financial Insights for<br/>"
 59 |     para_subtitle = Paragraph(subtitle, sub_centered_style)
 60 |     flowables.append(para_subtitle)
 61 | 
 62 |     subtitle2 = "{} {}".format(company_name, "2022")
 63 |     para_subtitle2 = Paragraph(subtitle2, sub_centered_style)
 64 |     flowables.append(para_subtitle2)
 65 |     
 66 |     # Add a page break after the cover page
 67 |     flowables.append(PageBreak())
 68 |     
 69 |     return flowables
 70 | 
 71 | from reportlab.lib.styles import ParagraphStyle
 72 | from reportlab.lib.enums import TA_LEFT
 73 | 
 74 | # Define custom styles
 75 | header_style = ParagraphStyle(
 76 |     'HeaderStyle',
 77 |     parent=styles['Heading2'],
 78 |     fontSize=24,
 79 |     spaceAfter=20,
 80 |     leading=30
 81 | )
 82 | 
 83 | sub_section_header_style = ParagraphStyle(
 84 |     'SubSectionHeaderStyle',
 85 |     parent=styles['Heading3'],
 86 |     fontSize=16,
 87 |     spaceAfter=8,
 88 |     leading=20
 89 | )
 90 | 
 91 | data_style = ParagraphStyle(
 92 |     'DataStyle',
 93 |     parent=styles['Normal'],
 94 |     fontSize=14,
 95 |     spaceAfter=15,
 96 |     leading=20
 97 | )
 98 | 
 99 | sub_header_style = ParagraphStyle(
100 |     'DataStyle',
101 |     parent=styles['Normal'],
102 |     fontSize=20,
103 |     spaceAfter=15,
104 |     leading=20
105 | )
106 | 
107 | def pdf_plotly_chart(fig):
108 |     img_bytes = fig.to_image(format="png")
109 |     temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".png")
110 |     temp_file.write(img_bytes)
111 |     temp_file.close()
112 |     img = Image(temp_file.name, width=5*inch, height=3*inch)
113 |     return img
114 | 
115 | def pdf_company_overview(data):
116 |     flowables = []
117 |     
118 |     # Section Title
119 |     title = "Company Overview"
120 |     para_title = Paragraph(title, header_style)
121 |     flowables.append(para_title)
122 |     
123 |     # Company Name
124 |     # data = json.loads(data)
125 |     company_name = data.get("Name")
126 |     print(company_name)
127 |     para_name = Paragraph("<b> {} </b>".format(company_name), sub_header_style)
128 |     flowables.append(para_name)
129 |     
130 |     # Other details
131 |     details = [
132 |         ("Symbol:", data.get("Symbol")),
133 |         ("Exchange:", data.get("Exchange")),
134 |         ("Currency:", data.get("Currency")),
135 |         ("Sector:", data.get("Sector")),
136 |         ("Industry:", data.get("Industry")),
137 |         ("Description:", data.get("Description")),
138 |         ("Country:", data.get("Country")),
139 |         ("Address:", data.get("Address")),
140 |         ("Fiscal Year End:", data.get("FiscalYearEnd")),
141 |         ("Latest Quarter:", data.get("LatestQuarter")),
142 |         ("Market Capitalization:", "$ "+str("{:,}".format(round_numeric(data.get("MarketCapitalization")))))
143 |     ]
144 |     
145 |     for label, value in details:
146 |         para_label = Paragraph("<b>{}</b> {}".format(label, value), data_style)
147 |         flowables.append(para_label)
148 |     
149 |     return flowables
150 | 
151 | 
152 | def pdf_income_statement(metrics, insights, chart_data):
153 |     flowables = []
154 |     
155 |     # Section Title
156 |     title = "INCOME STATEMENT"
157 |     para_title = Paragraph(title, header_style)
158 |     flowables.append(para_title)
159 | 
160 |     
161 |     # Metrics
162 |     flowables.append(Paragraph("<b>METRICS</b>", sub_header_style))
163 |     for label, value in metrics.items():
164 |         metric_text = "<b>{}</b>: {}".format(label.replace("_", " ").title(), round_numeric(value))
165 |         flowables.append(Paragraph(metric_text, data_style))
166 |     
167 |     # Insights
168 |     flowables.append(Paragraph("<b>INSIGHTS</b>", sub_header_style))
169 |     
170 | 
171 |     try:
172 |         flowables.append(Paragraph("Revenue Health", sub_section_header_style))
173 |         flowables.append(Paragraph(insights.revenue_health, data_style))
174 |         flowables.append(pdf_plotly_chart(create_bar_chart(chart_data, "total_revenue")))
175 |     except:
176 |         pass
177 |     try:
178 |         flowables.append(Paragraph("Operational Efficiency", sub_section_header_style))
179 |         flowables.append(Paragraph(insights.operational_efficiency, data_style))
180 |     except:
181 |         pass
182 |     try:
183 |         flowables.append(Paragraph("R&D Focus", sub_section_header_style))
184 |         flowables.append(Paragraph(insights.r_and_d_focus, data_style))
185 |     except:
186 |         pass
187 |     try:
188 |         flowables.append(Paragraph("Debt Management", sub_section_header_style))
189 |         flowables.append(Paragraph(insights.debt_management, data_style))
190 |         flowables.append(pdf_plotly_chart(create_bar_chart(chart_data, "interest_expense")))
191 |     except:
192 |         pass
193 | 
194 |     try:
195 |         flowables.append(Paragraph("Profit Retention", sub_section_header_style))
196 |         flowables.append(Paragraph(insights.profit_retention, data_style))
197 |         flowables.append(pdf_plotly_chart(create_bar_chart(chart_data, "net_income")))
198 |     except:
199 |         pass
200 | 
201 |     return flowables
202 | 
203 | def pdf_balance_sheet(metrics, insights, chart_data):
204 |     flowables = []
205 |     
206 |     # Section Title
207 |     title = "BALANCE SHEET"
208 |     para_title = Paragraph(title, header_style)
209 |     flowables.append(para_title)
210 | 
211 |     # Metrics
212 |     flowables.append(Paragraph("<b>METRICS</b>", sub_header_style))
213 |     for label, value in metrics.items():
214 |         metric_text = "<b>{}</b>: {}".format(label.replace("_", " ").title(), round_numeric(value))
215 |         flowables.append(Paragraph(metric_text, data_style))
216 |     
217 |     # Insights
218 |     flowables.append(Paragraph("<b>INSIGHTS</b>", sub_header_style))
219 |     # insight_sections = [
220 |     #     ("Liquidity Position", insights.liquidity_position),
221 |     #     ("Operational Efficiency", insights.operational_efficiency),
222 |     #     ("Capital Structure", insights.capital_structure),
223 |     #     ("Inventory Management", insights.inventory_management),
224 |     #     ("Overall Solvency", insights.overall_solvency)
225 |     # ]
226 |     
227 |     # for section_title, insight_text in insight_sections:
228 |     #     flowables.append(Paragraph(section_title, sub_section_header_style))
229 |     #     flowables.append(Paragraph(insight_text, data_style))
230 | 
231 |     try:
232 |         flowables.append(Paragraph("Liquidity Position", sub_section_header_style))
233 |         flowables.append(Paragraph(insights.liquidity_position, data_style))
234 |         flowables.append(pdf_plotly_chart(create_donut_chart(chart_data,"asset_composition")))
235 |     except:
236 |         pass
237 |     try:
238 |         flowables.append(Paragraph("Operational Efficiency", sub_section_header_style))
239 |         flowables.append(Paragraph(insights.operational_efficiency, data_style))
240 |     except:
241 |         pass
242 |     try:
243 |         flowables.append(Paragraph("Capital Structure", sub_section_header_style))
244 |         flowables.append(Paragraph(insights.capital_structure, data_style))
245 |         flowables.append(pdf_plotly_chart(create_donut_chart(chart_data, "liabilities_composition")))
246 | 
247 |     except:
248 |         pass
249 | 
250 |     try:
251 |         flowables.append(Paragraph("Inventory Management", sub_section_header_style))
252 |         flowables.append(Paragraph(insights.inventory_management, data_style))
253 |     except:
254 |         pass
255 | 
256 |     try:
257 |         flowables.append(Paragraph("Overall Solvency", sub_section_header_style))
258 |         flowables.append(Paragraph(insights.overall_solvency, data_style))
259 |         flowables.append(pdf_plotly_chart(create_donut_chart(chart_data, "debt_structure")))
260 |         
261 |     except:
262 |         pass
263 | 
264 |     return flowables
265 | 
266 | def pdf_cash_flow(metrics, insights, chart_data):
267 |     flowables = []
268 |     
269 |     # Section Title
270 |     title = "CASH FLOW"
271 |     para_title = Paragraph(title, header_style)
272 |     flowables.append(para_title)
273 | 
274 |     flowables.append(Paragraph("<b>METRICS</b>", sub_header_style))
275 |     for label, value in metrics.items():
276 |         metric_text = "<b>{}</b>: {}".format(label.replace("_", " ").title(), round_numeric(value))
277 |         flowables.append(Paragraph(metric_text, data_style))
278 | 
279 |     # Insights
280 |     flowables.append(Paragraph("<b>INSIGHTS</b>", sub_header_style))
281 |     # insight_sections = [
282 |     #     ("Operational Cash Efficiency", insights.operational_cash_efficiency),
283 |     #     ("Investment Capability", insights.investment_capability),
284 |     #     ("Financial Flexibility", insights.financial_flexibility),
285 |     #     ("Dividend Sustainability", insights.dividend_sustainability),
286 |     #     ("Debt Service Capability", insights.debt_service_capability)
287 |     # ]
288 |     
289 |     # for section_title, insight_text in insight_sections:
290 |     #     flowables.append(Paragraph(section_title, sub_section_header_style))
291 |     #     flowables.append(Paragraph(insight_text, data_style))
292 | 
293 |     try:
294 |         flowables.append(Paragraph("Operational Cash Efficiency", sub_section_header_style))
295 |         flowables.append(Paragraph(insights.operational_cash_efficiency, data_style))
296 |         flowables.append(pdf_plotly_chart(create_bar_chart(chart_data, "operating_cash_flow")))
297 |     except:
298 |         pass
299 | 
300 |     try:
301 |         flowables.append(Paragraph("Investment Capability", sub_section_header_style))
302 |         flowables.append(Paragraph(insights.investment_capability, data_style))
303 |         flowables.append(pdf_plotly_chart(create_bar_chart(chart_data, "cash_flow_from_investment")))
304 |     except:
305 |         pass
306 | 
307 |     try:
308 |         flowables.append(Paragraph("Financial Flexibility", sub_section_header_style))
309 |         flowables.append(Paragraph(insights.financial_flexibility, data_style))
310 |         flowables.append(pdf_plotly_chart(create_bar_chart(chart_data, "cash_flow_from_financing")))
311 |     except:
312 |         pass
313 | 
314 |     try:
315 |         flowables.append(Paragraph("Dividend Sustainability", sub_section_header_style))
316 |         flowables.append(Paragraph(insights.dividend_sustainability, data_style))
317 |     except:
318 |         pass
319 | 
320 |     try:
321 |         flowables.append(Paragraph("Debt Service Capability", sub_section_header_style))
322 |         flowables.append(Paragraph(insights.debt_service_capability, data_style))
323 |     except:
324 |         pass
325 | 
326 | 
327 |     
328 |     return flowables
329 | 
330 | def pdf_news_sentiment(data):
331 |     flowables = []
332 |     
333 |     # Section Title
334 |     title = "NEWS SENTIMENT"
335 |     para_title = Paragraph(title, header_style)
336 |     flowables.append(para_title)
337 |     flowables.append(Spacer(1, 12))
338 | 
339 |     # News DataFrame to Table
340 |     df = data['news']
341 |     table_data = [df.columns.to_list()] + df.values.tolist()
342 |     table = Table(table_data, repeatRows=1)  # repeatRows ensures the header is repeated if the table spans multiple pages
343 | 
344 |     # Table Style
345 |     style = TableStyle([
346 |         ('BACKGROUND', (0, 0), (-1, 0), colors.grey),
347 |         ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
348 |         ('ALIGN', (0, 0), (-1, -1), 'LEFT'),
349 |         ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
350 |         ('FONTSIZE', (0, 0), (-1, 0), 12),
351 |         ('BOTTOMPADDING', (0, 0), (-1, 0), 12),
352 |         ('BACKGROUND', (0, 1), (-1, -1), colors.beige),
353 |         ('GRID', (0, 0), (-1, -1), 1, colors.black)
354 |     ])
355 |     table.setStyle(style)
356 |     flowables.append(table)
357 |     flowables.append(Spacer(1, 12))
358 | 
359 |     # Mean Sentiment Score
360 |     mean_sentiment_score_text = f"Mean Sentiment Score: {data['mean_sentiment_score']:.2f}"
361 |     flowables.append(Paragraph(mean_sentiment_score_text, data_style))
362 |     flowables.append(Spacer(1, 12))
363 | 
364 |     # Mean Sentiment Class
365 |     mean_sentiment_class_text = f"Mean Sentiment Class: {data['mean_sentiment_class']}"
366 |     flowables.append(Paragraph(mean_sentiment_class_text, data_style))
367 |     
368 |     return flowables
369 | 
370 | 
371 | 
372 | 
373 | def gen_pdf(company_name, overview_data, income_statement_data, balance_sheet_data, cash_flow_data, news_data):
374 |     doc = SimpleDocTemplate("pdf/final_report.pdf", pagesize=letter)
375 |     all_flowables = []
376 | 
377 |     all_flowables.extend(cover_page(company_name))
378 |     all_flowables.extend(pdf_company_overview(overview_data))
379 |     all_flowables.extend(pdf_income_statement(income_statement_data['metrics'], income_statement_data['insights'], income_statement_data['chart_data']))
380 |     # all_flowables.extend(pdf_balance_sheet(balance_sheet_data['metrics'], balance_sheet_data['insights'], balance_sheet_data['chart_data']))
381 |     # all_flowables.extend(pdf_cash_flow(cash_flow_data['metrics'], cash_flow_data['insights'], cash_flow_data['chart_data']))
382 |     # all_flowables.extend(pdf_news_sentiment(news_data))
383 |     doc.build(all_flowables)
384 | 
385 | if __name__ == "__main__":
386 |     overview_data = company_overview("AAPL")
387 |     inc = income_statement("AAPL", [True, True, False, False, False])
388 |     # bal = balance_sheet("AAPL", [True, False, True, False, True])
389 |     # cash = cash_flow("AAPL", [True, True, True, False, False])
390 |     # news = top_news("AAPL", 10)
391 |     gen_pdf("Apple Inc.", overview_data, inc, None, None, None)
392 | 


--------------------------------------------------------------------------------
/src/pydantic_models.py:
--------------------------------------------------------------------------------
 1 | from pydantic import BaseModel, Field
 2 | 
 3 | min_length = 40
 4 | 
 5 | class IncomeStatementInsights(BaseModel):
 6 |     revenue_health: str = Field(..., description=f"Must be more than {min_length} words. Insight into the company's total revenue, providing a perspective on the health of the primary business activity.")
 7 |     operational_efficiency: str = Field(..., description=f"Must be more than {min_length} words. Analysis of the company's operating expenses in relation to its revenue, offering a view into the firm's operational efficiency.")
 8 |     r_and_d_focus: str = Field(..., description=f"Must be more than {min_length} words. Insight into the company's commitment to research and development, signifying its emphasis on innovation and future growth.")
 9 |     debt_management: str = Field(..., description=f"Must be more than {min_length} words. Analysis of the company's interest expenses, highlighting the scale of its debt obligations and its approach to leveraging.")
10 |     profit_retention: str = Field(..., description=f"Must be more than {min_length} words. Insight into the company's net income, showcasing the amount retained post all expenses, which can be reinvested or distributed.")
11 | 
12 | class BalanceSheetInsights(BaseModel):
13 |     liquidity_position: str = Field(..., description=f"Must be more than {min_length} words. Insight into the company's ability to meet its short-term obligations using its short-term assets.")
14 |     operational_efficiency: str = Field(..., description=f"Must be more than {min_length} words. Analysis of how efficiently the company is using its assets to generate sales.")
15 |     capital_structure: str = Field(..., description=f"Must be more than {min_length} words. Insight into the company's financial leverage and its reliance on external liabilities versus internal equity.")
16 |     inventory_management: str = Field(..., description=f"Must be more than {min_length} words. Analysis of the company's efficiency in managing, selling, and replacing its inventory.")
17 |     overall_solvency: str = Field(..., description=f"Must be more than {min_length} words. Insight into the company's overall ability to meet its long-term debts and obligations.")
18 | 
19 | class CashFlowInsights(BaseModel):
20 |     operational_cash_efficiency: str = Field(..., description=f"Must be more than {min_length} words. Insight into how efficiently the company is generating cash from its core operations.")
21 |     investment_capability: str = Field(..., description=f"Must be more than {min_length} words. Indicates the company's ability to invest in its business using its operational cash flows.")
22 |     financial_flexibility: str = Field(..., description=f"Must be more than {min_length} words. Demonstrates the cash left after all operational expenses and investments, which can be used for dividends, share buybacks, or further investments.")
23 |     dividend_sustainability: str = Field(..., description=f"Must be more than {min_length} words. Indicates the company's ability to cover its dividend payouts with its net earnings.")
24 |     debt_service_capability: str = Field(..., description=f"Must be more than {min_length} words. Analysis of the company's ability to service its debt using the operational cash flows.")
25 | 
26 | 
27 | class FiscalYearHighlights(BaseModel):
28 |     performance_highlights: str = Field(..., description="Key performance and financial stats over the fiscal year.")
29 |     major_events: str = Field(..., description="Highlight of significant events, acquisitions, or strategic shifts that occurred during the year.")
30 |     challenges_encountered: str = Field(..., description="Challenges the company faced during the year and, if and how they managed or overcame them.")
31 |     # milestone_achievements: str = Field(..., description="Milestones achieved in terms of projects, expansions, or any other notable accomplishments.")
32 | 
33 | 
34 | class StrategyOutlookFutureDirection(BaseModel):
35 |     strategic_initiatives: str = Field(..., description="The company's primary objectives and growth strategies for the upcoming years.")
36 |     market_outlook: str = Field(..., description="Insights into the broader market, competitive landscape, and industry trends the company anticipates.")
37 |     product_roadmap: str = Field(..., description="Upcoming launches, expansions, or innovations the company plans to roll out.")
38 | 
39 | class RiskManagement(BaseModel):
40 |     risk_factors: str = Field(..., description="Primary risks the company acknowledges.")
41 |     risk_mitigation: str = Field(..., description="Strategies for managing these risks.")
42 | 
43 | class CorporateGovernanceSocialResponsibility(BaseModel):
44 |     board_governance: str = Field(..., description="Details about the company's board composition, governance policies, and any changes in leadership or structure.")
45 |     csr_sustainability: str = Field(..., description="The company's initiatives related to environmental stewardship, community involvement, and ethical practices.")
46 | 
47 | class InnovationRnD(BaseModel):
48 |     r_and_d_activities: str = Field(..., description="Overview of the company's focus on research and development, major achievements, or breakthroughs.")
49 |     innovation_focus: str = Field(..., description="Mention of new technologies, patents, or areas of research the company is diving into.")
50 | 


--------------------------------------------------------------------------------
/src/ticker_symbol.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from pathlib import Path
 3 | script_dir = Path(__file__).resolve().parent
 4 | project_root = script_dir.parent
 5 | sys.path.append(str(project_root))
 6 | 
 7 | import csv
 8 | import requests
 9 | import streamlit as st
10 | 
11 | API_TOKEN = st.secrets["eod_api_key"]
12 | 
13 | def get_ticker_symbol(company_name):
14 |     with open("data/ticker_symbols/ticker_symbols.csv", 'r') as csvfile:
15 |         reader = csv.DictReader(csvfile)
16 |         for row in reader:
17 |             if row['Name'] == company_name:
18 |                 return row['Code']
19 |     return None
20 | 
21 | # Example usage:
22 | # if __name__ == "__main__":  # Replace with the path to your CSV file
23 | #     company_name = "Apple Inc"
24 | #     ticker = get_ticker_symbol(company_name)
25 | #     if ticker:
26 | #         print(f"The ticker symbol for {company_name} is {ticker}.")
27 | #     else:
28 | #         print(f"No ticker symbol found for {company_name}.")
29 | 
30 | def get_all_company_names():
31 |     company_names = []
32 |     with open("data/ticker_symbols/ticker_symbols.csv", 'r') as csvfile:
33 |         reader = csv.DictReader(csvfile)
34 |         for row in reader:
35 |             if row['Type'] == "Common Stock":
36 |                 company_names.append(row['Name'])
37 |     return tuple(company_names)
38 | 
39 | # Example usage:
40 | if __name__ == "__main__":
41 |     companies = get_all_company_names()
42 |     print(companies)
43 | 
44 | 
45 | 
46 | def get_symbols_for_exchange(exchange_code, api_token):
47 |     base_url = "https://eodhd.com/api/exchange-symbol-list/"
48 |     url = f"{base_url}{exchange_code}/"
49 |     params = {
50 |         "api_token": api_token
51 |     }
52 | 
53 |     response = requests.get(url, params=params)
54 | 
55 |     if response.status_code == 200:
56 |         try:
57 |             return response.json()
58 |         except ValueError:
59 |             print("Received unexpected response:")
60 |             print(response.text)
61 |             print(type(response))
62 |             print(type(response.text))
63 |             print(len(response.text))
64 |             with open("data/ticker_symbols/ticker_symbols.txt", "w") as f:
65 |                 f.write(response.text)
66 |             with open("data/ticker_symbols/ticker_symbols.csv", "w") as f:
67 |                 f.write(response.text)
68 |             return None
69 |     else:
70 |         response.raise_for_status()
71 | 
72 | if __name__ == "__main__":
73 |     EXCHANGE_CODE = ['NYSE', 'NASDAQ']  # Replace with your desired exchange code
74 | 
75 |     try:
76 |         data = get_symbols_for_exchange(EXCHANGE_CODE, API_TOKEN)
77 |         print(data)
78 |     except requests.RequestException as e:
79 |         print(f"Error occurred: {e}")
80 | 


--------------------------------------------------------------------------------
/src/utils.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | from pathlib import Path
  3 | script_dir = Path(__file__).resolve().parent
  4 | project_root = script_dir.parent
  5 | sys.path.append(str(project_root))
  6 | 
  7 | 
  8 | from langchain.vectorstores import FAISS
  9 | from langchain.text_splitter import CharacterTextSplitter
 10 | from langchain.embeddings import OpenAIEmbeddings
 11 | from langchain.prompts import PromptTemplate
 12 | from langchain.output_parsers import PydanticOutputParser
 13 | from langchain.chat_models import ChatOpenAI
 14 | # from llama_index import VectorStoreIndex, SimpleDirectoryReader
 15 | # from llama_index.vector_stores import WeaviateVectorStore
 16 | from llama_index.schema import Document
 17 | from llama_index.llms import OpenAI
 18 | # from llama_index.node_parser import SimpleNodeParser
 19 | 
 20 | 
 21 | # from dotenv import dotenv_values
 22 | import weaviate
 23 | from pypdf import PdfReader
 24 | import streamlit as st
 25 | import requests
 26 | import time
 27 | import json
 28 | import plotly.graph_objects as go
 29 | from pydantic import create_model
 30 | from langchain.llms import OpenAI
 31 | import os
 32 | # config = dotenv_values(".env")
 33 | 
 34 | # OPENAI_API_KEY = config["OPENAI_API_KEY"]
 35 | # AV_API_KEY = config["ALPHA_VANTAGE_API_KEY"]
 36 | 
 37 | # OPENAI_API_KEY = st.secrets["openai_api_key"]
 38 | # AV_API_KEY = st.secrets["av_api_key"]
 39 | 
 40 | AV_API_KEY = os.environ.get("AV_API_KEY")
 41 | OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
 42 | 
 43 | USER_ID = 'openai'
 44 | APP_ID = 'chat-completion'
 45 | MODEL_ID = 'GPT-4'
 46 | MODEL_VERSION_ID = '4aa760933afa4a33a0e5b4652cfa92fa'
 47 | 
 48 | def get_model(model_name, api_key):
 49 |     if model_name == "openai":
 50 |         model = ChatOpenAI(openai_api_key=api_key, model_name="gpt-3.5-turbo")
 51 |     return model
 52 | 
 53 | def process_pdf(pdfs):
 54 |     docs = []
 55 |     
 56 |     for pdf in pdfs:
 57 |         file = PdfReader(pdf)
 58 |         text = ""
 59 |         for page in file.pages:
 60 |             text += str(page.extract_text())
 61 |         # docs.append(Document(TextNode(text)))
 62 | 
 63 |     text_splitter = CharacterTextSplitter(separator="\n",
 64 |     chunk_size=2000,
 65 |     chunk_overlap=300,
 66 |     length_function=len)
 67 |     docs = text_splitter.split_documents(docs)
 68 |     # docs = text_splitter.split_text(text)
 69 | 
 70 |     return docs
 71 | 
 72 | def process_pdf2(pdf):
 73 |     file = PdfReader(pdf)
 74 |     text = ""
 75 |     for page in file.pages:
 76 |         text += str(page.extract_text())
 77 |         
 78 |     doc = Document(text=text)
 79 |     return [doc]
 80 | 
 81 | 
 82 | def faiss_db(splitted_text):
 83 |     embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
 84 |     db = FAISS.from_texts(splitted_text, embeddings)
 85 |     db.save_local("faiss_db")
 86 |     return db
 87 | 
 88 | def safe_float(value):
 89 |         if value == "None" or value == None:
 90 |             return "N/A"
 91 |         return float(value)
 92 | 
 93 | def round_numeric(value, decimal_places=2):
 94 |     if isinstance(value, (int, float)):
 95 |         return round(value, decimal_places)
 96 |     elif isinstance(value, str) and value.replace(".", "", 1).isdigit():
 97 |         # Check if the string represents a numeric value
 98 |         return round(float(value), decimal_places)
 99 |     else:
100 |         return value
101 |     
102 | def format_currency(value):
103 |     if value == "N/A":
104 |         return value
105 |     if value >= 1_000_000_000:  # billion
106 |         return f"${value / 1_000_000_000:.2f} billion"
107 |     elif value >= 1_000_000:  # million
108 |         return f"${value / 1_000_000:.2f} million"
109 |     else:
110 |         return f"${value:.2f}"
111 | 
112 | def get_total_revenue(symbol):
113 |     time.sleep(3)
114 |     url = "https://www.alphavantage.co/query"
115 |     params = {
116 |         "function": "INCOME_STATEMENT",
117 |         "symbol": symbol,
118 |         "apikey": AV_API_KEY
119 |     }
120 |     response = requests.get(url, params=params)
121 |     data = response.json()
122 |     total_revenue = safe_float(data["annualReports"][0]["totalRevenue"])
123 | 
124 |     return total_revenue
125 | 
126 | def get_total_debt(symbol):
127 |     time.sleep(3)
128 |     url = "https://www.alphavantage.co/query"
129 |     params = {
130 |         "function": "BALANCE_SHEET",
131 |         "symbol": symbol,
132 |         "apikey": AV_API_KEY
133 |     }
134 |     response = requests.get(url, params=params)
135 |     data = response.json()
136 |     short_term = safe_float(data["annualReports"][0]["shortTermDebt"])
137 |     time.sleep(3)
138 |     long_term = safe_float(data["annualReports"][0]["longTermDebt"])
139 | 
140 |     if short_term == "N/A" or long_term == "N/A":
141 |         return "N/A"
142 |     return short_term + long_term
143 | 
144 | def generate_pydantic_model(fields_to_include, attributes, base_fields):
145 |     selected_fields = {attr: base_fields[attr] for attr, include in zip(attributes, fields_to_include) if include}
146 |     
147 |     return create_model("DynamicModel", **selected_fields)
148 | 
149 | def insights(insight_name, type_of_data, data, output_format, api_key):
150 |     print(type_of_data)
151 |     
152 |     with open("prompts/iv2.prompt", "r") as f:
153 |         template = f.read()
154 | 
155 |     
156 |     prompt = PromptTemplate(
157 |         template=template,
158 |         input_variables=["insight_name","type_of_data","inputs", "output_format"],
159 |         # partial_variables={"output_format": parser.get_format_instructions()}
160 |     )
161 | 
162 |     model = get_model("openai", api_key)
163 | 
164 |     data = json.dumps(data)
165 | 
166 |     formatted_input = prompt.format(insight_name=insight_name,type_of_data=type_of_data, inputs=data, output_format=output_format)
167 | 
168 |     print("-"*30)
169 |     print("Formatted Input:")
170 |     print(formatted_input)
171 |     print("-"*30)
172 | 
173 |     response = model.predict(formatted_input)
174 |     return response
175 | 
176 |     
177 | 
178 | def format_title(s: str) -> str:
179 |     return ' '.join(word.capitalize() for word in s.split('_'))
180 | 
181 | def create_time_series_chart(data, type_of_data: str, title: str):
182 |     yaxis_title = format_title(type_of_data)
183 |     fig = go.Figure(data=[go.Scatter(x=data['dates'], y=data[type_of_data], mode='lines+markers')])
184 |     fig.update_layout(yaxis=dict(range=[0, max(data)]))
185 |     fig.update_layout(title=title,
186 |                       xaxis_title='Date',
187 |                       yaxis_title=yaxis_title)
188 |     
189 | 
190 |     
191 |     return fig
192 | 
193 | # data = {
194 | #     'dates': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'],
195 | #     'temperature': [22, 24, 23, 22, 21],
196 | #     'humidity': [40, 42, 41, 40, 39]
197 | # }
198 | # # Create a temperature time series chart
199 | # temperature_chart = create_time_series_chart(data, 'temperature')
200 | # temperature_chart.show()
201 | 
202 | # # Create a humidity time series chart
203 | # humidity_chart = create_time_series_chart(data, 'humidity')
204 | # humidity_chart.show()
205 | 
206 | import plotly.graph_objects as go
207 | 
208 | def create_donut_chart(data, type_of_data, hole_size=0.3):
209 | 
210 |     labels = list(data[type_of_data].keys())
211 |     values = list(data[type_of_data].values())
212 |     
213 |     fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=hole_size)])
214 |     fig.update_layout(title=format_title(type_of_data))
215 |     
216 |     return fig
217 | 
218 | # # Example usage:
219 | # data = {
220 | #     'Oxygen': 4500,
221 | #     'Hydrogen': 2500,
222 | #     'Carbon_Dioxide': 1053,
223 | #     'Nitrogen': 500
224 | # }
225 | # chart = create_donut_chart(data, title="Donut Chart")
226 | # chart.show()
227 | 
228 | def create_bar_chart(data, type_of_data: str, title: str = None):
229 |     yaxis_title = format_title(type_of_data)
230 |     fig = go.Figure(data=[go.Bar(x=data['dates'], y=data[type_of_data])])
231 |     # fig.update_layout(yaxis=dict(range=[0, max(data[type_of_data])]))
232 |     fig.update_layout(title=format_title(type_of_data),
233 |                       xaxis_title='Date',
234 |                       yaxis_title=yaxis_title)
235 |    
236 |     return fig
237 | 
238 | 
239 | 
240 | 
241 | 
242 | 
243 | 
244 | 
245 | 
246 | 
247 | 
248 | 


--------------------------------------------------------------------------------
/src/🏡_Home.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from pathlib import Path
 3 | script_dir = Path(__file__).resolve().parent
 4 | project_root = script_dir.parent
 5 | sys.path.append(str(project_root))
 6 | 
 7 | import streamlit as st
 8 | 
 9 | st.set_page_config(page_title="FinSight", page_icon=":money_with_wings:", layout="wide")
10 | 
11 | st.title(":money_with_wings: FinSight \n\n **Financial Insights at Your Fingertip**")
12 | 
13 | st.balloons()
14 | 
15 | st.success("""
16 | If you'd like to learn more about the technical details of FinSight, check out the LlamaIndex blogpost below where I do a deep dive into the project:
17 |            
18 | [How I built the Streamlit LLM Hackathon winning app — FinSight using LlamaIndex.](https://blog.llamaindex.ai/how-i-built-the-streamlit-llm-hackathon-winning-app-finsight-using-llamaindex-9dcf6c46d7a0)
19 |         
20 | """)
21 | 
22 | with open("docs/news.md", "r") as f:
23 |     st.success(f.read())
24 | 
25 | with open("docs/main.md", "r") as f:
26 |     st.info(f.read())
27 | 
28 | 


--------------------------------------------------------------------------------
/test_files/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/test_files/.DS_Store


--------------------------------------------------------------------------------
/test_files/Models.py:
--------------------------------------------------------------------------------
1 | from typing import List
2 | from pydantic import BaseModel
3 | class IncomeStatementRequest(BaseModel):
4 |     symbol: str
5 |     fields_to_include: List[bool]


--------------------------------------------------------------------------------
/test_files/RAG/data1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/test_files/RAG/data1.pdf


--------------------------------------------------------------------------------
/test_files/RAG/tech_1.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from pathlib import Path
 3 | script_dir = Path(__file__).resolve().parent
 4 | project_root = script_dir.parent
 5 | sys.path.append(str(project_root))
 6 | 
 7 | import camelot
 8 | import tabula
 9 | import pandas as pd
10 | from llama_index import Document, SummaryIndex
11 | 
12 | # https://en.wikipedia.org/wiki/The_World%27s_Billionaires
13 | from llama_index import VectorStoreIndex, ServiceContext, LLMPredictor
14 | from llama_index.query_engine import PandasQueryEngine, RetrieverQueryEngine
15 | from llama_index.retrievers import RecursiveRetriever
16 | from llama_index.schema import IndexNode
17 | from llama_index.llms import OpenAI
18 | from llama_index import download_loader
19 | 
20 | from pathlib import Path
21 | from typing import List
22 | 
23 | from src.utils import get_model
24 | 
25 | PDFReader = download_loader("PDFReader")
26 | loader = PDFReader()
27 | pdf =  loader.load_data(file=Path("data/apple/AAPL.pdf"))
28 | 
29 | 
30 | 
31 | # print(pdf)
32 | 
33 | pages = ['32', '33', '34', '35', '36']
34 | table_titles = ['Consolidated Statements of Operations', 'Consolidated Statements of Comprehensive Income', 'Consolidated Balance Sheets',  'Consolidated Statements of Shareholders’ Equity', 'Consolidated Statements of Cash Flows']
35 | 
36 | table1 = tabula.read_pdf("data/apple/AAPL.pdf",output_format="dataframe", pages="32")
37 | print(type(table1))
38 | print(len(table1))
39 | print(pd.DataFrame(table1))
40 | 
41 | 
42 | # def get_tables(path, pages, table_titles):
43 | #     tables = {}
44 | #     for i, page in enumerate(pages):
45 | #         table = tabula.read_pdf(path, pages=f"{page}")
46 |         
47 | #         tables[table_titles[i]] = table
48 | 
49 | #     return tables
50 | 
51 | 
52 | # tables = get_tables("data/apple/AAPL.pdf", pages, table_titles)
53 | 
54 | # # iterate through json object
55 | # for key, value in tables.items():
56 | #     print("-"*30)
57 | #     print("Title: ",key)
58 | #     print(value)
59 | 
60 | 
61 | 


--------------------------------------------------------------------------------
/test_files/RAG/test.py:
--------------------------------------------------------------------------------
1 | import tabula
2 | file1 = "https://nbviewer.jupyter.org/github/kuruvasatya/Scraping-Tables-from-PDF/blob/master/data1.pdf"
3 | table = tabula.read_pdf(file1,pages=1)
4 | print(table[0])


--------------------------------------------------------------------------------
/test_files/__pycache__/Models.cpython-311.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/test_files/__pycache__/Models.cpython-311.pyc


--------------------------------------------------------------------------------
/test_files/__pycache__/main.cpython-311.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/patchy631/RAGs/8706fd10f1c1d2dcf88ff54af25a23139cfca36c/test_files/__pycache__/main.cpython-311.pyc


--------------------------------------------------------------------------------
/test_files/attribs.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from pathlib import Path
 3 | script_dir = Path(__file__).resolve().parent
 4 | project_root = script_dir.parent
 5 | sys.path.append(str(project_root))
 6 | 
 7 | from pydantic import BaseModel, Field, create_model
 8 | import requests
 9 | import streamlit as st
10 | 
11 | 
12 | from src.utils import insights
13 | from src.income_statement import charts, metrics
14 | 
15 | 
16 | AV_API_KEY = st.secrets["av_api_key"]
17 | 
18 | # from src.income_statement import income_statement
19 | 
20 | def generate_model(fields_to_include, attributes, base_fields):
21 |     attributes = ["revenue_health", "operational_efficiency", "r_and_d_focus", "debt_management", "profit_retention"]
22 |     selected_fields = {attr: base_fields[attr] for attr, include in zip(attributes, fields_to_include) if include}
23 |     
24 |     return create_model("DynamicIncomeStatementInsights", **selected_fields)
25 | 
26 | def income_statement(symbol, fields_to_include):
27 | 
28 |     Model = generate_model(fields_to_include)
29 | 
30 |     url = "https://www.alphavantage.co/query"
31 |     params = {
32 |         "function": "INCOME_STATEMENT",
33 |         "symbol": symbol,
34 |         "apikey": AV_API_KEY
35 |     }
36 | 
37 |     # Send a GET request to the API
38 |     response = requests.get(url, params=params)
39 |     if response.status_code == 200:
40 |         data = response.json()
41 |         if not data:
42 |             print(f"No data found for {symbol}")
43 |             return None
44 |         
45 |         
46 |     else:
47 |         print(f"Error: {response.status_code} - {response.text}")
48 | 
49 |     chart_data = charts(data)
50 | 
51 |     report = data["annualReports"][0]
52 |     met = metrics(report)
53 | 
54 |     data_for_insights = {
55 |         "annual_report_data": report,
56 |         "historical_data": chart_data,
57 |     }
58 |     ins = insights("income statement", data_for_insights, Model)
59 | 
60 |     return {
61 |         "metrics": met,
62 |         "chart_data": chart_data,
63 |         "insights": ins
64 |     }
65 | 
66 | 
67 | 
68 | min_length = 5
69 | 
70 | base_fields = {
71 |     "revenue_health": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's total revenue, providing a perspective on the health of the primary business activity.")),
72 |     "operational_efficiency": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of the company's operating expenses in relation to its revenue, offering a view into the firm's operational efficiency.")),
73 |     "r_and_d_focus": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's commitment to research and development, signifying its emphasis on innovation and future growth.")),
74 |     "debt_management": (str, Field(..., description=f"Must be more than {min_length} words. Analysis of the company's interest expenses, highlighting the scale of its debt obligations and its approach to leveraging.")),
75 |     "profit_retention": (str, Field(..., description=f"Must be more than {min_length} words. Insight into the company's net income, showcasing the amount retained post all expenses, which can be reinvested or distributed."))
76 | }
77 | 
78 | 
79 | 
80 | # Example usage
81 | fields_to_include = [True, False, False, False, True]
82 | 
83 | # instance = DynamicModel(revenue_health="good", r_and_d_focus="high", profit_retention="medium")
84 | # print(instance)
85 | 
86 | response = income_statement("TSLA", fields_to_include)
87 | print(response['insights'])
88 | 
89 | 


--------------------------------------------------------------------------------
/test_files/av-api-test.py:
--------------------------------------------------------------------------------
 1 | from dotenv import dotenv_values
 2 | import requests
 3 | 
 4 | config = dotenv_values(".env")
 5 | 
 6 | AV_API_KEY = config["ALPHA_VANTAGE_API_KEY"]
 7 | 
 8 | import requests
 9 | 
10 | 
11 | url = "https://www.alphavantage.co/query"
12 | 
13 | symbol = "TSLA"
14 | params = {
15 |     "function": "CASH_FLOW",
16 |     "symbol": symbol,
17 |     "apikey": AV_API_KEY
18 | }
19 | response = requests.get(url, params=params)
20 | data = response.json()
21 | data = data["annualReports"][0]
22 | print(data)
23 | 
24 | 
25 | 


--------------------------------------------------------------------------------
/test_files/finchat.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from pathlib import Path
 3 | script_dir = Path(__file__).resolve().parent
 4 | project_root = script_dir.parent
 5 | sys.path.append(str(project_root))
 6 | 
 7 | 
 8 | from langchain.chat_models import ChatOpenAI
 9 | from langchain.memory import ConversationBufferWindowMemory
10 | from langchain.chains import ConversationalRetrievalChain
11 | 
12 | import streamlit as st
13 | from dotenv import dotenv_values
14 | 
15 | from src.utils import process_pdf, vector_store
16 | 
17 | # config = dotenv_values(".env")
18 | 
19 | # OPENAI_API_KEY = config["OPENAI_API_KEY"]
20 | 
21 | OPENAI_API_KEY = st.secrets["openai_api_key"]
22 | 
23 | 
24 | def handle_query(query: str):
25 |     result = st.session_state.conversation({"question": query, "chat_history": ""})
26 |     history = st.session_state.memory.load_memory_variables({})['chat_history']
27 |     print(st.session_state.memory.load_memory_variables({})['chat_history'])
28 |     for i, msg in enumerate(history):
29 |         if i%2 == 0:
30 |             st.write("hello")
31 |             st.chat_message("user").write(msg.content)
32 |         else:
33 |             st.chat_message("assistant").write(msg.content)
34 | 
35 | 
36 | if __name__ == "__main__":
37 | 
38 |     if "memory" not in st.session_state:
39 |             st.session_state.memory = None
40 | 
41 |     if "conversation" not in st.session_state:
42 |             st.session_state.conversation = None
43 | 
44 |     st.divider()
45 |     
46 |     model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-3.5-turbo")
47 | 
48 |     if "process_pdf" not in st.session_state:
49 |         st.session_state.process_pdf = False
50 | 
51 |     pdfs = st.sidebar.file_uploader("Upload a PDF file", type=["pdf"], accept_multiple_files=True)
52 |     if st.sidebar.button("Process PDF"):
53 |         st.session_state.process_pdf = True
54 |         with st.spinner("Processing PDF..."):
55 |             splitted_text = process_pdf(pdfs)
56 |             db = vector_store(splitted_text)
57 |         st.session_state.memory = ConversationBufferWindowMemory(memory_key='chat_history', return_messages=True, k=5)
58 |         st.session_state.conversation = ConversationalRetrievalChain.from_llm(llm=model,
59 |                                                                             chain_type="map_reduce", 
60 |                                                                             retriever=db.as_retriever(), 
61 |                                                                             memory=st.session_state.memory)
62 | 
63 |     if st.session_state.process_pdf:
64 |         query = st.chat_input("Ask a question")
65 |         if query:
66 |             handle_query(query)


--------------------------------------------------------------------------------
/test_files/fmp-api.py:
--------------------------------------------------------------------------------
 1 | from dotenv import dotenv_values
 2 | 
 3 | config = dotenv_values(".env")
 4 | 
 5 | FMP_API_KEY = config["FMP_API_KEY"]
 6 | 
 7 | import requests
 8 | 
 9 | # Define the API endpoint URL
10 | url = "https://financialmodelingprep.com/api/v4/financial-reports-json"
11 | 
12 | # Define your API key (replace YOUR_API_KEY with your actual API key)
13 | api_key = "YOUR_API_KEY"
14 | 
15 | # Define the parameters for the request
16 | params = {
17 |     "symbol": "AAPL",
18 |     "year": "2020",
19 |     "period": "FY",
20 |     "apikey": FMP_API_KEY
21 | }
22 | 
23 | # Send a GET request to the API
24 | response = requests.get(url, params=params)
25 | 
26 | # Check if the request was successful (status code 200)
27 | if response.status_code == 200:
28 |     data = response.json()
29 |     print(data)
30 | else:
31 |     print(f"Error: {response.status_code} - {response.text}")
32 | 


--------------------------------------------------------------------------------
/test_files/main.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from pathlib import Path
 3 | script_dir = Path(__file__).resolve().parent
 4 | project_root = script_dir.parent
 5 | sys.path.append(str(project_root))
 6 | 
 7 | import uvicorn
 8 | from fastapi import FastAPI
 9 | from pydantic import BaseModel
10 | from test_files.Models import IncomeStatementRequest
11 | from src.income_statement import income_statement
12 | 
13 | app = FastAPI()
14 | @app.post("/income_statement")
15 | def get_income_statement(request_data: IncomeStatementRequest):
16 |     symbol = request_data.symbol
17 |     fields_to_include = request_data.fields_to_include
18 | 
19 |     # Call the income_statement function to retrieve data
20 |     income_statement_data = income_statement(symbol, fields_to_include)
21 | 
22 |     return income_statement_data  # Return the data as a response
23 | 
24 | if __name__ == "main":
25 |     uvicorn.run(app, host="0.0.0.0", port=8000)


--------------------------------------------------------------------------------
/test_files/node_parsing.py:
--------------------------------------------------------------------------------
 1 | 
 2 | from llama_index.node_parser.extractors import (
 3 |     MetadataExtractor,
 4 |     SummaryExtractor,
 5 |     QuestionsAnsweredExtractor,
 6 |     TitleExtractor,
 7 |     KeywordExtractor,
 8 |     EntityExtractor,
 9 |     MetadataFeatureExtractor,
10 | )
11 | from llama_index.text_splitter import TokenTextSplitter
12 | from llama_index.node_parser import SimpleNodeParser
13 | from llama_index import SimpleDirectoryReader
14 | 
15 | document = SimpleDirectoryReader("data/apple").load_data()
16 | 
17 | print(len(document[0].text))
18 | chunk_size = len(document[0].text) // 25
19 | 
20 | text_splitter = TokenTextSplitter(separator=" ", chunk_size=chunk_size, chunk_overlap=chunk_size//10)
21 | metadata_extractor = MetadataExtractor(
22 |     extractors=[
23 |         # TitleExtractor(),
24 |         # SummaryExtractor(),
25 |         KeywordExtractor(keywords=1),
26 |         # QuestionsAnsweredExtractor(questions=3),
27 |     ],
28 | )
29 | 
30 | node_parser = SimpleNodeParser.from_defaults(
31 |     text_splitter=text_splitter,
32 |     metadata_extractor=metadata_extractor,
33 | )
34 | # assume documents are defined -> extract nodes
35 | nodes = node_parser.get_nodes_from_documents(document)
36 | print(len(nodes))
37 | print(nodes[0].metadata)
38 | print(nodes[0])
39 | print(nodes[1].metadata)
40 | print(nodes[1])
41 | 


--------------------------------------------------------------------------------
/test_files/nodes.py:
--------------------------------------------------------------------------------
 1 | from pypdf import PdfReader
 2 | import streamlit as st
 3 | from llama_index import Document
 4 | from llama_index.node_parser import SimpleNodeParser
 5 | 
 6 | node_parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
 7 | 
 8 | pdfs = st.file_uploader("pdf file")
 9 | docs = []
10 | for pdf in [pdfs]:
11 |         file = PdfReader(pdf)
12 |         text = ""
13 |         for page in file.pages:
14 |             text += str(page.extract_text())
15 |         
16 |         docs.append(Document(text=text))
17 | 
18 | # nodes = node_parser.get_nodes_from_documents(docs, show_progress=False)
19 | # print(nodes)
20 | 
21 | print(docs)


--------------------------------------------------------------------------------
/test_files/open_ai_api.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from pathlib import Path
 3 | from typing import Literal
 4 | 
 5 | script_dir = Path(__file__).resolve().parent
 6 | project_root = script_dir.parent
 7 | sys.path.append(str(project_root))
 8 | 
 9 | import os
10 | import openai
11 | import streamlit as st
12 | from pydantic import BaseModel, Field
13 | 
14 | OPENAI_API_KEY = st.secrets["openai_api_key"]
15 | 
16 | 
17 | # # Load your API key from an environment variable or secret management service
18 | openai.api_key = os.getenv("OPENAI_API_KEY")
19 | 
20 | # chat_completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "who is lionel messi?"}])
21 | # print(chat_completion)
22 | 
23 | # from llama_index.llms import OpenAI
24 | from langchain.chat_models import ChatOpenAI
25 | 
26 | llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-4")
27 | 
28 | response = llm.predict("who is virat kohli?")
29 | print(type(response))
30 | print(response)


--------------------------------------------------------------------------------
/test_files/parser.py:
--------------------------------------------------------------------------------
 1 | # Define your desired data structure.
 2 | from typing import List
 3 | 
 4 | from langchain.chat_models import ChatOpenAI
 5 | from langchain.output_parsers import PydanticOutputParser
 6 | from langchain.prompts import PromptTemplate
 7 | from langchain.pydantic_v1 import BaseModel, Field, validator
 8 | 
 9 | import streamlit as st
10 | 
11 | OPENAI_API_KEY = st.secrets["openai_api_key"]
12 | 
13 | 
14 | model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-4")
15 | 
16 | 
17 | # Here's another example, but with a compound typed field.
18 | class Actor(BaseModel):
19 |     name: str = Field(description="name of an actor")
20 |     film_names: List[str] = Field(description="list of names of films they starred in")
21 | 
22 | 
23 | actor_query = "Generate the filmography for a random actor."
24 | 
25 | parser = PydanticOutputParser(pydantic_object=Actor)
26 | 
27 | prompt = PromptTemplate(
28 |     template="Answer the user query.\n{format_instructions}\n{query}\n",
29 |     input_variables=["query"],
30 |     partial_variables={"format_instructions": parser.get_format_instructions()},
31 | )
32 | 
33 | _input = prompt.format_prompt(query=actor_query)
34 | 
35 | output = model(_input.to_string())
36 | 
37 | print(parser.parse(output))
38 | 
39 | 


--------------------------------------------------------------------------------
/test_files/pdf1.py:
--------------------------------------------------------------------------------
1 | import sys
2 | from pathlib import Path
3 | script_dir = Path(__file__).resolve().parent
4 | project_root = script_dir.parent
5 | sys.path.append(str(project_root))
6 | 
7 | pdf1 = open("AAPL.pdf", "r")
8 | print(pdf1.read())


--------------------------------------------------------------------------------
/test_files/pdf_gen.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import sys
  3 | from pathlib import Path
  4 | script_dir = Path(__file__).resolve().parent
  5 | project_root = script_dir.parent
  6 | sys.path.append(str(project_root))
  7 | 
  8 | from reportlab.lib.pagesizes import letter
  9 | from reportlab.platypus import PageBreak, Paragraph, SimpleDocTemplate
 10 | from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
 11 | from reportlab.lib.enums import TA_CENTER
 12 | from reportlab.lib.pagesizes import landscape
 13 | from reportlab.platypus import Paragraph, Table, TableStyle, Spacer
 14 | from reportlab.lib import colors
 15 | 
 16 | 
 17 | from src.company_overview import company_overview
 18 | from src.income_statement import income_statement
 19 | from src.balance_sheet import balance_sheet
 20 | from src.cash_flow import cash_flow
 21 | from src.news_sentiment import top_news
 22 | from src.utils import round_numeric
 23 | 
 24 | # Get the default styles
 25 | styles = getSampleStyleSheet()
 26 | 
 27 | # Define custom styles
 28 | centered_style = ParagraphStyle(
 29 |     'CenteredStyle',
 30 |     parent=styles['Heading1'],
 31 |     alignment=TA_CENTER,
 32 |     fontSize=48,
 33 |     spaceAfter=50,
 34 | )
 35 | 
 36 | sub_centered_style = ParagraphStyle(
 37 |     'SubCenteredStyle',
 38 |     parent=styles['Heading2'],
 39 |     alignment=TA_CENTER,
 40 |     fontSize=24,
 41 |     spaceAfter=15,
 42 | )
 43 | 
 44 | def cover_page(company_name):
 45 |     flowables = []
 46 |     
 47 |     # Title
 48 |     title = "FinSight"
 49 |     para_title = Paragraph(title, centered_style)
 50 |     flowables.append(para_title)
 51 |     
 52 |     # Subtitle
 53 |     subtitle = "Financial Insights for<br/>"
 54 |     para_subtitle = Paragraph(subtitle, sub_centered_style)
 55 |     flowables.append(para_subtitle)
 56 | 
 57 |     subtitle2 = "{} {}".format(company_name, "2022")
 58 |     para_subtitle2 = Paragraph(subtitle2, sub_centered_style)
 59 |     flowables.append(para_subtitle2)
 60 |     
 61 |     # Add a page break after the cover page
 62 |     flowables.append(PageBreak())
 63 |     
 64 |     return flowables
 65 | 
 66 | from reportlab.lib.styles import ParagraphStyle
 67 | from reportlab.lib.enums import TA_LEFT
 68 | 
 69 | # Define custom styles
 70 | header_style = ParagraphStyle(
 71 |     'HeaderStyle',
 72 |     parent=styles['Heading2'],
 73 |     fontSize=24,
 74 |     spaceAfter=20,
 75 |     leading=30
 76 | )
 77 | 
 78 | sub_section_header_style = ParagraphStyle(
 79 |     'SubSectionHeaderStyle',
 80 |     parent=styles['Heading3'],
 81 |     fontSize=16,
 82 |     spaceAfter=8,
 83 |     leading=20
 84 | )
 85 | 
 86 | data_style = ParagraphStyle(
 87 |     'DataStyle',
 88 |     parent=styles['Normal'],
 89 |     fontSize=14,
 90 |     spaceAfter=15,
 91 |     leading=20
 92 | )
 93 | 
 94 | sub_header_style = ParagraphStyle(
 95 |     'DataStyle',
 96 |     parent=styles['Normal'],
 97 |     fontSize=20,
 98 |     spaceAfter=15,
 99 |     leading=20
100 | )
101 | 
102 | def pdf_company_overview(data):
103 |     flowables = []
104 |     
105 |     # Section Title
106 |     title = "Company Overview"
107 |     para_title = Paragraph(title, header_style)
108 |     flowables.append(para_title)
109 |     
110 |     # Company Name
111 |     # data = json.loads(data)
112 |     company_name = data.get("Name")
113 |     print(company_name)
114 |     para_name = Paragraph("<b> {} </b>".format(company_name), sub_header_style)
115 |     flowables.append(para_name)
116 |     
117 |     # Other details
118 |     details = [
119 |         ("Symbol:", data.get("Symbol")),
120 |         ("Exchange:", data.get("Exchange")),
121 |         ("Currency:", data.get("Currency")),
122 |         ("Sector:", data.get("Sector")),
123 |         ("Industry:", data.get("Industry")),
124 |         ("Description:", data.get("Description")),
125 |         ("Country:", data.get("Country")),
126 |         ("Address:", data.get("Address")),
127 |         ("Fiscal Year End:", data.get("Fiscal_year_end")),
128 |         ("Latest Quarter:", data.get("Latest_quarter")),
129 |         ("Market Capitalization:", data.get("Market_cap"))
130 |     ]
131 |     
132 |     for label, value in details:
133 |         para_label = Paragraph("<b>{}</b> {}".format(label, value), data_style)
134 |         flowables.append(para_label)
135 |     
136 |     return flowables
137 | 
138 | 
139 | def pdf_income_statement(metrics, insights):
140 |     flowables = []
141 |     
142 |     # Section Title
143 |     title = "INCOME STATEMENT"
144 |     para_title = Paragraph(title, header_style)
145 |     flowables.append(para_title)
146 | 
147 |     
148 |     # Metrics
149 |     flowables.append(Paragraph("<b>METRICS</b>", sub_header_style))
150 |     for label, value in metrics.items():
151 |         metric_text = "<b>{}</b>: {}".format(label.replace("_", " ").title(), round_numeric(value))
152 |         flowables.append(Paragraph(metric_text, data_style))
153 |     
154 |     # Insights
155 |     flowables.append(Paragraph("<b>INSIGHTS</b>", sub_header_style))
156 |     flowables.append(Paragraph("Revenue Health", sub_section_header_style))
157 |     flowables.append(Paragraph(insights.revenue_health, data_style))
158 |     flowables.append(Paragraph("Operational Efficiency", sub_section_header_style))
159 |     flowables.append(Paragraph(insights.operational_efficiency, data_style))
160 |     flowables.append(Paragraph("R&D Focus", sub_section_header_style))
161 |     flowables.append(Paragraph(insights.r_and_d_focus, data_style))
162 |     flowables.append(Paragraph("Debt Management", sub_section_header_style))
163 |     flowables.append(Paragraph(insights.debt_management, data_style))
164 |     flowables.append(Paragraph("Profit Retention", sub_section_header_style))
165 |     flowables.append(Paragraph(insights.profit_retention, data_style))
166 |     
167 |     return flowables
168 | 
169 | def pdf_balance_sheet(metrics, insights):
170 |     flowables = []
171 |     
172 |     # Section Title
173 |     title = "BALANCE SHEET"
174 |     para_title = Paragraph(title, header_style)
175 |     flowables.append(para_title)
176 | 
177 |     # Metrics
178 |     flowables.append(Paragraph("<b>METRICS</b>", sub_header_style))
179 |     for label, value in metrics.items():
180 |         metric_text = "<b>{}</b>: {}".format(label.replace("_", " ").title(), round_numeric(value))
181 |         flowables.append(Paragraph(metric_text, data_style))
182 |     
183 |     # Insights
184 |     flowables.append(Paragraph("<b>INSIGHTS</b>", sub_header_style))
185 |     insight_sections = [
186 |         ("Liquidity Position", insights.liquidity_position),
187 |         ("Operational Efficiency", insights.operational_efficiency),
188 |         ("Capital Structure", insights.capital_structure),
189 |         ("Inventory Management", insights.inventory_management),
190 |         ("Overall Solvency", insights.overall_solvency)
191 |     ]
192 |     
193 |     for section_title, insight_text in insight_sections:
194 |         flowables.append(Paragraph(section_title, sub_section_header_style))
195 |         flowables.append(Paragraph(insight_text, data_style))
196 | 
197 |     return flowables
198 | 
199 | def pdf_cash_flow(metrics, insights):
200 |     flowables = []
201 |     
202 |     # Section Title
203 |     title = "CASH FLOW"
204 |     para_title = Paragraph(title, header_style)
205 |     flowables.append(para_title)
206 | 
207 |     flowables.append(Paragraph("<b>METRICS</b>", sub_header_style))
208 |     for label, value in metrics.items():
209 |         metric_text = "<b>{}</b>: {}".format(label.replace("_", " ").title(), round_numeric(value))
210 |         flowables.append(Paragraph(metric_text, data_style))
211 | 
212 |     # Insights
213 |     flowables.append(Paragraph("<b>INSIGHTS</b>", sub_header_style))
214 |     insight_sections = [
215 |         ("Operational Cash Efficiency", insights.operational_cash_efficiency),
216 |         ("Investment Capability", insights.investment_capability),
217 |         ("Financial Flexibility", insights.financial_flexibility),
218 |         ("Dividend Sustainability", insights.dividend_sustainability),
219 |         ("Debt Service Capability", insights.debt_service_capability)
220 |     ]
221 |     
222 |     for section_title, insight_text in insight_sections:
223 |         flowables.append(Paragraph(section_title, sub_section_header_style))
224 |         flowables.append(Paragraph(insight_text, data_style))
225 |     
226 |     return flowables
227 | 
228 | def pdf_news_sentiment(data):
229 |     flowables = []
230 |     
231 |     # Section Title
232 |     title = "NEWS SENTIMENT"
233 |     para_title = Paragraph(title, header_style)
234 |     flowables.append(para_title)
235 |     flowables.append(Spacer(1, 12))
236 | 
237 |     # News DataFrame to Table
238 |     df = data['news']
239 |     table_data = [df.columns.to_list()] + df.values.tolist()
240 |     table = Table(table_data, repeatRows=1)  # repeatRows ensures the header is repeated if the table spans multiple pages
241 | 
242 |     # Table Style
243 |     style = TableStyle([
244 |         ('BACKGROUND', (0, 0), (-1, 0), colors.grey),
245 |         ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
246 |         ('ALIGN', (0, 0), (-1, -1), 'LEFT'),
247 |         ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
248 |         ('FONTSIZE', (0, 0), (-1, 0), 12),
249 |         ('BOTTOMPADDING', (0, 0), (-1, 0), 12),
250 |         ('BACKGROUND', (0, 1), (-1, -1), colors.beige),
251 |         ('GRID', (0, 0), (-1, -1), 1, colors.black)
252 |     ])
253 |     table.setStyle(style)
254 |     flowables.append(table)
255 |     flowables.append(Spacer(1, 12))
256 | 
257 |     # Mean Sentiment Score
258 |     mean_sentiment_score_text = f"Mean Sentiment Score: {data['mean_sentiment_score']:.2f}"
259 |     flowables.append(Paragraph(mean_sentiment_score_text, data_style))
260 |     flowables.append(Spacer(1, 12))
261 | 
262 |     # Mean Sentiment Class
263 |     mean_sentiment_class_text = f"Mean Sentiment Class: {data['mean_sentiment_class']}"
264 |     flowables.append(Paragraph(mean_sentiment_class_text, data_style))
265 |     
266 |     return flowables
267 | 
268 | 
269 | def gen_pdf(company_name, overview_data, income_statement_data, balance_sheet_data, cash_flow_data, news_data):
270 |     doc = SimpleDocTemplate("final_report.pdf", pagesize=letter)
271 |     all_flowables = []
272 | 
273 |     all_flowables.extend(cover_page(company_name))
274 |     all_flowables.extend(pdf_company_overview(overview_data))
275 |     all_flowables.extend(pdf_income_statement(income_statement_data['metrics'], income_statement_data['insights']))
276 |     all_flowables.extend(pdf_balance_sheet(balance_sheet_data['metrics'], balance_sheet_data['insights']))
277 |     all_flowables.extend(pdf_cash_flow(cash_flow_data['metrics'], cash_flow_data['insights']))
278 |     # all_flowables.extend(pdf_news_sentiment(news_data))
279 |     doc.build(all_flowables)
280 | 
281 | if __name__ == "__main__":
282 |     overview_data = company_overview("AAPL")
283 |     inc = income_statement("AAPL")
284 |     bal = balance_sheet("AAPL")
285 |     cash = cash_flow("AAPL")
286 |     news = top_news("AAPL", 10)
287 |     gen_pdf("Apple Inc.", overview_data, inc, bal, cash, None)
288 | 


--------------------------------------------------------------------------------
/test_files/plotly_chart.py:
--------------------------------------------------------------------------------
 1 | import plotly.graph_objects as go
 2 | from reportlab.lib.pagesizes import letter
 3 | from reportlab.lib.units import inch
 4 | from reportlab.platypus import SimpleDocTemplate, Image
 5 | import tempfile
 6 | import os
 7 | 
 8 | def create_plotly_chart_image(data, labels):
 9 |     """
10 |     Create a Plotly chart and save it to a temporary image file.
11 |     Returns the path to the temporary image file.
12 |     """
13 |     fig = go.Figure(data=[go.Bar(y=data, x=labels)])
14 |     img_bytes = fig.to_image(format="png")
15 | 
16 |     # Save the image bytes to a temporary file
17 |     temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".png")
18 |     temp_file.write(img_bytes)
19 |     temp_file.close()
20 |     img = Image(temp_file.name, width=5*inch, height=3*inch)
21 | 
22 |     return img
23 | 
24 | def main():
25 |     # Create two charts and get their image paths
26 |     chart1_path = create_plotly_chart_image([2, 1, 3], ["a", "b", "c"])
27 |     # chart2_path = create_plotly_chart_image([5, 3, 4], ["d", "e", "f"])
28 | 
29 |     # Create a list of flowables
30 |     flowables = [
31 |         chart1_path,
32 |         # chart2_path
33 |     ]
34 | 
35 |     # Create a PDF and add the flowables
36 |     doc = SimpleDocTemplate("charts.pdf", pagesize=letter)
37 |     doc.build(flowables)
38 | 
39 |     # Clean up the temporary files
40 |     # os.unlink(chart1_path)
41 |     # os.unlink(chart2_path)
42 | 
43 |     print("PDF with two Plotly charts created!")
44 | 
45 | if __name__ == "__main__":
46 |     main()


--------------------------------------------------------------------------------
/test_files/plotly_pdf.py:
--------------------------------------------------------------------------------
 1 | # import sys
 2 | # from pathlib import Path
 3 | # script_dir = Path(__file__).resolve().parent
 4 | # project_root = script_dir.parent
 5 | # sys.path.append(str(project_root))
 6 | 
 7 | # import plotly.io as pio
 8 | # import plotly.graph_objects as go
 9 | 
10 | # from reportlab.lib.pagesizes import letter
11 | # from reportlab.lib.units import inch
12 | # from reportlab.platypus import SimpleDocTemplate, Image
13 | # from io import BytesIO
14 | # from PIL import Image as PILImage
15 | # import tempfile
16 | 
17 | 
18 | 
19 | # from src.utils import create_donut_chart, create_bar_chart
20 | 
21 | # def create_pdf_flowable_with_plotly(data, type_of_data):
22 | #     # Convert the Plotly figure to an image (in this case PNG format)
23 | #     fig = go.Figure(data=[go.Bar(y=[2, 1, 3], x=["a", "b", "c"])])
24 | #     img_bytes = fig.to_image(format="png")
25 | 
26 | #     temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".png")
27 | #     temp_file.write(img_bytes)
28 | #     temp_file.close()
29 | #     img = Image(temp_file.name, width=5*inch, height=3*inch)
30 | #     return img
31 | 
32 | # data = {
33 | #     "fruits": {"Apple": 18, "Banana": 20, "Cherry": 30}
34 | # }
35 | # type_of_data = "fruits"
36 | 
37 | # flowables = []
38 | 
39 | # flowables.append(create_pdf_flowable_with_plotly(data, type_of_data))
40 | 
41 | # doc = SimpleDocTemplate("output.pdf", pagesize=letter)
42 | # doc.build(flowables)
43 | import plotly.graph_objects as go
44 | from reportlab.lib.pagesizes import letter
45 | from reportlab.lib.units import inch
46 | from reportlab.platypus import SimpleDocTemplate, Image
47 | import tempfile
48 | import os
49 | 
50 | def create_plotly_chart_image(data, labels):
51 |     """
52 |     Create a Plotly chart and save it to a temporary image file.
53 |     Returns the path to the temporary image file.
54 |     """
55 |     fig = go.Figure(data=[go.Bar(y=data, x=labels)])
56 |     img_bytes = fig.to_image(format="png")
57 | 
58 |     # Save the image bytes to a temporary file
59 |     temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".png")
60 |     temp_file.write(img_bytes)
61 |     temp_file.close()
62 |     img = Image(temp_file.name, width=5*inch, height=3*inch)
63 | 
64 |     return img
65 | 
66 | def main():
67 |     # Create two charts and get their image paths
68 |     chart1_path = create_plotly_chart_image([2, 1, 3], ["a", "b", "c"])
69 |     # chart2_path = create_plotly_chart_image([5, 3, 4], ["d", "e", "f"])
70 | 
71 |     # Create a list of flowables
72 |     flowables = []
73 |     flowables.append(chart1_path)
74 | 
75 |     # Create a PDF and add the flowables
76 |     doc = SimpleDocTemplate("out1put.pdf", pagesize=letter)
77 |     doc.build(flowables)
78 | 
79 |     # Clean up the temporary files
80 |     # os.unlink(chart1_path)
81 |     # os.unlink(chart2_path)
82 | 
83 |     print("PDF with two Plotly charts created!")
84 | 
85 | if __name__ == "__main__":
86 |     main()


--------------------------------------------------------------------------------
/test_files/pydant.py:
--------------------------------------------------------------------------------
 1 | def calculate_metrics(data):
 2 |     # Extracting values from the data
 3 |     grossProfit = float(data["grossProfit"])
 4 |     totalRevenue = float(data["totalRevenue"])
 5 |     operatingIncome = float(data["operatingIncome"])
 6 |     costOfRevenue = float(data["costOfRevenue"])
 7 |     costofGoodsAndServicesSold = float(data["costofGoodsAndServicesSold"])
 8 |     sellingGeneralAndAdministrative = float(data["sellingGeneralAndAdministrative"])
 9 |     ebit = float(data["ebit"])
10 |     interestAndDebtExpense = float(data["interestAndDebtExpense"])
11 | 
12 |     # Calculating metrics
13 |     gross_profit_margin = grossProfit / totalRevenue
14 |     operating_profit_margin = operatingIncome / totalRevenue
15 |     net_profit_margin = float(data["netIncome"]) / totalRevenue
16 |     cost_efficiency = totalRevenue / (costOfRevenue + costofGoodsAndServicesSold)
17 |     sg_and_a_efficiency = totalRevenue / sellingGeneralAndAdministrative
18 |     interest_coverage_ratio = ebit / interestAndDebtExpense
19 | 
20 |     # Returning the results
21 |     return {
22 |         "gross_profit_margin": gross_profit_margin,
23 |         "operating_profit_margin": operating_profit_margin,
24 |         "net_profit_margin": net_profit_margin,
25 |         "cost_efficiency": cost_efficiency,
26 |         "sg_and_a_efficiency": sg_and_a_efficiency,
27 |         "interest_coverage_ratio": interest_coverage_ratio
28 |     }
29 | 
30 | # Example usage:
31 | data = {
32 |     "fiscalDateEnding": "2022-12-31",
33 |     "reportedCurrency": "USD",
34 |     "grossProfit": "32687000000",
35 |     "totalRevenue": "60530000000",
36 |     "costOfRevenue": "27842000000",
37 |     "costofGoodsAndServicesSold": "385000000",
38 |     "operatingIncome": "6408000000",
39 |     "sellingGeneralAndAdministrative": "18609000000",
40 |     "researchAndDevelopment": "6567000000",
41 |     "operatingExpenses": "26279000000",
42 |     "investmentIncomeNet": "None",
43 |     "netInterestIncome": "-1216000000",
44 |     "interestIncome": "162000000",
45 |     "interestExpense": "1216000000",
46 |     "nonInterestIncome": "365000000",
47 |     "otherNonOperatingIncome": "443000000",
48 |     "depreciation": "2407000000",
49 |     "depreciationAndAmortization": "2395000000",
50 |     "incomeBeforeTax": "1013000000",
51 |     "incomeTaxExpense": "-626000000",
52 |     "interestAndDebtExpense": "1216000000",
53 |     "netIncomeFromContinuingOperations": "1783000000",
54 |     "comprehensiveIncomeNetOfTax": "8134000000",
55 |     "ebit": "2229000000",
56 |     "ebitda": "4624000000",
57 |     "netIncome": "1639000000"
58 | }
59 | 
60 | print(calculate_metrics(data))
61 | 


--------------------------------------------------------------------------------
/test_files/remove_tags.py:
--------------------------------------------------------------------------------
 1 | from bs4 import BeautifulSoup
 2 | 
 3 | with open('data/sec-edgar-filings/AAPL/10-K/0000320193-22-000108/full-submission.txt', 'r', encoding='utf-8') as file:
 4 |     content = file.read()
 5 | 
 6 | soup = BeautifulSoup(content, 'html.parser')
 7 | cleaned_text = soup.get_text()
 8 | 
 9 | 
10 | with open('cleaned_file.txt', 'w', encoding='utf-8') as file:
11 |     file.write(cleaned_text)
12 | 
13 | print(cleaned_text)


--------------------------------------------------------------------------------
/test_files/sec_api_test.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | import pandas as pd
 3 | import json
 4 | 
 5 | # create request header
 6 | headers = {'User-Agent': "vishwas.g217@gmail.com"}
 7 | 
 8 | # get all companies data
 9 | companyTickers = requests.get(
10 |     "https://www.sec.gov/files/company_tickers.json",
11 |     headers=headers
12 |     )
13 | 
14 | print(companyTickers.json())
15 | # # review response / keys
16 | print(companyTickers.json().keys())
17 | # print(companyTickers.json().data())
18 | 
19 | # # format response to dictionary and get first key/value
20 | firstEntry = companyTickers.json()['0']
21 | print(firstEntry)
22 | 
23 | # # parse CIK // without leading zeros
24 | directCik = companyTickers.json()['0']['cik_str']
25 | 
26 | # # dictionary to dataframe
27 | companyData = pd.DataFrame.from_dict(companyTickers.json(),
28 |                                      orient='index')
29 | # print(companyData.head())
30 | 
31 | # # add leading zeros to CIK
32 | companyData['cik_str'] = companyData['cik_str'].astype(
33 |                            str).str.zfill(10)
34 | 
35 | cik = companyData['cik_str'][0]
36 | 
37 | filingMetadata = requests.get(
38 |     f'https://data.sec.gov/submissions/CIK{cik}.json',
39 |     headers=headers
40 |     )
41 | 
42 | # print(filingMetadata.json().keys())
43 | data = filingMetadata.json()['filings']['recent']
44 | 
45 | with open('data.json', 'w') as f:
46 |     json.dump(data, f)


--------------------------------------------------------------------------------
/test_files/sec_download.py:
--------------------------------------------------------------------------------
1 | from sec_edgar_downloader import Downloader
2 | 
3 | dl = Downloader("Personal", "vishwas.g217@gmail.com", "data/")
4 | dl.get("10-K", "AAPL", download_details=False, after="2020-01-01")
5 | 
6 | 


--------------------------------------------------------------------------------
/test_files/summarize.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from pathlib import Path
 3 | script_dir = Path(__file__).resolve().parent
 4 | project_root = script_dir.parent
 5 | sys.path.append(str(project_root))
 6 | 
 7 | from langchain.chains.question_answering import load_qa_chain
 8 | from langchain.document_loaders import PyPDFLoader
 9 | from langchain.llms import OpenAI
10 | 
11 | 
12 | from dotenv import dotenv_values
13 | 
14 | config = dotenv_values(".env")
15 | OPENAI_API_KEY = config["OPENAI_API_KEY"]
16 | 
17 | # load document
18 | loader = PyPDFLoader("AAPL.pdf")
19 | documents = loader.load()
20 | 
21 | llm = OpenAI(openai_api_key=OPENAI_API_KEY)
22 | 
23 | ### For multiple documents 
24 | # loaders = [....]
25 | # documents = []
26 | # for loader in loaders:
27 | #     documents.extend(loader.load())
28 | 
29 | chain = load_qa_chain(llm=llm, chain_type="map_reduce")
30 | query = "what is the total number of AI publications?"
31 | print(chain.run(input_documents=documents, question=query))
32 | 
33 | 


--------------------------------------------------------------------------------
/test_files/table_pdf.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | from pathlib import Path
  3 | script_dir = Path(__file__).resolve().parent
  4 | project_root = script_dir.parent
  5 | sys.path.append(str(project_root))
  6 | 
  7 | from reportlab.lib.pagesizes import letter
  8 | from reportlab.platypus import PageBreak, Paragraph, SimpleDocTemplate, Flowable
  9 | from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
 10 | from reportlab.lib.enums import TA_CENTER
 11 | from reportlab.lib.pagesizes import landscape
 12 | from reportlab.platypus import Paragraph, Table, TableStyle, Spacer
 13 | from reportlab.lib import colors
 14 | 
 15 | from src.news_sentiment import top_news
 16 | 
 17 | styles = getSampleStyleSheet()
 18 | 
 19 | 
 20 | header_style = ParagraphStyle(
 21 |     'HeaderStyle',
 22 |     parent=styles['Heading2'],
 23 |     fontSize=24,
 24 |     spaceAfter=20,
 25 |     leading=30
 26 | )
 27 | 
 28 | sub_section_header_style = ParagraphStyle(
 29 |     'SubSectionHeaderStyle',
 30 |     parent=styles['Heading3'],
 31 |     fontSize=16,
 32 |     spaceAfter=8,
 33 |     leading=20
 34 | )
 35 | 
 36 | data_style = ParagraphStyle(
 37 |     'DataStyle',
 38 |     parent=styles['Normal'],
 39 |     fontSize=14,
 40 |     spaceAfter=15,
 41 |     leading=20
 42 | )
 43 | 
 44 | class RotatedTable(Flowable):
 45 |     def __init__(self, table_data):
 46 |         Flowable.__init__(self)
 47 |         self.table_data = table_data
 48 | 
 49 |     def wrap(self, availWidth, availHeight):
 50 |         # Swap width and height for rotated table
 51 |         self.width, self.height = availHeight, availWidth
 52 |         return self.width, self.height
 53 | 
 54 |     def draw(self):
 55 |         # Create the table
 56 |         table = Table(self.table_data, repeatRows=1)
 57 |         
 58 |         # Table Style
 59 |         style = TableStyle([
 60 |             ('BACKGROUND', (0, 0), (-1, 0), colors.grey),
 61 |             ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
 62 |             ('ALIGN', (0, 0), (-1, -1), 'LEFT'),
 63 |             ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
 64 |             ('FONTSIZE', (0, 0), (-1, 0), 12),
 65 |             ('BOTTOMPADDING', (0, 0), (-1, 0), 12),
 66 |             ('BACKGROUND', (0, 1), (-1, -1), colors.beige),
 67 |             ('GRID', (0, 0), (-1, -1), 1, colors.black)
 68 |         ])
 69 |         table.setStyle(style)
 70 | 
 71 |         # Rotate the canvas, draw the table, then reset rotation
 72 |         self.canv.saveState()
 73 |         self.canv.translate(0, self.width)
 74 |         self.canv.rotate(-90)
 75 |         table.wrapOn(self.canv, self.height, self.width)
 76 |         table.drawOn(self.canv, 0, 0)
 77 |         self.canv.restoreState()
 78 | 
 79 | def pdf_news_sentiment(data):
 80 |     flowables = []
 81 |     
 82 |     # Section Title
 83 |     title = "NEWS SENTIMENT"
 84 |     para_title = Paragraph(title, header_style)
 85 |     flowables.append(para_title)
 86 |     flowables.append(Spacer(1, 12))
 87 | 
 88 |     # News DataFrame to Table
 89 |     df = data['news'].astype(str)
 90 |     table_data = [df.columns.to_list()] + df.values.tolist()
 91 |     flowables.append(RotatedTable(table_data))
 92 |     flowables.append(Spacer(1, 12))
 93 |     
 94 |     table = Table(table_data, repeatRows=1)  # repeatRows ensures the header is repeated if the table spans multiple pages
 95 |     table.hAlign = 'CENTER'
 96 |     table.rotate = 90
 97 | 
 98 |     # Table Style
 99 |     style = TableStyle([
100 |         ('BACKGROUND', (0, 0), (-1, 0), colors.grey),
101 |         ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
102 |         ('ALIGN', (0, 0), (-1, -1), 'LEFT'),
103 |         ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
104 |         ('FONTSIZE', (0, 0), (-1, 0), 12),
105 |         ('BOTTOMPADDING', (0, 0), (-1, 0), 12),
106 |         ('BACKGROUND', (0, 1), (-1, -1), colors.beige),
107 |         ('GRID', (0, 0), (-1, -1), 1, colors.black)
108 |     ])
109 |     table.setStyle(style)
110 |     flowables.append(table)
111 |     flowables.append(Spacer(1, 12))
112 | 
113 |     # Mean Sentiment Score
114 |     mean_sentiment_score_text = f"Mean Sentiment Score: {data['mean_sentiment_score']:.2f}"
115 |     flowables.append(Paragraph(mean_sentiment_score_text, data_style))
116 |     flowables.append(Spacer(1, 12))
117 | 
118 |     # Mean Sentiment Class
119 |     mean_sentiment_class_text = f"Mean Sentiment Class: {data['mean_sentiment_class']}"
120 |     flowables.append(Paragraph(mean_sentiment_class_text, data_style))
121 |     
122 |     return flowables
123 | 
124 | news = top_news('AAPL', 10)
125 | 
126 | flow1 = pdf_news_sentiment(news)
127 | 
128 | doc = SimpleDocTemplate("table.pdf", pagesize=letter)
129 | all_flowables = []
130 | all_flowables.extend(flow1)
131 | doc.build(all_flowables)
132 | 
133 | 
134 | 


--------------------------------------------------------------------------------
/test_files/tbl.py:
--------------------------------------------------------------------------------
 1 | from pydantic import BaseModel
 2 | from typing import List
 3 | 
 4 | class MyPydanticModel(BaseModel):
 5 |     str_value: str
 6 |     bool_values: List[bool]
 7 | 
 8 | # Example usage:
 9 | data = {
10 |     "str_value": "Hello, Pydantic!",
11 |     "bool_values": [True, False, True, False, True]
12 | }
13 | 
14 | my_model = MyPydanticModel(**data)
15 | print(my_model)
16 | 


--------------------------------------------------------------------------------
/test_files/tools.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from pathlib import Path
 3 | script_dir = Path(__file__).resolve().parent
 4 | project_root = script_dir.parent
 5 | sys.path.append(str(project_root))
 6 | 
 7 | from langchain.prompts import PromptTemplate
 8 | from langchain.output_parsers import PydanticOutputParser
 9 | 
10 | from llama_index import VectorStoreIndex, ServiceContext, StorageContext, SimpleDirectoryReader
11 | from llama_index.vector_stores import WeaviateVectorStore, FaissVectorStore, ChromaVectorStore
12 | from llama_index.embeddings import OpenAIEmbedding
13 | from llama_index.tools import QueryEngineTool, ToolMetadata
14 | from llama_index.query_engine import SubQuestionQueryEngine
15 | 
16 | 
17 | from weaviate.embedded import EmbeddedOptions
18 | 
19 | import streamlit as st
20 | import os
21 | import openai
22 | 
23 | 
24 | OPENAI_API_KEY = st.secrets["openai_api_key"]
25 | 
26 | 
27 | os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
28 | openai.api_key = os.environ["OPENAI_API_KEY"]
29 | 
30 | 
31 | query = """
32 | You are given the task of generating insights for Fiscal Year Highlights from the annual report of the company. 
33 | 
34 | Given below is the output format, which has the subsections.
35 | Write atleast 50 words for each subsection.
36 | Incase you don't have enough info you can just write: No information available
37 | ---
38 | The output should be formatted as a JSON instance that conforms to the JSON schema below.
39 | 
40 | As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
41 | the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.
42 | 
43 | Here is the output schema:
44 | ```
45 | {"properties": {"performance_highlights": {"title": "Performance Highlights", "description": "Key performance metrics and achievements over the fiscal year.", "type": "string"}, "major_events": {"title": "Major Events", "description": "Highlight of significant events, acquisitions, or strategic shifts that occurred during the year.", "type": "string"}, "challenges_encountered": {"title": "Challenges Encountered", "description": "Challenges the company faced during the year and how they managed or overcame them.", "type": "string"}, "milestone_achievements": {"title": "Milestone Achievements", "description": "Milestones achieved in terms of projects, expansions, or any other notable accomplishments.", "type": "string"}}, "required": ["performance_highlights", "major_events", "challenges_encountered", "milestone_achievements"]}
46 | ```
47 | ---
48 | """
49 | 
50 | report = SimpleDirectoryReader(
51 |     input_files=["data/meta/meta.pdf"]
52 | ).load_data()
53 | 
54 | index = VectorStoreIndex.from_documents(report)
55 | engine = index.as_query_engine(similarity_top_k=3)
56 | 
57 | query_engine_tools = [
58 |     QueryEngineTool(
59 |         query_engine=engine,
60 |         metadata=ToolMetadata(
61 |             name="Annual Report",
62 |             description="Provides information about Apple Inc. from its annual report.",
63 |         ),
64 |     ),
65 | ]
66 | 
67 | s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)
68 | 
69 | 
70 | response = s_engine.query(query)
71 | print(response)
72 | 


--------------------------------------------------------------------------------