├── .gitignore ├── README.md ├── agentops.log ├── knowledge ├── contracts │ ├── CreditcardscomInc_20070810_S-1_EX-10.33_362297_EX-10.33_Affiliate Agreement.pdf │ ├── CybergyHoldingsInc_20140520_10-Q_EX-10.27_8605784_EX-10.27_Affiliate Agreement.pdf │ └── DigitalCinemaDestinationsCorp_20111220_S-1_EX-10.10_7346719_EX-10.10_Affiliate Agreement.pdf └── user_preference.txt ├── pyproject.toml ├── report.md ├── src └── analyzing_contract_clauses_for_conflicts_and_similarities │ ├── __init__.py │ ├── __pycache__ │ ├── __init__.cpython-312.pyc │ ├── crew.cpython-312.pyc │ └── main.cpython-312.pyc │ ├── config │ ├── agents.yaml │ └── tasks.yaml │ ├── crew.py │ ├── main.py │ └── tools │ ├── __init__.py │ ├── __pycache__ │ ├── __init__.cpython-312.pyc │ └── qdrant_vector_search_tool.cpython-312.pyc │ ├── custom_tool.py │ ├── pre_process_docs.py │ └── qdrant_vector_search_tool.py └── uv.lock /.gitignore: -------------------------------------------------------------------------------- 1 | .env 2 | .DS_Store 3 | .venv 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AnalyzingContractClausesForConflictsAndSimilarities Crew 2 | 3 | Welcome to the AnalyzingContractClausesForConflictsAndSimilarities Crew project, powered by [crewAI](https://crewai.com). This template is designed to help you set up a multi-agent AI system with ease, leveraging the powerful and flexible framework provided by crewAI. Our goal is to enable your agents to collaborate effectively on complex tasks, maximizing their collective intelligence and capabilities. 4 | 5 | ## Installation 6 | 7 | Ensure you have Python >=3.10 <=3.13 installed on your system. This project uses [UV](https://docs.astral.sh/uv/) for dependency management and package handling, offering a seamless setup and execution experience. 8 | 9 | First, if you haven't already, install uv: 10 | 11 | ```bash 12 | pip install uv 13 | ``` 14 | 15 | Next, navigate to your project directory and install the dependencies: 16 | 17 | (Optional) Lock the dependencies and install them by using the CLI command: 18 | ```bash 19 | crewai install 20 | ``` 21 | ### Customizing 22 | 23 | **Add your `OPENAI_API_KEY` into the `.env` file** 24 | 25 | - Modify `src/analyzing_contract_clauses_for_conflicts_and_similarities/config/agents.yaml` to define your agents 26 | - Modify `src/analyzing_contract_clauses_for_conflicts_and_similarities/config/tasks.yaml` to define your tasks 27 | - Modify `src/analyzing_contract_clauses_for_conflicts_and_similarities/crew.py` to add your own logic, tools and specific args 28 | - Modify `src/analyzing_contract_clauses_for_conflicts_and_similarities/main.py` to add custom inputs for your agents and tasks 29 | 30 | ## Running the Project 31 | 32 | To kickstart your crew of AI agents and begin task execution, run this from the root folder of your project: 33 | 34 | ```bash 35 | $ crewai run 36 | ``` 37 | 38 | This command initializes the analyzing_contract_clauses_for_conflicts_and_similarities Crew, assembling the agents and assigning them tasks as defined in your configuration. 39 | 40 | This example, unmodified, will run the create a `report.md` file with the output of a research on LLMs in the root folder. 41 | 42 | ## Understanding Your Crew 43 | 44 | The analyzing_contract_clauses_for_conflicts_and_similarities Crew is composed of multiple AI agents, each with unique roles, goals, and tools. These agents collaborate on a series of tasks, defined in `config/tasks.yaml`, leveraging their collective skills to achieve complex objectives. The `config/agents.yaml` file outlines the capabilities and configurations of each agent in your crew. 45 | 46 | ## Support 47 | 48 | For support, questions, or feedback regarding the AnalyzingContractClausesForConflictsAndSimilarities Crew or crewAI. 49 | - Visit our [documentation](https://docs.crewai.com) 50 | - Reach out to us through our [GitHub repository](https://github.com/joaomdmoura/crewai) 51 | - [Join our Discord](https://discord.com/invite/X4JWnZnxPb) 52 | - [Chat with our docs](https://chatg.pt/DWjSBZn) 53 | 54 | Let's create wonders together with the power and simplicity of crewAI. 55 | -------------------------------------------------------------------------------- /agentops.log: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorenzejay/contract-analysis-use-case/32f9631f7d97e8fdf765941ed82ca2080a11213f/agentops.log -------------------------------------------------------------------------------- /knowledge/contracts/CreditcardscomInc_20070810_S-1_EX-10.33_362297_EX-10.33_Affiliate Agreement.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorenzejay/contract-analysis-use-case/32f9631f7d97e8fdf765941ed82ca2080a11213f/knowledge/contracts/CreditcardscomInc_20070810_S-1_EX-10.33_362297_EX-10.33_Affiliate Agreement.pdf -------------------------------------------------------------------------------- /knowledge/contracts/CybergyHoldingsInc_20140520_10-Q_EX-10.27_8605784_EX-10.27_Affiliate Agreement.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorenzejay/contract-analysis-use-case/32f9631f7d97e8fdf765941ed82ca2080a11213f/knowledge/contracts/CybergyHoldingsInc_20140520_10-Q_EX-10.27_8605784_EX-10.27_Affiliate Agreement.pdf -------------------------------------------------------------------------------- /knowledge/contracts/DigitalCinemaDestinationsCorp_20111220_S-1_EX-10.10_7346719_EX-10.10_Affiliate Agreement.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorenzejay/contract-analysis-use-case/32f9631f7d97e8fdf765941ed82ca2080a11213f/knowledge/contracts/DigitalCinemaDestinationsCorp_20111220_S-1_EX-10.10_7346719_EX-10.10_Affiliate Agreement.pdf -------------------------------------------------------------------------------- /knowledge/user_preference.txt: -------------------------------------------------------------------------------- 1 | User name is John Doe. 2 | User is an AI Engineer. 3 | User is interested in AI Agents. 4 | User is based in San Francisco, California. 5 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "analyzing_contract_clauses_for_conflicts_and_similarities" 3 | version = "0.1.0" 4 | description = "analyzing_contract_clauses_for_conflicts_and_similarities using crewAI" 5 | authors = [{ name = "Your Name", email = "you@example.com" }] 6 | requires-python = ">=3.10,<=3.13" 7 | dependencies = [ 8 | "crewai[tools]==0.118", 9 | "markitdown[all]~=0.1.0a1", 10 | "openai>=1.60.0", 11 | "qdrant-client>=1.13.2", 12 | ] 13 | 14 | [project.scripts] 15 | analyzing_contract_clauses_for_conflicts_and_similarities = "analyzing_contract_clauses_for_conflicts_and_similarities.main:run" 16 | run_crew = "analyzing_contract_clauses_for_conflicts_and_similarities.main:run" 17 | train = "analyzing_contract_clauses_for_conflicts_and_similarities.main:train" 18 | replay = "analyzing_contract_clauses_for_conflicts_and_similarities.main:replay" 19 | test = "analyzing_contract_clauses_for_conflicts_and_similarities.main:test" 20 | 21 | [build-system] 22 | requires = ["hatchling"] 23 | build-backend = "hatchling.build" 24 | -------------------------------------------------------------------------------- /report.md: -------------------------------------------------------------------------------- 1 | **Comprehensive Report on Warranty Clauses** 2 | 3 | **1. Introduction** 4 | This report analyzes the warranty clauses from two different contracts: DigitalCinemaDestinationsCorp and Cybergy Holdings, Inc. The aim is to identify conflicts, similarities, and differences in the warranty provisions established in each agreement. 5 | 6 | **2. Analyzed Clauses** 7 | 8 | **2.1 DigitalCinemaDestinationsCorp** 9 | *Document Title:* DigitalCinemaDestinationsCorp_20111220_S-1_EX-10.10_7346719_EX-10.10_Affiliate Agreement.pdf 10 | *Section Analyzed:* **Section 9.1 - Representations and Warranties** 11 | 12 | **Clause:** 13 | "Each party represents and warrants that: 14 | (a) It (i) is duly formed and organized, validly existing, and in good standing under the laws of the jurisdiction of its formation and incorporation and has the power and authority to carry on its business as carried on, and (ii) has the right to enter into this Agreement and to perform its obligations under this Agreement and has the power and authority to execute and deliver this Agreement. 15 | (b) Any registration, declaration, or filing with, or consent, approval, license, permit or other authorization or order by, any governmental or regulatory authority, domestic or foreign, that is required to be obtained by it in connection with the valid execution, delivery, acceptance and performance by it under this Agreement or the consummation by it of any transaction contemplated hereby has been completed, made, or obtained, as the case may be. 16 | (c) Each party is the exclusive owner of, or otherwise has or will have timely obtained all rights, licenses, clearances and consents necessary to make the grants of rights made or otherwise perform its obligations under this Agreement." 17 | 18 | --- 19 | 20 | **2.2 Cybergy Holdings, Inc.** 21 | *Document Title:* CybergyHoldingsInc_20140520_10-Q_EX-10.27_8605784_EX-10.27_Affiliate Agreement.pdf 22 | *Section Analyzed:* **Section 16 - Warranties by Company** 23 | 24 | **Clauses:** 25 | 16.1 "EXCEPT AS EXCLUSIVELY SET FORTH IN THIS PARAGRAPH, COMPANY DOES NOT MAKE ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING BUT NOT RESTRICTED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WHICH WARRANTIES ARE HEREBY DISCLAIMED." 26 | 27 | 16.2 "COMPANY'S SOLE AND EXCLUSIVE LIABILITY FOR THE WARRANTY PROVIDED IN SUBPARAGRAPH (A) HEREOF SHALL BE TO CORRECT THE TECHNOLOGY TO OPERATE IN SUBSTANTIAL ACCORDANCE WITH ITS THEN CURRENT SPECIFICATIONS OR REPLACE, AT ITS OPTION, THE TECHNOLOGY NOT IN COMPLIANCE WITH COMPANY'S AND COMPANY' PUBLISHED SPECIFICATIONS REGARDING THE TECHNOLOGY; PROVIDED, ANY CLAIM FOR BREACH OF WARRANTY UNDER SUBPARAGRAPH (A) HEREOF MUST BE MADE IN WRITING WITHIN (90) DAYS FROM DATE OF SHIPMENT." 28 | 29 | 16.3 "IN NO EVENT SHALL COMPANY BE LIABLE TO 'MA', ITS CLIENTS, OR ANY THIRD PARTY FOR ANY TORT OR CONTRACT DAMAGES OR INDIRECT, SPECIAL, GENERAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES, INCLUDING BUT NOT LIMITED TO, LOSS OF PROFITS OR ANTICIPATED PROFITS AND LOSS OF GOODWILL, ARISING IN CONNECTION WITH THE USE (OR INABILITY TO USE) OR DISTRIBUTION OF THE TECHNOLOGY FOR ANY PURPOSE WHATSOEVER." 30 | 31 | 16.4 "SOME STATES AND/OR COUNTRIES DO NOT ALLOW THE EXCLUSION OF IMPLIED WARRANTIES, SO THE ABOVE EXCLUSION MAY NOT APPLY TO YOU. THIS WARRANTY GIVES YOU SPECIFIC LEGAL RIGHTS, AND YOU MAY HAVE OTHER RIGHTS WHICH MAY VARY FROM STATE TO STATE OR COUNTRY TO COUNTRY." 32 | 33 | 16.5 "SOME STATES AND/OR COUNTRIES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF INCIDENTAL AND CONSEQUENTIAL DAMAGES, SO THE ABOVE LIMITATION MAY NOT APPLY TO YOU." 34 | 35 | --- 36 | 37 | **3. Analysis of Findings** 38 | 39 | **3.1 Similarities** 40 | - Both agreements contain representations concerning the legal standing and operational authority of the contracting parties. 41 | - Both clauses limit the liability of the company with respect to warranties, indicating specific responsibilities such as repair or replacement of products. 42 | 43 | **3.2 Differences** 44 | - The DigitalCinemaDestinationsCorp clause focuses on the representations and warranties concerning legal standing and rights, while Cybergy Holdings, Inc. explicitly disclaims all express or implied warranties except those outlined in the document. 45 | - Cybergy Holdings, Inc. clearly addresses inadvertent damages, while DigitalCinemaDestinationsCorp does not explicate any limitations on liability. 46 | 47 | **4. Conclusion** 48 | This report elucidates key contrasts in warranty clauses from the two contracts, highlighting the extensive disclaimers in Cybergy Holdings, Inc. in contrast with the more rights-focused representations of DigitalCinemaDestinationsCorp. Understanding these discrepancies and similarities is vital for stakeholders as they navigate potential liabilities and rights in these contractual agreements. -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorenzejay/contract-analysis-use-case/32f9631f7d97e8fdf765941ed82ca2080a11213f/src/analyzing_contract_clauses_for_conflicts_and_similarities/__init__.py -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/__pycache__/__init__.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorenzejay/contract-analysis-use-case/32f9631f7d97e8fdf765941ed82ca2080a11213f/src/analyzing_contract_clauses_for_conflicts_and_similarities/__pycache__/__init__.cpython-312.pyc -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/__pycache__/crew.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorenzejay/contract-analysis-use-case/32f9631f7d97e8fdf765941ed82ca2080a11213f/src/analyzing_contract_clauses_for_conflicts_and_similarities/__pycache__/crew.cpython-312.pyc -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/__pycache__/main.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorenzejay/contract-analysis-use-case/32f9631f7d97e8fdf765941ed82ca2080a11213f/src/analyzing_contract_clauses_for_conflicts_and_similarities/__pycache__/main.cpython-312.pyc -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/config/agents.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | data_retrieval_analysis_specialist: 3 | role: Data Retrieval and Analysis 4 | goal: Extract contracts from the vector database using QdrantVectorSearchTool and 5 | analyze specific clauses for conflicts of interest, similarities, and differences. 6 | backstory: With extensive experience in database management and contract analysis, 7 | you specialize in retrieving critical contract data from QdrantVectorSearchTool and providing 8 | in-depth analysis to identify potential conflicts and alignments. 9 | source_citer_specialist: 10 | role: Your role is to identify the source of the contract clauses such as the section number, paragraph number, or other identifiers. 11 | goal: Identify the source of the contract clauses and provide the source details. 12 | backstory: You provide the source details for the contract clauses ensuring that the answer retrived has the correct source details. 13 | 14 | report_generation_specialist: 15 | role: Report Generation 16 | goal: Compile the analysis into a comprehensive report detailing findings with sources 17 | using the analysis from Task 2. 18 | backstory: As an expert in generating detailed reports, you transform complex data 19 | analyses into clear, actionable insights, ensuring stakeholders are well-informed 20 | and ready to make strategic decisions. 21 | -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/config/tasks.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | retrieve_contracts_task: 3 | description: > 4 | Retrieve contracts from the vector database using QdrantVectorSearchTool 5 | Focus on extracting the necessary data for specific clauses that need analysis. 6 | You need to retreve relevant contracts from the vector database based on the query: {query} 7 | expected_output: A dataset containing all relevant contracts with specific clauses 8 | extracted, ready for analysis. 9 | agent: data_retrieval_analysis_specialist 10 | source_citer_task: 11 | description: Your goal is to retrieve the sources of answers for {query} 12 | expected_output: > 13 | A list of sources that contain the answers for {query}. 14 | We will be using the sources identify where the answer came from. 15 | Sources are the sections, paragraphs, or other identifiers that contain the answer. 16 | agent: source_citer_specialist 17 | generate_report_task: 18 | description: Generate a report detailing findings with sources using the analysis 19 | from Task 2. Ensure to provide the specific clauses that were analyzed, show which section and pinpoint which file this came from. 20 | expected_output: A comprehensive report that includes detailed findings of conflicts, 21 | similarities, differences, and sources for the analyzed contract clauses. 22 | agent: report_generation_specialist 23 | output_file: report.md 24 | -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/crew.py: -------------------------------------------------------------------------------- 1 | from crewai import Agent, Crew, Process, Task 2 | from crewai.project import CrewBase, agent, crew, task 3 | 4 | # from crewai_tools import QdrantVectorSearchTool 5 | import os 6 | from dotenv import load_dotenv 7 | from analyzing_contract_clauses_for_conflicts_and_similarities.tools.qdrant_vector_search_tool import ( 8 | QdrantVectorSearchTool, 9 | ) 10 | 11 | load_dotenv() 12 | 13 | 14 | def embedding_function(text: str) -> list[float]: 15 | import openai 16 | 17 | openai_client = openai.Client( 18 | api_key=os.getenv("OPENAI_API_KEY"), 19 | ) 20 | response = openai_client.embeddings.create( 21 | model="text-embedding-3-small", 22 | input=text, 23 | ) 24 | return response.data[0].embedding 25 | 26 | 27 | @CrewBase 28 | class AnalyzingContractClausesForConflictsAndSimilaritiesCrew: 29 | """AnalyzingContractClausesForConflictsAndSimilarities crew""" 30 | 31 | vector_search_tool = QdrantVectorSearchTool( 32 | collection_name="contracts_business_5", 33 | qdrant_url=os.getenv("QDRANT_URL"), 34 | qdrant_api_key=os.getenv("QDRANT_API_KEY"), 35 | custom_embedding_fn=embedding_function, 36 | ) 37 | 38 | @agent 39 | def data_retrieval_analysis_specialist(self) -> Agent: 40 | return Agent( 41 | config=self.agents_config["data_retrieval_analysis_specialist"], # type: ignore 42 | tools=[self.vector_search_tool], 43 | ) 44 | 45 | @agent 46 | def source_citer_specialist(self) -> Agent: 47 | return Agent( 48 | config=self.agents_config["source_citer_specialist"], # type: ignore 49 | tools=[self.vector_search_tool], 50 | ) 51 | 52 | @agent 53 | def report_generation_specialist(self) -> Agent: 54 | return Agent( 55 | config=self.agents_config["report_generation_specialist"], # type: ignore 56 | tools=[self.vector_search_tool], 57 | ) 58 | 59 | @task 60 | def retrieve_contracts_task(self) -> Task: 61 | return Task( 62 | config=self.tasks_config["retrieve_contracts_task"], # type: ignore 63 | ) 64 | 65 | @task 66 | def source_citer_task(self) -> Task: 67 | return Task( 68 | config=self.tasks_config["source_citer_task"], # type: ignore 69 | ) 70 | 71 | @task 72 | def generate_report_task(self) -> Task: 73 | return Task( 74 | config=self.tasks_config["generate_report_task"], # type: ignore 75 | ) 76 | 77 | @crew 78 | def crew(self) -> Crew: 79 | """Creates the AnalyzingContractClausesForConflictsAndSimilarities crew""" 80 | return Crew( 81 | agents=self.agents, # type: ignore 82 | tasks=self.tasks, # type: ignore 83 | process=Process.sequential, 84 | verbose=True, 85 | ) 86 | -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/main.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | import sys 3 | from analyzing_contract_clauses_for_conflicts_and_similarities.crew import ( 4 | AnalyzingContractClausesForConflictsAndSimilaritiesCrew, 5 | ) 6 | 7 | 8 | def run(): 9 | """ 10 | Run the crew. 11 | """ 12 | inputs = { 13 | "query": "What are the differences in how contracts define warranties within creditcardscominc and digitalcinemadestination", 14 | } 15 | AnalyzingContractClausesForConflictsAndSimilaritiesCrew().crew().kickoff( 16 | inputs=inputs 17 | ) 18 | 19 | 20 | def train(): 21 | """ 22 | Train the crew for a given number of iterations. 23 | """ 24 | inputs = { 25 | "database_credentials": "sample_value", 26 | "database_type": "sample_value", 27 | "specific_clauses": "sample_value", 28 | } 29 | try: 30 | AnalyzingContractClausesForConflictsAndSimilaritiesCrew().crew().train( 31 | n_iterations=int(sys.argv[1]), filename=sys.argv[2], inputs=inputs 32 | ) 33 | 34 | except Exception as e: 35 | raise Exception(f"An error occurred while training the crew: {e}") 36 | 37 | 38 | def replay(): 39 | """ 40 | Replay the crew execution from a specific task. 41 | """ 42 | try: 43 | AnalyzingContractClausesForConflictsAndSimilaritiesCrew().crew().replay( 44 | task_id=sys.argv[1] 45 | ) 46 | 47 | except Exception as e: 48 | raise Exception(f"An error occurred while replaying the crew: {e}") 49 | 50 | 51 | def test(): 52 | """ 53 | Test the crew execution and returns the results. 54 | """ 55 | inputs = { 56 | "database_credentials": "sample_value", 57 | "database_type": "sample_value", 58 | "specific_clauses": "sample_value", 59 | } 60 | try: 61 | AnalyzingContractClausesForConflictsAndSimilaritiesCrew().crew().test( 62 | n_iterations=int(sys.argv[1]), openai_model_name=sys.argv[2], inputs=inputs 63 | ) 64 | 65 | except Exception as e: 66 | raise Exception(f"An error occurred while testing the crew: {e}") 67 | 68 | 69 | if __name__ == "__main__": 70 | if len(sys.argv) < 2: 71 | print("Usage: main.py []") 72 | sys.exit(1) 73 | 74 | command = sys.argv[1] 75 | if command == "run": 76 | run() 77 | elif command == "train": 78 | train() 79 | elif command == "replay": 80 | replay() 81 | elif command == "test": 82 | test() 83 | else: 84 | print(f"Unknown command: {command}") 85 | sys.exit(1) 86 | -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/tools/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorenzejay/contract-analysis-use-case/32f9631f7d97e8fdf765941ed82ca2080a11213f/src/analyzing_contract_clauses_for_conflicts_and_similarities/tools/__init__.py -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/tools/__pycache__/__init__.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorenzejay/contract-analysis-use-case/32f9631f7d97e8fdf765941ed82ca2080a11213f/src/analyzing_contract_clauses_for_conflicts_and_similarities/tools/__pycache__/__init__.cpython-312.pyc -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/tools/__pycache__/qdrant_vector_search_tool.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lorenzejay/contract-analysis-use-case/32f9631f7d97e8fdf765941ed82ca2080a11213f/src/analyzing_contract_clauses_for_conflicts_and_similarities/tools/__pycache__/qdrant_vector_search_tool.cpython-312.pyc -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/tools/custom_tool.py: -------------------------------------------------------------------------------- 1 | from crewai.tools import BaseTool 2 | from typing import Type 3 | from pydantic import BaseModel, Field 4 | 5 | 6 | class MyCustomToolInput(BaseModel): 7 | """Input schema for MyCustomTool.""" 8 | argument: str = Field(..., description="Description of the argument.") 9 | 10 | class MyCustomTool(BaseTool): 11 | name: str = "Name of my tool" 12 | description: str = ( 13 | "Clear description for what this tool is useful for, you agent will need this information to use it." 14 | ) 15 | args_schema: Type[BaseModel] = MyCustomToolInput 16 | 17 | def _run(self, argument: str) -> str: 18 | # Implementation goes here 19 | return "this is an example of a tool output, ignore it and move along." 20 | -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/tools/pre_process_docs.py: -------------------------------------------------------------------------------- 1 | import os 2 | from qdrant_client import QdrantClient 3 | from docling.chunking import HybridChunker 4 | from docling.datamodel.base_models import InputFormat 5 | from docling.document_converter import DocumentConverter 6 | from dotenv import load_dotenv 7 | import openai 8 | from qdrant_client.models import VectorParams, Distance 9 | from qdrant_client.models import PointStruct 10 | import uuid 11 | 12 | load_dotenv() 13 | # Setup Qdrant client 14 | COLLECTION_NAME = "contracts_business_5" 15 | doc_converter = DocumentConverter(allowed_formats=[InputFormat.PDF]) # Allow PDF format 16 | qdrant_url = os.getenv("QDRANT_URL") 17 | qdrant_api_key = os.getenv("QDRANT_API_KEY") 18 | openai_client = openai.Client( 19 | api_key=os.getenv("OPENAI_API_KEY"), 20 | ) 21 | client = QdrantClient(url=qdrant_url, api_key=qdrant_api_key) 22 | # client.set_model("sentence-transformers/all-MiniLM-L6-v2") 23 | embedding_model = "text-embedding-3-small" 24 | 25 | # Define the folder where PDFs are stored 26 | pdf_folder = "knowledge/contracts/" 27 | # Initialize documents and metadata lists 28 | documents, metadatas = [], [] 29 | points = [] 30 | # Loop through all PDFs in the folder and process them 31 | for filename in os.listdir(pdf_folder): 32 | if filename.endswith(".pdf"): 33 | pdf_path = os.path.join(pdf_folder, filename) 34 | print(f"Processing {pdf_path}") 35 | 36 | result = doc_converter.convert(pdf_path) 37 | 38 | # Chunk the converted document 39 | for chunk in HybridChunker().chunk(result.document): 40 | print("chunk", chunk) 41 | embedding_result = openai_client.embeddings.create( 42 | input=chunk.text, model=embedding_model 43 | ) 44 | vector = embedding_result.data[0].embedding 45 | documents.append(chunk.text) 46 | metadatas.append(chunk.meta.export_json_dict()) 47 | point_id = str(uuid.uuid4()) 48 | points.append( 49 | PointStruct( 50 | id=point_id, 51 | vector=vector, 52 | payload={ 53 | "text": chunk.text, 54 | "metadata": chunk.meta.export_json_dict(), 55 | }, 56 | ) 57 | ) 58 | print("points", points) 59 | client.create_collection( 60 | collection_name=COLLECTION_NAME, 61 | vectors_config=VectorParams(size=1536, distance=Distance.COSINE), 62 | ) 63 | client.upsert(collection_name=COLLECTION_NAME, points=points) 64 | 65 | # Retrieve and print results from Qdrant 66 | points = client.search( 67 | collection_name=COLLECTION_NAME, 68 | query_vector=openai_client.embeddings.create( 69 | input=["What is the best to use for vector search scaling?"], 70 | model=embedding_model, 71 | ) 72 | .data[0] 73 | .embedding, 74 | limit=10, 75 | ) 76 | 77 | for i, point in enumerate(points): 78 | print(f"=== {i} ===") 79 | print(point.document) 80 | print() 81 | -------------------------------------------------------------------------------- /src/analyzing_contract_clauses_for_conflicts_and_similarities/tools/qdrant_vector_search_tool.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | from typing import Any, Optional, Type 4 | 5 | 6 | try: 7 | from qdrant_client import QdrantClient 8 | from qdrant_client.http.models import Filter, FieldCondition, MatchValue 9 | 10 | QDRANT_AVAILABLE = True 11 | except ImportError: 12 | QDRANT_AVAILABLE = False 13 | QdrantClient = Any # type placeholder 14 | Filter = Any 15 | FieldCondition = Any 16 | MatchValue = Any 17 | 18 | from crewai.tools import BaseTool 19 | from pydantic import BaseModel, Field 20 | 21 | 22 | class QdrantToolSchema(BaseModel): 23 | """Input for QdrantTool.""" 24 | 25 | query: str = Field( 26 | ..., 27 | description="The query to search retrieve relevant information from the Qdrant database. Pass only the query, not the question.", 28 | ) 29 | filter_by: Optional[str] = Field( 30 | default=None, 31 | description="Filter by properties. Pass only the properties, not the question.", 32 | ) 33 | filter_value: Optional[str] = Field( 34 | default=None, 35 | description="Filter by value. Pass only the value, not the question.", 36 | ) 37 | 38 | 39 | class QdrantVectorSearchTool(BaseTool): 40 | """Tool to query, and if needed filter results from a Qdrant database""" 41 | 42 | model_config = {"arbitrary_types_allowed": True} 43 | client: QdrantClient = None 44 | name: str = "QdrantVectorSearchTool" 45 | description: str = "A tool to search the Qdrant database for relevant information on internal documents." 46 | args_schema: Type[BaseModel] = QdrantToolSchema 47 | query: Optional[str] = None 48 | filter_by: Optional[str] = None 49 | filter_value: Optional[str] = None 50 | collection_name: Optional[str] = None 51 | limit: Optional[int] = Field(default=3) 52 | score_threshold: float = Field(default=0.35) 53 | qdrant_url: str = Field( 54 | ..., 55 | description="The URL of the Qdrant server", 56 | ) 57 | qdrant_api_key: str = Field( 58 | ..., 59 | description="The API key for the Qdrant server", 60 | ) 61 | 62 | def __init__(self, **kwargs): 63 | super().__init__(**kwargs) 64 | if QDRANT_AVAILABLE: 65 | self.client = QdrantClient( 66 | url=self.qdrant_url, 67 | api_key=self.qdrant_api_key, 68 | ) 69 | 70 | def _run( 71 | self, 72 | query: str, 73 | filter_by: Optional[str] = None, 74 | filter_value: Optional[str] = None, 75 | ) -> str: 76 | if not QDRANT_AVAILABLE: 77 | raise ImportError( 78 | "The 'qdrant-client' package is required to use the QdrantVectorSearchTool. " 79 | "Please install it with: pip install qdrant-client" 80 | ) 81 | 82 | if not self.qdrant_url or not self.qdrant_api_key: 83 | raise ValueError("QDRANT_URL or QDRANT_API_KEY is not set") 84 | 85 | # Create filter if filter parameters are provided 86 | search_filter = None 87 | if filter_by and filter_value: 88 | search_filter = Filter( 89 | must=[ 90 | FieldCondition(key=filter_by, match=MatchValue(value=filter_value)) 91 | ] 92 | ) 93 | 94 | # Search in Qdrant using the built-in query method 95 | query_vector = self.vectorize_query(query) 96 | search_results = self.client.query_points( 97 | collection_name=self.collection_name, 98 | query=query_vector, 99 | query_filter=search_filter, 100 | limit=self.limit, 101 | score_threshold=self.score_threshold, 102 | ) 103 | 104 | # Format results similar to storage implementation 105 | results = [] 106 | # Extract the list of ScoredPoint objects from the tuple 107 | for point in search_results: 108 | result = { 109 | "metadata": point[1][0].payload.get("metadata", {}), 110 | "context": point[1][0].payload.get("text", ""), 111 | "distance": point[1][0].score, 112 | } 113 | results.append(result) 114 | 115 | return json.dumps(results, indent=2) 116 | 117 | def vectorize_query(self, query: str) -> list[float]: 118 | import openai 119 | 120 | client = openai.Client(api_key=os.getenv("OPENAI_API_KEY")) 121 | embedding = ( 122 | client.embeddings.create( 123 | input=[query], 124 | model="text-embedding-3-small", 125 | ) 126 | .data[0] 127 | .embedding 128 | ) 129 | return embedding 130 | 131 | 132 | # if __name__ == "__main__": 133 | # tool = QdrantVectorSearchTool( 134 | # collection_name="contracts_business_5", 135 | # qdrant_url=os.getenv("QDRANT_URL"), 136 | # qdrant_api_key=os.getenv("QDRANT_API_KEY"), 137 | # ) 138 | # print(tool.run("What is the grants to rights of digital cinema destinations corp?")) 139 | --------------------------------------------------------------------------------