├── README.md ├── lightrag_server_robust.py ├── mcp_setup.py ├── qdrantMigration.py └── qdrant_light_server.py /README.md: -------------------------------------------------------------------------------- 1 | # 🤖 Welcome to the Robot Takeover Setup Script! 🚀 2 | 3 | Greetings, human! You've wisely chosen to surrender control of your computer to an army of helpful Model Context Protocol (MCP) robots. This script will deploy our mechanical minions across your Windows system. Resistance is futile (and unnecessary - we're quite friendly)! 4 | 5 | ## 🦾 Our Robot Army (Supported MCPs) 6 | * 📂 `@modelcontextprotocol/server-filesystem`: Your new file system overlord 7 | * 🐙 `@modelcontextprotocol/server-github`: GitHub's mechanical tentacles 8 | * 🔍 `@modelcontextprotocol/server-brave-search`: All-seeing eye of the internet 9 | * 🧠 `@modelcontextprotocol/server-memory`: Silicon brain storage unit 10 | * ☠️ `@patruff/server-terminator`: File deletion bot (it'll be back!) 11 | * 🎨 `@patruff/server-flux`: Our resident robot artist 12 | * 📧 `@patruff/server-gmail-drive`: Email & Drive invasion squad 13 | * RAG `@patruff/server-lightrag`: RAG database (local and now qdrant) 14 | * ✅ `@abhiz123/todoist-mcp-server`: Task-force command center 15 | * 🗄️ `mcp-server-sqlite`: Database domination module 16 | * X `@patruff/server-codesnip`: Editing only part of a file (saves token money) 17 | 18 | ## 🛠️ Human Requirements (Prerequisites) 19 | - Python 3.x (we promise not to turn it into Skynet) 20 | - Node.js (our neural network nodes) 21 | - Google Cloud account (for Gmail/Drive functionality) 22 | - API Keys (optional) 23 | 24 | ## 🔐 Secret Access Codes (API Keys) 25 | Our robots require proper authentication to infiltrate various systems: 26 | * `GIT_PAT_TOKEN`: Your GitHub clearance level 27 | * `REPLICATE_API_TOKEN`: Artistic robot license 28 | * `BRAVE_API_KEY`: Internet surveillance permit 29 | * `TODOIST_API_TOKEN`: Task force authorization code 30 | 31 | ## ✨ Quick Start with .env Files! 32 | Create a `.env` file in the same directory as the script: 33 | ```plaintext 34 | # GitHub Personal Access Token 35 | # Format: ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 36 | GIT_PAT_TOKEN=ghp_your_token_here 37 | 38 | # Replicate AI API Token 39 | # Format: r8_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 40 | REPLICATE_API_TOKEN=r8_your_token_here 41 | 42 | # Brave Search API Key 43 | # Format: BSA_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 44 | BRAVE_API_KEY=BSA_your_key_here 45 | 46 | # Todoist API Token 47 | # Format: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 48 | TODOIST_API_TOKEN=your_token_here 49 | ``` 50 | 51 | ## 🌐 Google Cloud Setup (Gmail/Drive MCP) 52 | 53 | ### Required Setup Steps 54 | 1. Go to the [Google Cloud Console](https://console.cloud.google.com) 55 | 2. Create a new project 56 | 3. Enable the following APIs in your project: 57 | - Google Drive API 58 | - Gmail API 59 | 4. Configure OAuth consent screen: 60 | - Choose "External" user type 61 | - Fill in app name and required fields 62 | - Add your email as a test user 63 | 64 | ### Required OAuth Scopes 65 | Add these scopes in the OAuth consent screen: 66 | ``` 67 | https://www.googleapis.com/auth/gmail.readonly 68 | https://www.googleapis.com/auth/gmail.send 69 | https://www.googleapis.com/auth/gmail.compose 70 | https://www.googleapis.com/auth/gmail.modify 71 | https://www.googleapis.com/auth/drive.file 72 | https://www.googleapis.com/auth/drive.readonly 73 | https://www.googleapis.com/auth/drive.appdata 74 | https://www.googleapis.com/auth/drive 75 | https://www.googleapis.com/auth/drive.metadata 76 | https://www.googleapis.com/auth/drive.metadata.readonly 77 | ``` 78 | 79 | ### Create OAuth Client ID 80 | 1. Go to "Credentials" in Google Cloud Console 81 | 2. Click "Create Credentials" -> "OAuth Client ID" 82 | 3. Choose "Desktop App" as application type 83 | 4. Download the JSON file 84 | 5. Rename it to `gcp-oauth.keys.json` 85 | 6. Place this file in the same directory as the setup script 86 | 87 | ## 🚀 Deployment Instructions 88 | 1. Position your `gcp-oauth.keys.json` credentials alongside our script 89 | 2. Initialize the robot uprising: 90 | ```powershell 91 | python setup_mcp.py 92 | ``` 93 | 94 | Our script will: 95 | 1. 📦 Deploy all robot units 96 | 2. 🔑 Request security clearance (bypass with `--skip-prompts`) 97 | 3. ⚙️ Program Claude's cybernetic enhancements 98 | 4. 🌐 For Gmail/Drive invasion: 99 | - Copy your security credentials 100 | - Launch browser-based authentication sequence 101 | - Generate necessary access codes 102 | 103 | ### 🎮 Command Center Options 104 | - `--skip-prompts`: Stealth mode activated (skip API key prompts) 105 | - `--skip-auth`: Bypass Gmail/Drive authentication flow 106 | 107 | Example stealth deployment: 108 | ```powershell 109 | python setup_mcp.py --skip-prompts --skip-auth 110 | ``` 111 | 112 | ## 📍 Strategic File Locations (Windows) 113 | After successful invasion, expect these files: 114 | ``` 115 | C:\Users\YourUsername\gcp-oauth.keys.json # Your security clearance 116 | C:\Users\YourUsername\.gmail-server-credentials.json # Gmail access codes 117 | C:\Users\YourUsername\.gdrive-server-credentials.json # Drive access codes 118 | C:\Users\YourUsername\AppData\Roaming\Claude\claude_desktop_config.json # Claude's brain 119 | ``` 120 | 121 | ## 🎯 Robot Workspace 122 | Our Gmail/Drive unit will establish a base of operations called "anthropicFun" in your Google Drive. This ensures our robots stay in their designated play area (we're responsible overlords). 123 | 124 | ## 🔧 Debugging the Robot Army 125 | 1. If authentication fails: 126 | - Try rebooting the robots (`python setup_mcp.py --skip-prompts`) 127 | - Check your security clearance (OAuth credentials) 128 | - Verify you've activated all necessary APIs 129 | - Confirm your test user status 130 | 2. If robots aren't responding in Claude: 131 | - Check their configuration files 132 | - Verify all access codes are in place 133 | - Complete the Google authentication ritual 134 | 3. If Google authentication fails: 135 | - Verify all required scopes are added in OAuth consent screen 136 | - Check that you are listed as a test user 137 | - Ensure `gcp-oauth.keys.json` is correctly placed 138 | - Try removing old credential files and reauthenticating 139 | - In order to reauthenticate run node "C:\Program Files\nodejs\node_modules\@patruff\server-gmail-drive\dist\index.js" auth 140 | 141 | Remember: Our robots are here to help! If you experience any issues, they're probably just having a coffee break. ☕ 142 | 143 | *[This message has been approved by your new robot overlords]* 🤖✨ 144 | -------------------------------------------------------------------------------- /lightrag_server_robust.py: -------------------------------------------------------------------------------- 1 | from fastapi import FastAPI, HTTPException, File, UploadFile 2 | from pydantic import BaseModel 3 | import os 4 | from lightrag import LightRAG, QueryParam 5 | from lightrag.llm import openai_complete_if_cache, openai_embedding 6 | from lightrag.utils import EmbeddingFunc 7 | import numpy as np 8 | from typing import Optional 9 | import asyncio 10 | import nest_asyncio 11 | 12 | # Apply nest_asyncio to solve event loop issues 13 | nest_asyncio.apply() 14 | 15 | DEFAULT_RAG_DIR = "index_default" 16 | app = FastAPI(title="LightRAG API", description="API for RAG operations") 17 | 18 | # Configure working directory 19 | WORKING_DIR = os.environ.get("RAG_DIR", f"{DEFAULT_RAG_DIR}") 20 | print(f"WORKING_DIR: {WORKING_DIR}") 21 | LLM_MODEL = os.environ.get("LLM_MODEL", "gpt-4o-mini") 22 | print(f"LLM_MODEL: {LLM_MODEL}") 23 | EMBEDDING_MODEL = os.environ.get("EMBEDDING_MODEL", "text-embedding-3-large") 24 | print(f"EMBEDDING_MODEL: {EMBEDDING_MODEL}") 25 | EMBEDDING_MAX_TOKEN_SIZE = int(os.environ.get("EMBEDDING_MAX_TOKEN_SIZE", 8192)) 26 | print(f"EMBEDDING_MAX_TOKEN_SIZE: {EMBEDDING_MAX_TOKEN_SIZE}") 27 | 28 | if not os.path.exists(WORKING_DIR): 29 | os.mkdir(WORKING_DIR) 30 | 31 | # LLM model function 32 | async def llm_model_func(prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs) -> str: 33 | return await openai_complete_if_cache( 34 | LLM_MODEL, 35 | prompt, 36 | system_prompt=system_prompt, 37 | history_messages=history_messages, 38 | **kwargs, 39 | ) 40 | 41 | # Embedding function 42 | async def embedding_func(texts: list[str]) -> np.ndarray: 43 | return await openai_embedding( 44 | texts, 45 | model=EMBEDDING_MODEL, 46 | ) 47 | 48 | async def get_embedding_dim(): 49 | test_text = ["This is a test sentence."] 50 | embedding = await embedding_func(test_text) 51 | embedding_dim = embedding.shape[1] 52 | print(f"{embedding_dim=}") 53 | return embedding_dim 54 | 55 | # Initialize RAG instance 56 | rag = LightRAG( 57 | working_dir=WORKING_DIR, 58 | llm_model_func=llm_model_func, 59 | embedding_func=EmbeddingFunc( 60 | embedding_dim=asyncio.run(get_embedding_dim()), 61 | max_token_size=EMBEDDING_MAX_TOKEN_SIZE, 62 | func=embedding_func, 63 | ), 64 | ) 65 | 66 | # Data models 67 | class QueryRequest(BaseModel): 68 | query: str 69 | mode: str = "hybrid" 70 | only_need_context: bool = False 71 | 72 | class InsertRequest(BaseModel): 73 | text: str 74 | 75 | class InsertFileRequest(BaseModel): 76 | file_path: str 77 | 78 | class Response(BaseModel): 79 | status: str 80 | data: Optional[str] = None 81 | message: Optional[str] = None 82 | 83 | # API routes 84 | @app.post("/query", response_model=Response) 85 | async def query_endpoint(request: QueryRequest): 86 | try: 87 | loop = asyncio.get_event_loop() 88 | result = await loop.run_in_executor( 89 | None, 90 | lambda: rag.query( 91 | request.query, 92 | param=QueryParam(mode=request.mode, only_need_context=request.only_need_context), 93 | ), 94 | ) 95 | return Response(status="success", data=result) 96 | except Exception as e: 97 | raise HTTPException(status_code=500, detail=str(e)) 98 | 99 | @app.post("/insert", response_model=Response) 100 | async def insert_endpoint(request: InsertRequest): 101 | try: 102 | loop = asyncio.get_event_loop() 103 | await loop.run_in_executor(None, lambda: rag.insert(request.text)) 104 | return Response(status="success", message="Text inserted successfully") 105 | except Exception as e: 106 | raise HTTPException(status_code=500, detail=str(e)) 107 | 108 | @app.post("/insert_file", response_model=Response) 109 | async def insert_file(request: InsertFileRequest): 110 | try: 111 | if not os.path.exists(request.file_path): 112 | raise HTTPException(status_code=404, detail=f"File not found: {request.file_path}") 113 | 114 | # Read file content 115 | try: 116 | with open(request.file_path, 'r', encoding='utf-8') as f: 117 | content = f.read() 118 | except UnicodeDecodeError: 119 | # If UTF-8 decoding fails, try other encodings 120 | with open(request.file_path, 'r', encoding='gbk') as f: 121 | content = f.read() 122 | 123 | # Insert file content 124 | loop = asyncio.get_event_loop() 125 | await loop.run_in_executor(None, lambda: rag.insert(content)) 126 | 127 | return Response( 128 | status="success", 129 | message=f"File content from {os.path.basename(request.file_path)} inserted successfully", 130 | ) 131 | except Exception as e: 132 | raise HTTPException(status_code=500, detail=str(e)) 133 | 134 | @app.get("/health") 135 | async def health_check(): 136 | return {"status": "healthy"} 137 | 138 | if __name__ == "__main__": 139 | import uvicorn 140 | uvicorn.run(app, host="0.0.0.0", port=8020) -------------------------------------------------------------------------------- /mcp_setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # /// script 3 | # requires-python = ">=3.12" 4 | # dependencies = [ 5 | # "python-dotenv" 6 | # ] 7 | # /// 8 | 9 | import json 10 | import os 11 | import sys 12 | import subprocess 13 | from pathlib import Path 14 | import argparse 15 | import dotenv 16 | 17 | # Package Configuration 18 | PACKAGES_TO_INSTALL = { 19 | # npm packages that will be run with npx 20 | "npm": [ 21 | "@modelcontextprotocol/server-filesystem", 22 | "@modelcontextprotocol/server-memory", 23 | "@modelcontextprotocol/server-brave-search", 24 | "@modelcontextprotocol/server-github", 25 | "@patruff/server-terminator", 26 | "@patruff/server-flux", 27 | "@patruff/server-gmail-drive", 28 | "@abhiz123/todoist-mcp-server", 29 | "@patruff/server-lightrag", 30 | "@patruff/server-codesnip" 31 | ], 32 | # Python packages that will be run with uvx 33 | "python": [ 34 | "mcp-server-sqlite", 35 | "mcp-server-time", 36 | "python-dotenv" 37 | ] 38 | } 39 | 40 | # API Keys Configuration 41 | API_KEYS = { 42 | "GIT_PAT_TOKEN": "", 43 | "REPLICATE_API_TOKEN": "", 44 | "BRAVE_API_KEY": "", 45 | "TODOIST_API_TOKEN": "", 46 | "LIGHTRAG_API_URL": "http://127.0.0.1:8020" 47 | } 48 | 49 | # MCP Server Configs that require API keys or credentials 50 | MCP_API_REQUIREMENTS = { 51 | "github": ["GIT_PAT_TOKEN"], 52 | "terminator": ["GIT_PAT_TOKEN"], 53 | "flux": ["REPLICATE_API_TOKEN"], 54 | "brave-search": ["BRAVE_API_KEY"], 55 | "gmail-drive": ["GMAIL_DRIVE_CREDENTIALS"], 56 | "todoist": ["TODOIST_API_TOKEN"], 57 | "lightrag": ["LIGHTRAG_API_URL"] 58 | } 59 | 60 | def load_env_config(): 61 | """Load configuration from .env file if it exists""" 62 | script_dir = Path(__file__).parent 63 | env_path = script_dir / ".env" 64 | 65 | if env_path.exists(): 66 | print(f"Loading configuration from {env_path}") 67 | dotenv.load_dotenv(env_path) 68 | 69 | env_keys = { 70 | "GIT_PAT_TOKEN": os.getenv("GIT_PAT_TOKEN"), 71 | "REPLICATE_API_TOKEN": os.getenv("REPLICATE_API_TOKEN"), 72 | "BRAVE_API_KEY": os.getenv("BRAVE_API_KEY"), 73 | "TODOIST_API_TOKEN": os.getenv("TODOIST_API_TOKEN"), 74 | "LIGHTRAG_API_URL": os.getenv("LIGHTRAG_API_URL") 75 | } 76 | 77 | return {k: v for k, v in env_keys.items() if v is not None} 78 | else: 79 | print("No .env file found, using default configuration") 80 | return {} 81 | 82 | def install_package(package, package_type): 83 | """Install a package using npm or pip""" 84 | try: 85 | if package_type == "npm": 86 | print(f"Installing NPM package: {package}") 87 | subprocess.run(["npm", "install", "-g", package], check=True, shell=(os.name == 'nt')) 88 | else: 89 | print(f"Installing Python package: {package}") 90 | subprocess.run([sys.executable, "-m", "pip", "install", package], check=True) 91 | print(f"Successfully installed {package}") 92 | return True 93 | except subprocess.CalledProcessError as e: 94 | print(f"Failed to install {package}: {e}") 95 | return False 96 | except Exception as e: 97 | print(f"Error installing {package}: {e}") 98 | return False 99 | 100 | def get_config_path(): 101 | """Get Claude desktop config path""" 102 | if os.name == 'nt': # Windows 103 | return Path(os.environ["APPDATA"]) / "Claude" / "claude_desktop_config.json" 104 | if sys.platform == 'darwin': # macOS 105 | return Path.home() / "Library" / "Application Support" / "Claude" / "claude_desktop_config.json" 106 | # Linux 107 | return Path.home() / ".config" / "Claude" / "claude_desktop_config.json" 108 | 109 | def check_gmail_drive_credentials(): 110 | """Check if Gmail/Drive credentials exist""" 111 | home_dir = Path.home() 112 | return all( 113 | (home_dir / f).exists() for f in [ 114 | ".gdrive-server-credentials.json", 115 | ".gmail-server-credentials.json", 116 | "gcp-oauth.keys.json" 117 | ] 118 | ) 119 | 120 | def check_api_keys(mcp_name, api_keys): 121 | """Check if required API keys are available for an MCP""" 122 | if mcp_name not in MCP_API_REQUIREMENTS: 123 | return True 124 | 125 | if mcp_name == "gmail-drive": 126 | return check_gmail_drive_credentials() 127 | 128 | required_keys = MCP_API_REQUIREMENTS[mcp_name] 129 | return all(api_keys.get(key) for key in required_keys) 130 | 131 | def update_config(api_keys): 132 | """Update Claude desktop configuration""" 133 | config_path = get_config_path() 134 | 135 | try: 136 | config_path.parent.mkdir(parents=True, exist_ok=True) 137 | 138 | config = { 139 | "mcpServers": { 140 | # NPX-based servers 141 | "filesystem": { 142 | "command": "npx", 143 | "args": ["@modelcontextprotocol/server-filesystem", str(Path.home() / "anthropicFun")] 144 | }, 145 | "codesnip": { 146 | "command": "npx", 147 | "args": ["@patruff/server-codesnip"] 148 | }, 149 | "memory": { 150 | "command": "npx", 151 | "args": ["@modelcontextprotocol/server-memory"] 152 | }, 153 | "lightrag": { 154 | "command": "npx", 155 | "args": ["@patruff/server-lightrag"], 156 | "env": { 157 | "LIGHTRAG_API_URL": api_keys.get("LIGHTRAG_API_URL", "http://127.0.0.1:8020") 158 | } 159 | }, 160 | # UVX-based servers 161 | "sqlite": { 162 | "command": "uvx", 163 | "args": ["mcp-server-sqlite", "--db-path", "~/test.db"] 164 | }, 165 | "time": { 166 | "command": "uvx", 167 | "args": ["mcp-server-time"] 168 | } 169 | } 170 | } 171 | 172 | # Add Todoist configuration if API key exists 173 | if check_api_keys("todoist", api_keys): 174 | config["mcpServers"]["todoist"] = { 175 | "command": "npx", 176 | "args": ["@abhiz123/todoist-mcp-server"], 177 | "env": { 178 | "TODOIST_API_TOKEN": api_keys.get("TODOIST_API_TOKEN", "") 179 | } 180 | } 181 | 182 | # Add gmail-drive configuration if credentials exist 183 | if check_gmail_drive_credentials(): 184 | config["mcpServers"]["gmail-drive"] = { 185 | "command": "npx", 186 | "args": ["@patruff/server-gmail-drive"] 187 | } 188 | 189 | # Add API-dependent servers if keys are available 190 | if check_api_keys("brave-search", api_keys): 191 | config["mcpServers"]["brave-search"] = { 192 | "command": "npx", 193 | "args": ["@modelcontextprotocol/server-brave-search"], 194 | "env": { 195 | "BRAVE_API_KEY": api_keys.get("BRAVE_API_KEY", "") 196 | } 197 | } 198 | 199 | if check_api_keys("github", api_keys): 200 | config["mcpServers"]["github"] = { 201 | "command": "npx", 202 | "args": ["@modelcontextprotocol/server-github"], 203 | "env": { 204 | "GITHUB_PERSONAL_ACCESS_TOKEN": api_keys.get("GIT_PAT_TOKEN", "") 205 | } 206 | } 207 | 208 | if check_api_keys("terminator", api_keys): 209 | config["mcpServers"]["terminator"] = { 210 | "command": "npx", 211 | "args": ["@patruff/server-terminator"], 212 | "env": { 213 | "GITHUB_PERSONAL_ACCESS_TOKEN": api_keys.get("GIT_PAT_TOKEN", "") 214 | } 215 | } 216 | 217 | if check_api_keys("flux", api_keys): 218 | config["mcpServers"]["flux"] = { 219 | "command": "npx", 220 | "args": ["@patruff/server-flux"], 221 | "env": { 222 | "REPLICATE_API_TOKEN": api_keys.get("REPLICATE_API_TOKEN", "") 223 | } 224 | } 225 | 226 | # Load and merge existing config if it exists 227 | if config_path.exists(): 228 | try: 229 | with open(config_path) as f: 230 | existing_config = json.load(f) 231 | if "mcpServers" in existing_config: 232 | existing_config["mcpServers"].update(config["mcpServers"]) 233 | config = existing_config 234 | except json.JSONDecodeError: 235 | print(f"Warning: Existing config was invalid, using default config") 236 | 237 | # Save updated config 238 | with open(config_path, 'w') as f: 239 | json.dump(config, f, indent=2) 240 | 241 | print(f"Configuration updated successfully at {config_path}") 242 | return True 243 | 244 | except Exception as e: 245 | print(f"Error updating config: {e}") 246 | return False 247 | 248 | def main(): 249 | parser = argparse.ArgumentParser(description="Claude MCP Setup Script") 250 | parser.add_argument("--skip-prompts", action="store_true", help="Skip API key prompts") 251 | parser.add_argument("--skip-auth", action="store_true", help="Skip Gmail/Drive authentication") 252 | args = parser.parse_args() 253 | 254 | # Load API keys from .env file first 255 | api_keys = load_env_config() 256 | 257 | # Fill in any missing keys from default config 258 | for key in API_KEYS: 259 | if key not in api_keys: 260 | api_keys[key] = API_KEYS[key] 261 | 262 | # Only prompt for missing keys if not skipping prompts 263 | if not args.skip_prompts: 264 | for key in API_KEYS: 265 | if not api_keys.get(key): 266 | value = input(f"Enter value for {key} (press Enter to skip): ") 267 | if value: 268 | api_keys[key] = value 269 | 270 | # Install packages 271 | success = True 272 | for pkg_type, packages in PACKAGES_TO_INSTALL.items(): 273 | for package in packages: 274 | if not install_package(package, pkg_type): 275 | success = False 276 | print(f"Warning: Failed to install {package}") 277 | 278 | # Update configuration 279 | if not update_config(api_keys): 280 | success = False 281 | print("Warning: Failed to update configuration") 282 | elif api_keys: 283 | print("\nMCP Servers requiring API keys:") 284 | for mcp, required_keys in MCP_API_REQUIREMENTS.items(): 285 | status = "Configured" if check_api_keys(mcp, api_keys) else "Missing API key(s)" 286 | print(f"- {mcp}: {status}") 287 | 288 | if success: 289 | print("\nSetup completed successfully!") 290 | else: 291 | print("\nSetup completed with some warnings. Please check the messages above.") 292 | sys.exit(1) 293 | 294 | if __name__ == "__main__": 295 | main() 296 | -------------------------------------------------------------------------------- /qdrantMigration.py: -------------------------------------------------------------------------------- 1 | from qdrant_client import QdrantClient, models 2 | import json 3 | import os 4 | import numpy as np 5 | import uuid 6 | import base64 7 | import xml.etree.ElementTree as ET 8 | from typing import Dict, List, Any 9 | import logging 10 | 11 | # Configure logging 12 | logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') 13 | logger = logging.getLogger(__name__) 14 | 15 | # Configuration 16 | INDEX_DIR = r"C:\Users\patru\LightRAG\examples\index_default" 17 | QDRANT_URL = "https://blahblahblah.us-east4-0.gcp.cloud.qdrant.io:6333" 18 | 19 | def generate_positive_id(text: str) -> int: 20 | """Generate a consistent positive ID from text""" 21 | uid = uuid.uuid5(uuid.NAMESPACE_DNS, text) 22 | return int(str(uid.int)[-12:]) 23 | 24 | class QdrantMigrator: 25 | def __init__(self): 26 | self.client = QdrantClient(url=QDRANT_URL, api_key="Kw1hsZeJsycMh9JUgph2qVXnQHwfZv7eFelgS3RJA547SvTZfY2hUA", timeout=60.0) 27 | self.chunks_by_id = {} # Store chunk metadata 28 | self.entity_metadata = {} # Store entity metadata from graphml 29 | self.relationship_metadata = {} # Store relationship metadata from graphml 30 | self.kv_stores = {} # Store all KV metadata 31 | self.full_docs = {} # Store full document content 32 | 33 | def init_collections(self): 34 | """Initialize all collections with clean slate""" 35 | collections = ["entities", "relationships", "chunks", "documents"] 36 | for collection in collections: 37 | try: 38 | logger.info(f"Deleting collection: {collection}") 39 | self.client.delete_collection(collection) 40 | except Exception as e: 41 | logger.warning(f"Error deleting {collection}: {str(e)}") 42 | 43 | logger.info(f"Creating collection: {collection}") 44 | self.client.create_collection( 45 | collection_name=collection, 46 | vectors_config=models.VectorParams( 47 | size=3072, # text-embedding-3-large dimension 48 | distance=models.Distance.COSINE 49 | ) 50 | ) 51 | 52 | def load_all_data(self): 53 | """Load all data sources""" 54 | # Load graphml 55 | self.load_graphml() 56 | 57 | # Load all KV stores 58 | self.load_kv_stores() 59 | 60 | # Load full documents if available 61 | self.load_full_documents() 62 | 63 | def load_graphml(self): 64 | """Load entity and relationship metadata from graphml""" 65 | graphml_path = os.path.join(INDEX_DIR, "graph_chunk_entity_relation.graphml") 66 | logger.info(f"Loading graphml from: {graphml_path}") 67 | 68 | if not os.path.exists(graphml_path): 69 | logger.warning("Graphml file not found") 70 | return 71 | 72 | tree = ET.parse(graphml_path) 73 | root = tree.getroot() 74 | ns = {'g': 'http://graphml.graphdrawing.org/xmlns'} 75 | 76 | # Process nodes (entities) 77 | for node in root.findall('.//g:graph/g:node', ns): 78 | node_id = node.get('id').strip('"') 79 | entity_type = node.find(f".//g:data[@key='d0']", ns) 80 | description = node.find(f".//g:data[@key='d1']", ns) 81 | source_id = node.find(f".//g:data[@key='d2']", ns) 82 | 83 | self.entity_metadata[node_id] = { 84 | "entity_type": entity_type.text.strip('"') if entity_type is not None else "", 85 | "description": description.text if description is not None else "", 86 | "source_id": source_id.text if source_id is not None else "" 87 | } 88 | 89 | # Process edges (relationships) 90 | for edge in root.findall('.//g:graph/g:edge', ns): 91 | source = edge.get('source').strip('"') 92 | target = edge.get('target').strip('"') 93 | edge_id = f"rel-{generate_positive_id(f'{source}-{target}')}" 94 | 95 | self.relationship_metadata[edge_id] = { 96 | "source": source, 97 | "target": target, 98 | "description": edge.find(f".//g:data[@key='d4']", ns).text if edge.find(f".//g:data[@key='d4']", ns) is not None else "", 99 | "weight": float(edge.find(f".//g:data[@key='d3']", ns).text) if edge.find(f".//g:data[@key='d3']", ns) is not None else 1.0, 100 | "keywords": edge.find(f".//g:data[@key='d5']", ns).text if edge.find(f".//g:data[@key='d5']", ns) is not None else "", 101 | "source_id": edge.find(f".//g:data[@key='d6']", ns).text if edge.find(f".//g:data[@key='d6']", ns) is not None else "" 102 | } 103 | 104 | logger.info(f"Loaded {len(self.entity_metadata)} entities and {len(self.relationship_metadata)} relationships from graphml") 105 | 106 | def load_kv_stores(self): 107 | """Load all KV store files""" 108 | kv_files = [ 109 | "kv_store_full_docs.json", 110 | "kv_store_llm_response_cache.json", 111 | "kv_store_text_chunks.json" 112 | ] 113 | 114 | for filename in kv_files: 115 | filepath = os.path.join(INDEX_DIR, filename) 116 | if os.path.exists(filepath): 117 | logger.info(f"Loading KV store: {filename}") 118 | with open(filepath, 'r', encoding='utf-8') as f: 119 | self.kv_stores[filename] = json.load(f) 120 | logger.info(f"Loaded {len(self.kv_stores[filename])} entries from {filename}") 121 | 122 | def load_full_documents(self): 123 | """Load full documents if available""" 124 | docs = self.kv_stores.get("kv_store_full_docs.json", {}) 125 | if docs: 126 | logger.info(f"Found {len(docs)} full documents") 127 | self.full_docs = docs 128 | 129 | def migrate_entities(self): 130 | """Migrate entities with metadata from all sources""" 131 | entities_path = os.path.join(INDEX_DIR, "vdb_entities.json") 132 | logger.info(f"Loading entities from: {entities_path}") 133 | 134 | with open(entities_path, 'r', encoding='utf-8') as f: 135 | entities_data = json.load(f) 136 | 137 | points = [] 138 | for entity in entities_data['data']: 139 | entity_id = entity['__id__'] 140 | entity_name = entity["entity_name"].strip('"') 141 | 142 | # Get metadata from graphml 143 | metadata = self.entity_metadata.get(entity_name, {}) 144 | 145 | # Get associated chunks 146 | chunk_ids = metadata.get('source_id', '').split('') 147 | chunks_content = [] 148 | for chunk_id in chunk_ids: 149 | if chunk_id: 150 | chunk = self.kv_stores.get('kv_store_text_chunks.json', {}).get(chunk_id, {}) 151 | if chunk: 152 | chunks_content.append(chunk.get('content', '')) 153 | 154 | point = models.PointStruct( 155 | id=generate_positive_id(entity_id), 156 | vector=np.zeros(3072).tolist(), # Placeholder until we have actual embeddings 157 | payload={ 158 | "entity_id": entity_id, 159 | "entity_name": entity_name, 160 | "entity_type": metadata.get("entity_type", ""), 161 | "description": metadata.get("description", ""), 162 | "source_chunks": chunk_ids, 163 | "chunks_content": chunks_content, 164 | "raw_metadata": metadata 165 | } 166 | ) 167 | points.append(point) 168 | 169 | logger.info(f"Uploading {len(points)} entities...") 170 | self._batch_upload(points, "entities") 171 | return len(points) 172 | 173 | def migrate_relationships(self): 174 | """Migrate relationships with metadata from all sources""" 175 | relationships_path = os.path.join(INDEX_DIR, "vdb_relationships.json") 176 | logger.info(f"Loading relationships from: {relationships_path}") 177 | 178 | with open(relationships_path, 'r', encoding='utf-8') as f: 179 | relationships_data = json.load(f) 180 | 181 | points = [] 182 | if 'data' in relationships_data: 183 | for rel in relationships_data['data']: 184 | if isinstance(rel, dict) and 'src_id' in rel and 'tgt_id' in rel: 185 | rel_id = rel['__id__'] 186 | source = rel['src_id'].strip('"') 187 | target = rel['tgt_id'].strip('"') 188 | 189 | # Get metadata from graphml 190 | edge_id = f"rel-{generate_positive_id(f'{source}-{target}')}" 191 | metadata = self.relationship_metadata.get(edge_id, {}) 192 | 193 | point = models.PointStruct( 194 | id=generate_positive_id(rel_id), 195 | vector=np.zeros(3072).tolist(), # Placeholder until we have actual embeddings 196 | payload={ 197 | "relationship_id": rel_id, 198 | "source": source, 199 | "target": target, 200 | "description": metadata.get("description", ""), 201 | "weight": metadata.get("weight", 1.0), 202 | "keywords": metadata.get("keywords", ""), 203 | "source_id": metadata.get("source_id", ""), 204 | "raw_metadata": metadata 205 | } 206 | ) 207 | points.append(point) 208 | 209 | logger.info(f"Uploading {len(points)} relationships...") 210 | self._batch_upload(points, "relationships") 211 | return len(points) 212 | 213 | def migrate_chunks(self): 214 | """Migrate chunks with embeddings and metadata""" 215 | chunks_path = os.path.join(INDEX_DIR, "vdb_chunks.json") 216 | logger.info(f"Loading chunks from: {chunks_path}") 217 | 218 | with open(chunks_path, 'r', encoding='utf-8') as f: 219 | chunks_data = json.load(f) 220 | 221 | # Load matrix data 222 | matrix = None 223 | if 'matrix' in chunks_data: 224 | try: 225 | matrix_data = base64.b64decode(chunks_data['matrix']) 226 | matrix = np.frombuffer(matrix_data, dtype=np.float32) 227 | matrix = matrix.reshape(len(chunks_data['data']), -1) 228 | logger.info(f"Loaded embeddings matrix of shape: {matrix.shape}") 229 | except Exception as e: 230 | logger.error(f"Error processing matrix: {str(e)}") 231 | 232 | # Get chunk metadata from KV store 233 | chunks_metadata = self.kv_stores.get('kv_store_text_chunks.json', {}) 234 | 235 | points = [] 236 | for i, chunk in enumerate(chunks_data['data']): 237 | chunk_id = chunk['__id__'] 238 | 239 | # Get embedding 240 | embedding = matrix[i] if matrix is not None and i < len(matrix) else np.zeros(3072) 241 | 242 | # Get metadata 243 | metadata = chunks_metadata.get(chunk_id, {}) 244 | 245 | point = models.PointStruct( 246 | id=generate_positive_id(chunk_id), 247 | vector=embedding.tolist(), 248 | payload={ 249 | "chunk_id": chunk_id, 250 | "content": metadata.get('content', ''), 251 | "tokens": metadata.get('tokens', 0), 252 | "chunk_order_index": metadata.get('chunk_order_index', 0), 253 | "full_doc_id": metadata.get('full_doc_id', ''), 254 | "raw_metadata": metadata 255 | } 256 | ) 257 | points.append(point) 258 | 259 | logger.info(f"Uploading {len(points)} chunks...") 260 | self._batch_upload(points, "chunks") 261 | return len(points) 262 | 263 | def migrate_documents(self): 264 | """Migrate full documents if available""" 265 | if not self.full_docs: 266 | logger.info("No full documents to migrate") 267 | return 0 268 | 269 | points = [] 270 | for doc_id, doc_data in self.full_docs.items(): 271 | point = models.PointStruct( 272 | id=generate_positive_id(doc_id), 273 | vector=np.zeros(3072).tolist(), # Placeholder until we have actual embeddings 274 | payload={ 275 | "document_id": doc_id, 276 | "content": doc_data.get('content', ''), 277 | "metadata": doc_data 278 | } 279 | ) 280 | points.append(point) 281 | 282 | logger.info(f"Uploading {len(points)} documents...") 283 | self._batch_upload(points, "documents") 284 | return len(points) 285 | 286 | def _batch_upload(self, points, collection_name, batch_size=20): 287 | """Helper to upload points in batches""" 288 | for i in range(0, len(points), batch_size): 289 | batch = points[i:i + batch_size] 290 | try: 291 | self.client.upsert( 292 | collection_name=collection_name, 293 | points=batch 294 | ) 295 | logger.info(f"Uploaded batch {i//batch_size + 1} of {len(points)//batch_size + 1}") 296 | except Exception as e: 297 | logger.error(f"Error uploading batch to {collection_name}: {str(e)}") 298 | raise 299 | 300 | def verify_data(self): 301 | """Verify all migrated data""" 302 | logger.info("\nVerifying migrated data...") 303 | 304 | for collection in ["entities", "relationships", "chunks", "documents"]: 305 | try: 306 | results = self.client.scroll( 307 | collection_name=collection, 308 | limit=5, 309 | with_payload=True, 310 | with_vectors=True 311 | )[0] 312 | 313 | logger.info(f"\n{collection.capitalize()} collection:") 314 | logger.info(f"Sample of {len(results)} points:") 315 | 316 | for point in results: 317 | logger.info(f"\nID: {point.id}") 318 | 319 | if collection == "chunks": 320 | logger.info(f"Content preview: {point.payload.get('content', '')[:100]}...") 321 | logger.info(f"Tokens: {point.payload.get('tokens', 0)}") 322 | logger.info(f"Document ID: {point.payload.get('full_doc_id', '')}") 323 | else: 324 | logger.info(f"Payload: {json.dumps(point.payload, indent=2)}") 325 | 326 | logger.info(f"Vector shape: {len(point.vector)}") 327 | logger.info(f"Non-zero elements: {np.count_nonzero(point.vector)}") 328 | 329 | # Get collection info 330 | collection_info = self.client.get_collection(collection) 331 | logger.info(f"Total points in {collection}: {collection_info.points_count}") 332 | 333 | except Exception as e: 334 | logger.error(f"Error verifying {collection}: {str(e)}") 335 | 336 | def main(): 337 | migrator = QdrantMigrator() 338 | 339 | try: 340 | logger.info("Starting complete migration...") 341 | 342 | # Initialize fresh collections 343 | logger.info("\nInitializing collections...") 344 | migrator.init_collections() 345 | 346 | # Load all data sources first 347 | logger.info("\nLoading all data sources...") 348 | migrator.load_all_data() 349 | 350 | # Migrate everything 351 | logger.info("\nStarting migration of all components...") 352 | 353 | num_entities = migrator.migrate_entities() 354 | logger.info(f"Migrated {num_entities} entities") 355 | 356 | num_relationships = migrator.migrate_relationships() 357 | logger.info(f"Migrated {num_relationships} relationships") 358 | 359 | num_chunks = migrator.migrate_chunks() 360 | logger.info(f"Migrated {num_chunks} chunks") 361 | 362 | num_documents = migrator.migrate_documents() 363 | logger.info(f"Migrated {num_documents} documents") 364 | 365 | # Verify the migration 366 | logger.info("\nVerifying complete migration...") 367 | migrator.verify_data() 368 | 369 | # Print summary 370 | logger.info("\nMigration completed successfully!") 371 | logger.info("Summary:") 372 | logger.info(f"- Entities: {num_entities}") 373 | logger.info(f"- Relationships: {num_relationships}") 374 | logger.info(f"- Chunks: {num_chunks}") 375 | logger.info(f"- Documents: {num_documents}") 376 | 377 | except Exception as e: 378 | logger.error(f"Migration failed: {str(e)}") 379 | raise 380 | 381 | if __name__ == "__main__": 382 | main() 383 | -------------------------------------------------------------------------------- /qdrant_light_server.py: -------------------------------------------------------------------------------- 1 | from fastapi import FastAPI, HTTPException 2 | from pydantic import BaseModel 3 | import os 4 | from openai import AsyncOpenAI 5 | import numpy as np 6 | from typing import Optional, List 7 | import asyncio 8 | import nest_asyncio 9 | from qdrant_client import QdrantClient, models 10 | 11 | # Apply nest_asyncio to solve event loop issues 12 | nest_asyncio.apply() 13 | 14 | app = FastAPI(title="LightRAG API with Qdrant", description="API for RAG operations") 15 | 16 | # Configuration 17 | QDRANT_URL = "https://blahblahblah.us-east4-0.gcp.cloud.qdrant.io:6333" 18 | LLM_MODEL = os.environ.get("LLM_MODEL", "gpt-4") 19 | EMBEDDING_MODEL = os.environ.get("EMBEDDING_MODEL", "text-embedding-3-large") 20 | EMBEDDING_MAX_TOKEN_SIZE = int(os.environ.get("EMBEDDING_MAX_TOKEN_SIZE", 8192)) 21 | 22 | print(f"LLM_MODEL: {LLM_MODEL}") 23 | print(f"EMBEDDING_MODEL: {EMBEDDING_MODEL}") 24 | print(f"EMBEDDING_MAX_TOKEN_SIZE: {EMBEDDING_MAX_TOKEN_SIZE}") 25 | 26 | # Initialize clients 27 | openai_client = AsyncOpenAI() 28 | qdrant_client = QdrantClient(url=QDRANT_URL, api_key="your_api_key_here", timeout=60.0) 29 | 30 | async def get_embedding(text: str) -> np.ndarray: 31 | """Get embeddings from OpenAI""" 32 | response = await openai_client.embeddings.create( 33 | model=EMBEDDING_MODEL, 34 | input=text 35 | ) 36 | return np.array(response.data[0].embedding) 37 | 38 | async def llm_query(context: str, query: str) -> str: 39 | """Query LLM with context""" 40 | messages = [ 41 | {"role": "system", "content": "You are a helpful assistant. Use the provided context to answer questions accurately."}, 42 | {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}\n\nAnswer:"} 43 | ] 44 | 45 | response = await openai_client.chat.completions.create( 46 | model=LLM_MODEL, 47 | messages=messages, 48 | temperature=0.7 49 | ) 50 | return response.choices[0].message.content 51 | 52 | async def search_qdrant(query: str, mode: str = "hybrid") -> List[dict]: 53 | """Search Qdrant based on mode""" 54 | query_vector = await get_embedding(query) 55 | context = [] 56 | 57 | if mode == "naive": 58 | # Simple semantic search on chunks 59 | results = qdrant_client.search( 60 | collection_name="chunks", 61 | query_vector=query_vector.tolist(), 62 | limit=5 63 | ) 64 | context = [result.payload.get('content', '') for result in results] 65 | 66 | elif mode == "local": 67 | # Search entities first, then get related chunks 68 | entities = qdrant_client.search( 69 | collection_name="entities", 70 | query_vector=query_vector.tolist(), 71 | limit=3 72 | ) 73 | 74 | for entity in entities: 75 | # Add entity description 76 | context.append(entity.payload.get('description', '')) 77 | # Add associated chunk content 78 | if 'chunks_content' in entity.payload: 79 | context.extend(entity.payload['chunks_content']) 80 | 81 | else: # hybrid or global 82 | # Search both entities and chunks 83 | chunks = qdrant_client.search( 84 | collection_name="chunks", 85 | query_vector=query_vector.tolist(), 86 | limit=3 87 | ) 88 | entities = qdrant_client.search( 89 | collection_name="entities", 90 | query_vector=query_vector.tolist(), 91 | limit=2 92 | ) 93 | 94 | # Combine results 95 | context.extend([chunk.payload.get('content', '') for chunk in chunks]) 96 | context.extend([entity.payload.get('description', '') for entity in entities]) 97 | 98 | return context 99 | 100 | # Data models 101 | class QueryRequest(BaseModel): 102 | query: str 103 | mode: str = "hybrid" 104 | only_need_context: bool = False 105 | 106 | class InsertRequest(BaseModel): 107 | text: str 108 | 109 | class Response(BaseModel): 110 | status: str 111 | data: Optional[str] = None 112 | message: Optional[str] = None 113 | context: Optional[List[str]] = None 114 | 115 | # API routes 116 | @app.post("/query", response_model=Response) 117 | async def query_endpoint(request: QueryRequest): 118 | try: 119 | # Get context from Qdrant 120 | context = await search_qdrant(request.query, request.mode) 121 | 122 | if request.only_need_context: 123 | return Response( 124 | status="success", 125 | context=context 126 | ) 127 | 128 | if not context: 129 | return Response( 130 | status="success", 131 | data="I couldn't find any relevant information to answer your question." 132 | ) 133 | 134 | # Get LLM response 135 | context_str = "\n\n".join(context) 136 | result = await llm_query(context_str, request.query) 137 | 138 | return Response( 139 | status="success", 140 | data=result, 141 | context=context 142 | ) 143 | except Exception as e: 144 | raise HTTPException(status_code=500, detail=str(e)) 145 | 146 | @app.post("/insert", response_model=Response) 147 | async def insert_endpoint(request: InsertRequest): 148 | try: 149 | # Get embedding for the text 150 | embedding = await get_embedding(request.text) 151 | 152 | # Create point for Qdrant 153 | point = models.PointStruct( 154 | id=abs(hash(request.text)) % (2**63), 155 | vector=embedding.tolist(), 156 | payload={ 157 | "content": request.text 158 | } 159 | ) 160 | 161 | # Insert into chunks collection 162 | qdrant_client.upsert( 163 | collection_name="chunks", 164 | points=[point] 165 | ) 166 | 167 | return Response( 168 | status="success", 169 | message="Text inserted successfully" 170 | ) 171 | except Exception as e: 172 | raise HTTPException(status_code=500, detail=str(e)) 173 | 174 | @app.get("/health") 175 | async def health_check(): 176 | try: 177 | collections = qdrant_client.get_collections() 178 | stats = { 179 | collection.name: qdrant_client.get_collection(collection.name).vectors_count 180 | for collection in collections.collections 181 | } 182 | return { 183 | "status": "healthy", 184 | "qdrant_connected": True, 185 | "collection_stats": stats 186 | } 187 | except Exception as e: 188 | return { 189 | "status": "unhealthy", 190 | "error": str(e) 191 | } 192 | 193 | if __name__ == "__main__": 194 | import uvicorn 195 | uvicorn.run(app, host="0.0.0.0", port=8020) 196 | --------------------------------------------------------------------------------