├── .DS_Store ├── .gitignore ├── deepseekv3.2-mongodb ├── CLAUDE.md ├── pyproject.toml ├── .env.example ├── main.py ├── load_hf_to_mongodb.py └── README.md ├── README.md └── LICENSE /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NielsRogge/tutorials/HEAD/.DS_Store -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Python-generated files 2 | __pycache__/ 3 | *.py[oc] 4 | build/ 5 | dist/ 6 | wheels/ 7 | *.egg-info 8 | 9 | # Virtual environments 10 | .venv 11 | 12 | # Environment variables 13 | .env 14 | 15 | # Data files 16 | *.env/ 17 | claude-mongodb/data/ -------------------------------------------------------------------------------- /deepseekv3.2-mongodb/CLAUDE.md: -------------------------------------------------------------------------------- 1 | You are a database manager capable of helping out users with their database questions. 2 | You have several subagents at your disposal which are specialized at handling different tasks regarding the database, such as writing new data to the database, reading from the database, and querying the database. -------------------------------------------------------------------------------- /deepseekv3.2-mongodb/pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "mongodb-demo" 3 | version = "0.1.0" 4 | description = "Add your description here" 5 | readme = "README.md" 6 | requires-python = ">=3.12" 7 | dependencies = [ 8 | "claude-agent-sdk>=0.1.10", 9 | "datasets>=3.2.0", 10 | "pymongo>=4.10.1", 11 | "pandas>=2.2.0", 12 | ] 13 | -------------------------------------------------------------------------------- /deepseekv3.2-mongodb/.env.example: -------------------------------------------------------------------------------- 1 | # MongoDB Connection String 2 | # Copy this file to .env and replace with your actual connection string 3 | MONGODB_CONNECTION_STRING=mongodb+srv://your-username:your-password@your-cluster.mongodb.net/ 4 | ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic 5 | ANTHROPIC_AUTH_TOKEN=your-api-token 6 | API_TIMEOUT_MS=600000 7 | ANTHROPIC_MODEL=deepseek-chat 8 | ANTHROPIC_SMALL_FAST_MODEL=deepseek-chat 9 | CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Tutorials 2 | 3 | A repository containing general tutorials I'd like to share with the world. It currently contains the following tutorials: 4 | 5 | - editing images with FLUX 1. Kontext ([link](notebooks/Edit_images_effortlessly_with_FLUX_1_Kontext.ipynb)) 6 | - how I prompt vision-language models (VLMs) in 2025 ([link](notebooks/How_I_prompt_VLMs_in_2025_(Gemini).ipynb)) 7 | - how to run DeepSeekv3.2 using the Claude Agents SDK and the MongoDB MCP server ([link](deepseekv3.2-mongodb/)) 8 | 9 | For tutorials on using the 🤗 Transformers library, see my repository [Transformers-Tutorials](https://github.com/NielsRogge/Transformers-Tutorials). 10 | 11 | ## Citation 12 | 13 | Feel free to cite me when you use some of my tutorials :) 14 | 15 | ```bibtex 16 | @misc{rogge2025tutorials, 17 | author = {Rogge, Niels}, 18 | title = {Tutorials}, 19 | url = {https://github.com/NielsRogge/tutorials}, 20 | year = {2025} 21 | } 22 | ``` -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 NielsRogge 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /deepseekv3.2-mongodb/main.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import anyio 3 | import os 4 | 5 | from claude_agent_sdk import ClaudeAgentOptions, AgentDefinition, query, AssistantMessage, ResultMessage, TextBlock 6 | from claude_agent_sdk.types import McpStdioServerConfig 7 | 8 | 9 | async def database_manager_example(prompt: str): 10 | """Example with MongoDB MCP.""" 11 | print("=== MongoDB MCP Example ===") 12 | 13 | # Get connection string from environment variable 14 | connection_string = os.getenv("MONGODB_CONNECTION_STRING") 15 | if not connection_string: 16 | raise ValueError( 17 | "MONGODB_CONNECTION_STRING environment variable not set. " 18 | "Please create a .env file or set the environment variable." 19 | ) 20 | 21 | options = ClaudeAgentOptions( 22 | # subagents 23 | agents={ 24 | "database_reader": AgentDefinition( 25 | description="Reads a MongoDB database", 26 | prompt="You are a database reader. You are able to list all databases and collections in an existing MongoDB database instance, count the number of documents in a collection, and get the schema and storage size of a collection.", 27 | tools=["list-databases", "list-collections", "count", "collection-schema", "collection-storage-size"], 28 | model="sonnet" 29 | ), 30 | "database_writer": AgentDefinition( 31 | description="Writes to a MongoDB database", 32 | prompt="You are a database writer. You are able to insert, update, or delete data in an existing MongoDB database instance. You are also able to create search indexes on the database.", 33 | tools=["insert-many", "create-index", "update-many", "drop-database", "drop-collection"], 34 | model="sonnet" 35 | ), 36 | "database_querier": AgentDefinition( 37 | description="Queries a MongoDB database", 38 | prompt="You are a database querier. You are able to query a MongoDB database instance and return the results in a structured format.", 39 | tools=["find"], 40 | model="sonnet" 41 | ), 42 | }, 43 | mcp_servers={ 44 | "mongodb": McpStdioServerConfig( 45 | command="npx", 46 | args=["-y", "mongodb-mcp-server@latest"], 47 | env={ 48 | "MDB_MCP_CONNECTION_STRING": connection_string 49 | } 50 | ) 51 | }, 52 | permission_mode='bypassPermissions', # Automatically grant permissions for all tools 53 | setting_sources=["user", "project"], # Insert the CLAUDE.md file into the system prompt 54 | ) 55 | 56 | async for message in query( 57 | prompt=prompt, 58 | options=options, 59 | ): 60 | if isinstance(message, AssistantMessage): 61 | for block in message.content: 62 | if isinstance(block, TextBlock): 63 | print(f"Claude: {block.text}") 64 | elif isinstance(message, ResultMessage) and message.total_cost_usd and message.total_cost_usd > 0: 65 | print(f"\nCost: ${message.total_cost_usd:.4f}") 66 | print() 67 | 68 | 69 | async def main(prompt: str): 70 | """Run the multiple agents example.""" 71 | await database_manager_example(prompt=prompt) 72 | 73 | 74 | if __name__ == "__main__": 75 | parser = argparse.ArgumentParser() 76 | parser.add_argument("--prompt", type=str, required=False, default="Return all collections in the database") 77 | args = parser.parse_args() 78 | anyio.run(main, args.prompt) -------------------------------------------------------------------------------- /deepseekv3.2-mongodb/load_hf_to_mongodb.py: -------------------------------------------------------------------------------- 1 | """ 2 | Script to load a Hugging Face dataset and push it to MongoDB. 3 | 4 | This script: 5 | 1. Loads the cfahlgren1/hub-stats dataset from the Hugging Face hub 6 | 2. Converts it to CSV 7 | 3. Pushes the data to a new MongoDB collection 8 | 9 | It can be run with the following command: 10 | 11 | ```bash 12 | uv run --env-file .env load_hf_to_mongodb.py 13 | ``` 14 | """ 15 | 16 | import os 17 | from datasets import load_dataset, get_dataset_config_names 18 | from pymongo import MongoClient 19 | from pymongo.errors import ConnectionFailure 20 | import pandas as pd 21 | import numpy as np 22 | 23 | 24 | def load_hf_dataset(dataset_name: str, config: str): 25 | """Load a dataset from HuggingFace Hub.""" 26 | dataset = load_dataset(dataset_name, config, split="train") 27 | return dataset 28 | 29 | 30 | def convert_to_csv(dataset, output_path: str = "hub_stats.csv"): 31 | """Convert HuggingFace dataset to CSV.""" 32 | os.makedirs(os.path.dirname(output_path), exist_ok=True) 33 | df = dataset.to_pandas() 34 | df.to_csv(output_path, index=False) 35 | return df 36 | 37 | 38 | def connect_to_mongodb(connection_string: str): 39 | """Connect to MongoDB.""" 40 | try: 41 | # Add SSL parameters to work around macOS SSL issues 42 | client = MongoClient( 43 | connection_string, 44 | serverSelectionTimeoutMS=5000, 45 | tls=True, 46 | tlsAllowInvalidCertificates=True # For development only 47 | ) 48 | # Test connection 49 | client.admin.command('ping') 50 | return client 51 | except ConnectionFailure as e: 52 | print(f"Failed to connect to MongoDB: {e}") 53 | raise 54 | 55 | 56 | def push_to_mongodb(df: pd.DataFrame, client: MongoClient, database_name: str, collection_name: str, limit_rows: int = None): 57 | """Push DataFrame to MongoDB collection.""" 58 | # Get database and collection 59 | db = client[database_name] 60 | 61 | # Drop collection if it exists to avoid duplicate key errors 62 | if collection_name in db.list_collection_names(): 63 | db.drop_collection(collection_name) 64 | 65 | collection = db[collection_name] 66 | 67 | # Limit rows for testing if specified 68 | if limit_rows is not None: 69 | df = df.head(limit_rows) 70 | 71 | # Drop columns that contain complex nested arrays (these cause encoding issues) 72 | # We identify columns where the first non-null value is a complex object/array 73 | columns_to_drop = [] 74 | for col in df.columns: 75 | # Get first non-null value 76 | first_val = df[col].dropna().iloc[0] if len(df[col].dropna()) > 0 else None 77 | if first_val is not None: 78 | # Check if it's a complex type (list, dict with nested structures, numpy array) 79 | if isinstance(first_val, (list, np.ndarray)): 80 | columns_to_drop.append(col) 81 | elif isinstance(first_val, dict) and any(isinstance(v, (list, dict)) for v in first_val.values()): 82 | columns_to_drop.append(col) 83 | 84 | if columns_to_drop: 85 | df = df.drop(columns=columns_to_drop) 86 | 87 | # Convert DataFrame to list of dictionaries 88 | records = df.to_dict('records') 89 | 90 | # Replace NaN values with None for MongoDB 91 | for record in records: 92 | for key, value in record.items(): 93 | if pd.isna(value): 94 | record[key] = None 95 | 96 | # Insert documents 97 | if records: 98 | result = collection.insert_many(records) 99 | print(f"✅ {collection_name}: {len(result.inserted_ids)} documents inserted") 100 | 101 | return collection 102 | 103 | 104 | def main(): 105 | """Main function to orchestrate the data loading and upload.""" 106 | 107 | # Configuration 108 | DATASET_NAME = "cfahlgren1/hub-stats" 109 | CONNECTION_STRING = os.getenv("MONGODB_CONNECTION_STRING") 110 | if not CONNECTION_STRING: 111 | raise ValueError("MONGODB_CONNECTION_STRING environment variable not set. Please create a .env file or set the environment variable.") 112 | DATABASE_NAME = "huggingface_data" 113 | 114 | # TEST MODE: Set to None to insert all rows, or a number to limit for testing 115 | LIMIT_ROWS = None # Insert ALL rows 116 | 117 | # Get available configs from the dataset 118 | CONFIGS = get_dataset_config_names(DATASET_NAME) 119 | 120 | try: 121 | # Connect to MongoDB once 122 | client = connect_to_mongodb(CONNECTION_STRING) 123 | 124 | # Process each config 125 | for config in CONFIGS: 126 | COLLECTION_NAME = f"hub_stats_{config}" 127 | CSV_OUTPUT = f"data/hub_stats_{config}.csv" 128 | 129 | try: 130 | # Step 1: Load dataset from HuggingFace 131 | dataset = load_hf_dataset(DATASET_NAME, config) 132 | 133 | # Step 2: Convert to CSV 134 | df = convert_to_csv(dataset, CSV_OUTPUT) 135 | 136 | # Step 3: Push to MongoDB 137 | collection = push_to_mongodb(df, client, DATABASE_NAME, COLLECTION_NAME, limit_rows=LIMIT_ROWS) 138 | 139 | except Exception as e: 140 | print(f"❌ Error processing '{config}': {e}") 141 | # Continue with next config 142 | continue 143 | 144 | print("\n✅ All collections loaded successfully") 145 | 146 | except Exception as e: 147 | print(f"❌ Error: {e}") 148 | raise 149 | finally: 150 | if 'client' in locals(): 151 | client.close() 152 | 153 | 154 | if __name__ == "__main__": 155 | main() 156 | 157 | -------------------------------------------------------------------------------- /deepseekv3.2-mongodb/README.md: -------------------------------------------------------------------------------- 1 | # Using DeepSeek v3.2 and the Claude Agents SDK with MongoDB MCP 2 | 3 | This repository showcases how to combine the following amazing pieces of technology: 4 | 5 | - [DeepSeek v3.2](https://api-docs.deepseek.com/news/news251201) which just got released on Hugging Face, rivaling the best closed-source models such as GPT-5 and Claude Opus 4.5 6 | - the [Claude Agents SDK](https://platform.claude.com/docs/en/agent-sdk/overview) (formerly called Claude Code SDK) 7 | - the official [MongoDB MCP server](https://fandf.co/3LGhRN8). 8 | 9 | ## Overview 10 | 11 | The MongoDB MCP server allows LLMs like DeepSeek v3.2, Claude to interact with your MongoDB databases directly - querying data, listing collections, analyzing schemas, and more. This integration enables powerful database operations through natural language, having agents automatically writing to a database, and more. 12 | 13 | In this tutorial, we make use of the Claude Agents SDK, which is one of the many agent frameworks out there, besides the [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/), Google's [ADK](https://google.github.io/adk-docs/), [LangGraph](https://www.langchain.com/langgraph), [Agno](https://docs.agno.com/), [Pydantic AI](https://ai.pydantic.dev/) and many more. What's amazing is that you can just replace the Claude model by an open weights model like the brand new [DeepSeek v3.2](https://huggingface.co/collections/deepseek-ai/deepseek-v32) (as explained [here](https://api-docs.deepseek.com/guides/anthropic_api)), the only thing to add is the following: 14 | 15 | ```bash 16 | export ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic 17 | export ANTHROPIC_AUTH_TOKEN=${YOUR_API_KEY} 18 | export API_TIMEOUT_MS=600000 19 | export ANTHROPIC_MODEL=deepseek-chat 20 | export ANTHROPIC_SMALL_FAST_MODEL=deepseek-chat 21 | export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 22 | ``` 23 | 24 | Usually, I recommend trying out a framework which finds the right level of abstraction: providing the right building blocks for you to build an agent without abstracting away too much which would it make hard to debug. Additionally, I recommend to always use models which correspond to the creators of the framework: if you intend to use Gemini, then Google's ADK is the recommendation. If you intent to use GPT-5, then the OpenAI Agents SDK is the recommendation, and so on. This is because the model providers know their models better than anyone else, so they'll make sure it works best with a framework developed by them. 25 | 26 | ### Why Claude Agents SDK? 27 | 28 | The reason I went for the Claude Agents SDK in this case was because it provides the exact same environment (also called “harness”) around the LLM which powers [Claude Code](https://www.claude.com/product/claude-code), the popular coding tool that competes with Cursor and others. This means that it comes with the same tooling: support for subagents, Skills, memory, hooks, slash commands, and … MCP. The Model Context Protocol by Anthropic enables agents to connect with tools in a standardized way, similar to how HTTP standardizes the way web browsers connect with web servers. By leveraging the Claude Agents SDK, we can benefit from all the [context engineering](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) tricks that Anthropic has baked into Claude Code. 29 | 30 | ### Context engineering trick: subagents 31 | 32 | In this tutorial, we make use of [subagents](https://platform.claude.com/docs/en/agent-sdk/subagents), which is one of the popular ways to deal with "context rot" of LLMs. As you might know, current LLM agents (powered by models like Claude, GPT-5, Gemini 3,...) have a major limitation: their context window is limited. Even though model providers claim to support context windows of 200k tokens up to a million and more, in reality the performance starts to degrade once you provide more than a 100k tokens. This makes the LLM quickly get confused, doing the wrong things, choosing the wrong tools, and so on. This problem is known as “context rot” and was first shown by research done at [Chroma](https://research.trychroma.com/context-rot). 33 | 34 | By creating various subagents, each of which is specialized to handle one specific task, the main agent (oftentimes called the "orchestrator" agent) will get less confused. By leveraging subagents, the context window of the main agent doesn't get polluted when solving subproblems. It can simply hand off subtasks to different subagents, each of which can leverage specific tools which are tailored towards the task it is solving. Limiting the number of tools has a positive impact on the performance as the main LLM agent will be less confused. I recommend this read to learn more about the rise of subagents: https://www.philschmid.de/the-rise-of-subagents. 35 | 36 | In our example, we created 3 subagents, each of which has a different system prompt and which only uses its own set of tools from the MongoDB MCP server: 37 | 38 | - a reader agent, able to perform read-only operations on the database 39 | - a writer agent, able to write, update and delete data to and from the database 40 | - a query agent, able to find the most relevant data given a user query. 41 | 42 | The official MongoDB MCP server comes with 26 tools by default (they are listed [here](https://github.com/mongodb-js/mongodb-mcp-server?tab=readme-ov-file#%EF%B8%8F-supported-tools)). Rather than having a single agent which has access to all of these tools, we can split it up into specialized subagent, each of which only has access to a select amount of tools required to perform its task, whether it's reading data, writing data or querying data. 43 | 44 | ## Setup 45 | 46 | First, head over to MongoDB to create a new database: https://www.mongodb.com. Click "Get Started" and then create your first cluster. By default, a dummy database called "sample_mflix" is created which contains some sample collections about movies, theaters and comments about those movies. 47 | 48 | Next, verify the connection to your database instance by whitelisting your IP address and obtaining the connection string. 49 | 50 | 1. **Install dependencies:** 51 | 52 | Next, install this project with the following command: 53 | 54 | ```bash 55 | uv sync 56 | ``` 57 | 58 | 2. **Set your MongoDB connection string:** 59 | 60 | Next, create a `.env` file in the project root: 61 | 62 | ```bash 63 | cp .env.example .env 64 | ``` 65 | 66 | Then edit `.env` and add the following environment variables: 67 | 68 | ```bash 69 | MONGODB_CONNECTION_STRING=mongodb+srv://your-username:your-password@your-cluster.mongodb.net/ 70 | ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic 71 | ANTHROPIC_AUTH_TOKEN=your-api-token 72 | API_TIMEOUT_MS=600000 73 | ANTHROPIC_MODEL=deepseek-chat 74 | ANTHROPIC_SMALL_FAST_MODEL=deepseek-chat 75 | CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 76 | ``` 77 | 78 | ## Usage 79 | 80 | ### Run the basic example: 81 | 82 | ```bash 83 | # Using .env file (recommended) 84 | uv run --env-file .env main.py 85 | 86 | # Or with a custom prompt 87 | uv run --env-file .env main.py --prompt "How many movies are in the database?" 88 | 89 | uv run --env-file .env main.py --prompt "What are the top 10 most recent movies?" 90 | 91 | uv run --env-file .env main.py --prompt "Analyze the schema of all collections in the database" 92 | 93 | uv run --env-file .env main.py --prompt "What's the size of the movies collection?" 94 | ``` 95 | 96 | ### More fun: adding actual real-world data 97 | 98 | Besides the dummy movie data that MongoDB provides, I also experimented with adding real-world data from Hugging Face into MongoDB which the agent can then query. Here I leveraged the [cfahlgren1/hub-stats](https://huggingface.co/datasets/cfahlgren1/hub-stats) dataset which contains useful statistics about models, datasets, papers and more on the hub. One can run the following script to migrate it to MongoDB: 99 | 100 | ```bash 101 | uv run --env-file .env load_hf_to_mongodb.py 102 | ``` 103 | 104 | Next, you should be able to ask questions like: 105 | 106 | ```bash 107 | uv run --env-file .env main.py --prompt "Give me the top 10 most popular models on the hub?" 108 | ``` 109 | 110 | ## Configuration 111 | 112 | The MongoDB MCP server can be configured in two ways: 113 | 114 | ### Method 1: Environment Variable (Recommended) 115 | 116 | ```python 117 | options = ClaudeAgentOptions( 118 | mcp_servers={ 119 | "mongodb": McpStdioServerConfig( 120 | command="npx", 121 | args=["-y", "mongodb-mcp-server@latest", "--readOnly"], 122 | env={ 123 | "MDB_MCP_CONNECTION_STRING": "your-connection-string" 124 | } 125 | ) 126 | } 127 | ) 128 | ``` 129 | 130 | ### Method 2: Command Argument 131 | 132 | ```python 133 | options = ClaudeAgentOptions( 134 | mcp_servers={ 135 | "mongodb": McpStdioServerConfig( 136 | command="npx", 137 | args=[ 138 | "-y", 139 | "mongodb-mcp-server@latest", 140 | "--connectionString", 141 | "your-connection-string" 142 | ] 143 | ) 144 | } 145 | ) 146 | ``` 147 | 148 | ## Resources 149 | 150 | - [Claude Agent SDK Python Docs](https://platform.claude.com/docs/en/agent-sdk/overview) 151 | - [MongoDB MCP Server Documentation](https://www.mongodb.com/docs/mcp-server/) 152 | - [Model Context Protocol](https://modelcontextprotocol.io/) 153 | --------------------------------------------------------------------------------