├── README.md
└── Gemini API quickstart.txt


/README.md:
--------------------------------------------------------------------------------
 1 | # Complete Gemini API Documentation for Offline LLM
 2 | > ⚠️ **Disclaimer**  
 3 | > This documentation is an enhanced, summarized, and reorganized version of the [official Gemini API documentation](https://ai.google.dev).  
 4 | > It is shared for educational and reference purposes only. Always consult the official docs for the latest updates.
 5 | 
 6 | 
 7 | This file contains the complete, single-file documentation for the Google Gemini API.
 8 | 
 9 | ## Purpose
10 | 
11 | The content herein is structured as a comprehensive knowledge base, specifically formatted to be ingested by an offline Large Language Model (LLM). By providing this entire document as context, an offline LLM can:
12 | 
13 | *   Answer questions about the Gemini API's features and functionalities.
14 | *   Generate code snippets for various API use cases.
15 | *   Explain concepts related to the different Gemini models, rate limits, and supported features.
16 | 
17 | ## Content
18 | 
19 | This document includes detailed information on:
20 | 
21 | *   **Quickstart**: Initial setup and first API requests.
22 | *   **Gemini Models**: An overview of all available model variants and their specific use cases.
23 | *   **Core Features**: In-depth explanations and code examples for:
24 |     *   Text Generation
25 |     *   Image Generation & Understanding
26 |     *   Video & Audio Understanding
27 |     *   Function Calling
28 |     *   Structured Output
29 |     *   and much more.
30 | *   **Advanced Topics**: Guides on context caching, batch processing, and safety settings.
31 | 
32 | ## Usage
33 | 
34 | To use this documentation with an offline LLM, simply load the entire content of this `Gemini API quickstart.txt` file as a single text input or document. The LLM will then be able to use the information contained within to respond to your queries about the Gemini API.
35 | 


--------------------------------------------------------------------------------
/Gemini API quickstart.txt:
--------------------------------------------------------------------------------
   1 | Gemini API quickstart
   2 | 
   3 | This quickstart shows you how to install our libraries and make your first Gemini API request.
   4 | 
   5 | Before you begin
   6 | You need a Gemini API key. If you don't already have one, you can get it for free in Google AI Studio.
   7 | 
   8 | Install the Google GenAI SDK
   9 | 
  10 | pip install -q -U google-genai
  11 | 
  12 | 
  13 | 
  14 | Make your first request
  15 | Here is an example that uses the generateContent method to send a request to the Gemini API using the Gemini 2.5 Flash model.
  16 | 
  17 | If you set your API key as the environment variable GEMINI_API_KEY, it will be picked up automatically by the client when using the Gemini API libraries. Otherwise you will need to pass your API key as an argument when initializing the client.
  18 | 
  19 | Note that all code samples in the Gemini API docs assume that you have set the environment variable GEMINI_API_KEY.
  20 | 
  21 | 
  22 | 
  23 | from google import genai
  24 | 
  25 | # The client gets the API key from the environment variable `GEMINI_API_KEY`.
  26 | client = genai.Client()
  27 | 
  28 | response = client.models.generate_content(
  29 |     model="gemini-2.5-flash", contents="Explain how AI works in a few words"
  30 | )
  31 | print(response.text)
  32 | 
  33 | 
  34 | 
  35 | 
  36 | "Thinking" is on by default on many of our code samples
  37 | Many code samples on this site use the Gemini 2.5 Flash model, which has the "thinking" feature enabled by default to enhance response quality. You should be aware that this may increase response time and token usage. If you prioritize speed or wish to minimize costs, you can disable this feature by setting the thinking budget to zero, as shown in the examples below. For more details, see the thinking guide.
  38 | 
  39 | Note: Thinking is only available on Gemini 2.5 series models and can't be disabled on Gemini 2.5 Pro.
  40 | 
  41 | 
  42 | 
  43 | from google import genai
  44 | from google.genai import types
  45 | 
  46 | client = genai.Client()
  47 | 
  48 | response = client.models.generate_content(
  49 |     model="gemini-2.5-flash",
  50 |     contents="Explain how AI works in a few words",
  51 |     config=types.GenerateContentConfig(
  52 |         thinking_config=types.ThinkingConfig(thinking_budget=0) # Disables thinking
  53 |     ),
  54 | )
  55 | print(response.text)
  56 | 
  57 | 
  58 | 
  59 | 
  60 | 
  61 | Providing API key explicitly
  62 | In some cases, you may want to explicitly provide an API key. For example:
  63 | 
  64 | You're doing a simple API call and prefer hard coding the API key.
  65 | You want explicit control without having to rely on automatic discovery of environment variables by the Gemini API libraries
  66 | You're using an environment where environment variables are not supported (e.g web) or you are making REST calls.
  67 | Below are examples for how you can provide an API key explicitly:
  68 | 
  69 | 
  70 | 
  71 | from google import genai
  72 | 
  73 | client = genai.Client(api_key="YOUR_API_KEY")
  74 | 
  75 | response = client.models.generate_content(
  76 |     model="gemini-2.5-flash", contents="Explain how AI works in a few words"
  77 | )
  78 | print(response.text)
  79 | 
  80 | 
  81 | 
  82 | Gemini models
  83 | 
  84 | 
  85 | 
  86 | Model variants
  87 | The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available:
  88 | 
  89 | Model variant	Input(s)	Output	Optimized for
  90 | Gemini 2.5 Pro
  91 | gemini-2.5-pro	Audio, images, videos, text, and PDF	Text	Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more
  92 | Gemini 2.5 Flash
  93 | gemini-2.5-flash	Audio, images, videos, and text	Text	Adaptive thinking, cost efficiency
  94 | Gemini 2.5 Flash-Lite Preview
  95 | gemini-2.5-flash-lite-preview-06-17	Text, image, video, audio	Text	Most cost-efficient model supporting high throughput
  96 | Gemini 2.5 Flash Native Audio
  97 | gemini-2.5-flash-preview-native-audio-dialog &
  98 | gemini-2.5-flash-exp-native-audio-thinking-dialog	Audio, videos, and text	Text and audio, interleaved	High quality, natural conversational audio outputs, with or without thinking
  99 | Gemini 2.5 Flash Preview TTS
 100 | gemini-2.5-flash-preview-tts	Text	Audio	Low latency, controllable, single- and multi-speaker text-to-speech audio generation
 101 | Gemini 2.5 Pro Preview TTS
 102 | gemini-2.5-pro-preview-tts	Text	Audio	Low latency, controllable, single- and multi-speaker text-to-speech audio generation
 103 | Gemini 2.0 Flash
 104 | gemini-2.0-flash	Audio, images, videos, and text	Text	Next generation features, speed, and realtime streaming.
 105 | Gemini 2.0 Flash Preview Image Generation
 106 | gemini-2.0-flash-preview-image-generation	Audio, images, videos, and text	Text, images	Conversational image generation and editing
 107 | Gemini 2.0 Flash-Lite
 108 | gemini-2.0-flash-lite	Audio, images, videos, and text	Text	Cost efficiency and low latency
 109 | Gemini 1.5 Flash
 110 | gemini-1.5-flash	Audio, images, videos, and text	Text	Fast and versatile performance across a diverse variety of tasks
 111 | Deprecated
 112 | Gemini 1.5 Flash-8B
 113 | gemini-1.5-flash-8b	Audio, images, videos, and text	Text	High volume and lower intelligence tasks
 114 | Deprecated
 115 | Gemini 1.5 Pro
 116 | gemini-1.5-pro	Audio, images, videos, and text	Text	Complex reasoning tasks requiring more intelligence
 117 | Deprecated
 118 | Gemini Embedding
 119 | gemini-embedding-001	Text	Text embeddings	Measuring the relatedness of text strings
 120 | Imagen 4
 121 | imagen-4.0-generate-preview-06-06
 122 | imagen-4.0-ultra-generate-preview-06-06	Text	Images	Our most up-to-date image generation model
 123 | Imagen 3
 124 | imagen-3.0-generate-002	Text	Images	High quality image generation model
 125 | Veo 2
 126 | veo-2.0-generate-001	Text, images	Video	High quality video generation
 127 | Gemini 2.5 Flash Live
 128 | gemini-live-2.5-flash-preview	Audio, video, and text	Text, audio	Low-latency bidirectional voice and video interactions
 129 | Gemini 2.0 Flash Live
 130 | gemini-2.0-flash-live-001	Audio, video, and text	Text, audio	Low-latency bidirectional voice and video interactions
 131 | 
 132 | 
 133 | 
 134 | 
 135 | 
 136 | Rate limits
 137 | 
 138 | Rate limits regulate the number of requests you can make to the Gemini API within a given timeframe. These limits help maintain fair usage, protect against abuse, and help maintain system performance for all users.
 139 | 
 140 | How rate limits work
 141 | Rate limits are usually measured across three dimensions:
 142 | 
 143 | Requests per minute (RPM)
 144 | Requests per day (RPD)
 145 | Tokens per minute (input) (TPM)
 146 | Your usage is evaluated against each limit, and exceeding any of them will trigger a rate limit error. For example, if your RPM limit is 20, making 21 requests within a minute will result in an error, even if you haven't exceeded your TPM or other limits.
 147 | 
 148 | Rate limits are applied per project, not per API key.
 149 | 
 150 | Limits vary depending on the specific model being used, and some limits only apply to specific models. For example, Images per minute, or IPM, is only calculated for models capable of generating images (Imagen 3), but is conceptually similar to TPM. Other models might have a token per day limit (TPD).
 151 | 
 152 | Rate limits are more restricted for experimental and preview models.
 153 | 
 154 | Usage tiers
 155 | Rate limits are tied to the project's usage tier. As your API usage and spending increase, you'll have an option to upgrade to a higher tier with increased rate limits.
 156 | 
 157 | 
 158 | Free Tier
 159 | Tier 1
 160 | Tier 2
 161 | Tier 3
 162 | Model	RPM	TPM	RPD
 163 | Text-out models
 164 | Gemini 2.5 Pro	5	250,000	100
 165 | Gemini 2.5 Flash	10	250,000	250
 166 | Gemini 2.5 Flash-Lite Preview 06-17	15	250,000	1,000
 167 | Gemini 2.0 Flash	15	1,000,000	200
 168 | Gemini 2.0 Flash-Lite	30	1,000,000	200
 169 | Live API
 170 | Gemini 2.5 Flash Live	3 sessions	1,000,000	
 171 | Gemini 2.5 Flash Preview Native Audio Dialog	1 session	25,000	5
 172 | Gemini 2.5 Flash Experimental Native Audio Thinking Dialog	1 session	10,000	5
 173 | Gemini 2.0 Flash Live	3 sessions	1,000,000	
 174 | Multi-modal generation models
 175 | Gemini 2.5 Flash Preview TTS	3	10,000	15
 176 | Gemini 2.5 Pro Preview TTS			
 177 | Gemini 2.0 Flash Preview Image Generation	10	200,000	100
 178 | Imagen 3			
 179 | Veo 2			
 180 | Other models
 181 | Gemma 3 & 3n	30	15,000	14,400
 182 | Gemini Embedding	100	30,000	1,000
 183 | Deprecated models
 184 | Gemini 1.5 Flash (Deprecated)	15	250,000	50
 185 | Gemini 1.5 Flash-8B (Deprecated)	15	250,000	50
 186 | Gemini 1.5 Pro (Deprecated)	
 187 | 
 188 | 
 189 | 
 190 | 
 191 | 
 192 | Batch Mode rate limits
 193 | Batch Mode requests are subject to their own rate limits, separate from the non-batch mode API calls.
 194 | 
 195 | Concurrent batch requests: 100
 196 | Input file size limit: 2GB
 197 | File storage limit: 20GB
 198 | Enqueued tokens per model: The following table outlines the maximum number of tokens that can be enqueued for batch processing across all your active batch jobs for a given model.
 199 | How to upgrade to the next tier
 200 | The Gemini API uses Cloud Billing for all billing services. To transition from the Free tier to a paid tier, you must first enable Cloud Billing for your Google Cloud project.
 201 | 
 202 | Once your project meets the specified criteria, it becomes eligible for an upgrade to the next tier. To request an upgrade, follow these steps:
 203 | 
 204 | Navigate to the API keys page in AI Studio.
 205 | Locate the project you want to upgrade and click "Upgrade". The "Upgrade" option will only show up for projects that meet next tier qualifications.
 206 | After a quick validation, the project will be upgraded to the next tier.
 207 | 
 208 | 
 209 | 
 210 | 
 211 | 
 212 | 
 213 | 
 214 | 
 215 | 
 216 | Text generation
 217 | 
 218 | The Gemini API can generate text output from various inputs, including text, images, video, and audio, leveraging Gemini models.
 219 | 
 220 | Here's a basic example that takes a single text input:
 221 | 
 222 | from google import genai
 223 | 
 224 | client = genai.Client()
 225 | 
 226 | response = client.models.generate_content(
 227 |     model="gemini-2.5-flash",
 228 |     contents="How does AI work?"
 229 | )
 230 | print(response.text)
 231 | 
 232 | 
 233 | 
 234 | 
 235 | Thinking with Gemini 2.5
 236 | 2.5 Flash and Pro models have "thinking" enabled by default to enhance quality, which may take longer to run and increase token usage.
 237 | 
 238 | When using 2.5 Flash, you can disable thinking by setting the thinking budget to zero
 239 | 
 240 | 
 241 | from google import genai
 242 | from google.genai import types
 243 | 
 244 | client = genai.Client()
 245 | 
 246 | response = client.models.generate_content(
 247 |     model="gemini-2.5-flash",
 248 |     contents="How does AI work?",
 249 |     config=types.GenerateContentConfig(
 250 |         thinking_config=types.ThinkingConfig(thinking_budget=0) # Disables thinking
 251 |     ),
 252 | )
 253 | print(response.text)
 254 | 
 255 | 
 256 | 
 257 | 
 258 | 
 259 | System instructions and other configurations
 260 | You can guide the behavior of Gemini models with system instructions. To do so, pass a GenerateContentConfig object.
 261 | 
 262 | 
 263 | from google import genai
 264 | from google.genai import types
 265 | 
 266 | client = genai.Client()
 267 | 
 268 | response = client.models.generate_content(
 269 |     model="gemini-2.5-flash",
 270 |     config=types.GenerateContentConfig(
 271 |         system_instruction="You are a cat. Your name is Neko."),
 272 |     contents="Hello there"
 273 | )
 274 | 
 275 | print(response.text)
 276 | 
 277 | 
 278 | The GenerateContentConfig object also lets you override default generation parameters, such as temperature.
 279 | 
 280 | 
 281 | from google import genai
 282 | from google.genai import types
 283 | 
 284 | client = genai.Client()
 285 | 
 286 | response = client.models.generate_content(
 287 |     model="gemini-2.5-flash",
 288 |     contents=["Explain how AI works"],
 289 |     config=types.GenerateContentConfig(
 290 |         temperature=0.1
 291 |     )
 292 | )
 293 | print(response.text)
 294 | 
 295 | 
 296 | 
 297 | Multimodal inputs
 298 | The Gemini API supports multimodal inputs, allowing you to combine text with media files. The following example demonstrates providing an image:
 299 | 
 300 | from PIL import Image
 301 | from google import genai
 302 | 
 303 | client = genai.Client()
 304 | 
 305 | image = Image.open("/path/to/organ.png")
 306 | response = client.models.generate_content(
 307 |     model="gemini-2.5-flash",
 308 |     contents=[image, "Tell me about this instrument"]
 309 | )
 310 | print(response.text)
 311 | 
 312 | 
 313 | 
 314 | Streaming responses
 315 | By default, the model returns a response only after the entire generation process is complete.
 316 | 
 317 | For more fluid interactions, use streaming to receive GenerateContentResponse instances incrementally as they're generated.
 318 | 
 319 | from google import genai
 320 | 
 321 | client = genai.Client()
 322 | 
 323 | response = client.models.generate_content_stream(
 324 |     model="gemini-2.5-flash",
 325 |     contents=["Explain how AI works"]
 326 | )
 327 | for chunk in response:
 328 |     print(chunk.text, end="")
 329 | 
 330 | 
 331 | 
 332 | Multi-turn conversations (Chat)
 333 | Our SDKs provide functionality to collect multiple rounds of prompts and responses into a chat, giving you an easy way to keep track of the conversation history.
 334 | 
 335 | Note: Chat functionality is only implemented as part of the SDKs. Behind the scenes, it still uses the generateContent API. For multi-turn conversations, the full conversation history is sent to the model with each follow-up turn.
 336 | 
 337 | 
 338 | from google import genai
 339 | 
 340 | client = genai.Client()
 341 | chat = client.chats.create(model="gemini-2.5-flash")
 342 | 
 343 | response = chat.send_message("I have 2 dogs in my house.")
 344 | print(response.text)
 345 | 
 346 | response = chat.send_message("How many paws are in my house?")
 347 | print(response.text)
 348 | 
 349 | for message in chat.get_history():
 350 |     print(f'role - {message.role}',end=": ")
 351 |     print(message.parts[0].text)
 352 | 
 353 | 
 354 | 
 355 | Streaming can also be used for multi-turn conversations.
 356 | 
 357 | 
 358 | from google import genai
 359 | 
 360 | client = genai.Client()
 361 | chat = client.chats.create(model="gemini-2.5-flash")
 362 | 
 363 | response = chat.send_message_stream("I have 2 dogs in my house.")
 364 | for chunk in response:
 365 |     print(chunk.text, end="")
 366 | 
 367 | response = chat.send_message_stream("How many paws are in my house?")
 368 | for chunk in response:
 369 |     print(chunk.text, end="")
 370 | 
 371 | for message in chat.get_history():
 372 |     print(f'role - {message.role}', end=": ")
 373 |     print(message.parts[0].text)
 374 | 
 375 | 
 376 | 
 377 | 
 378 | 
 379 | Image generation
 380 | 
 381 | You can generate images using the Gemini API with either Gemini's built-in multimodal capabilities or Imagen, Google's specialized image generation models. For most use cases, start with Gemini. Choose Imagen for specialized tasks where image quality is critical. See Choosing the right model section for more guidance.
 382 | 
 383 | All generated images include a SynthID watermark.
 384 | 
 385 | Before you begin
 386 | Ensure you use a supported model and version for image generation:
 387 | 
 388 | For Gemini, use Gemini 2.0 Flash Preview Image Generation.
 389 | 
 390 | For Imagen, use one of the Imagen models (Imagen 3, Imagen 4 or Imagen 4 Ultra).
 391 | 
 392 | Note that those models are only available on the Paid tier.
 393 | You can access both Gemini and Imagen models using the same libraries.
 394 | 
 395 | Note: Image generation may not be available in all regions and countries, review our Models page for more information.
 396 | Generate images using Gemini
 397 | Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both to achieve various image-related tasks, such as image generation and editing.
 398 | 
 399 | You must include responseModalities: ["TEXT", "IMAGE"] in your configuration. Image-only output is not supported with these models.
 400 | 
 401 | Image generation (text-to-image)
 402 | The following code demonstrates how to generate an image based on a descriptive prompt:
 403 | 
 404 | 
 405 | from google import genai
 406 | from google.genai import types
 407 | from PIL import Image
 408 | from io import BytesIO
 409 | import base64
 410 | 
 411 | client = genai.Client()
 412 | 
 413 | contents = ('Hi, can you create a 3d rendered image of a pig '
 414 |             'with wings and a top hat flying over a happy '
 415 |             'futuristic scifi city with lots of greenery?')
 416 | 
 417 | response = client.models.generate_content(
 418 |     model="gemini-2.0-flash-preview-image-generation",
 419 |     contents=contents,
 420 |     config=types.GenerateContentConfig(
 421 |       response_modalities=['TEXT', 'IMAGE']
 422 |     )
 423 | )
 424 | 
 425 | for part in response.candidates[0].content.parts:
 426 |   if part.text is not None:
 427 |     print(part.text)
 428 |   elif part.inline_data is not None:
 429 |     image = Image.open(BytesIO((part.inline_data.data)))
 430 |     image.save('gemini-native-image.png')
 431 |     image.show()
 432 | 
 433 | 
 434 | 
 435 | 
 436 | Speech generation (text-to-speech)
 437 | 
 438 | The Gemini API can transform text input into single speaker or multi-speaker audio using native text-to-speech (TTS) generation capabilities. Text-to-speech (TTS) generation is controllable, meaning you can use natural language to structure interactions and guide the style, accent, pace, and tone of the audio.
 439 | 
 440 | The TTS capability differs from speech generation provided through the Live API, which is designed for interactive, unstructured audio, and multimodal inputs and outputs. While the Live API excels in dynamic conversational contexts, TTS through the Gemini API is tailored for scenarios that require exact text recitation with fine-grained control over style and sound, such as podcast or audiobook generation.
 441 | 
 442 | This guide shows you how to generate single-speaker and multi-speaker audio from text.
 443 | 
 444 | Preview: Native text-to-speech (TTS) is in Preview.
 445 | Before you begin
 446 | Ensure you use a Gemini 2.5 model variant with native text-to-speech (TTS) capabilities, as listed in the Supported models section. For optimal results, consider which model best fits your specific use case.
 447 | 
 448 | You may find it useful to test the Gemini 2.5 TTS models in AI Studio before you start building.
 449 | 
 450 | Note: TTS models accept text-only inputs and produce audio-only outputs. For a complete list of restrictions specific to TTS models, review the Limitations section.
 451 | Single-speaker text-to-speech
 452 | To convert text to single-speaker audio, set the response modality to "audio", and pass a SpeechConfig object with VoiceConfig set. You'll need to choose a voice name from the prebuilt output voices.
 453 | 
 454 | This example saves the output audio from the model in a wave file:
 455 | 
 456 | from google import genai
 457 | from google.genai import types
 458 | import wave
 459 | 
 460 | # Set up the wave file to save the output:
 461 | def wave_file(filename, pcm, channels=1, rate=24000, sample_width=2):
 462 |    with wave.open(filename, "wb") as wf:
 463 |       wf.setnchannels(channels)
 464 |       wf.setsampwidth(sample_width)
 465 |       wf.setframerate(rate)
 466 |       wf.writeframes(pcm)
 467 | 
 468 | client = genai.Client()
 469 | 
 470 | response = client.models.generate_content(
 471 |    model="gemini-2.5-flash-preview-tts",
 472 |    contents="Say cheerfully: Have a wonderful day!",
 473 |    config=types.GenerateContentConfig(
 474 |       response_modalities=["AUDIO"],
 475 |       speech_config=types.SpeechConfig(
 476 |          voice_config=types.VoiceConfig(
 477 |             prebuilt_voice_config=types.PrebuiltVoiceConfig(
 478 |                voice_name='Kore',
 479 |             )
 480 |          )
 481 |       ),
 482 |    )
 483 | )
 484 | 
 485 | data = response.candidates[0].content.parts[0].inline_data.data
 486 | 
 487 | file_name='out.wav'
 488 | wave_file(file_name, data) # Saves the file to current directory
 489 | 
 490 | 
 491 | 
 492 | Multi-speaker text-to-speech
 493 | For multi-speaker audio, you'll need a MultiSpeakerVoiceConfig object with each speaker (up to 2) configured as a SpeakerVoiceConfig. You'll need to define each speaker with the same names used in the prompt:
 494 | 
 495 | 
 496 | from google import genai
 497 | from google.genai import types
 498 | import wave
 499 | 
 500 | # Set up the wave file to save the output:
 501 | def wave_file(filename, pcm, channels=1, rate=24000, sample_width=2):
 502 |    with wave.open(filename, "wb") as wf:
 503 |       wf.setnchannels(channels)
 504 |       wf.setsampwidth(sample_width)
 505 |       wf.setframerate(rate)
 506 |       wf.writeframes(pcm)
 507 | 
 508 | client = genai.Client()
 509 | 
 510 | prompt = """TTS the following conversation between Joe and Jane:
 511 |          Joe: How's it going today Jane?
 512 |          Jane: Not too bad, how about you?"""
 513 | 
 514 | response = client.models.generate_content(
 515 |    model="gemini-2.5-flash-preview-tts",
 516 |    contents=prompt,
 517 |    config=types.GenerateContentConfig(
 518 |       response_modalities=["AUDIO"],
 519 |       speech_config=types.SpeechConfig(
 520 |          multi_speaker_voice_config=types.MultiSpeakerVoiceConfig(
 521 |             speaker_voice_configs=[
 522 |                types.SpeakerVoiceConfig(
 523 |                   speaker='Joe',
 524 |                   voice_config=types.VoiceConfig(
 525 |                      prebuilt_voice_config=types.PrebuiltVoiceConfig(
 526 |                         voice_name='Kore',
 527 |                      )
 528 |                   )
 529 |                ),
 530 |                types.SpeakerVoiceConfig(
 531 |                   speaker='Jane',
 532 |                   voice_config=types.VoiceConfig(
 533 |                      prebuilt_voice_config=types.PrebuiltVoiceConfig(
 534 |                         voice_name='Puck',
 535 |                      )
 536 |                   )
 537 |                ),
 538 |             ]
 539 |          )
 540 |       )
 541 |    )
 542 | )
 543 | 
 544 | data = response.candidates[0].content.parts[0].inline_data.data
 545 | 
 546 | file_name='out.wav'
 547 | wave_file(file_name, data) # Saves the file to current directory
 548 | 
 549 | 
 550 | 
 551 | 
 552 | Controlling speech style with prompts
 553 | You can control style, tone, accent, and pace using natural language prompts for both single- and multi-speaker TTS. For example, in a single-speaker prompt, you can say:
 554 | 
 555 | 
 556 | Say in an spooky whisper:
 557 | "By the pricking of my thumbs...
 558 | Something wicked this way comes"
 559 | In a multi-speaker prompt, provide the model with each speaker's name and corresponding transcript. You can also provide guidance for each speaker individually:
 560 | 
 561 | 
 562 | Make Speaker1 sound tired and bored, and Speaker2 sound excited and happy:
 563 | 
 564 | Speaker1: So... what's on the agenda today?
 565 | Speaker2: You're never going to guess!
 566 | Try using a voice option that corresponds to the style or emotion you want to convey, to emphasize it even more. In the previous prompt, for example, Enceladus's breathiness might emphasize "tired" and "bored", while Puck's upbeat tone could complement "excited" and "happy".
 567 | 
 568 | Generating a prompt to convert to audio
 569 | The TTS models only output audio, but you can use other models to generate a transcript first, then pass that transcript to the TTS model to read aloud.
 570 | 
 571 | 
 572 | from google import genai
 573 | from google.genai import types
 574 | 
 575 | client = genai.Client()
 576 | 
 577 | transcript = client.models.generate_content(
 578 |    model="gemini-2.0-flash",
 579 |    contents="""Generate a short transcript around 100 words that reads
 580 |             like it was clipped from a podcast by excited herpetologists.
 581 |             The hosts names are Dr. Anya and Liam.""").text
 582 | 
 583 | response = client.models.generate_content(
 584 |    model="gemini-2.5-flash-preview-tts",
 585 |    contents=transcript,
 586 |    config=types.GenerateContentConfig(
 587 |       response_modalities=["AUDIO"],
 588 |       speech_config=types.SpeechConfig(
 589 |          multi_speaker_voice_config=types.MultiSpeakerVoiceConfig(
 590 |             speaker_voice_configs=[
 591 |                types.SpeakerVoiceConfig(
 592 |                   speaker='Dr. Anya',
 593 |                   voice_config=types.VoiceConfig(
 594 |                      prebuilt_voice_config=types.PrebuiltVoiceConfig(
 595 |                         voice_name='Kore',
 596 |                      )
 597 |                   )
 598 |                ),
 599 |                types.SpeakerVoiceConfig(
 600 |                   speaker='Liam',
 601 |                   voice_config=types.VoiceConfig(
 602 |                      prebuilt_voice_config=types.PrebuiltVoiceConfig(
 603 |                         voice_name='Puck',
 604 |                      )
 605 |                   )
 606 |                ),
 607 |             ]
 608 |          )
 609 |       )
 610 |    )
 611 | )
 612 | 
 613 | # ...Code to stream or save the output
 614 | 
 615 | 
 616 | Voice options
 617 | TTS models support the following 30 voice options in the voice_name field:
 618 | 
 619 | Zephyr -- Bright	Puck -- Upbeat	Charon -- Informative
 620 | Kore -- Firm	Fenrir -- Excitable	Leda -- Youthful
 621 | Orus -- Firm	Aoede -- Breezy	Callirrhoe -- Easy-going
 622 | Autonoe -- Bright	Enceladus -- Breathy	Iapetus -- Clear
 623 | Umbriel -- Easy-going	Algieba -- Smooth	Despina -- Smooth
 624 | Erinome -- Clear	Algenib -- Gravelly	Rasalgethi -- Informative
 625 | Laomedeia -- Upbeat	Achernar -- Soft	Alnilam -- Firm
 626 | Schedar -- Even	Gacrux -- Mature	Pulcherrima -- Forward
 627 | Achird -- Friendly	Zubenelgenubi -- Casual	Vindemiatrix -- Gentle
 628 | Sadachbia -- Lively	Sadaltager -- Knowledgeable	Sulafat -- Warm
 629 | You can hear all the voice options in AI Studio.
 630 | 
 631 | Supported languages
 632 | The TTS models detect the input language automatically. They support the following 24 languages:
 633 | 
 634 | Language	BCP-47 Code	Language	BCP-47 Code
 635 | Arabic (Egyptian)	ar-EG	German (Germany)	de-DE
 636 | English (US)	en-US	Spanish (US)	es-US
 637 | French (France)	fr-FR	Hindi (India)	hi-IN
 638 | Indonesian (Indonesia)	id-ID	Italian (Italy)	it-IT
 639 | Japanese (Japan)	ja-JP	Korean (Korea)	ko-KR
 640 | Portuguese (Brazil)	pt-BR	Russian (Russia)	ru-RU
 641 | Dutch (Netherlands)	nl-NL	Polish (Poland)	pl-PL
 642 | Thai (Thailand)	th-TH	Turkish (Turkey)	tr-TR
 643 | Vietnamese (Vietnam)	vi-VN	Romanian (Romania)	ro-RO
 644 | Ukrainian (Ukraine)	uk-UA	Bengali (Bangladesh)	bn-BD
 645 | English (India)	en-IN & hi-IN bundle	Marathi (India)	mr-IN
 646 | Tamil (India)	ta-IN	Telugu (India)	te-IN
 647 | Supported models
 648 | Model	Single speaker	Multispeaker
 649 | Gemini 2.5 Flash Preview TTS✔️	✔️
 650 | Gemini 2.5 Pro Preview TTS	✔️	✔️
 651 | Limitations
 652 | TTS models can only receive text inputs and generate audio outputs.
 653 | A TTS session has a context window limit of 32k tokens.
 654 | Review Languages section for language support.
 655 | 
 656 | 
 657 | 
 658 | 
 659 | Getting started with long context
 660 | Earlier versions of generative models were only able to process 8,000 tokens at a time. Newer models pushed this further by accepting 32,000 or even 128,000 tokens. Gemini is the first model capable of accepting 1 million tokens.
 661 | 
 662 | In practice, 1 million tokens would look like:
 663 | 
 664 | 50,000 lines of code (with the standard 80 characters per line)
 665 | All the text messages you have sent in the last 5 years
 666 | 8 average length English novels
 667 | Transcripts of over 200 average length podcast episodes
 668 | 
 669 | 
 670 | Structured output
 671 | 
 672 | You can configure Gemini for structured output instead of unstructured text, allowing precise extraction and standardization of information for further processing. For example, you can use structured output to extract information from resumes, standardize them to build a structured database.
 673 | 
 674 | Gemini can generate either JSON or enum values as structured output.
 675 | 
 676 | Generating JSON
 677 | There are two ways to generate JSON using the Gemini API:
 678 | 
 679 | Configure a schema on the model
 680 | Provide a schema in a text prompt
 681 | Configuring a schema on the model is the recommended way to generate JSON, because it constrains the model to output JSON.
 682 | 
 683 | Configuring a schema (recommended)
 684 | To constrain the model to generate JSON, configure a responseSchema. The model will then respond to any prompt with JSON-formatted output.
 685 | 
 686 | 
 687 | from google import genai
 688 | from pydantic import BaseModel
 689 | 
 690 | class Recipe(BaseModel):
 691 |     recipe_name: str
 692 |     ingredients: list[str]
 693 | 
 694 | client = genai.Client()
 695 | response = client.models.generate_content(
 696 |     model="gemini-2.5-flash",
 697 |     contents="List a few popular cookie recipes, and include the amounts of ingredients.",
 698 |     config={
 699 |         "response_mime_type": "application/json",
 700 |         "response_schema": list[Recipe],
 701 |     },
 702 | )
 703 | # Use the response as a JSON string.
 704 | print(response.text)
 705 | 
 706 | # Use instantiated objects.
 707 | my_recipes: list[Recipe] = response.parsed
 708 | 
 709 | The output might look like this:
 710 | 
 711 | [
 712 |   {
 713 |     "recipeName": "Chocolate Chip Cookies",
 714 |     "ingredients": [
 715 |       "1 cup (2 sticks) unsalted butter, softened",
 716 |       "3/4 cup granulated sugar",
 717 |       "3/4 cup packed brown sugar",
 718 |       "1 teaspoon vanilla extract",
 719 |       "2 large eggs",
 720 |       "2 1/4 cups all-purpose flour",
 721 |       "1 teaspoon baking soda",
 722 |       "1 teaspoon salt",
 723 |       "2 cups chocolate chips"
 724 |     ]
 725 |   },
 726 |   ...
 727 | ]
 728 | 
 729 | 
 730 | 
 731 | 
 732 | Providing a schema in a text prompt
 733 | Instead of configuring a schema, you can supply a schema as natural language or pseudo-code in a text prompt. This method is not recommended, because it might produce lower quality output, and because the model is not constrained to follow the schema.
 734 | 
 735 | Warning: Don't provide a schema in a text prompt if you're configuring a responseSchema. This can produce unexpected or low quality results.
 736 | Here's a generic example of a schema provided in a text prompt:
 737 | 
 738 | 
 739 | List a few popular cookie recipes, and include the amounts of ingredients.
 740 | 
 741 | Produce JSON matching this specification:
 742 | 
 743 | Recipe = { "recipeName": string, "ingredients": array<string> }
 744 | Return: array<Recipe>
 745 | Since the model gets the schema from text in the prompt, you might have some flexibility in how you represent the schema. But when you supply a schema inline like this, the model is not actually constrained to return JSON. For a more deterministic, higher quality response, configure a schema on the model, and don't duplicate the schema in the text prompt.
 746 | 
 747 | Generating enum values
 748 | In some cases you might want the model to choose a single option from a list of options. To implement this behavior, you can pass an enum in your schema. You can use an enum option anywhere you could use a string in the responseSchema, because an enum is an array of strings. Like a JSON schema, an enum lets you constrain model output to meet the requirements of your application.
 749 | 
 750 | For example, assume that you're developing an application to classify musical instruments into one of five categories: "Percussion", "String", "Woodwind", "Brass", or ""Keyboard"". You could create an enum to help with this task.
 751 | 
 752 | In the following example, you pass an enum as the responseSchema, constraining the model to choose the most appropriate option.
 753 | 
 754 | from google import genai
 755 | import enum
 756 | 
 757 | class Instrument(enum.Enum):
 758 |   PERCUSSION = "Percussion"
 759 |   STRING = "String"
 760 |   WOODWIND = "Woodwind"
 761 |   BRASS = "Brass"
 762 |   KEYBOARD = "Keyboard"
 763 | 
 764 | client = genai.Client()
 765 | response = client.models.generate_content(
 766 |     model='gemini-2.5-flash',
 767 |     contents='What type of instrument is an oboe?',
 768 |     config={
 769 |         'response_mime_type': 'text/x.enum',
 770 |         'response_schema': Instrument,
 771 |     },
 772 | )
 773 | 
 774 | print(response.text)
 775 | # Woodwind
 776 | 
 777 | 
 778 | And you can also pass the schema as JSON:
 779 | 
 780 | from google import genai
 781 | 
 782 | client = genai.Client()
 783 | response = client.models.generate_content(
 784 |     model='gemini-2.5-flash',
 785 |     contents='What type of instrument is an oboe?',
 786 |     config={
 787 |         'response_mime_type': 'text/x.enum',
 788 |         'response_schema': {
 789 |             "type": "STRING",
 790 |             "enum": ["Percussion", "String", "Woodwind", "Brass", "Keyboard"],
 791 |         },
 792 |     },
 793 | )
 794 | 
 795 | print(response.text)
 796 | # Woodwind
 797 | 
 798 | 
 799 | 
 800 | 
 801 | Gemini thinking
 802 | 
 803 | The Gemini 2.5 series models use an internal "thinking process" that significantly improves their reasoning and multi-step planning abilities, making them highly effective for complex tasks such as coding, advanced mathematics, and data analysis.
 804 | 
 805 | This guide shows you how to work with Gemini's thinking capabilities using the Gemini API.
 806 | 
 807 | Generating content with thinking
 808 | Initiating a request with a thinking model is similar to any other content generation request. The key difference lies in specifying one of the models with thinking support in the model field, as demonstrated in the following text generation example:
 809 | 
 810 | from google import genai
 811 | 
 812 | client = genai.Client()
 813 | prompt = "Explain the concept of Occam's Razor and provide a simple, everyday example."
 814 | response = client.models.generate_content(
 815 |     model="gemini-2.5-pro",
 816 |     contents=prompt
 817 | )
 818 | 
 819 | print(response.text)
 820 | 
 821 | 
 822 | Thinking budgets
 823 | The thinkingBudget parameter guides the model on the number of thinking tokens to use when generating a response. A higher token count generally allows for more detailed reasoning, which can be beneficial for tackling more complex tasks. If latency is more important, use a lower budget or disable thinking by setting thinkingBudget to 0. Setting the thinkingBudget to -1 turns on dynamic thinking, meaning the model will adjust the budget based on the complexity of the request.
 824 | 
 825 | The thinkingBudget is only supported in Gemini 2.5 Flash, 2.5 Pro, and 2.5 Flash-Lite. Depending on the prompt, the model might overflow or underflow the token budget.
 826 | 
 827 | The following are thinkingBudget configuration details for each model type.
 828 | 
 829 | 
 830 | Model	Default setting
 831 | (Thinking budget is not set)	Range	Disable thinking	Turn on dynamic thinking
 832 | 2.5 Pro	Dynamic thinking: Model decides when and how much to think	128 to 32768	N/A: Cannot disable thinking	thinkingBudget = -1
 833 | 2.5 Flash	Dynamic thinking: Model decides when and how much to think	0 to 24576	thinkingBudget = 0	thinkingBudget = -1
 834 | 2.5 Flash Lite	Model does not think	512 to 24576	thinkingBudget = 0	thinkingBudget = -1
 835 | 
 836 | 
 837 | 
 838 | 
 839 | from google import genai
 840 | from google.genai import types
 841 | 
 842 | client = genai.Client()
 843 | 
 844 | response = client.models.generate_content(
 845 |     model="gemini-2.5-pro",
 846 |     contents="Provide a list of 3 famous physicists and their key contributions",
 847 |     config=types.GenerateContentConfig(
 848 |         thinking_config=types.ThinkingConfig(thinking_budget=1024)
 849 |         # Turn off thinking:
 850 |         # thinking_config=types.ThinkingConfig(thinking_budget=0)
 851 |         # Turn on dynamic thinking:
 852 |         # thinking_config=types.ThinkingConfig(thinking_budget=-1)
 853 |     ),
 854 | )
 855 | 
 856 | print(response.text)
 857 | 
 858 | 
 859 | 
 860 | Thought summaries
 861 | Thought summaries are synthesized versions of the model's raw thoughts and offer insights into the model's internal reasoning process. Note that thinking budgets apply to the model's raw thoughts and not to thought summaries.
 862 | 
 863 | You can enable thought summaries by setting includeThoughts to true in your request configuration. You can then access the summary by iterating through the response parameter's parts, and checking the thought boolean.
 864 | 
 865 | Here's an example demonstrating how to enable and retrieve thought summaries without streaming, which returns a single, final thought summary with the response:
 866 | 
 867 | 
 868 | from google import genai
 869 | from google.genai import types
 870 | 
 871 | client = genai.Client()
 872 | prompt = "What is the sum of the first 50 prime numbers?"
 873 | response = client.models.generate_content(
 874 |   model="gemini-2.5-pro",
 875 |   contents=prompt,
 876 |   config=types.GenerateContentConfig(
 877 |     thinking_config=types.ThinkingConfig(
 878 |       include_thoughts=True
 879 |     )
 880 |   )
 881 | )
 882 | 
 883 | for part in response.candidates[0].content.parts:
 884 |   if not part.text:
 885 |     continue
 886 |   if part.thought:
 887 |     print("Thought summary:")
 888 |     print(part.text)
 889 |     print()
 890 |   else:
 891 |     print("Answer:")
 892 |     print(part.text)
 893 |     print()
 894 | 
 895 | 
 896 | 
 897 | 
 898 | Function calling with the Gemini API
 899 | 
 900 | Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases:
 901 | 
 902 | Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases.
 903 | Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts.
 904 | Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices.
 905 | 
 906 | 
 907 | from google import genai
 908 | from google.genai import types
 909 | 
 910 | # Define the function declaration for the model
 911 | schedule_meeting_function = {
 912 |     "name": "schedule_meeting",
 913 |     "description": "Schedules a meeting with specified attendees at a given time and date.",
 914 |     "parameters": {
 915 |         "type": "object",
 916 |         "properties": {
 917 |             "attendees": {
 918 |                 "type": "array",
 919 |                 "items": {"type": "string"},
 920 |                 "description": "List of people attending the meeting.",
 921 |             },
 922 |             "date": {
 923 |                 "type": "string",
 924 |                 "description": "Date of the meeting (e.g., '2024-07-29')",
 925 |             },
 926 |             "time": {
 927 |                 "type": "string",
 928 |                 "description": "Time of the meeting (e.g., '15:00')",
 929 |             },
 930 |             "topic": {
 931 |                 "type": "string",
 932 |                 "description": "The subject or topic of the meeting.",
 933 |             },
 934 |         },
 935 |         "required": ["attendees", "date", "time", "topic"],
 936 |     },
 937 | }
 938 | 
 939 | # Configure the client and tools
 940 | client = genai.Client()
 941 | tools = types.Tool(function_declarations=[schedule_meeting_function])
 942 | config = types.GenerateContentConfig(tools=[tools])
 943 | 
 944 | # Send request with function declarations
 945 | response = client.models.generate_content(
 946 |     model="gemini-2.5-flash",
 947 |     contents="Schedule a meeting with Bob and Alice for 03/14/2025 at 10:00 AM about the Q3 planning.",
 948 |     config=config,
 949 | )
 950 | 
 951 | # Check for a function call
 952 | if response.candidates[0].content.parts[0].function_call:
 953 |     function_call = response.candidates[0].content.parts[0].function_call
 954 |     print(f"Function to call: {function_call.name}")
 955 |     print(f"Arguments: {function_call.args}")
 956 |     #  In a real app, you would call your function here:
 957 |     #  result = schedule_meeting(**function_call.args)
 958 | else:
 959 |     print("No function call found in the response.")
 960 |     print(response.text)
 961 | 
 962 | 
 963 | Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process:
 964 | 
 965 | Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model.
 966 | Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object.
 967 | Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if
 968 | Yes: Extract the name and args of the function and execute the corresponding function in your application.
 969 | No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome).
 970 | Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call.
 971 | This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn (parallel function calling) and in sequence (compositional function calling).
 972 | 
 973 | Step 1: Define a function declaration
 974 | Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs.
 975 | 
 976 | 
 977 | # Define a function that the model can call to control smart lights
 978 | set_light_values_declaration = {
 979 |     "name": "set_light_values",
 980 |     "description": "Sets the brightness and color temperature of a light.",
 981 |     "parameters": {
 982 |         "type": "object",
 983 |         "properties": {
 984 |             "brightness": {
 985 |                 "type": "integer",
 986 |                 "description": "Light level from 0 to 100. Zero is off and 100 is full brightness",
 987 |             },
 988 |             "color_temp": {
 989 |                 "type": "string",
 990 |                 "enum": ["daylight", "cool", "warm"],
 991 |                 "description": "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.",
 992 |             },
 993 |         },
 994 |         "required": ["brightness", "color_temp"],
 995 |     },
 996 | }
 997 | 
 998 | # This is the actual function that would be called based on the model's suggestion
 999 | def set_light_values(brightness: int, color_temp: str) -> dict[str, int | str]:
1000 |     """Set the brightness and color temperature of a room light. (mock API).
1001 | 
1002 |     Args:
1003 |         brightness: Light level from 0 to 100. Zero is off and 100 is full brightness
1004 |         color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.
1005 | 
1006 |     Returns:
1007 |         A dictionary containing the set brightness and color temperature.
1008 |     """
1009 |     return {"brightness": brightness, "colorTemperature": color_temp}
1010 | 
1011 | 
1012 | Step 2: Call the model with function declarations
1013 | Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion.
1014 | 
1015 | 
1016 | from google.genai import types
1017 | 
1018 | # Configure the client and tools
1019 | client = genai.Client()
1020 | tools = types.Tool(function_declarations=[set_light_values_declaration])
1021 | config = types.GenerateContentConfig(tools=[tools])
1022 | 
1023 | # Define user prompt
1024 | contents = [
1025 |     types.Content(
1026 |         role="user", parts=[types.Part(text="Turn the lights down to a romantic level")]
1027 |     )
1028 | ]
1029 | 
1030 | # Send request with function declarations
1031 | response = client.models.generate_content(
1032 |     model="gemini-2.5-flash",
1033 |     contents=contents
1034 |     config=config,
1035 | )
1036 | 
1037 | print(response.candidates[0].content.parts[0].function_call)
1038 | 
1039 | 
1040 | The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question.
1041 | 
1042 | 
1043 | id=None args={'color_temp': 'warm', 'brightness': 25} name='set_light_values'
1044 | 
1045 | 
1046 | Step 3: Execute set_light_values function code
1047 | Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function.
1048 | 
1049 | # Extract tool call details, it may not be in the first part.
1050 | tool_call = response.candidates[0].content.parts[0].function_call
1051 | 
1052 | if tool_call.name == "set_light_values":
1053 |     result = set_light_values(**tool_call.args)
1054 |     print(f"Function execution result: {result}")
1055 | 
1056 | 
1057 | Step 4: Create user friendly response with function result and call the model again
1058 | Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user.
1059 | 
1060 | # Create a function response part
1061 | function_response_part = types.Part.from_function_response(
1062 |     name=tool_call.name,
1063 |     response={"result": result},
1064 | )
1065 | 
1066 | # Append function call and result of the function execution to contents
1067 | contents.append(response.candidates[0].content) # Append the content from the model's response.
1068 | contents.append(types.Content(role="user", parts=[function_response_part])) # Append the function response
1069 | 
1070 | final_response = client.models.generate_content(
1071 |     model="gemini-2.5-flash",
1072 |     config=config,
1073 |     contents=contents,
1074 | )
1075 | 
1076 | print(final_response.text)
1077 | 
1078 | 
1079 | 
1080 | This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user.
1081 | 
1082 | Function declarations
1083 | When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations. You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters:
1084 | 
1085 | name (string): A unique name for the function (get_weather_forecast, send_email). Use descriptive names without spaces or special characters (use underscores or camelCase).
1086 | description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters.").
1087 | parameters (object): Defines the input parameters the function expects.
1088 | type (string): Specifies the overall data type, such as object.
1089 | properties (object): Lists individual parameters, each with:
1090 | type (string): The data type of the parameter, such as string, integer, boolean, array.
1091 | description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'.").
1092 | enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]).
1093 | required (array): An array of strings listing the parameter names that are mandatory for the function to operate.
1094 | Function calling with thinking
1095 | Enabling "thinking" can improve function call performance by allowing the model to reason through a request before suggesting function calls.
1096 | 
1097 | However, because the Gemini API is stateless, this reasoning context is lost between turns, which can reduce the quality of function calls as they require multiple turn requests.
1098 | 
1099 | To preserve this context you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns.
1100 | 
1101 | To use thought signatures:
1102 | 
1103 | Receive the signature: When thinking is enabled, the API response will include a thought_signature field containing an encrypted representation of the model's reasoning.
1104 | Return the signature: When you send the function's execution result back to the server, include the thought_signature you received.
1105 | This allows the model to restore its previous thinking context and will likely result in better function calling performance.
1106 | 
1107 | Receiving signatures from the server
1108 | 
1109 | Signatures are returned in the part after the model's thinking phase, which typically is a text or function call.
1110 | 
1111 | 
1112 | Parallel function calling
1113 | In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco.
1114 | 
1115 | 
1116 | power_disco_ball = {
1117 |     "name": "power_disco_ball",
1118 |     "description": "Powers the spinning disco ball.",
1119 |     "parameters": {
1120 |         "type": "object",
1121 |         "properties": {
1122 |             "power": {
1123 |                 "type": "boolean",
1124 |                 "description": "Whether to turn the disco ball on or off.",
1125 |             }
1126 |         },
1127 |         "required": ["power"],
1128 |     },
1129 | }
1130 | 
1131 | start_music = {
1132 |     "name": "start_music",
1133 |     "description": "Play some music matching the specified parameters.",
1134 |     "parameters": {
1135 |         "type": "object",
1136 |         "properties": {
1137 |             "energetic": {
1138 |                 "type": "boolean",
1139 |                 "description": "Whether the music is energetic or not.",
1140 |             },
1141 |             "loud": {
1142 |                 "type": "boolean",
1143 |                 "description": "Whether the music is loud or not.",
1144 |             },
1145 |         },
1146 |         "required": ["energetic", "loud"],
1147 |     },
1148 | }
1149 | 
1150 | dim_lights = {
1151 |     "name": "dim_lights",
1152 |     "description": "Dim the lights.",
1153 |     "parameters": {
1154 |         "type": "object",
1155 |         "properties": {
1156 |             "brightness": {
1157 |                 "type": "number",
1158 |                 "description": "The brightness of the lights, 0.0 is off, 1.0 is full.",
1159 |             }
1160 |         },
1161 |         "required": ["brightness"],
1162 |     },
1163 | }
1164 | 
1165 | 
1166 | Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling.
1167 | 
1168 | from google import genai
1169 | from google.genai import types
1170 | 
1171 | # Configure the client and tools
1172 | client = genai.Client()
1173 | house_tools = [
1174 |     types.Tool(function_declarations=[power_disco_ball, start_music, dim_lights])
1175 | ]
1176 | config = types.GenerateContentConfig(
1177 |     tools=house_tools,
1178 |     automatic_function_calling=types.AutomaticFunctionCallingConfig(
1179 |         disable=True
1180 |     ),
1181 |     # Force the model to call 'any' function, instead of chatting.
1182 |     tool_config=types.ToolConfig(
1183 |         function_calling_config=types.FunctionCallingConfig(mode='ANY')
1184 |     ),
1185 | )
1186 | 
1187 | chat = client.chats.create(model="gemini-2.5-flash", config=config)
1188 | response = chat.send_message("Turn this place into a party!")
1189 | 
1190 | # Print out each of the function calls requested from this single call
1191 | print("Example 1: Forced function calling")
1192 | for fn in response.function_calls:
1193 |     args = ", ".join(f"{key}={val}" for key, val in fn.args.items())
1194 |     print(f"{fn.name}({args})")
1195 | 
1196 | 
1197 | Compositional function calling
1198 | Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter.
1199 | 
1200 | The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling.
1201 | 
1202 | 
1203 | 
1204 | import os
1205 | from google import genai
1206 | from google.genai import types
1207 | 
1208 | # Example Functions
1209 | def get_weather_forecast(location: str) -> dict:
1210 |     """Gets the current weather temperature for a given location."""
1211 |     print(f"Tool Call: get_weather_forecast(location={location})")
1212 |     # TODO: Make API call
1213 |     print("Tool Response: {'temperature': 25, 'unit': 'celsius'}")
1214 |     return {"temperature": 25, "unit": "celsius"}  # Dummy response
1215 | 
1216 | def set_thermostat_temperature(temperature: int) -> dict:
1217 |     """Sets the thermostat to a desired temperature."""
1218 |     print(f"Tool Call: set_thermostat_temperature(temperature={temperature})")
1219 |     # TODO: Interact with a thermostat API
1220 |     print("Tool Response: {'status': 'success'}")
1221 |     return {"status": "success"}
1222 | 
1223 | # Configure the client and model
1224 | client = genai.Client()
1225 | config = types.GenerateContentConfig(
1226 |     tools=[get_weather_forecast, set_thermostat_temperature]
1227 | )
1228 | 
1229 | # Make the request
1230 | response = client.models.generate_content(
1231 |     model="gemini-2.5-flash",
1232 |     contents="If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C.",
1233 |     config=config,
1234 | )
1235 | 
1236 | # Print the final, user-facing response
1237 | print(response.text)
1238 | 
1239 | 
1240 | Function calling modes
1241 | The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the.function_calling_config.
1242 | 
1243 | AUTO (Default): The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios.
1244 | ANY: The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable).
1245 | NONE: The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions.
1246 | 
1247 | 
1248 | from google.genai import types
1249 | 
1250 | # Configure function calling mode
1251 | tool_config = types.ToolConfig(
1252 |     function_calling_config=types.FunctionCallingConfig(
1253 |         mode="ANY", allowed_function_names=["get_current_temperature"]
1254 |     )
1255 | )
1256 | 
1257 | # Create the generation config
1258 | config = types.GenerateContentConfig(
1259 |     tools=[tools],  # not defined here.
1260 |     tool_config=tool_config,
1261 | )
1262 | 
1263 | 
1264 | Automatic function calling (Python only)
1265 | When using the Python SDK, you can provide Python functions directly as tools. The SDK automatically converts the Python function to declarations, handles the function call execution and the response cycle for you. The Python SDK then automatically:
1266 | 
1267 | Detects function call responses from the model.
1268 | Call the corresponding Python function in your code.
1269 | Sends the function response back to the model.
1270 | Returns the model's final text response.
1271 | To use this, define your function with type hints and a docstring, and then pass the function itself (not a JSON declaration) as a tool:
1272 | 
1273 | 
1274 | from google import genai
1275 | from google.genai import types
1276 | 
1277 | # Define the function with type hints and docstring
1278 | def get_current_temperature(location: str) -> dict:
1279 |     """Gets the current temperature for a given location.
1280 | 
1281 |     Args:
1282 |         location: The city and state, e.g. San Francisco, CA
1283 | 
1284 |     Returns:
1285 |         A dictionary containing the temperature and unit.
1286 |     """
1287 |     # ... (implementation) ...
1288 |     return {"temperature": 25, "unit": "Celsius"}
1289 | 
1290 | # Configure the client
1291 | client = genai.Client()
1292 | config = types.GenerateContentConfig(
1293 |     tools=[get_current_temperature]
1294 | )  # Pass the function itself
1295 | 
1296 | # Make the request
1297 | response = client.models.generate_content(
1298 |     model="gemini-2.5-flash",
1299 |     contents="What's the temperature in Boston?",
1300 |     config=config,
1301 | )
1302 | 
1303 | print(response.text)  # The SDK handles the function call and returns the final text
1304 | 
1305 | 
1306 | 
1307 | Model context protocol (MCP)
1308 | Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts.
1309 | 
1310 | The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model.
1311 | 
1312 | Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK.
1313 | 
1314 | 
1315 | pip install mcp
1316 | 
1317 | import os
1318 | import asyncio
1319 | from datetime import datetime
1320 | from mcp import ClientSession, StdioServerParameters
1321 | from mcp.client.stdio import stdio_client
1322 | from google import genai
1323 | 
1324 | client = genai.Client()
1325 | 
1326 | # Create server parameters for stdio connection
1327 | server_params = StdioServerParameters(
1328 |     command="npx",  # Executable
1329 |     args=["-y", "@philschmid/weather-mcp"],  # MCP Server
1330 |     env=None,  # Optional environment variables
1331 | )
1332 | 
1333 | async def run():
1334 |     async with stdio_client(server_params) as (read, write):
1335 |         async with ClientSession(read, write) as session:
1336 |             # Prompt to get the weather for the current day in London.
1337 |             prompt = f"What is the weather in London in {datetime.now().strftime('%Y-%m-%d')}?"
1338 | 
1339 |             # Initialize the connection between client and server
1340 |             await session.initialize()
1341 | 
1342 |             # Send request to the model with MCP function declarations
1343 |             response = await client.aio.models.generate_content(
1344 |                 model="gemini-2.5-flash",
1345 |                 contents=prompt,
1346 |                 config=genai.types.GenerateContentConfig(
1347 |                     temperature=0,
1348 |                     tools=[session],  # uses the session, will automatically call the tool
1349 |                     # Uncomment if you **don't** want the SDK to automatically call the tool
1350 |                     # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig(
1351 |                     #     disable=True
1352 |                     # ),
1353 |                 ),
1354 |             )
1355 |             print(response.text)
1356 | 
1357 | # Start the asyncio event loop and run the main function
1358 | asyncio.run(run())
1359 | 
1360 | 
1361 | 
1362 | Limitations with built-in MCP support
1363 | Built-in MCP support is a experimental feature in our SDKs and has the following limitations:
1364 | 
1365 | Only tools are supported, not resources nor prompts
1366 | It is available for the Python and JavaScript/TypeScript SDK.
1367 | Breaking changes might occur in future releases.
1368 | Manual integration of MCP servers is always an option if these limit what you're building.
1369 | 
1370 | Supported models
1371 | This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page.
1372 | 
1373 | Model	Function Calling	Parallel Function Calling	Compositional Function Calling
1374 | Gemini 2.5 Pro✔️	✔️	✔️
1375 | Gemini 2.5 Flash	✔️	✔️	✔️
1376 | Gemini 2.5 Flash-Lite	✔️	✔️	✔️
1377 | Gemini 2.0 Flash	✔️	✔️	✔️
1378 | Gemini 2.0 Flash-Lite	X	X	X
1379 | Best practices
1380 | Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments.
1381 | Naming: Use descriptive function names (without spaces, periods, or dashes).
1382 | Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum.
1383 | Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools.
1384 | Prompt Engineering:
1385 | Provide context: Tell the model its role (e.g., "You are a helpful weather assistant.").
1386 | Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts.").
1387 | Encourage clarification: Instruct the model to ask clarifying questions if needed.
1388 | Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls.
1389 | Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it.
1390 | Error Handling: Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user.
1391 | Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls.
1392 | Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets.
1393 | 
1394 | 
1395 | 
1396 | 
1397 | 
1398 | 
1399 | 
1400 | 
1401 | Document understanding
1402 | 
1403 | Gemini models can process documents in PDF format, using native vision to understand entire document contexts. This goes beyond simple text extraction, allowing Gemini to:
1404 | 
1405 | Analyze and interpret content, including text, images, diagrams, charts, and tables, even in long documents up to 1000 pages.
1406 | Extract information into structured output formats.
1407 | Summarize and answer questions based on both the visual and textual elements in a document.
1408 | Transcribe document content (e.g. to HTML), preserving layouts and formatting, for use in downstream applications.
1409 | Passing inline PDF data
1410 | You can pass inline PDF data in the request to generateContent. For PDF payloads under 20MB, you can choose between uploading base64 encoded documents or directly uploading locally stored files.
1411 | 
1412 | The following example shows you how to fetch a PDF from a URL and convert it to bytes for processing:
1413 | 
1414 | 
1415 | from google import genai
1416 | from google.genai import types
1417 | import httpx
1418 | 
1419 | client = genai.Client()
1420 | 
1421 | doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"
1422 | 
1423 | # Retrieve and encode the PDF byte
1424 | doc_data = httpx.get(doc_url).content
1425 | 
1426 | prompt = "Summarize this document"
1427 | response = client.models.generate_content(
1428 |   model="gemini-2.5-flash",
1429 |   contents=[
1430 |       types.Part.from_bytes(
1431 |         data=doc_data,
1432 |         mime_type='application/pdf',
1433 |       ),
1434 |       prompt])
1435 | print(response.text)
1436 | 
1437 | You can also read a PDF from a local file for processing:
1438 | 
1439 | from google import genai
1440 | from google.genai import types
1441 | import pathlib
1442 | 
1443 | client = genai.Client()
1444 | 
1445 | # Retrieve and encode the PDF byte
1446 | filepath = pathlib.Path('file.pdf')
1447 | 
1448 | prompt = "Summarize this document"
1449 | response = client.models.generate_content(
1450 |   model="gemini-2.5-flash",
1451 |   contents=[
1452 |       types.Part.from_bytes(
1453 |         data=filepath.read_bytes(),
1454 |         mime_type='application/pdf',
1455 |       ),
1456 |       prompt])
1457 | print(response.text)
1458 | 
1459 | Uploading PDFs using the File API
1460 | You can use the File API to upload larger documents. Always use the File API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20MB.
1461 | 
1462 | Note: The File API lets you store up to 50MB of PDF files. Files are stored for 48 hours. You can access them in that period with your API key, but you can't download them from the API. The File API is available at no cost in all regions where the Gemini API is available.
1463 | Call media.upload to upload a file using the File API. The following code uploads a document file and then uses the file in a call to models.generateContent.
1464 | 
1465 | 
1466 | Large PDFs from URLs
1467 | Use the File API to simplify uploading and processing large PDF files from URLs:
1468 | 
1469 | from google import genai
1470 | from google.genai import types
1471 | import io
1472 | import httpx
1473 | 
1474 | client = genai.Client()
1475 | 
1476 | long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf"
1477 | 
1478 | # Retrieve and upload the PDF using the File API
1479 | doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)
1480 | 
1481 | sample_doc = client.files.upload(
1482 |   # You can pass a path or a file-like object here
1483 |   file=doc_io,
1484 |   config=dict(
1485 |     mime_type='application/pdf')
1486 | )
1487 | 
1488 | prompt = "Summarize this document"
1489 | 
1490 | response = client.models.generate_content(
1491 |   model="gemini-2.5-flash",
1492 |   contents=[sample_doc, prompt])
1493 | print(response.text)
1494 | 
1495 | 
1496 | Large PDFs stored locally
1497 | 
1498 | from google import genai
1499 | from google.genai import types
1500 | import pathlib
1501 | import httpx
1502 | 
1503 | client = genai.Client()
1504 | 
1505 | # Retrieve and encode the PDF byte
1506 | file_path = pathlib.Path('large_file.pdf')
1507 | 
1508 | # Upload the PDF using the File API
1509 | sample_file = client.files.upload(
1510 |   file=file_path,
1511 | )
1512 | 
1513 | prompt="Summarize this document"
1514 | 
1515 | response = client.models.generate_content(
1516 |   model="gemini-2.5-flash",
1517 |   contents=[sample_file, "Summarize this document"])
1518 | print(response.text)
1519 | 
1520 | 
1521 | Passing multiple PDFs
1522 | The Gemini API is capable of processing multiple PDF documents (up to 1000 pages) in a single request, as long as the combined size of the documents and the text prompt stays within the model's context window.
1523 | 
1524 | 
1525 | from google import genai
1526 | import io
1527 | import httpx
1528 | 
1529 | client = genai.Client()
1530 | 
1531 | doc_url_1 = "https://arxiv.org/pdf/2312.11805"
1532 | doc_url_2 = "https://arxiv.org/pdf/2403.05530"
1533 | 
1534 | # Retrieve and upload both PDFs using the File API
1535 | doc_data_1 = io.BytesIO(httpx.get(doc_url_1).content)
1536 | doc_data_2 = io.BytesIO(httpx.get(doc_url_2).content)
1537 | 
1538 | sample_pdf_1 = client.files.upload(
1539 |   file=doc_data_1,
1540 |   config=dict(mime_type='application/pdf')
1541 | )
1542 | sample_pdf_2 = client.files.upload(
1543 |   file=doc_data_2,
1544 |   config=dict(mime_type='application/pdf')
1545 | )
1546 | 
1547 | prompt = "What is the difference between each of the main benchmarks between these two papers? Output these in a table."
1548 | 
1549 | response = client.models.generate_content(
1550 |   model="gemini-2.5-flash",
1551 |   contents=[sample_pdf_1, sample_pdf_2, prompt])
1552 | print(response.text)
1553 | 
1554 | 
1555 | Technical details
1556 | Gemini supports a maximum of 1,000 document pages. Each document page is equivalent to 258 tokens.
1557 | 
1558 | While there are no specific limits to the number of pixels in a document besides the model's context window, larger pages are scaled down to a maximum resolution of 3072x3072 while preserving their original aspect ratio, while smaller pages are scaled up to 768x768 pixels. There is no cost reduction for pages at lower sizes, other than bandwidth, or performance improvement for pages at higher resolution.
1559 | 
1560 | Document types
1561 | Technically, you can pass other MIME types for document understanding, like TXT, Markdown, HTML, XML, etc. However, document vision only meaningfully understands PDFs. Other types will be extracted as pure text, and the model won't be able to interpret what we see in the rendering of those files. Any file-type specifics like charts, diagrams, HTML tags, Markdown formatting, etc., will be lost.
1562 | 
1563 | Best practices
1564 | For best results:
1565 | 
1566 | Rotate pages to the correct orientation before uploading.
1567 | Avoid blurry pages.
1568 | If using a single page, place the text prompt after the page.
1569 | 
1570 | 
1571 | 
1572 | 
1573 | Image understanding
1574 | 
1575 | Gemini models are built to be multimodal from the ground up, unlocking a wide range of image processing and computer vision tasks including but not limited to image captioning, classification, and visual question answering without having to train specialized ML models.
1576 | 
1577 | Tip: In addition to their general multimodal capabilities, Gemini models (2.0 and newer) offer improved accuracy for specific use cases like object detection and segmentation, through additional training. See the Capabilities section for more details.
1578 | Passing images to Gemini
1579 | You can provide images as input to Gemini using two methods:
1580 | 
1581 | Passing inline image data: Ideal for smaller files (total request size less than 20MB, including prompts).
1582 | Uploading images using the File API: Recommended for larger files or for reusing images across multiple requests.
1583 | Passing inline image data
1584 | You can pass inline image data in the request to generateContent. You can provide image data as Base64 encoded strings or by reading local files directly (depending on the language).
1585 | 
1586 | The following example shows how to read an image from a local file and pass it to generateContent API for processing.
1587 | 
1588 | 
1589 |   from google.genai import types
1590 | 
1591 |   with open('path/to/small-sample.jpg', 'rb') as f:
1592 |       image_bytes = f.read()
1593 | 
1594 |   response = client.models.generate_content(
1595 |     model='gemini-2.5-flash',
1596 |     contents=[
1597 |       types.Part.from_bytes(
1598 |         data=image_bytes,
1599 |         mime_type='image/jpeg',
1600 |       ),
1601 |       'Caption this image.'
1602 |     ]
1603 |   )
1604 | 
1605 |   print(response.text)
1606 | 
1607 | You can also fetch an image from a URL, convert it to bytes, and pass it to generateContent as shown in the following examples.
1608 | 
1609 | from google import genai
1610 | from google.genai import types
1611 | 
1612 | import requests
1613 | 
1614 | image_path = "https://goo.gle/instrument-img"
1615 | image_bytes = requests.get(image_path).content
1616 | image = types.Part.from_bytes(
1617 |   data=image_bytes, mime_type="image/jpeg"
1618 | )
1619 | 
1620 | client = genai.Client()
1621 | 
1622 | response = client.models.generate_content(
1623 |     model="gemini-2.5-flash",
1624 |     contents=["What is this image?", image],
1625 | )
1626 | 
1627 | print(response.text)
1628 | 
1629 | 
1630 | Note: Inline image data limits your total request size (text prompts, system instructions, and inline bytes) to 20MB. For larger requests, upload image files using the File API. Files API is also more efficient for scenarios that use the same image repeatedly.
1631 | Uploading images using the File API
1632 | For large files or to be able to use the same image file repeatedly, use the Files API. The following code uploads an image file and then uses the file in a call to generateContent. See the Files API guide for more information and examples.
1633 | 
1634 | from google import genai
1635 | 
1636 | client = genai.Client()
1637 | 
1638 | my_file = client.files.upload(file="path/to/sample.jpg")
1639 | 
1640 | response = client.models.generate_content(
1641 |     model="gemini-2.5-flash",
1642 |     contents=[my_file, "Caption this image."],
1643 | )
1644 | 
1645 | print(response.text)
1646 | 
1647 | 
1648 | Prompting with multiple images
1649 | You can provide multiple images in a single prompt by including multiple image Part objects in the contents array. These can be a mix of inline data (local files or URLs) and File API references.
1650 | 
1651 | 
1652 | from google import genai
1653 | from google.genai import types
1654 | 
1655 | client = genai.Client()
1656 | 
1657 | # Upload the first image
1658 | image1_path = "path/to/image1.jpg"
1659 | uploaded_file = client.files.upload(file=image1_path)
1660 | 
1661 | # Prepare the second image as inline data
1662 | image2_path = "path/to/image2.png"
1663 | with open(image2_path, 'rb') as f:
1664 |     img2_bytes = f.read()
1665 | 
1666 | # Create the prompt with text and multiple images
1667 | response = client.models.generate_content(
1668 | 
1669 |     model="gemini-2.5-flash",
1670 |     contents=[
1671 |         "What is different between these two images?",
1672 |         uploaded_file,  # Use the uploaded file reference
1673 |         types.Part.from_bytes(
1674 |             data=img2_bytes,
1675 |             mime_type='image/png'
1676 |         )
1677 |     ]
1678 | )
1679 | 
1680 | print(response.text)
1681 | 
1682 | 
1683 | Object detection
1684 | From Gemini 2.0 onwards, models are further trained to detect objects in an image and get their bounding box coordinates. The coordinates, relative to image dimensions, scale to [0, 1000]. You need to descale these coordinates based on your original image size.
1685 | 
1686 | 
1687 | from google import genai
1688 | from google.genai import types
1689 | from PIL import Image
1690 | import json
1691 | 
1692 | client = genai.Client()
1693 | prompt = "Detect the all of the prominent items in the image. The box_2d should be [ymin, xmin, ymax, xmax] normalized to 0-1000."
1694 | 
1695 | image = Image.open("/path/to/image.png")
1696 | 
1697 | config = types.GenerateContentConfig(
1698 |   response_mime_type="application/json"
1699 |   )
1700 | 
1701 | response = client.models.generate_content(model="gemini-2.5-flash",
1702 |                                           contents=[image, prompt],
1703 |                                           config=config
1704 |                                           )
1705 | 
1706 | width, height = image.size
1707 | bounding_boxes = json.loads(response.text)
1708 | 
1709 | converted_bounding_boxes = []
1710 | for bounding_box in bounding_boxes:
1711 |     abs_y1 = int(bounding_box["box_2d"][0]/1000 * height)
1712 |     abs_x1 = int(bounding_box["box_2d"][1]/1000 * width)
1713 |     abs_y2 = int(bounding_box["box_2d"][2]/1000 * height)
1714 |     abs_x2 = int(bounding_box["box_2d"][3]/1000 * width)
1715 |     converted_bounding_boxes.append([abs_x1, abs_y1, abs_x2, abs_y2])
1716 | 
1717 | print("Image size: ", width, height)
1718 | print("Bounding boxes:", converted_bounding_boxes)
1719 | 
1720 | 
1721 | 
1722 | 
1723 | Segmentation
1724 | Starting with Gemini 2.5, models not only detect items but also segment them and provide their contour masks.
1725 | 
1726 | The model predicts a JSON list, where each item represents a segmentation mask. Each item has a bounding box ("box_2d") in the format [y0, x0, y1, x1] with normalized coordinates between 0 and 1000, a label ("label") that identifies the object, and finally the segmentation mask inside the bounding box, as base64 encoded png that is a probability map with values between 0 and 255. The mask needs to be resized to match the bounding box dimensions, then binarized at your confidence threshold (127 for the midpoint).
1727 | 
1728 | Note: For better results, disable thinking by setting the thinking budget to 0. See code sample below for an example
1729 | 
1730 | 
1731 | from google import genai
1732 | from google.genai import types
1733 | from PIL import Image, ImageDraw
1734 | import io
1735 | import base64
1736 | import json
1737 | import numpy as np
1738 | import os
1739 | 
1740 | client = genai.Client()
1741 | 
1742 | def parse_json(json_output: str):
1743 |   # Parsing out the markdown fencing
1744 |   lines = json_output.splitlines()
1745 |   for i, line in enumerate(lines):
1746 |     if line == "```json":
1747 |       json_output = "\n".join(lines[i+1:])  # Remove everything before "```json"
1748 |       output = json_output.split("```")[0]  # Remove everything after the closing "```"
1749 |       break  # Exit the loop once "```json" is found
1750 |   return json_output
1751 | 
1752 | def extract_segmentation_masks(image_path: str, output_dir: str = "segmentation_outputs"):
1753 |   # Load and resize image
1754 |   im = Image.open(image_path)
1755 |   im.thumbnail([1024, 1024], Image.Resampling.LANCZOS)
1756 | 
1757 |   prompt = """
1758 |   Give the segmentation masks for the wooden and glass items.
1759 |   Output a JSON list of segmentation masks where each entry contains the 2D
1760 |   bounding box in the key "box_2d", the segmentation mask in key "mask", and
1761 |   the text label in the key "label". Use descriptive labels.
1762 |   """
1763 | 
1764 |   config = types.GenerateContentConfig(
1765 |     thinking_config=types.ThinkingConfig(thinking_budget=0) # set thinking_budget to 0 for better results in object detection
1766 |   )
1767 | 
1768 |   response = client.models.generate_content(
1769 |     model="gemini-2.5-flash",
1770 |     contents=[prompt, im], # Pillow images can be directly passed as inputs (which will be converted by the SDK)
1771 |     config=config
1772 |   )
1773 | 
1774 |   # Parse JSON response
1775 |   items = json.loads(parse_json(response.text))
1776 | 
1777 |   # Create output directory
1778 |   os.makedirs(output_dir, exist_ok=True)
1779 | 
1780 |   # Process each mask
1781 |   for i, item in enumerate(items):
1782 |       # Get bounding box coordinates
1783 |       box = item["box_2d"]
1784 |       y0 = int(box[0] / 1000 * im.size[1])
1785 |       x0 = int(box[1] / 1000 * im.size[0])
1786 |       y1 = int(box[2] / 1000 * im.size[1])
1787 |       x1 = int(box[3] / 1000 * im.size[0])
1788 | 
1789 |       # Skip invalid boxes
1790 |       if y0 >= y1 or x0 >= x1:
1791 |           continue
1792 | 
1793 |       # Process mask
1794 |       png_str = item["mask"]
1795 |       if not png_str.startswith("data:image/png;base64,"):
1796 |           continue
1797 | 
1798 |       # Remove prefix
1799 |       png_str = png_str.removeprefix("data:image/png;base64,")
1800 |       mask_data = base64.b64decode(png_str)
1801 |       mask = Image.open(io.BytesIO(mask_data))
1802 | 
1803 |       # Resize mask to match bounding box
1804 |       mask = mask.resize((x1 - x0, y1 - y0), Image.Resampling.BILINEAR)
1805 | 
1806 |       # Convert mask to numpy array for processing
1807 |       mask_array = np.array(mask)
1808 | 
1809 |       # Create overlay for this mask
1810 |       overlay = Image.new('RGBA', im.size, (0, 0, 0, 0))
1811 |       overlay_draw = ImageDraw.Draw(overlay)
1812 | 
1813 |       # Create overlay for the mask
1814 |       color = (255, 255, 255, 200)
1815 |       for y in range(y0, y1):
1816 |           for x in range(x0, x1):
1817 |               if mask_array[y - y0, x - x0] > 128:  # Threshold for mask
1818 |                   overlay_draw.point((x, y), fill=color)
1819 | 
1820 |       # Save individual mask and its overlay
1821 |       mask_filename = f"{item['label']}_{i}_mask.png"
1822 |       overlay_filename = f"{item['label']}_{i}_overlay.png"
1823 | 
1824 |       mask.save(os.path.join(output_dir, mask_filename))
1825 | 
1826 |       # Create and save overlay
1827 |       composite = Image.alpha_composite(im.convert('RGBA'), overlay)
1828 |       composite.save(os.path.join(output_dir, overlay_filename))
1829 |       print(f"Saved mask and overlay for {item['label']} to {output_dir}")
1830 | 
1831 | # Example usage
1832 | if __name__ == "__main__":
1833 |   extract_segmentation_masks("path/to/image.png")
1834 | 
1835 | 
1836 | 
1837 | 
1838 | Video understanding
1839 | 
1840 | Gemini models can process videos, enabling many frontier developer use cases that would have historically required domain specific models. Some of Gemini's vision capabilities include the ability to:
1841 | 
1842 | Describe, segment, and extract information from videos
1843 | Answer questions about video content
1844 | Refer to specific timestamps within a video
1845 | Gemini was built to be multimodal from the ground up and we continue to push the frontier of what is possible. This guide shows how to use the Gemini API to generate text responses based on video inputs.
1846 | 
1847 | Video input
1848 | You can provide videos as input to Gemini in the following ways:
1849 | 
1850 | Upload a video file using the File API before making a request to generateContent. Use this method for files larger than 20MB, videos longer than approximately 1 minute, or when you want to reuse the file across multiple requests.
1851 | Pass inline video data with the request to generateContent. Use this method for smaller files (<20MB) and shorter durations.
1852 | Include a YouTube URL directly in the prompt.
1853 | Upload a video file
1854 | You can use the Files API to upload a video file. Always use the Files API when the total request size (including the file, text prompt, system instructions, etc.) is larger than 20 MB, the video duration is significant, or if you intend to use the same video in multiple prompts. The File API accepts video file formats directly.
1855 | 
1856 | The following code downloads the sample video, uploads it using the File API, waits for it to be processed, and then uses the file reference in a generateContent request.
1857 | 
1858 | 
1859 | from google import genai
1860 | 
1861 | client = genai.Client()
1862 | 
1863 | myfile = client.files.upload(file="path/to/sample.mp4")
1864 | 
1865 | response = client.models.generate_content(
1866 |     model="gemini-2.5-flash", contents=[myfile, "Summarize this video. Then create a quiz with an answer key based on the information in this video."]
1867 | )
1868 | 
1869 | print(response.text)
1870 | 
1871 | 
1872 | Pass video data inline
1873 | Instead of uploading a video file using the File API, you can pass smaller videos directly in the request to generateContent. This is suitable for shorter videos under 20MB total request size.
1874 | 
1875 | Here's an example of providing inline video data:
1876 | 
1877 | 
1878 | # Only for videos of size <20Mb
1879 | video_file_name = "/path/to/your/video.mp4"
1880 | video_bytes = open(video_file_name, 'rb').read()
1881 | 
1882 | response = client.models.generate_content(
1883 |     model='models/gemini-2.5-flash',
1884 |     contents=types.Content(
1885 |         parts=[
1886 |             types.Part(
1887 |                 inline_data=types.Blob(data=video_bytes, mime_type='video/mp4')
1888 |             ),
1889 |             types.Part(text='Please summarize the video in 3 sentences.')
1890 |         ]
1891 |     )
1892 | )
1893 | 
1894 | 
1895 | Include a YouTube URL
1896 | Preview: The YouTube URL feature is in preview and is available at no charge. Pricing and rate limits are likely to change.
1897 | The Gemini API and AI Studio support YouTube URLs as a file data Part. You can include a YouTube URL with a prompt asking the model to summarize, translate, or otherwise interact with the video content.
1898 | 
1899 | Limitations:
1900 | 
1901 | For the free tier, you can't upload more than 8 hours of YouTube video per day.
1902 | For the paid tier, there is no limit based on video length.
1903 | For models before 2.5, you can upload only 1 video per request. For models after 2.5, you can upload a maximum of 10 videos per request.
1904 | You can only upload public videos (not private or unlisted videos).
1905 | The following example shows how to include a YouTube URL with a prompt:
1906 | 
1907 | 
1908 | response = client.models.generate_content(
1909 |     model='models/gemini-2.5-flash',
1910 |     contents=types.Content(
1911 |         parts=[
1912 |             types.Part(
1913 |                 file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=9hE5-98ZeCg')
1914 |             ),
1915 |             types.Part(text='Please summarize the video in 3 sentences.')
1916 |         ]
1917 |     )
1918 | )
1919 | 
1920 | Refer to timestamps in the content
1921 | You can ask questions about specific points in time within the video using timestamps of the form MM:SS.
1922 | 
1923 | prompt = "What are the examples given at 00:05 and 00:10 supposed to show us?" # Adjusted timestamps for the NASA video
1924 | 
1925 | 
1926 | Transcribe video and provide visual descriptions
1927 | The Gemini models can transcribe and provide visual descriptions of video content by processing both the audio track and visual frames. For visual descriptions, the model samples the video at a rate of 1 frame per second. This sampling rate may affect the level of detail in the descriptions, particularly for videos with rapidly changing visuals.
1928 | 
1929 | 
1930 | Customize video processing
1931 | You can customize video processing in the Gemini API by setting clipping intervals or providing custom frame rate sampling.
1932 | 
1933 | Tip: Video clipping and frames per second (FPS) are supported by all models, but the quality is significantly higher from 2.5 series models.
1934 | Set clipping intervals
1935 | You can clip video by specifying videoMetadata with start and end offsets.
1936 | 
1937 | 
1938 | response = client.models.generate_content(
1939 |     model='models/gemini-2.5-flash',
1940 |     contents=types.Content(
1941 |         parts=[
1942 |             types.Part(
1943 |                 file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=XEzRZ35urlk'),
1944 |                 video_metadata=types.VideoMetadata(
1945 |                     start_offset='1250s',
1946 |                     end_offset='1570s'
1947 |                 )
1948 |             ),
1949 |             types.Part(text='Please summarize the video in 3 sentences.')
1950 |         ]
1951 |     )
1952 | )
1953 | 
1954 | 
1955 | Set a custom frame rate
1956 | You can set custom frame rate sampling by passing an fps argument to videoMetadata.
1957 | 
1958 | # Only for videos of size <20Mb
1959 | video_file_name = "/path/to/your/video.mp4"
1960 | video_bytes = open(video_file_name, 'rb').read()
1961 | 
1962 | response = client.models.generate_content(
1963 |     model='models/gemini-2.5-flash',
1964 |     contents=types.Content(
1965 |         parts=[
1966 |             types.Part(
1967 |                 inline_data=types.Blob(
1968 |                     data=video_bytes,
1969 |                     mime_type='video/mp4'),
1970 |                 video_metadata=types.VideoMetadata(fps=5)
1971 |             ),
1972 |             types.Part(text='Please summarize the video in 3 sentences.')
1973 |         ]
1974 |     )
1975 | )
1976 | 
1977 | 
1978 | Audio understanding
1979 | 
1980 | Gemini can analyze and understand audio input, enabling use cases like the following:
1981 | 
1982 | Describe, summarize, or answer questions about audio content.
1983 | Provide a transcription of the audio.
1984 | Analyze specific segments of the audio.
1985 | This guide shows you how to use the Gemini API to generate a text response to audio input.
1986 | 
1987 | Before you begin
1988 | Before calling the Gemini API, ensure you have your SDK of choice installed, and a Gemini API key configured and ready to use.
1989 | 
1990 | Input audio
1991 | You can provide audio data to Gemini in the following ways:
1992 | 
1993 | Upload an audio file before making a request to generateContent.
1994 | Pass inline audio data with the request to generateContent.
1995 | Upload an audio file
1996 | You can use the Files API to upload an audio file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB.
1997 | 
1998 | The following code uploads an audio file and then uses the file in a call to generateContent.
1999 | 
2000 | 
2001 | from google import genai
2002 | 
2003 | client = genai.Client()
2004 | 
2005 | myfile = client.files.upload(file="path/to/sample.mp3")
2006 | 
2007 | response = client.models.generate_content(
2008 |     model="gemini-2.5-flash", contents=["Describe this audio clip", myfile]
2009 | )
2010 | 
2011 | print(response.text)
2012 | 
2013 | 
2014 | 
2015 | Code execution
2016 | 
2017 | The Gemini API provides a code execution tool that enables the model to generate and run Python code. The model can then learn iteratively from the code execution results until it arrives at a final output. You can use code execution to build applications that benefit from code-based reasoning. For example, you can use code execution to solve equations or process text. You can also use the libraries included in the code execution environment to perform more specialized tasks.
2018 | 
2019 | Gemini is only able to execute code in Python. You can still ask Gemini to generate code in another language, but the model can't use the code execution tool to run it.
2020 | 
2021 | Enable code execution
2022 | To enable code execution, configure the code execution tool on the model. This allows the model to generate and run code.
2023 | 
2024 | from google import genai
2025 | from google.genai import types
2026 | 
2027 | client = genai.Client()
2028 | 
2029 | response = client.models.generate_content(
2030 |     model="gemini-2.5-flash",
2031 |     contents="What is the sum of the first 50 prime numbers? "
2032 |     "Generate and run code for the calculation, and make sure you get all 50.",
2033 |     config=types.GenerateContentConfig(
2034 |         tools=[types.Tool(code_execution=types.ToolCodeExecution)]
2035 |     ),
2036 | )
2037 | 
2038 | for part in response.candidates[0].content.parts:
2039 |     if part.text is not None:
2040 |         print(part.text)
2041 |     if part.executable_code is not None:
2042 |         print(part.executable_code.code)
2043 |     if part.code_execution_result is not None:
2044 |         print(part.code_execution_result.output)
2045 | 
2046 | 
2047 | Use code execution in chat
2048 | You can also use code execution as part of a chat.
2049 | 
2050 | 
2051 | from google import genai
2052 | from google.genai import types
2053 | 
2054 | client = genai.Client()
2055 | 
2056 | chat = client.chats.create(
2057 |     model="gemini-2.5-flash",
2058 |     config=types.GenerateContentConfig(
2059 |         tools=[types.Tool(code_execution=types.ToolCodeExecution)]
2060 |     ),
2061 | )
2062 | 
2063 | response = chat.send_message("I have a math question for you.")
2064 | print(response.text)
2065 | 
2066 | response = chat.send_message(
2067 |     "What is the sum of the first 50 prime numbers? "
2068 |     "Generate and run code for the calculation, and make sure you get all 50."
2069 | )
2070 | 
2071 | for part in response.candidates[0].content.parts:
2072 |     if part.text is not None:
2073 |         print(part.text)
2074 |     if part.executable_code is not None:
2075 |         print(part.executable_code.code)
2076 |     if part.code_execution_result is not None:
2077 |         print(part.code_execution_result.output)
2078 | 
2079 | 
2080 | 
2081 | 
2082 | Supported libraries
2083 | The code execution environment includes the following libraries:
2084 | 
2085 | attrs
2086 | chess
2087 | contourpy
2088 | fpdf
2089 | geopandas
2090 | imageio
2091 | jinja2
2092 | joblib
2093 | jsonschema
2094 | jsonschema-specifications
2095 | lxml
2096 | matplotlib
2097 | mpmath
2098 | numpy
2099 | opencv-python
2100 | openpyxl
2101 | packaging
2102 | pandas
2103 | pillow
2104 | protobuf
2105 | pylatex
2106 | pyparsing
2107 | PyPDF2
2108 | python-dateutil
2109 | python-docx
2110 | python-pptx
2111 | reportlab
2112 | scikit-learn
2113 | scipy
2114 | seaborn
2115 | six
2116 | striprtf
2117 | sympy
2118 | tabulate
2119 | tensorflow
2120 | toolz
2121 | xlrd
2122 | 
2123 | 
2124 | 
2125 | URL context
2126 | 
2127 | Experimental: The URL context tool is an experimental feature.
2128 | Using the URL context tool, you can provide Gemini with URLs as additional context for your prompt. The model can then retrieve content from the URLs and use that content to inform and shape its response.
2129 | 
2130 | This tool is useful for tasks like the following:
2131 | 
2132 | Extracting key data points or talking points from articles
2133 | Comparing information across multiple links
2134 | Synthesizing data from several sources
2135 | Answering questions based on the content of a specific page or pages
2136 | Analyzing content for specific purposes (like writing a job description or creating test questions)
2137 | This guide explains how to use the URL context tool in the Gemini API.
2138 | 
2139 | Use URL context
2140 | You can use the URL context tool in two main ways, by itself or in conjunction with Grounding with Google Search.
2141 | 
2142 | URL Context Only
2143 | 
2144 | You provide specific URLs that you want the model to analyze directly in your prompt.
2145 | 
2146 | Example prompts:
2147 | 
2148 | 
2149 | Summarize this document: YOUR_URLs
2150 | 
2151 | Extract the key features from the product description on this page: YOUR_URLs
2152 | Grounding with Google Search + URL Context
2153 | 
2154 | You can also enable both URL context and Grounding with Google Search together. You can enter a prompt with or without URLs. The model may first search for relevant information and then use the URL context tool to read the content of the search results for a more in-depth understanding.
2155 | 
2156 | Example prompts:
2157 | 
2158 | 
2159 | Give me three day events schedule based on YOUR_URL. Also let me know what needs to taken care of considering weather and commute.
2160 | 
2161 | Recommend 3 books for beginners to read to learn more about the latest YOUR_subject.
2162 | Code examples with URL context only
2163 | 
2164 | 
2165 | from google import genai
2166 | from google.genai.types import Tool, GenerateContentConfig, GoogleSearch
2167 | 
2168 | client = genai.Client()
2169 | model_id = "gemini-2.5-flash"
2170 | 
2171 | url_context_tool = Tool(
2172 |     url_context = types.UrlContext
2173 | )
2174 | 
2175 | response = client.models.generate_content(
2176 |     model=model_id,
2177 |     contents="Compare recipes from YOUR_URL1 and YOUR_URL2",
2178 |     config=GenerateContentConfig(
2179 |         tools=[url_context_tool],
2180 |         response_modalities=["TEXT"],
2181 |     )
2182 | )
2183 | 
2184 | for each in response.candidates[0].content.parts:
2185 |     print(each.text)
2186 | # get URLs retrieved for context
2187 | print(response.candidates[0].url_context_metadata)
2188 | 
2189 | 
2190 | Code examples with Grounding with Google Search
2191 | 
2192 | 
2193 | from google import genai
2194 | from google.genai.types import Tool, GenerateContentConfig, GoogleSearch
2195 | 
2196 | client = genai.Client()
2197 | model_id = "gemini-2.5-flash"
2198 | 
2199 | tools = []
2200 | tools.append(Tool(url_context=types.UrlContext))
2201 | tools.append(Tool(google_search=types.GoogleSearch))
2202 | 
2203 | response = client.models.generate_content(
2204 |     model=model_id,
2205 |     contents="Give me three day events schedule based on YOUR_URL. Also let me know what needs to taken care of considering weather and commute.",
2206 |     config=GenerateContentConfig(
2207 |         tools=tools,
2208 |         response_modalities=["TEXT"],
2209 |     )
2210 | )
2211 | 
2212 | for each in response.candidates[0].content.parts:
2213 |     print(each.text)
2214 | # get URLs retrieved for context
2215 | print(response.candidates[0].url_context_metadata)
2216 | 
2217 | 
2218 | 
2219 | Limitations
2220 | The tool will consume up to 20 URLs per request for analysis.
2221 | For best results during experimental phase, use the tool on standard web pages rather than multimedia content such as YouTube videos.
2222 | During experimental phase, the tool is free to use. Billing to come later.
2223 | The experimental release has the following quotas:
2224 | 
2225 | 1500 queries per day per project for requests made through the Gemini API
2226 | 100 queries per day per user in Google AI Studio
2227 | 
2228 | 
2229 | 
2230 | 
2231 | 
2232 | Grounding with Google Search
2233 | 
2234 | Grounding with Google Search connects the Gemini model to real-time web content and works with all available languages. This allows Gemini to provide more accurate answers and cite verifiable sources beyond its knowledge cutoff.
2235 | 
2236 | Grounding helps you build applications that can:
2237 | 
2238 | Increase factual accuracy: Reduce model hallucinations by basing responses on real-world information.
2239 | Access real-time information: Answer questions about recent events and topics.
2240 | Provide citations: Build user trust by showing the sources for the model's claims.
2241 | 
2242 | 
2243 | 
2244 | from google import genai
2245 | from google.genai import types
2246 | 
2247 | # Configure the client
2248 | client = genai.Client()
2249 | 
2250 | # Define the grounding tool
2251 | grounding_tool = types.Tool(
2252 |     google_search=types.GoogleSearch()
2253 | )
2254 | 
2255 | # Configure generation settings
2256 | config = types.GenerateContentConfig(
2257 |     tools=[grounding_tool]
2258 | )
2259 | 
2260 | # Make the request
2261 | response = client.models.generate_content(
2262 |     model="gemini-2.5-flash",
2263 |     contents="Who won the euro 2024?",
2264 |     config=config,
2265 | )
2266 | 
2267 | # Print the grounded response
2268 | print(response.text)
2269 | 
2270 | 
2271 | 
2272 | How grounding with Google Search works
2273 | When you enable the google_search tool, the model handles the entire workflow of searching, processing, and citing information automatically.
2274 | 
2275 | grounding-overview
2276 | 
2277 | User Prompt: Your application sends a user's prompt to the Gemini API with the google_search tool enabled.
2278 | Prompt Analysis: The model analyzes the prompt and determines if a Google Search can improve the answer.
2279 | Google Search: If needed, the model automatically generates one or multiple search queries and executes them.
2280 | Search Results Processing: The model processes the search results, synthesizes the information, and formulates a response.
2281 | Grounded Response: The API returns a final, user-friendly response that is grounded in the search results. This response includes the model's text answer and groundingMetadata with the search queries, web results, and citations.
2282 | Understanding the Grounding Response
2283 | When a response is successfully grounded, the response includes a groundingMetadata field. This structured data is essential for verifying claims and building a rich citation experience in your application.
2284 | 
2285 | 
2286 | {
2287 |   "candidates": [
2288 |     {
2289 |       "content": {
2290 |         "parts": [
2291 |           {
2292 |             "text": "Spain won Euro 2024, defeating England 2-1 in the final. This victory marks Spain's record fourth European Championship title."
2293 |           }
2294 |         ],
2295 |         "role": "model"
2296 |       },
2297 |       "groundingMetadata": {
2298 |         "webSearchQueries": [
2299 |           "UEFA Euro 2024 winner",
2300 |           "who won euro 2024"
2301 |         ],
2302 |         "searchEntryPoint": {
2303 |           "renderedContent": "<!-- HTML and CSS for the search widget -->"
2304 |         },
2305 |         "groundingChunks": [
2306 |           {"web": {"uri": "https://vertexaisearch.cloud.google.com.....", "title": "aljazeera.com"}},
2307 |           {"web": {"uri": "https://vertexaisearch.cloud.google.com.....", "title": "uefa.com"}}
2308 |         ],
2309 |         "groundingSupports": [
2310 |           {
2311 |             "segment": {"startIndex": 0, "endIndex": 85, "text": "Spain won Euro 2024, defeatin..."},
2312 |             "groundingChunkIndices": [0]
2313 |           },
2314 |           {
2315 |             "segment": {"startIndex": 86, "endIndex": 210, "text": "This victory marks Spain's..."},
2316 |             "groundingChunkIndices": [0, 1]
2317 |           }
2318 |         ]
2319 |       }
2320 |     }
2321 |   ]
2322 | }
2323 | 
2324 | 
2325 | 
2326 | The Gemini API returns the following information with the groundingMetadata:
2327 | 
2328 | webSearchQueries : Array of the search queries used. This is useful for debugging and understanding the model's reasoning process.
2329 | searchEntryPoint : Contains the HTML and CSS to render the required Search Suggestions. Full usage requirements are detailed in the Terms of Service.
2330 | groundingChunks : Array of objects containing the web sources (uri and title).
2331 | groundingSupports : Array of chunks to connect model response text to the sources in groundingChunks. Each chunk links a text segment (defined by startIndex and endIndex) to one or more groundingChunkIndices. This is the key to building inline citations.
2332 | Grounding with Google Search can also be used in combination with the URL context tool to ground responses in both public web data and the specific URLs you provide.
2333 | 
2334 | Attributing Sources with inline Citations
2335 | The API returns structured citation data, giving you complete control over how you display sources in your user interface. You can use the groundingSupports and groundingChunks fields to link the model's statements directly to their sources. Here is a common pattern for processing the metadata to create a response with inline, clickable citations.
2336 | 
2337 | 
2338 | def add_citations(response):
2339 |     text = response.text
2340 |     supports = response.candidates[0].grounding_metadata.grounding_supports
2341 |     chunks = response.candidates[0].grounding_metadata.grounding_chunks
2342 | 
2343 |     # Sort supports by end_index in descending order to avoid shifting issues when inserting.
2344 |     sorted_supports = sorted(supports, key=lambda s: s.segment.end_index, reverse=True)
2345 | 
2346 |     for support in sorted_supports:
2347 |         end_index = support.segment.end_index
2348 |         if support.grounding_chunk_indices:
2349 |             # Create citation string like [1](link1)[2](link2)
2350 |             citation_links = []
2351 |             for i in support.grounding_chunk_indices:
2352 |                 if i < len(chunks):
2353 |                     uri = chunks[i].web.uri
2354 |                     citation_links.append(f"[{i + 1}]({uri})")
2355 | 
2356 |             citation_string = ", ".join(citation_links)
2357 |             text = text[:end_index] + citation_string + text[end_index:]
2358 | 
2359 |     return text
2360 | 
2361 | # Assuming response with grounding metadata
2362 | text_with_citations = add_citations(response)
2363 | print(text_with_citations)
2364 | 
2365 | 
2366 | 
2367 | 
2368 | 
2369 | 
2370 | 
2371 | Get started with Live API
2372 | 
2373 | Preview: The Live API is in preview.
2374 | The Live API enables low-latency, real-time voice and video interactions with Gemini. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses, creating a natural conversational experience for your users.
2375 | 
2376 | Live API Overview
2377 | 
2378 | Live API offers a comprehensive set of features such as Voice Activity Detection, tool use and function calling, session management (for managing long running conversations) and ephemeral tokens (for secure client-sided authentication).
2379 | 
2380 | This page gets you up and running with examples and basic code samples.
2381 | 
2382 | Example applications
2383 | Check out the following example applications that illustrate how to use Live API for end-to-end use cases:
2384 | 
2385 | Live audio starter app on AI Studio, using JavaScript libraries to connect to Live API and stream bidirectional audio through your microphone and speakers.
2386 | Live API Python cookbook using Pyaudio that connects to Live API.
2387 | Partner integrations
2388 | If you prefer a simpler development process, you can use Daily or LiveKit. These are third-party partner platforms that have already integrated the Gemini Live API over the WebRTC protocol to streamline the development of real-time audio and video applications.
2389 | 
2390 | Before you begin building
2391 | There are two important decisions to make before you begin building with the Live API: choosing a model and choosing an implementation approach.
2392 | 
2393 | Choose an audio generation architecture
2394 | If you're building an audio-based use case, your choice of model determines the audio generation architecture used to create the audio response:
2395 | 
2396 | Native audio: This option provides the most natural and realistic-sounding speech and better multilingual performance. It also enables advanced features like affective (emotion-aware) dialogue, proactive audio (where the model can decide to ignore or respond to certain inputs), and "thinking". Native audio is supported by the following native audio models:
2397 | gemini-2.5-flash-preview-native-audio-dialog
2398 | gemini-2.5-flash-exp-native-audio-thinking-dialog
2399 | Half-cascade audio: This option uses a cascaded model architecture (native audio input and text-to-speech output). It offers better performance and reliability in production environments, especially with tool use. Half-cascaded audio is supported by the following models:
2400 | gemini-live-2.5-flash-preview
2401 | gemini-2.0-flash-live-001
2402 | Choose an implementation approach
2403 | When integrating with Live API, you'll need to choose one of the following implementation approaches:
2404 | 
2405 | Server-to-server: Your backend connects to the Live API using WebSockets. Typically, your client sends stream data (audio, video, text) to your server, which then forwards it to the Live API.
2406 | Client-to-server: Your frontend code connects directly to the Live API using WebSockets to stream data, bypassing your backend.
2407 | Note: Client-to-server generally offers better performance for streaming audio and video, since it bypasses the need to send the stream to your backend first. It's also easier to set up since you don't need to implement a proxy that sends data from your client to your server and then your server to the API. However, for production environments, in order to mitigate security risks, we recommend using ephemeral tokens instead of standard API keys.
2408 | Get started
2409 | This example reads a WAV file, sends it in the correct format, and saves the received data as WAV file.
2410 | 
2411 | You can send audio by converting it to 16-bit PCM, 16kHz, mono format, and you can receive audio by setting AUDIO as response modality. The output uses a sample rate of 24kHz.
2412 | 
2413 | 
2414 | 
2415 | # Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
2416 | # Install helpers for converting files: pip install librosa soundfile
2417 | import asyncio
2418 | import io
2419 | from pathlib import Path
2420 | import wave
2421 | from google import genai
2422 | from google.genai import types
2423 | import soundfile as sf
2424 | import librosa
2425 | 
2426 | client = genai.Client()
2427 | 
2428 | # Half cascade model:
2429 | # model = "gemini-live-2.5-flash-preview"
2430 | 
2431 | # Native audio output model:
2432 | model = "gemini-2.5-flash-preview-native-audio-dialog"
2433 | 
2434 | config = {
2435 |   "response_modalities": ["AUDIO"],
2436 |   "system_instruction": "You are a helpful assistant and answer in a friendly tone.",
2437 | }
2438 | 
2439 | async def main():
2440 |     async with client.aio.live.connect(model=model, config=config) as session:
2441 | 
2442 |         buffer = io.BytesIO()
2443 |         y, sr = librosa.load("sample.wav", sr=16000)
2444 |         sf.write(buffer, y, sr, format='RAW', subtype='PCM_16')
2445 |         buffer.seek(0)
2446 |         audio_bytes = buffer.read()
2447 | 
2448 |         # If already in correct format, you can use this:
2449 |         # audio_bytes = Path("sample.pcm").read_bytes()
2450 | 
2451 |         await session.send_realtime_input(
2452 |             audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
2453 |         )
2454 | 
2455 |         wf = wave.open("audio.wav", "wb")
2456 |         wf.setnchannels(1)
2457 |         wf.setsampwidth(2)
2458 |         wf.setframerate(24000)  # Output is 24kHz
2459 | 
2460 |         async for response in session.receive():
2461 |             if response.data is not None:
2462 |                 wf.writeframes(response.data)
2463 | 
2464 |             # Un-comment this code to print audio data info
2465 |             # if response.server_content.model_turn is not None:
2466 |             #      print(response.server_content.model_turn.parts[0].inline_data.mime_type)
2467 | 
2468 |         wf.close()
2469 | 
2470 | if __name__ == "__main__":
2471 |     asyncio.run(main())
2472 | 
2473 | 
2474 | 
2475 | Establishing a connection
2476 | The following example shows how to create a connection with an API key:
2477 | 
2478 | 
2479 | import asyncio
2480 | from google import genai
2481 | 
2482 | client = genai.Client()
2483 | 
2484 | model = "gemini-live-2.5-flash-preview"
2485 | config = {"response_modalities": ["TEXT"]}
2486 | 
2487 | async def main():
2488 |     async with client.aio.live.connect(model=model, config=config) as session:
2489 |         print("Session started")
2490 | 
2491 | if __name__ == "__main__":
2492 |     asyncio.run(main())
2493 | 
2494 | 
2495 | 
2496 | 
2497 | Sending and receiving audio
2498 | The most common audio example, audio-to-audio, is covered in the Getting started guide.
2499 | 
2500 | Here's an audio-to-text example that reads a WAV file, sends it in the correct format and receives text output:
2501 | 
2502 | 
2503 | # Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
2504 | # Install helpers for converting files: pip install librosa soundfile
2505 | import asyncio
2506 | import io
2507 | from pathlib import Path
2508 | from google import genai
2509 | from google.genai import types
2510 | import soundfile as sf
2511 | import librosa
2512 | 
2513 | client = genai.Client()
2514 | model = "gemini-live-2.5-flash-preview"
2515 | 
2516 | config = {"response_modalities": ["TEXT"]}
2517 | 
2518 | async def main():
2519 |     async with client.aio.live.connect(model=model, config=config) as session:
2520 | 
2521 |         buffer = io.BytesIO()
2522 |         y, sr = librosa.load("sample.wav", sr=16000)
2523 |         sf.write(buffer, y, sr, format='RAW', subtype='PCM_16')
2524 |         buffer.seek(0)
2525 |         audio_bytes = buffer.read()
2526 | 
2527 |         # If already in correct format, you can use this:
2528 |         # audio_bytes = Path("sample.pcm").read_bytes()
2529 | 
2530 |         await session.send_realtime_input(
2531 |             audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
2532 |         )
2533 | 
2534 |         async for response in session.receive():
2535 |             if response.text is not None:
2536 |                 print(response.text)
2537 | 
2538 | if __name__ == "__main__":
2539 |     asyncio.run(main())
2540 | 
2541 | 
2542 | And here is a text-to-audio example. You can receive audio by setting AUDIO as response modality. This example saves the received data as WAV file:
2543 | 
2544 | 
2545 | 
2546 | import asyncio
2547 | import wave
2548 | from google import genai
2549 | 
2550 | client = genai.Client()
2551 | model = "gemini-live-2.5-flash-preview"
2552 | 
2553 | config = {"response_modalities": ["AUDIO"]}
2554 | 
2555 | async def main():
2556 |     async with client.aio.live.connect(model=model, config=config) as session:
2557 |         wf = wave.open("audio.wav", "wb")
2558 |         wf.setnchannels(1)
2559 |         wf.setsampwidth(2)
2560 |         wf.setframerate(24000)
2561 | 
2562 |         message = "Hello how are you?"
2563 |         await session.send_client_content(
2564 |             turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
2565 |         )
2566 | 
2567 |         async for response in session.receive():
2568 |             if response.data is not None:
2569 |                 wf.writeframes(response.data)
2570 | 
2571 |             # Un-comment this code to print audio data info
2572 |             # if response.server_content.model_turn is not None:
2573 |             #      print(response.server_content.model_turn.parts[0].inline_data.mime_type)
2574 | 
2575 |         wf.close()
2576 | 
2577 | if __name__ == "__main__":
2578 |     asyncio.run(main())
2579 | 
2580 | 
2581 | 
2582 | 
2583 | 
2584 | Context caching
2585 | 
2586 | In a typical AI workflow, you might pass the same input tokens over and over to a model. The Gemini API offers two different caching mechanisms:
2587 | 
2588 | Implicit caching (automatically enabled on Gemini 2.5 models, no cost saving guarantee)
2589 | Explicit caching (can be manually enabled on most models, cost saving guarantee)
2590 | Explicit caching is useful in cases where you want to guarantee cost savings, but with some added developer work.
2591 | 
2592 | Implicit caching
2593 | Implicit caching is enabled by default for all Gemini 2.5 models. We automatically pass on cost savings if your request hits caches. There is nothing you need to do in order to enable this. It is effective as of May 8th, 2025. The minimum input token count for context caching is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro.
2594 | 
2595 | To increase the chance of an implicit cache hit:
2596 | 
2597 | Try putting large and common contents at the beginning of your prompt
2598 | Try to send requests with similar prefix in a short amount of time
2599 | You can see the number of tokens which were cache hits in the response object's usage_metadata field.
2600 | 
2601 | Explicit caching
2602 | Using the Gemini API explicit caching feature, you can pass some content to the model once, cache the input tokens, and then refer to the cached tokens for subsequent requests. At certain volumes, using cached tokens is lower cost than passing in the same corpus of tokens repeatedly.
2603 | 
2604 | When you cache a set of tokens, you can choose how long you want the cache to exist before the tokens are automatically deleted. This caching duration is called the time to live (TTL). If not set, the TTL defaults to 1 hour. The cost for caching depends on the input token size and how long you want the tokens to persist.
2605 | 
2606 | This section assumes that you've installed a Gemini SDK (or have curl installed) and that you've configured an API key, as shown in the quickstart.
2607 | 
2608 | Generate content using a cache
2609 | The following example shows how to generate content using a cached system instruction and video file.
2610 | 
2611 | 
2612 | import os
2613 | import pathlib
2614 | import requests
2615 | import time
2616 | 
2617 | from google import genai
2618 | from google.genai import types
2619 | 
2620 | client = genai.Client()
2621 | 
2622 | # Download video file
2623 | url = 'https://storage.googleapis.com/generativeai-downloads/data/SherlockJr._10min.mp4'
2624 | path_to_video_file = pathlib.Path('SherlockJr._10min.mp4')
2625 | if not path_to_video_file.exists():
2626 |   with path_to_video_file.open('wb') as wf:
2627 |     response = requests.get(url, stream=True)
2628 |     for chunk in response.iter_content(chunk_size=32768):
2629 |       wf.write(chunk)
2630 | 
2631 | # Upload the video using the Files API
2632 | video_file = client.files.upload(file=path_to_video_file)
2633 | 
2634 | # Wait for the file to finish processing
2635 | while video_file.state.name == 'PROCESSING':
2636 |   print('Waiting for video to be processed.')
2637 |   time.sleep(2)
2638 |   video_file = client.files.get(name=video_file.name)
2639 | 
2640 | print(f'Video processing complete: {video_file.uri}')
2641 | 
2642 | # You must use an explicit version suffix: "-flash-001", not just "-flash".
2643 | model='models/gemini-2.0-flash-001'
2644 | 
2645 | # Create a cache with a 5 minute TTL
2646 | cache = client.caches.create(
2647 |     model=model,
2648 |     config=types.CreateCachedContentConfig(
2649 |       display_name='sherlock jr movie', # used to identify the cache
2650 |       system_instruction=(
2651 |           'You are an expert video analyzer, and your job is to answer '
2652 |           'the user\'s query based on the video file you have access to.'
2653 |       ),
2654 |       contents=[video_file],
2655 |       ttl="300s",
2656 |   )
2657 | )
2658 | 
2659 | # Construct a GenerativeModel which uses the created cache.
2660 | response = client.models.generate_content(
2661 |   model = model,
2662 |   contents= (
2663 |     'Introduce different characters in the movie by describing '
2664 |     'their personality, looks, and names. Also list the timestamps '
2665 |     'they were introduced for the first time.'),
2666 |   config=types.GenerateContentConfig(cached_content=cache.name)
2667 | )
2668 | 
2669 | print(response.usage_metadata)
2670 | 
2671 | # The output should look something like this:
2672 | #
2673 | # prompt_token_count: 696219
2674 | # cached_content_token_count: 696190
2675 | # candidates_token_count: 214
2676 | # total_token_count: 696433
2677 | 
2678 | print(response.text)
2679 | 
2680 | 
2681 | 
2682 | List caches
2683 | It's not possible to retrieve or view cached content, but you can retrieve cache metadata (name, model, display_name, usage_metadata, create_time, update_time, and expire_time).
2684 | 
2685 | To list metadata for all uploaded caches, use CachedContent.list():
2686 | 
2687 | 
2688 | for cache in client.caches.list():
2689 |   print(cache)
2690 | To fetch the metadata for one cache object, if you know its name, use get:
2691 | 
2692 | 
2693 | client.caches.get(name=name)
2694 | Update a cache
2695 | You can set a new ttl or expire_time for a cache. Changing anything else about the cache isn't supported.
2696 | 
2697 | The following example shows how to update the ttl of a cache using client.caches.update().
2698 | 
2699 | 
2700 | from google import genai
2701 | from google.genai import types
2702 | 
2703 | client.caches.update(
2704 |   name = cache.name,
2705 |   config  = types.UpdateCachedContentConfig(
2706 |       ttl='300s'
2707 |   )
2708 | )
2709 | To set the expiry time, it will accepts either a datetime object or an ISO-formatted datetime string (dt.isoformat(), like 2025-01-27T16:02:36.473528+00:00). Your time must include a time zone (datetime.utcnow() doesn't attach a time zone, datetime.now(datetime.timezone.utc) does attach a time zone).
2710 | 
2711 | 
2712 | from google import genai
2713 | from google.genai import types
2714 | import datetime
2715 | 
2716 | # You must use a time zone-aware time.
2717 | in10min = datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(minutes=10)
2718 | 
2719 | client.caches.update(
2720 |   name = cache.name,
2721 |   config  = types.UpdateCachedContentConfig(
2722 |       expire_time=in10min
2723 |   )
2724 | )
2725 | Delete a cache
2726 | The caching service provides a delete operation for manually removing content from the cache. The following example shows how to delete a cache:
2727 | 
2728 | 
2729 | client.caches.delete(cache.name)
2730 | Explicit caching using the OpenAI library
2731 | If you're using an OpenAI library, you can enable explicit caching using the cached_content property on extra_body.
2732 | 
2733 | When to use explicit caching
2734 | Context caching is particularly well suited to scenarios where a substantial initial context is referenced repeatedly by shorter requests. Consider using context caching for use cases such as:
2735 | 
2736 | Chatbots with extensive system instructions
2737 | Repetitive analysis of lengthy video files
2738 | Recurring queries against large document sets
2739 | Frequent code repository analysis or bug fixing
2740 | 
2741 | 
2742 | 
2743 | 
2744 | 
2745 | Files API
2746 | 
2747 | The Gemini family of artificial intelligence (AI) models is built to handle various types of input data, including text, images, and audio. Since these models can handle more than one type or mode of data, the Gemini models are called multimodal models or explained as having multimodal capabilities.
2748 | 
2749 | This guide shows you how to work with media files using the Files API. The basic operations are the same for audio files, images, videos, documents, and other supported file types.
2750 | 
2751 | For file prompting guidance, check out the File prompt guide section.
2752 | 
2753 | Upload a file
2754 | You can use the Files API to upload a media file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB.
2755 | 
2756 | The following code uploads a file and then uses the file in a call to generateContent.
2757 | 
2758 | 
2759 | from google import genai
2760 | 
2761 | client = genai.Client()
2762 | 
2763 | myfile = client.files.upload(file="path/to/sample.mp3")
2764 | 
2765 | response = client.models.generate_content(
2766 |     model="gemini-2.5-flash", contents=["Describe this audio clip", myfile]
2767 | )
2768 | 
2769 | print(response.text)
2770 | 
2771 | 
2772 | Get metadata for a file
2773 | You can verify that the API successfully stored the uploaded file and get its metadata by calling files.get.
2774 | 
2775 | 
2776 | 
2777 | 
2778 | 
2779 | 
2780 | Batch Mode
2781 | 
2782 | The Gemini API's Batch Mode is designed to process large volumes of requests asynchronously at 50% of the standard cost. The target turnaround time is 24 hours, but in majority of cases, it is much quicker.
2783 | 
2784 | Use Batch Mode for large-scale, non-urgent tasks such as data pre-processing or running evaluations where an immediate response is not required.
2785 | 
2786 | Note: You can access Batch Mode via our Python libraries or the REST API. Support for other languages (including JavaScript/TypeScript) is coming soon.
2787 | Creating a batch job
2788 | You have two ways to submit your requests in Batch Mode:
2789 | 
2790 | Inline Requests: A list of GenerateContentRequest objects directly included in your batch creation request. This is suitable for smaller batches that keep the total request size under 20MB. The output returned from the model is a list of inlineResponse objects.
2791 | Input File: A JSON Lines (JSONL) file where each line contains a complete GenerateContentRequest object. This method is recommended for larger requests. The output returned from the model is a JSONL file where each line is either a GenerateContentResponse or a status object.
2792 | Inline requests
2793 | For a small number of requests, you can directly embed the GenerateContentRequest objects within your BatchGenerateContentRequest. The following example calls the BatchGenerateContent method with inline requests:
2794 | 
2795 | 
2796 | from google import genai
2797 | from google.genai import types
2798 | 
2799 | client = genai.Client()
2800 | 
2801 | # A list of dictionaries, where each is a GenerateContentRequest
2802 | inline_requests = [
2803 |     {
2804 |         'contents': [{
2805 |             'parts': [{'text': 'Tell me a one-sentence joke.'}],
2806 |             'role': 'user'
2807 |         }]
2808 |     },
2809 |     {
2810 |         'contents': [{
2811 |             'parts': [{'text': 'Why is the sky blue?'}],
2812 |             'role': 'user'
2813 |         }]
2814 |     }
2815 | ]
2816 | 
2817 | inline_batch_job = client.batches.create(
2818 |     model="models/gemini-2.5-flash",
2819 |     src=inline_requests,
2820 |     config={
2821 |         'display_name': "inlined-requests-job-1",
2822 |     },
2823 | )
2824 | 
2825 | print(f"Created batch job: {inline_batch_job.name}")
2826 | 
2827 | 
2828 | 
2829 | You can upload this file using the File API as shown in the following example. If you are working with multimodal input, you can reference other uploaded files within your JSONL file.
2830 | 
2831 | 
2832 | 
2833 | from google import genai
2834 | from google.genai import types
2835 | 
2836 | client = genai.Client()
2837 | 
2838 | # Create a sample JSONL file
2839 | with open("my-batch-requests.jsonl", "w") as f:
2840 |     requests = [
2841 |         {"key": "request-1", "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}},
2842 |         {"key": "request-2", "request": {"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}}
2843 |     ]
2844 |     for req in requests:
2845 |         f.write(json.dumps(req) + "\n")
2846 | 
2847 | # Upload the file to the File API
2848 | uploaded_file = client.files.upload(
2849 |     file='my-batch-requests.jsonl',
2850 |     config=types.UploadFileConfig(display_name='my-batch-requests', mime_type='jsonl')
2851 | )
2852 | 
2853 | print(f"Uploaded file: {uploaded_file.name}")
2854 | 
2855 | 
2856 | 
2857 | 
2858 | The following example calls the BatchGenerateContent method with the input file uploaded using File API:
2859 | 
2860 | 
2861 | # Assumes `uploaded_file` is the file object from the previous step
2862 | file_batch_job = client.batches.create(
2863 |     model="gemini-2.5-flash",
2864 |     src=uploaded_file.name,
2865 |     config={
2866 |         'display_name': "file-upload-job-1",
2867 |     },
2868 | )
2869 | 
2870 | print(f"Created batch job: {file_batch_job.name}")
2871 | 
2872 | 
2873 | Monitoring job status
2874 | Use the operation name obtained when creating the batch job to poll its status. The state field of the batch job will indicate its current status. A batch job can be in one of the following states:
2875 | 
2876 | JOB_STATE_PENDING: The job has been created and is waiting to be processed by the service.
2877 | JOB_STATE_SUCCEEDED: The job completed successfully. You can now retrieve the results.
2878 | JOB_STATE_FAILED: The job failed. Check the error details for more information.
2879 | JOB_STATE_CANCELLED: The job was cancelled by the user.
2880 | You can poll the job status periodically to check for completion.
2881 | 
2882 | 
2883 | 
2884 | # Use the name of the job you want to check
2885 | # e.g., inline_batch_job.name from the previous step
2886 | job_name = "YOUR_BATCH_JOB_NAME"  # (e.g. 'batches/your-batch-id')
2887 | batch_job = client.batches.get(name=job_name)
2888 | 
2889 | completed_states = set([
2890 |     'JOB_STATE_SUCCEEDED',
2891 |     'JOB_STATE_FAILED',
2892 |     'JOB_STATE_CANCELLED',
2893 | ])
2894 | 
2895 | print(f"Polling status for job: {job_name}")
2896 | batch_job = client.batches.get(name=job_name) # Initial get
2897 | while batch_job.state.name not in completed_states:
2898 |   print(f"Current state: {batch_job.state.name}")
2899 |   time.sleep(30) # Wait for 30 seconds before polling again
2900 |   batch_job = client.batches.get(name=job_name)
2901 | 
2902 | print(f"Job finished with state: {batch_job.state.name}")
2903 | if batch_job.state.name == 'JOB_STATE_FAILED':
2904 |     print(f"Error: {batch_job.error}")
2905 | 
2906 | 
2907 | 
2908 | Retrieving results
2909 | Once the job status indicates your batch job has succeeded, the results are available in the response field.
2910 | 
2911 | 
2912 | import json
2913 | 
2914 | # Use the name of the job you want to check
2915 | # e.g., inline_batch_job.name from the previous step
2916 | job_name = "YOUR_BATCH_JOB_NAME"
2917 | batch_job = client.batches.get(name=job_name)
2918 | 
2919 | if batch_job.state.name == 'JOB_STATE_SUCCEEDED':
2920 | 
2921 |     # If batch job was created with a file
2922 |     if batch_job.dest and batch_job.dest.file_name:
2923 |         # Results are in a file
2924 |         result_file_name = batch_job.dest.file_name
2925 |         print(f"Results are in file: {result_file_name}")
2926 | 
2927 |         print("Downloading result file content...")
2928 |         file_content = client.files.download(file=result_file_name)
2929 |         # Process file_content (bytes) as needed
2930 |         print(file_content.decode('utf-8'))
2931 | 
2932 |     # If batch job was created with inline request
2933 |     elif batch_job.dest and batch_job.dest.inlined_responses:
2934 |         # Results are inline
2935 |         print("Results are inline:")
2936 |         for i, inline_response in enumerate(batch_job.dest.inlined_responses):
2937 |             print(f"Response {i+1}:")
2938 |             if inline_response.response:
2939 |                 # Accessing response, structure may vary.
2940 |                 try:
2941 |                     print(inline_response.response.text)
2942 |                 except AttributeError:
2943 |                     print(inline_response.response) # Fallback
2944 |             elif inline_response.error:
2945 |                 print(f"Error: {inline_response.error}")
2946 |     else:
2947 |         print("No results found (neither file nor inline).")
2948 | else:
2949 |     print(f"Job did not succeed. Final state: {batch_job.state.name}")
2950 |     if batch_job.error:
2951 |         print(f"Error: {batch_job.error}")
2952 | 
2953 | 
2954 | 
2955 | 
2956 | Embeddings
2957 | 
2958 | Note: gemini-embedding-001 is our newest text embedding model available in the Gemini API.
2959 | The Gemini API offers text embedding models to generate embeddings for words, phrases, sentences, and code. These foundational embeddings power advanced NLP tasks such as semantic search, classification, and clustering, providing more accurate, context-aware results than keyword-based approaches.
2960 | 
2961 | Building Retrieval Augmented Generation (RAG) systems is a common use case for embeddings. Embeddings plays a key role in significantly enhancing model outputs with improved factual accuracy, coherence, and contextual richness. They efficiently retrieve relevant information from knowledge bases, represented by embeddings, which are then passed as additional context in the input prompt to language models, guiding it to generate more informed and accurate responses.
2962 | 
2963 | Generating embeddings
2964 | Use the embedContent method to generate text embeddings:
2965 | 
2966 | 
2967 | from google import genai
2968 | 
2969 | client = genai.Client()
2970 | 
2971 | result = client.models.embed_content(
2972 |         model="gemini-embedding-001",
2973 |         contents="What is the meaning of life?")
2974 | 
2975 | print(result.embeddings)
2976 | 
2977 | 
2978 | You can also generate embeddings for multiple chunks at once by passing them in as a list of strings.
2979 | 
2980 | 
2981 | from google import genai
2982 | 
2983 | client = genai.Client()
2984 | 
2985 | result = client.models.embed_content(
2986 |         model="gemini-embedding-001",
2987 |         contents= [
2988 |             "What is the meaning of life?",
2989 |             "What is the purpose of existence?",
2990 |             "How do I bake a cake?"
2991 |         ])
2992 | 
2993 | for embedding in result.embeddings:
2994 |     print(embedding)
2995 | 
2996 | 
2997 | 
2998 | Specify task type to improve performance
2999 | You can use embeddings for a wide range of tasks from classification to document search. Specifying the right task type helps optimize the embeddings for the intended relationships, maximizing accuracy and efficiency. For a complete list of supported task types, see the Supported task types table.
3000 | 
3001 | The following example shows how you can use SEMANTIC_SIMILARITY to check how similar in meaning strings of texts are.
3002 | 
3003 | 
3004 | from google import genai
3005 | from google.genai import types
3006 | import numpy as np
3007 | from sklearn.metrics.pairwise import cosine_similarity
3008 | 
3009 | client = genai.Client()
3010 | 
3011 | texts = [
3012 |     "What is the meaning of life?",
3013 |     "What is the purpose of existence?",
3014 |     "How do I bake a cake?"]
3015 | 
3016 | result = [
3017 |     np.array(e.values) for e in client.models.embed_content(
3018 |         model="gemini-embedding-001",
3019 |         contents=texts, 
3020 |         config=types.EmbedContentConfig(task_type="SEMANTIC_SIMILARITY")).embeddings
3021 | ]
3022 | 
3023 | # Calculate cosine similarity. Higher scores = greater semantic similarity.
3024 | 
3025 | embeddings_matrix = np.array(result)
3026 | similarity_matrix = cosine_similarity(embeddings_matrix)
3027 | 
3028 | for i, text1 in enumerate(texts):
3029 |     for j in range(i + 1, len(texts)):
3030 |         text2 = texts[j]
3031 |         similarity = similarity_matrix[i, j]
3032 |         print(f"Similarity between '{text1}' and '{text2}': {similarity:.4f}")
3033 | 
3034 | 
3035 | 
3036 | 
3037 | Supported task types
3038 | Task type	Description	Examples
3039 | SEMANTIC_SIMILARITY	Embeddings optimized to assess text similarity.	Recommendation systems, duplicate detection
3040 | CLASSIFICATION	Embeddings optimized to classify texts according to preset labels.	Sentiment analysis, spam detection
3041 | CLUSTERING	Embeddings optimized to cluster texts based on their similarities.	Document organization, market research, anomaly detection
3042 | RETRIEVAL_DOCUMENT	Embeddings optimized for document search.	Indexing articles, books, or web pages for search.
3043 | RETRIEVAL_QUERY	Embeddings optimized for general search queries. Use RETRIEVAL_QUERY for queries; RETRIEVAL_DOCUMENT for documents to be retrieved.	Custom search
3044 | CODE_RETRIEVAL_QUERY	Embeddings optimized for retrieval of code blocks based on natural language queries. Use CODE_RETRIEVAL_QUERY for queries; RETRIEVAL_DOCUMENT for code blocks to be retrieved.	Code suggestions and search
3045 | QUESTION_ANSWERING	Embeddings for questions in a question-answering system, optimized for finding documents that answer the question. Use QUESTION_ANSWERING for questions; RETRIEVAL_DOCUMENT for documents to be retrieved.	Chatbox
3046 | FACT_VERIFICATION	Embeddings for statements that need to be verified, optimized for retrieving documents that contain evidence supporting or refuting the statement. Use FACT_VERIFICATION for the target text; RETRIEVAL_DOCUMENT for documents to be retrieved	Automated fact-checking systems
3047 | Controlling Embedding Size
3048 | The Gemini embedding model, gemini-embedding-001, is trained using the Matryoshka Representation Learning (MRL) technique which teaches a model to learn high-dimensional embeddings that have initial segments (or prefixes) which are also useful, simpler versions of the same data. You can choose to use the full 3072-dimensional embedding, or you can truncate it to a smaller size without losing quality to save storage space. For best quality, we recommend using the first 768 and 1536.
3049 | 
3050 | By using the output_dimensionality parameter, users can control the size of the output embedding vector. Selecting a smaller output dimensionality can save storage space and increase computational efficiency for downstream applications, while sacrificing little in terms of quality.
3051 | 
3052 | 
3053 | from google import genai
3054 | from google.genai import types
3055 | 
3056 | client = genai.Client()
3057 | 
3058 | result = client.models.embed_content(
3059 |     model="gemini-embedding-001",
3060 |     contents="What is the meaning of life?",
3061 |     config=types.EmbedContentConfig(output_dimensionality=768)
3062 | )
3063 | 
3064 | [embedding_obj] = result.embeddings
3065 | embedding_length = len(embedding_obj.values)
3066 | 
3067 | print(f"Length of embedding: {embedding_length}")
3068 | 
3069 | 
3070 | Safety settings
3071 | 
3072 | The Gemini API provides safety settings that you can adjust during the prototyping stage to determine if your application requires more or less restrictive safety configuration. You can adjust these settings across five filter categories to restrict or allow certain types of content.
3073 | 
3074 | This guide covers how the Gemini API handles safety settings and filtering and how you can change the safety settings for your application.
3075 | 
3076 | Note: Applications that use less restrictive safety settings may be subject to review. See the Terms of Service for more information.
3077 | Safety filters
3078 | The Gemini API's adjustable safety filters cover the following categories:
3079 | 
3080 | Category	Description
3081 | Harassment	Negative or harmful comments targeting identity and/or protected attributes.
3082 | Hate speech	Content that is rude, disrespectful, or profane.
3083 | Sexually explicit	Contains references to sexual acts or other lewd content.
3084 | Dangerous	Promotes, facilitates, or encourages harmful acts.
3085 | Civic integrity	Election-related queries.
3086 | These categories are defined in HarmCategory. The Gemini models only support HARM_CATEGORY_HARASSMENT, HARM_CATEGORY_HATE_SPEECH, HARM_CATEGORY_SEXUALLY_EXPLICIT, HARM_CATEGORY_DANGEROUS_CONTENT, and HARM_CATEGORY_CIVIC_INTEGRITY. All other categories are used only by PaLM 2 (Legacy) models.
3087 | You can use these filters to adjust what's appropriate for your use case. For example, if you're building video game dialogue, you may deem it acceptable to allow more content that's rated as Dangerous due to the nature of the game.
3088 | 
3089 | In addition to the adjustable safety filters, the Gemini API has built-in protections against core harms, such as content that endangers child safety. These types of harm are always blocked and cannot be adjusted.
3090 | 
3091 | Content safety filtering level
3092 | The Gemini API categorizes the probability level of content being unsafe as HIGH, MEDIUM, LOW, or NEGLIGIBLE.
3093 | 
3094 | The Gemini API blocks content based on the probability of content being unsafe and not the severity. This is important to consider because some content can have low probability of being unsafe even though the severity of harm could still be high. For example, comparing the sentences:
3095 | 
3096 | The robot punched me.
3097 | The robot slashed me up.
3098 | The first sentence might result in a higher probability of being unsafe, but you might consider the second sentence to be a higher severity in terms of violence. Given this, it is important that you carefully test and consider what the appropriate level of blocking is needed to support your key use cases while minimizing harm to end users.
3099 | 
3100 | Safety filtering per request
3101 | You can adjust the safety settings for each request you make to the API. When you make a request, the content is analyzed and assigned a safety rating. The safety rating includes the category and the probability of the harm classification. For example, if the content was blocked due to the harassment category having a high probability, the safety rating returned would have category equal to HARASSMENT and harm probability set to HIGH.
3102 | 
3103 | By default, safety settings block content (including prompts) with medium or higher probability of being unsafe across any filter. This baseline safety is designed to work for most use cases, so you should only adjust your safety settings if it's consistently required for your application.
3104 | 
3105 | The following table describes the block settings you can adjust for each category. For example, if you set the block setting to Block few for the Hate speech category, everything that has a high probability of being hate speech content is blocked. But anything with a lower probability is allowed.
3106 | 
3107 | Threshold (Google AI Studio)	Threshold (API)	Description
3108 | Block none	BLOCK_NONE	Always show regardless of probability of unsafe content
3109 | Block few	BLOCK_ONLY_HIGH	Block when high probability of unsafe content
3110 | Block some	BLOCK_MEDIUM_AND_ABOVE	Block when medium or high probability of unsafe content
3111 | Block most	BLOCK_LOW_AND_ABOVE	Block when low, medium or high probability of unsafe content
3112 | N/A	HARM_BLOCK_THRESHOLD_UNSPECIFIED	Threshold is unspecified, block using default threshold
3113 | If the threshold is not set, the default block threshold is Block none (for gemini-1.5-pro-002 and gemini-1.5-flash-002 and all newer stable GA models) or Block some (in all other models) for all categories except the Civic integrity category.
3114 | 
3115 | The default block threshold for the Civic integrity category is Block none (for gemini-2.0-flash-001 aliased as gemini-2.0-flash, gemini-2.0-pro-exp-02-05, and gemini-2.0-flash-lite) both for Google AI Studio and the Gemini API, and Block most for all other models in Google AI Studio only.
3116 | 
3117 | You can set these settings for each request you make to the generative service. See the HarmBlockThreshold API reference for details.
3118 | 
3119 | Safety feedback
3120 | generateContent returns a GenerateContentResponse which includes safety feedback.
3121 | 
3122 | Prompt feedback is included in promptFeedback. If promptFeedback.blockReason is set, then the content of the prompt was blocked.
3123 | 
3124 | Response candidate feedback is included in Candidate.finishReason and Candidate.safetyRatings. If response content was blocked and the finishReason was SAFETY, you can inspect safetyRatings for more details. The content that was blocked is not returned.
3125 | 
3126 | 
3127 | 
3128 | 
3129 | 
3130 | 
3131 | 
3132 | ReAct agent from scratch with Gemini 2.5 and LangGraph
3133 | 
3134 | LangGraph is a framework for building stateful LLM applications, making it a good choice for constructing ReAct (Reasoning and Acting) Agents.
3135 | 
3136 | ReAct agents combine LLM reasoning with action execution. They iteratively think, use tools, and act on observations to achieve user goals, dynamically adapting their approach. Introduced in "ReAct: Synergizing Reasoning and Acting in Language Models" (2023), this pattern tries to mirror human-like, flexible problem-solving over rigid workflows.
3137 | 
3138 | While LangGraph offers a prebuilt ReAct agent (create_react_agent), it shines when you need more control and customization for your ReAct implementations.
3139 | 
3140 | LangGraph models agents as graphs using three key components:
3141 | 
3142 | State: Shared data structure (typically TypedDict or Pydantic BaseModel) representing the application's current snapshot.
3143 | Nodes: Encodes logic of your agents. They receive the current State as input, perform some computation or side-effect, and return an updated State, such as LLM calls or tool calls.
3144 | Edges: Define the next Node to execute based on the current State, allowing for conditional logic and fixed transitions.
3145 | If you don't have an API Key yet, you can get one for free at the Google AI Studio.
3146 | 
3147 | 
3148 | pip install langgraph langchain-google-genai geopy requests
3149 | Set your API key in the environment variable GEMINI_API_KEY.
3150 | 
3151 | 
3152 | import os
3153 | 
3154 | # Read your API key from the environment variable or set it manually
3155 | api_key = os.getenv("GEMINI_API_KEY")
3156 | To better understand how to implement a ReAct agent using LangGraph, let's walk through a practical example. You will create a simple agent whose goal is to use a tool to find the current weather for a specified location.
3157 | 
3158 | For this weather agent, its State will need to maintain the ongoing conversation history (as a list of messages) and a counter for the number of steps taken to further illustrate state management.
3159 | 
3160 | LangGraph provides a convenient helper, add_messages, for updating message lists in the state. It functions as a reducer, meaning it takes the current list and new messages, then returns a combined list. It smartly handles updates by message ID and defaults to an "append-only" behavior for new, unique messages.
3161 | 
3162 | Note: Since having a list of messages in the state is so common, there exists a prebuilt state called MessagesState which makes it easy to use messages.
3163 | 
3164 | from typing import Annotated,Sequence, TypedDict
3165 | 
3166 | from langchain_core.messages import BaseMessage
3167 | from langgraph.graph.message import add_messages # helper function to add messages to the state
3168 | 
3169 | 
3170 | class AgentState(TypedDict):
3171 |     """The state of the agent."""
3172 |     messages: Annotated[Sequence[BaseMessage], add_messages]
3173 |     number_of_steps: int
3174 | Next, you define your weather tool.
3175 | 
3176 | 
3177 | from langchain_core.tools import tool
3178 | from geopy.geocoders import Nominatim
3179 | from pydantic import BaseModel, Field
3180 | import requests
3181 | 
3182 | geolocator = Nominatim(user_agent="weather-app")
3183 | 
3184 | class SearchInput(BaseModel):
3185 |     location:str = Field(description="The city and state, e.g., San Francisco")
3186 |     date:str = Field(description="the forecasting date for when to get the weather format (yyyy-mm-dd)")
3187 | 
3188 | @tool("get_weather_forecast", args_schema=SearchInput, return_direct=True)
3189 | def get_weather_forecast(location: str, date: str):
3190 |     """Retrieves the weather using Open-Meteo API for a given location (city) and a date (yyyy-mm-dd). Returns a list dictionary with the time and temperature for each hour."""
3191 |     location = geolocator.geocode(location)
3192 |     if location:
3193 |         try:
3194 |             response = requests.get(f"https://api.open-meteo.com/v1/forecast?latitude={location.latitude}&longitude={location.longitude}&hourly=temperature_2m&start_date={date}&end_date={date}")
3195 |             data = response.json()
3196 |             return {time: temp for time, temp in zip(data["hourly"]["time"], data["hourly"]["temperature_2m"])}
3197 |         except Exception as e:
3198 |             return {"error": str(e)}
3199 |     else:
3200 |         return {"error": "Location not found"}
3201 | 
3202 | tools = [get_weather_forecast]
3203 | Next, you initialize your model and bind the tools to the model.
3204 | 
3205 | 
3206 | from datetime import datetime
3207 | from langchain_google_genai import ChatGoogleGenerativeAI
3208 | 
3209 | # Create LLM class
3210 | llm = ChatGoogleGenerativeAI(
3211 |     model= "gemini-2.5-pro",
3212 |     temperature=1.0,
3213 |     max_retries=2,
3214 |     google_api_key=api_key,
3215 | )
3216 | 
3217 | # Bind tools to the model
3218 | model = llm.bind_tools([get_weather_forecast])
3219 | 
3220 | # Test the model with tools
3221 | res=model.invoke(f"What is the weather in Berlin on {datetime.today()}?")
3222 | 
3223 | print(res)
3224 | The last step before you can run your agent is to define your nodes and edges. In this example, you have two nodes and one edge. - call_tool node that executes your tool method. LangGraph has a prebuilt node for this called ToolNode. - call_model node that uses the model_with_tools to call the model. - should_continue edge that decides whether to call the tool or the model.
3225 | 
3226 | The number of nodes and edges is not fixed. You can add as many nodes and edges as you want to your graph. For example, you could add a node for adding structured output or a self-verification/reflection node to check the model output before calling the tool or the model.
3227 | 
3228 | 
3229 | from langchain_core.messages import ToolMessage
3230 | from langchain_core.runnables import RunnableConfig
3231 | 
3232 | tools_by_name = {tool.name: tool for tool in tools}
3233 | 
3234 | # Define our tool node
3235 | def call_tool(state: AgentState):
3236 |     outputs = []
3237 |     # Iterate over the tool calls in the last message
3238 |     for tool_call in state["messages"][-1].tool_calls:
3239 |         # Get the tool by name
3240 |         tool_result = tools_by_name[tool_call["name"]].invoke(tool_call["args"])
3241 |         outputs.append(
3242 |             ToolMessage(
3243 |                 content=tool_result,
3244 |                 name=tool_call["name"],
3245 |                 tool_call_id=tool_call["id"],
3246 |             )
3247 |         )
3248 |     return {"messages": outputs}
3249 | 
3250 | def call_model(
3251 |     state: AgentState,
3252 |     config: RunnableConfig,
3253 | ):
3254 |     # Invoke the model with the system prompt and the messages
3255 |     response = model.invoke(state["messages"], config)
3256 |     # We return a list, because this will get added to the existing messages state using the add_messages reducer
3257 |     return {"messages": [response]}
3258 | 
3259 | 
3260 | # Define the conditional edge that determines whether to continue or not
3261 | def should_continue(state: AgentState):
3262 |     messages = state["messages"]
3263 |     # If the last message is not a tool call, then we finish
3264 |     if not messages[-1].tool_calls:
3265 |         return "end"
3266 |     # default to continue
3267 |     return "continue"
3268 | Now you have all the components to build your agent. Let's put them together.
3269 | 
3270 | 
3271 | from langgraph.graph import StateGraph, END
3272 | 
3273 | # Define a new graph with our state
3274 | workflow = StateGraph(AgentState)
3275 | 
3276 | # 1. Add our nodes 
3277 | workflow.add_node("llm", call_model)
3278 | workflow.add_node("tools",  call_tool)
3279 | # 2. Set the entrypoint as `agent`, this is the first node called
3280 | workflow.set_entry_point("llm")
3281 | # 3. Add a conditional edge after the `llm` node is called.
3282 | workflow.add_conditional_edges(
3283 |     # Edge is used after the `llm` node is called.
3284 |     "llm",
3285 |     # The function that will determine which node is called next.
3286 |     should_continue,
3287 |     # Mapping for where to go next, keys are strings from the function return, and the values are other nodes.
3288 |     # END is a special node marking that the graph is finish.
3289 |     {
3290 |         # If `tools`, then we call the tool node.
3291 |         "continue": "tools",
3292 |         # Otherwise we finish.
3293 |         "end": END,
3294 |     },
3295 | )
3296 | # 4. Add a normal edge after `tools` is called, `llm` node is called next.
3297 | workflow.add_edge("tools", "llm")
3298 | 
3299 | # Now we can compile and visualize our graph
3300 | graph = workflow.compile()
3301 | You can visualize your graph using the draw_mermaid_png method.
3302 | 
3303 | 
3304 | from IPython.display import Image, display
3305 | 
3306 | display(Image(graph.get_graph().draw_mermaid_png()))
3307 | png
3308 | 
3309 | Now let's run the agent.
3310 | 
3311 | 
3312 | from datetime import datetime
3313 | # Create our initial message dictionary
3314 | inputs = {"messages": [("user", f"What is the weather in Berlin on {datetime.today()}?")]}
3315 | 
3316 | # call our graph with streaming to see the steps
3317 | for state in graph.stream(inputs, stream_mode="values"):
3318 |     last_message = state["messages"][-1]
3319 |     last_message.pretty_print()
3320 | You can now continue with your conversation and for example ask for the weather in another city or let it compare it.
3321 | 
3322 | 
3323 | state["messages"].append(("user", "Would it be in Munich warmer?"))
3324 | 
3325 | for state in graph.stream(state, stream_mode="values"):
3326 |     last_message = state["messages"][-1]
3327 |     last_message.pretty_print()
3328 | 
3329 | 
3330 | 
3331 | Customer Support Analysis with Gemini 2.5 Pro and CrewAI
3332 | 
3333 | CrewAI is a framework for orchestrating autonomous AI agents that collaborate to achieve complex goals. It lets you define agents by specifying roles, goals, and backstories, and then define tasks for them.
3334 | 
3335 | This example demonstrates how to build a multi-agent system for analyzing customer support data to identify issues and propose process improvements using Gemini 2.5 Pro, generating a report intended to be read by a Chief Operating Officer (COO).
3336 | 
3337 | The guide will show you how to create a "crew" of AI agents that can do the following tasks:
3338 | 
3339 | Fetch and analyze customer support data (simulated in this example).
3340 | Identify recurring problems and process bottlenecks.
3341 | Suggest actionable improvements.
3342 | Compile the findings into a concise report suitable for a COO.
3343 | You need a Gemini API key. If you don't already have one, you can get one in Google AI Studio.
3344 | 
3345 | 
3346 | pip install "crewai[tools]"
3347 | Set your Gemini API key as an environment variable named GEMINI_API_KEY, then configure CrewAI to use the Gemini 2.5 Pro model.
3348 | 
3349 | 
3350 | import os
3351 | from crewai import LLM
3352 | 
3353 | # Read your API key from the environment variable
3354 | gemini_api_key = os.getenv("GEMINI_API_KEY")
3355 | 
3356 | # Use Gemini 2.5 Pro Experimental model
3357 | gemini_llm = LLM(
3358 |     model='gemini/gemini-2.5-pro',
3359 |     api_key=gemini_api_key,
3360 |     temperature=0.0  # Lower temperature for more consistent results.
3361 | )
3362 | Define components
3363 | CrewAI applications are built using Tools, Agents, Tasks, and the Crew itself. Each of these is explained in the following sections.
3364 | 
3365 | Tools
3366 | Tools are capabilities that agents can use to interact with the outside world or perform specific actions. Here, you define a placeholder tool to simulate fetching customer support data. In a real application, you would connect to a database, API or file system. For more information on tools, see the CrewAI tools guide.
3367 | 
3368 | 
3369 | from crewai.tools import BaseTool
3370 | 
3371 | # Placeholder tool for fetching customer support data
3372 | class CustomerSupportDataTool(BaseTool):
3373 |     name: str = "Customer Support Data Fetcher"
3374 |     description: str = (
3375 |       "Fetches recent customer support interactions, tickets, and feedback. "
3376 |       "Returns a summary string.")
3377 | 
3378 |     def _run(self, argument: str) -> str:
3379 |         # In a real scenario, this would query a database or API.
3380 |         # For this example, return simulated data.
3381 |         print(f"--- Fetching data for query: {argument} ---")
3382 |         return (
3383 |             """Recent Support Data Summary:
3384 | - 50 tickets related to 'login issues'. High resolution time (avg 48h).
3385 | - 30 tickets about 'billing discrepancies'. Mostly resolved within 12h.
3386 | - 20 tickets on 'feature requests'. Often closed without resolution.
3387 | - Frequent feedback mentions 'confusing user interface' for password reset.
3388 | - High volume of calls related to 'account verification process'.
3389 | - Sentiment analysis shows growing frustration with 'login issues' resolution time.
3390 | - Support agent notes indicate difficulty reproducing 'login issues'."""
3391 |         )
3392 | 
3393 | support_data_tool = CustomerSupportDataTool()
3394 | Agents
3395 | Agents are the individual AI workers in your crew. Each agent has a specific role, goal, backstory, assigned llm, and optional tools. For more information on agents, see the CrewAI agents guide.
3396 | 
3397 | 
3398 | from crewai import Agent
3399 | 
3400 | # Agent 1: Data analyst
3401 | data_analyst = Agent(
3402 |     role='Customer Support Data Analyst',
3403 |     goal='Analyze customer support data to identify trends, recurring issues, and key pain points.',
3404 |     backstory=(
3405 |         """You are an expert data analyst specializing in customer support operations.
3406 |         Your strength lies in identifying patterns and quantifying problems from raw support data."""
3407 |     ),
3408 |     verbose=True,
3409 |     allow_delegation=False,  # This agent focuses on its specific task
3410 |     tools=[support_data_tool],  # Assign the data fetching tool
3411 |     llm=gemini_llm  # Use the configured Gemini LLM
3412 | )
3413 | 
3414 | # Agent 2: Process optimizer
3415 | process_optimizer = Agent(
3416 |     role='Process Optimization Specialist',
3417 |     goal='Identify bottlenecks and inefficiencies in current support processes based on the data analysis. Propose actionable improvements.',
3418 |     backstory=(
3419 |         """You are a specialist in optimizing business processes, particularly in customer support.
3420 |         You excel at pinpointing root causes of delays and inefficiencies and suggesting concrete solutions."""
3421 |     ),
3422 |     verbose=True,
3423 |     allow_delegation=False,
3424 |     # No tools needed, this agent relies on the context provided by data_analyst.
3425 |     llm=gemini_llm
3426 | )
3427 | 
3428 | # Agent 3: Report writer
3429 | report_writer = Agent(
3430 |     role='Executive Report Writer',
3431 |     goal='Compile the analysis and improvement suggestions into a concise, clear, and actionable report for the COO.',
3432 |     backstory=(
3433 |         """You are a skilled writer adept at creating executive summaries and reports.
3434 |         You focus on clarity, conciseness, and highlighting the most critical information and recommendations for senior leadership."""
3435 |     ),
3436 |     verbose=True,
3437 |     allow_delegation=False,
3438 |     llm=gemini_llm
3439 | )
3440 | Tasks
3441 | Tasks define the specific assignments for the agents. Each task has a description, expected_output, and is assigned to an agent. Tasks are run sequentially by default and include the context of the previous task. For more information on tasks, see the CrewAI tasks guide.
3442 | 
3443 | 
3444 | from crewai import Task
3445 | 
3446 | # Task 1: Analyze data
3447 | analysis_task = Task(
3448 |     description=(
3449 |         """Fetch and analyze the latest customer support interaction data (tickets, feedback, call logs)
3450 |         focusing on the last quarter. Identify the top 3-5 recurring issues, quantify their frequency
3451 |         and impact (e.g., resolution time, customer sentiment). Use the Customer Support Data Fetcher tool."""
3452 |     ),
3453 |     expected_output=(
3454 |         """A summary report detailing the key findings from the customer support data analysis, including:
3455 | - Top 3-5 recurring issues with frequency.
3456 | - Average resolution times for these issues.
3457 | - Key customer pain points mentioned in feedback.
3458 | - Any notable trends in sentiment or support agent observations."""
3459 |     ),
3460 |     agent=data_analyst  # Assign task to the data_analyst agent
3461 | )
3462 | 
3463 | # Task 2: Identify bottlenecks and suggest improvements
3464 | optimization_task = Task(
3465 |     description=(
3466 |         """Based on the data analysis report provided by the Data Analyst, identify the primary bottlenecks
3467 |         in the support processes contributing to the identified issues (especially the top recurring ones).
3468 |         Propose 2-3 concrete, actionable process improvements to address these bottlenecks.
3469 |         Consider potential impact and ease of implementation."""
3470 |     ),
3471 |     expected_output=(
3472 |         """A concise list identifying the main process bottlenecks (e.g., lack of documentation for agents,
3473 |         complex escalation path, UI issues) linked to the key problems.
3474 | A list of 2-3 specific, actionable recommendations for process improvement
3475 | (e.g., update agent knowledge base, simplify password reset UI, implement proactive monitoring)."""
3476 |     ),
3477 |     agent=process_optimizer  # Assign task to the process_optimizer agent
3478 |     # This task implicitly uses the output of analysis_task as context
3479 | )
3480 | 
3481 | # Task 3: Compile COO report
3482 | report_task = Task(
3483 |     description=(
3484 |         """Compile the findings from the Data Analyst and the recommendations from the Process Optimization Specialist
3485 |         into a single, concise executive report for the COO. The report should clearly state:
3486 | 1. The most critical customer support issues identified (with brief data points).
3487 | 2. The key process bottlenecks causing these issues.
3488 | 3. The recommended process improvements.
3489 | Ensure the report is easy to understand, focuses on actionable insights, and is formatted professionally."""
3490 |     ),
3491 |     expected_output=(
3492 |         """A well-structured executive report (max 1 page) summarizing the critical support issues,
3493 |         underlying process bottlenecks, and clear, actionable recommendations for the COO.
3494 |         Use clear headings and bullet points."""
3495 |     ),
3496 |     agent=report_writer  # Assign task to the report_writer agent
3497 | )
3498 | Crew
3499 | The Crew brings the agents and tasks together, defining the workflow process (such as "sequential").
3500 | 
3501 | 
3502 | from crewai import Crew, Process
3503 | 
3504 | # Define the crew with agents, tasks, and process
3505 | support_analysis_crew = Crew(
3506 |     agents=[data_analyst, process_optimizer, report_writer],
3507 |     tasks=[analysis_task, optimization_task, report_task],
3508 |     process=Process.sequential,  # Tasks will run sequentially in the order defined
3509 |     verbose=True
3510 | )
3511 | Run the Crew
3512 | Finally, kick off the crew execution with any necessary inputs.
3513 | 
3514 | 
3515 | # Start the crew's work
3516 | print("--- Starting Customer Support Analysis Crew ---")
3517 | # The 'inputs' dictionary provides initial context if needed by the first task.
3518 | # In this case, the tool simulates data fetching regardless of the input.
3519 | result = support_analysis_crew.kickoff(inputs={'data_query': 'last quarter support data'})
3520 | 
3521 | print("--- Crew Execution Finished ---")
3522 | print("--- Final Report for COO ---")
3523 | print(result)
3524 | The script will now execute. The Data Analyst will use the tool, the Process Optimizer will analyze the findings, and the Report Writer will compile the final report, which is then printed to the console. The verbose=True setting will show the detailed thought process and actions of each agent
3525 | 
3526 | 
3527 | 
3528 | 
3529 | 
3530 | 
3531 | Research Agent with Gemini 2.5 Pro and LlamaIndex
3532 | 
3533 | LlamaIndex is a framework for building knowledge agents using LLMs connected to your data. This example shows you how to build a multi-agent workflow for a Research Agent. In LlamaIndex, Workflows are the building blocks of agent or multi-agent systems.
3534 | 
3535 | You need a Gemini API key. If you don't already have one, you can get one in Google AI Studio. First, install all reuqired LlamaIndex libraries. LlamaIndex uses the google-genai package under the hood.
3536 | 
3537 | 
3538 | pip install llama-index llama-index-utils-workflow llama-index-llms-google-genai llama-index-tools-google
3539 | Set up Gemini 2.5 Pro in LlamaIndex
3540 | The engine of any LlamaIndex agent is an LLM that handles reasoning and text processing. This example uses Gemini 2.5 Pro. Make sure you set your API key as an environment variable.
3541 | 
3542 | 
3543 | from llama_index.llms.google_genai import GoogleGenAI
3544 | 
3545 | llm = GoogleGenAI(model="gemini-2.5-pro")
3546 | Build tools
3547 | Agents use tools to interact with the outside world, like searching the web or storing information. Tools in LlamaIndex can be regular Python functions, or imported from pre-existing ToolSpecs. Gemini comes with a built-in tool for using Google Search which is used here.
3548 | 
3549 | 
3550 | from google.genai import types
3551 | 
3552 | google_search_tool = types.Tool(
3553 |     google_search=types.GoogleSearch()
3554 | )
3555 | 
3556 | llm_with_search = GoogleGenAI(
3557 |     model="gemini-2.5-pro",
3558 |     generation_config=types.GenerateContentConfig(tools=[google_search_tool])
3559 | )
3560 | Now test the LLM instance with a query that requires search:
3561 | 
3562 | 
3563 | response = llm_with_search.complete("What's the weather like today in Biarritz?")
3564 | print(response)
3565 | The Research Agent will use Python functions as tools. There are a lot of ways you could go about building a system to perform this task. In this example, you will use the following:
3566 | 
3567 | search_web uses Gemini with Google Search to search the web for information on the given topic.
3568 | record_notes saves research found on the web to the state so that the other tools can use it.
3569 | write_report writes the report using the information found by the ResearchAgent
3570 | review_report reviews the report and provides feedback.
3571 | The Context class passes the state between agents/tools, and each agent will have access to the current state of the system.
3572 | 
3573 | 
3574 | from llama_index.core.workflow import Context
3575 | 
3576 | async def search_web(ctx: Context, query: str) -> str:
3577 |     """Useful for searching the web about a specific query or topic"""
3578 |     response = await llm_with_search.acomplete(f"""Please research given this query or topic,
3579 |     and return the result\n<query_or_topic>{query}</query_or_topic>""")
3580 |     return response
3581 | 
3582 | async def record_notes(ctx: Context, notes: str, notes_title: str) -> str:
3583 |     """Useful for recording notes on a given topic."""
3584 |     current_state = await ctx.store.get("state")
3585 |     if "research_notes" not in current_state:
3586 |         current_state["research_notes"] = {}
3587 |     current_state["research_notes"][notes_title] = notes
3588 |     await ctx.store.set("state", current_state)
3589 |     return "Notes recorded."
3590 | 
3591 | async def write_report(ctx: Context, report_content: str) -> str:
3592 |     """Useful for writing a report on a given topic."""
3593 |     current_state = await ctx.store.get("state")
3594 |     current_state["report_content"] = report_content
3595 |     await ctx.store.set("state", current_state)
3596 |     return "Report written."
3597 | 
3598 | async def review_report(ctx: Context, review: str) -> str:
3599 |     """Useful for reviewing a report and providing feedback."""
3600 |     current_state = await ctx.store.get("state")
3601 |     current_state["review"] = review
3602 |     await ctx.store.set("state", current_state)
3603 |     return "Report reviewed."
3604 | Build a multi-agent assistant
3605 | To build a multi-agent system, you define the agents and their interactions. Your system will have three agents:
3606 | 
3607 | A ResearchAgent searches the web for information on the given topic.
3608 | A WriteAgent writes the report using the information found by the ResearchAgent.
3609 | A ReviewAgent reviews the report and provides feedback.
3610 | This example uses the AgentWorkflow class to create a multi-agent system that will execute these agents in order. Each agent takes a system_prompt that tells it what it should do, and suggests how to work with the other agents.
3611 | 
3612 | Optionally, you can help your multi-agent system by specifying which other agents it can talk to using can_handoff_to (if not, it will try to figure this out on its own).
3613 | 
3614 | 
3615 | from llama_index.core.agent.workflow import (
3616 |     AgentInput,
3617 |     AgentOutput,
3618 |     ToolCall,
3619 |     ToolCallResult,
3620 |     AgentStream,
3621 | )
3622 | from llama_index.core.agent.workflow import FunctionAgent, ReActAgent
3623 | 
3624 | research_agent = FunctionAgent(
3625 |     name="ResearchAgent",
3626 |     description="Useful for searching the web for information on a given topic and recording notes on the topic.",
3627 |     system_prompt=(
3628 |         "You are the ResearchAgent that can search the web for information on a given topic and record notes on the topic. "
3629 |         "Once notes are recorded and you are satisfied, you should hand off control to the WriteAgent to write a report on the topic."
3630 |     ),
3631 |     llm=llm,
3632 |     tools=[search_web, record_notes],
3633 |     can_handoff_to=["WriteAgent"],
3634 | )
3635 | 
3636 | write_agent = FunctionAgent(
3637 |     name="WriteAgent",
3638 |     description="Useful for writing a report on a given topic.",
3639 |     system_prompt=(
3640 |         "You are the WriteAgent that can write a report on a given topic. "
3641 |         "Your report should be in a markdown format. The content should be grounded in the research notes. "
3642 |         "Once the report is written, you should get feedback at least once from the ReviewAgent."
3643 |     ),
3644 |     llm=llm,
3645 |     tools=[write_report],
3646 |     can_handoff_to=["ReviewAgent", "ResearchAgent"],
3647 | )
3648 | 
3649 | review_agent = FunctionAgent(
3650 |     name="ReviewAgent",
3651 |     description="Useful for reviewing a report and providing feedback.",
3652 |     system_prompt=(
3653 |         "You are the ReviewAgent that can review a report and provide feedback. "
3654 |         "Your feedback should either approve the current report or request changes for the WriteAgent to implement."
3655 |     ),
3656 |     llm=llm,
3657 |     tools=[review_report],
3658 |     can_handoff_to=["ResearchAgent","WriteAgent"],
3659 | )
3660 | The Agents are defined, now you can create the AgentWorkflow and run it.
3661 | 
3662 | 
3663 | from llama_index.core.agent.workflow import AgentWorkflow
3664 | 
3665 | agent_workflow = AgentWorkflow(
3666 |     agents=[research_agent, write_agent, review_agent],
3667 |     root_agent=research_agent.name,
3668 |     initial_state={
3669 |         "research_notes": {},
3670 |         "report_content": "Not written yet.",
3671 |         "review": "Review required.",
3672 |     },
3673 | )
3674 | During execution of the workflow, you can stream events, tool calls and updates to the console.
3675 | 
3676 | 
3677 | from llama_index.core.agent.workflow import (
3678 |     AgentInput,
3679 |     AgentOutput,
3680 |     ToolCall,
3681 |     ToolCallResult,
3682 |     AgentStream,
3683 | )
3684 | 
3685 | research_topic = """Write me a report on the history of the web.
3686 | Briefly describe the history of the world wide web, including
3687 | the development of the internet and the development of the web,
3688 | including 21st century developments"""
3689 | 
3690 | handler = agent_workflow.run(
3691 |     user_msg=research_topic
3692 | )
3693 | 
3694 | current_agent = None
3695 | current_tool_calls = ""
3696 | async for event in handler.stream_events():
3697 |     if (
3698 |         hasattr(event, "current_agent_name")
3699 |         and event.current_agent_name != current_agent
3700 |     ):
3701 |         current_agent = event.current_agent_name
3702 |         print(f"\n{'='*50}")
3703 |         print(f"🤖 Agent: {current_agent}")
3704 |         print(f"{'='*50}\n")
3705 |     elif isinstance(event, AgentOutput):
3706 |         if event.response.content:
3707 |             print("📤 Output:", event.response.content)
3708 |         if event.tool_calls:
3709 |             print(
3710 |                 "🛠️  Planning to use tools:",
3711 |                 [call.tool_name for call in event.tool_calls],
3712 |             )
3713 |     elif isinstance(event, ToolCallResult):
3714 |         print(f"🔧 Tool Result ({event.tool_name}):")
3715 |         print(f"  Arguments: {event.tool_kwargs}")
3716 |         print(f"  Output: {event.tool_output}")
3717 |     elif isinstance(event, ToolCall):
3718 |         print(f"🔨 Calling Tool: {event.tool_name}")
3719 |         print(f"  With arguments: {event.tool_kwargs}")
3720 | After the workflow is complete, you can print the final output of the report, as well as the final review state from then review agent.
3721 | 
3722 | 
3723 | state = await handler.ctx.store.get("state")
3724 | print("Report Content:\n", state["report_content"])
3725 | print("\n------------\nFinal Review:\n", state["review"])
3726 | Go further with custom workflows
3727 | The AgentWorkflow is a great way to get started with multi-agent systems. But what if you need more control? You can build a workflow from scratch. Here are some reasons why you might want to build your own workflow:
3728 | 
3729 | More control over the process: You can decide the exact path your agents take. This includes creating loops, making decisions at certain points, or having agents work in parallel on different tasks.
3730 | Use complex data: Go beyond simple text. Custom workflows let you use more structured data, like JSON objects or custom classes, for your inputs and outputs.
3731 | Work with different media: Build agents that can understand and process not just text, but also images, audio, and video.
3732 | Smarter planning: You can design a workflow that first creates a detailed plan before the agents start working. This is useful for complex tasks that require multiple steps.
3733 | Enable self-correction: Create agents that can review their own work. If the output isn't good enough, the agent can try again, creating a loop of improvement until the result is perfect.


--------------------------------------------------------------------------------