├── LICENSE ├── README.md ├── data ├── audio.mp3 └── rewe_invoice.pdf ├── notebooks ├── part-1-text-prompting.ipynb ├── part-2-multimodal-understanding.ipynb └── part-3-thinking-and-tools.ipynb └── solutions ├── solution-part-1-text-prompting.ipynb ├── solution-part-2-multimodal-understanding.ipynb └── solution-part-3-thinking-and-tools.ipynb /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Workshop: Build with Gemini 2 | 3 | This workshop teaches how to build with Gemini using the Gemini API and Python SDK. 4 | 5 | > [!NOTE] 6 | > I recommend first going through the notebooks and exercises in the [notebooks](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/) folder. You'll find the same notebooks but with the solutions in [solutions](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/solutions/). 7 | 8 | **Prerequisites**: You need an API key from [Google AI Studio](https://aistudio.google.com/apikey). Everything can be done on the free tier. 9 | 10 | Course outline: 11 | 12 | - [Part1: Quickstart + Text prompting](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-1-text-prompting.ipynb) 13 | - Text understanding 14 | - Streaming response 15 | - Chats 16 | - System prompts 17 | - Config options 18 | - Long context 19 | - Token usage 20 | - Final excercise: Chat with book 21 | 22 | - [Part 2: Multimodal understanding (image, video, audio, docs, code)](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-2-multimodal-understanding.ipynb) 23 | - Image 24 | - Video 25 | - Audio 26 | - Documents (PDFs) 27 | - Code 28 | - Final excercise: Analyze supermarket invoice 29 | 30 | - [Part 3: Thinking models + agentic capabilities (tool usage)](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-3-thinking-and-tools.ipynb) 31 | - Thinking models 32 | - Structured outputs 33 | - Code execution 34 | - Grounding with Google Search 35 | - Function calling 36 | - Final excercise: Give Gemini access to the PokéAPI to answer Pokémon questions 37 | 38 | **Next steps**: There's even more you can do with Gemini: 39 | 40 | - [Image creation and editing with Gemini 2.0](https://github.com/patrickloeber/genai-tutorials/blob/main/notebooks/gemini-image-editing.ipynb) 41 | - [Live API: Talk to Gemini and share your camera](https://aistudio.google.com/live) & [Live API cookbook](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Get_started_LiveAPI.ipynb) 42 | -------------------------------------------------------------------------------- /data/audio.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/patrickloeber/workshop-build-with-gemini/15058fb734b97a7ec1b41fc9a259f58c339cd9ed/data/audio.mp3 -------------------------------------------------------------------------------- /data/rewe_invoice.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/patrickloeber/workshop-build-with-gemini/15058fb734b97a7ec1b41fc9a259f58c339cd9ed/data/rewe_invoice.pdf -------------------------------------------------------------------------------- /notebooks/part-1-text-prompting.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "XSc7AU66mJSC" 7 | }, 8 | "source": [ 9 | "##### Copyright 2025 Patrick Loeber, Google LLC" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": { 16 | "cellView": "form", 17 | "id": "tc6tjo9vmJSE" 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "\n", 22 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 23 | "# you may not use this file except in compliance with the License.\n", 24 | "# You may obtain a copy of the License at\n", 25 | "#\n", 26 | "# https://www.apache.org/licenses/LICENSE-2.0\n", 27 | "#\n", 28 | "# Unless required by applicable law or agreed to in writing, software\n", 29 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 30 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 31 | "# See the License for the specific language governing permissions and\n", 32 | "# limitations under the License." 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": { 38 | "id": "CuC_VSKMcEt6" 39 | }, 40 | "source": [ 41 | "# Workshop: Build with Gemini (Part 1)\n", 42 | "\n", 43 | "\"Open\n", 44 | "\n", 45 | "This workshop teaches how to build with Gemini using the Gemini API and Python SDK.\n", 46 | "\n", 47 | "Course outline:\n", 48 | "\n", 49 | "- **Part1 (this notebook): Quickstart + Text prompting**\n", 50 | " - Text understanding\n", 51 | " - Streaming response\n", 52 | " - Chats\n", 53 | " - System prompts\n", 54 | " - Config options\n", 55 | " - Long context\n", 56 | " - Token usage\n", 57 | " - Final excercise: Chat with book\n", 58 | "\n", 59 | "- **[Part 2: Multimodal understanding (image, video, audio, docs, code)](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-2-multimodal-understanding.ipynb)**\n", 60 | "\n", 61 | "- **[Part 3: Thinking models + agentic capabilities (tool usage)](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-3-thinking-and-tools.ipynb)**" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": { 67 | "id": "avRVsnMMJvof" 68 | }, 69 | "source": [ 70 | "## 0. Use the Google AI Studio as playground\n", 71 | "\n", 72 | "Explore and play with all models in the [Google AI Studio](https://aistudio.google.com/apikey).\n" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": { 78 | "id": "jnl6q8tMcpwU" 79 | }, 80 | "source": [ 81 | "## 1. Setup\n" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": { 87 | "id": "DD1kaBP4dnZG" 88 | }, 89 | "source": [ 90 | "Get a free API key in the [Google AI Studio](https://aistudio.google.com/apikey)" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": null, 96 | "metadata": { 97 | "id": "j6raUs82eYfk" 98 | }, 99 | "outputs": [], 100 | "source": [ 101 | "from google.colab import userdata\n", 102 | "\n", 103 | "GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')" 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": { 109 | "id": "yKjUEGGzdp87" 110 | }, 111 | "source": [ 112 | "Install the [Google Gen AI Python SDK](https://github.com/googleapis/python-genai)" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": { 119 | "id": "Y4d9NjqNeAXx" 120 | }, 121 | "outputs": [], 122 | "source": [ 123 | "%pip install -q -U google-genai" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": { 129 | "id": "d6b7d1FleDuz" 130 | }, 131 | "source": [ 132 | "Configure Client" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": null, 138 | "metadata": { 139 | "id": "o6Uort3heUqT" 140 | }, 141 | "outputs": [], 142 | "source": [ 143 | "from google import genai\n", 144 | "from google.genai import types\n", 145 | "\n", 146 | "client = genai.Client(api_key=GOOGLE_API_KEY)" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": { 152 | "id": "1P2KmoPSgRxO" 153 | }, 154 | "source": [ 155 | "Configure model. See all [models](https://ai.google.dev/gemini-api/docs/models)" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": { 162 | "id": "0qcgiiP7gO-6" 163 | }, 164 | "outputs": [], 165 | "source": [ 166 | "MODEL = ... # TODO: add model name, \n", 167 | "# info: you'll find the solutions in the `solutions` folder" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": { 173 | "id": "LLsGbeGec8iF" 174 | }, 175 | "source": [ 176 | "## 2. Send your first prompt" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": null, 182 | "metadata": { 183 | "id": "e57RFdZ6dRro" 184 | }, 185 | "outputs": [], 186 | "source": [ 187 | "# TODO: send your first prompt and print it" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "metadata": { 193 | "id": "-rfjqevtmRBO" 194 | }, 195 | "source": [ 196 | "#### **!! Exercise !!**\n", 197 | "- Send a few more prompts\n", 198 | " - Tell Gemini to write a blog post about the transformers architecture\n", 199 | " - Ask Gemini to explain list comprehension in Python\n", 200 | "- Experiment with models:\n", 201 | " - Try Gemini 2.0 Flash-Lite\n", 202 | " - Try Gemini 2.5 Pro Exp" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": null, 208 | "metadata": { 209 | "id": "l4Zj8kiIoRqn" 210 | }, 211 | "outputs": [], 212 | "source": [ 213 | "# TODO: complete exercise" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": { 219 | "id": "vHqnTYJFdSlG" 220 | }, 221 | "source": [ 222 | "## 3. Text understanding" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": { 228 | "id": "WHRVaK0-tCE_" 229 | }, 230 | "source": [ 231 | "The simplest way to generate text is to provide the model with a text-only prompt. `contents` can be a single prompt, a list of prompts, or a combination of multimodal inputs." 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": null, 237 | "metadata": { 238 | "id": "A_HqjSiFsUQ2" 239 | }, 240 | "outputs": [], 241 | "source": [ 242 | "# TODO: send a prompt and provide multiple strings in `contents`" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": { 248 | "id": "itCzXz1BiG5g" 249 | }, 250 | "source": [ 251 | "#### Streaming response\n", 252 | "\n", 253 | "By default, the model returns a response after completing the entire text generation process. You can achieve faster interactions by using streaming to return outputs as they're generated." 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": null, 259 | "metadata": { 260 | "id": "7d6HzwfZdWbt" 261 | }, 262 | "outputs": [], 263 | "source": [ 264 | "# TODO: generate a streaming response" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": { 270 | "id": "LZjfCkzSdcEc" 271 | }, 272 | "source": [ 273 | "#### Chat\n", 274 | "\n", 275 | "The SDK chat class provides an interface to keep track of conversation history. Behind the scenes it uses the same `generate_content` method." 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": null, 281 | "metadata": { 282 | "id": "BCI8O9Ldjn6q" 283 | }, 284 | "outputs": [], 285 | "source": [ 286 | "# TODO: create a chat" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": null, 292 | "metadata": { 293 | "id": "mmfMuI44Kev2" 294 | }, 295 | "outputs": [], 296 | "source": [ 297 | "# TODO: send messages" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": { 303 | "id": "E_MkOG6uLs75" 304 | }, 305 | "source": [ 306 | "#### Parameters\n", 307 | "\n", 308 | "Every prompt you send to the model includes parameters that control how the model generates responses. You can configure these parameters, or let the model use the default options." 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": { 315 | "id": "J_jk93Z-Lum-" 316 | }, 317 | "outputs": [], 318 | "source": [ 319 | "# TODO: send a prompt using the parameters below" 320 | ] 321 | }, 322 | { 323 | "cell_type": "markdown", 324 | "metadata": { 325 | "id": "DPyrJ9ul7yuv" 326 | }, 327 | "source": [ 328 | "- `max_output_tokens`: Sets the maximum number of tokens to include in a candidate.\n", 329 | "- `temperature`: Controls the randomness of the output. Use higher values for more creative responses, and lower values for more deterministic responses. Values can range from [0.0, 2.0].\n", 330 | "- `top_p`: Changes how the model selects tokens for output. Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value.\n", 331 | "- `top_k`: Changes how the model selects tokens for output. A top_k of 1 means the selected token is the most probable among all the tokens in the model's vocabulary, while a top_k of 3 means that the next token is selected from among the 3 most probable using the temperature. Tokens are further filtered based on top_p with the final token selected using temperature sampling.\n", 332 | "- `stop_sequences`: List of strings (up to 5) that tells the model to stop generating text if one of the strings is encountered in the response. If specified, the API will stop at the first appearance of a stop sequence.\n", 333 | "- `seed`: If specified, the model makes a best effort to provide the same response for repeated requests. By default, a random number is used." 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": { 339 | "id": "sG9JgfKF8nvr" 340 | }, 341 | "source": [ 342 | "#### System instructions\n", 343 | "\n", 344 | "System instructions let you steer the behavior of a model based on your specific use case. When you provide system instructions, you give the model additional context to help it understand the task and generate more customized responses. The model should adhere to the system instructions over the full interaction with the user, enabling you to specify product-level behavior separate from the prompts provided by end users." 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": null, 350 | "metadata": { 351 | "id": "CayVOonC8st5" 352 | }, 353 | "outputs": [], 354 | "source": [ 355 | "# TODO: send a prompt with a system instruction" 356 | ] 357 | }, 358 | { 359 | "cell_type": "markdown", 360 | "metadata": { 361 | "id": "kjdRzLbN-ANo" 362 | }, 363 | "source": [ 364 | "#### Long context and token counting\n", 365 | "\n", 366 | "Gemini 2.0 Flash and 2.5 Pro have a 1M token context window.\n", 367 | "\n", 368 | "In practice, 1 million tokens could look like:\n", 369 | "\n", 370 | "- 50,000 lines of code (with the standard 80 characters per line)\n", 371 | "- All the text messages you have sent in the last 5 years\n", 372 | "- 8 average length English novels\n", 373 | "- 1 hour of video data\n", 374 | "\n", 375 | "Let's feed in an entire book and ask questions:\n", 376 | "\n" 377 | ] 378 | }, 379 | { 380 | "cell_type": "code", 381 | "execution_count": null, 382 | "metadata": { 383 | "id": "b6pGhOkj-CFS" 384 | }, 385 | "outputs": [], 386 | "source": [ 387 | "import requests\n", 388 | "res = requests.get(\"https://gutenberg.org/cache/epub/16317/pg16317.txt\")\n", 389 | "book = res.text" 390 | ] 391 | }, 392 | { 393 | "cell_type": "code", 394 | "execution_count": null, 395 | "metadata": { 396 | "id": "C0nnKaKC-NMu" 397 | }, 398 | "outputs": [], 399 | "source": [ 400 | "print(book[:100])" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": null, 406 | "metadata": { 407 | "id": "Ves9N2m-_k-V" 408 | }, 409 | "outputs": [], 410 | "source": [ 411 | "print(f\"# charakters {len(book)}\")\n", 412 | "print(f\"# words {len(book.split())}\")\n", 413 | "print(f\"# tokens: ~{int(len(book.split()) * 4/3)}\") # rule of thumb: 100tokens=75words" 414 | ] 415 | }, 416 | { 417 | "cell_type": "code", 418 | "execution_count": null, 419 | "metadata": { 420 | "id": "6hmtD77wMXdF" 421 | }, 422 | "outputs": [], 423 | "source": [ 424 | "# TODO: send a prompt to summarize the book" 425 | ] 426 | }, 427 | { 428 | "cell_type": "markdown", 429 | "metadata": { 430 | "id": "jt9NUCaexPqy" 431 | }, 432 | "source": [ 433 | "To understand the token usage, you can check `usage_metadata`:" 434 | ] 435 | }, 436 | { 437 | "cell_type": "code", 438 | "execution_count": null, 439 | "metadata": { 440 | "id": "6LAoNQ3Ys-CB" 441 | }, 442 | "outputs": [], 443 | "source": [ 444 | "# TODO: print token usage" 445 | ] 446 | }, 447 | { 448 | "cell_type": "markdown", 449 | "metadata": { 450 | "id": "9jzrjfNDxUhZ" 451 | }, 452 | "source": [ 453 | "You can also use `count_tokens` to check the size of your input prompt(s):" 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": null, 459 | "metadata": { 460 | "id": "EIrVpB-Htc3y" 461 | }, 462 | "outputs": [], 463 | "source": [ 464 | "# TODO: use count_tokens" 465 | ] 466 | }, 467 | { 468 | "cell_type": "markdown", 469 | "metadata": { 470 | "id": "pE7MEKBI18K0" 471 | }, 472 | "source": [ 473 | "## !! Exercise: Chat with a book !!\n", 474 | "\n", 475 | "Task:\n", 476 | "- Create a chat\n", 477 | "- Use a system prompt: `\"You are an expert book reviewer with a witty tone.\"`\n", 478 | "- Use a temperature of `1.5`\n", 479 | "- Ask 1 to summarize the book\n", 480 | "- Ask 1 question to explain more detail about a certain topic from the book\n", 481 | "- Ask to create a social media post based on the book\n", 482 | "- Print the total number of tokens used during the chat" 483 | ] 484 | }, 485 | { 486 | "cell_type": "code", 487 | "execution_count": null, 488 | "metadata": { 489 | "id": "sKL0JNbCzY0P" 490 | }, 491 | "outputs": [], 492 | "source": [ 493 | "# TODO: complete exercise" 494 | ] 495 | }, 496 | { 497 | "cell_type": "markdown", 498 | "metadata": { 499 | "id": "muzBsZi5Fmgs" 500 | }, 501 | "source": [ 502 | "## Recap & Next steps\n", 503 | "\n", 504 | "Nice work! You learned\n", 505 | "- Python SDK quickstart\n", 506 | "- Text prompting\n", 507 | "- Streaming and chats\n", 508 | "- System prompts and config options\n", 509 | "- Long context and token counting\n", 510 | "\n", 511 | "\n", 512 | "More helpful resources:\n", 513 | "- [API docs quickstart](https://ai.google.dev/gemini-api/docs/quickstart?lang=python)\n", 514 | "- [Text generation docs](https://ai.google.dev/gemini-api/docs/text-generation)\n", 515 | "- [Long context docs](https://ai.google.dev/gemini-api/docs/long-context)\n", 516 | "\n", 517 | "Next steps:\n", 518 | "- [Part 2: Multimodal understanding (image, video, audio, docs, code)](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-2-multimodal-understanding.ipynb)" 519 | ] 520 | }, 521 | { 522 | "cell_type": "markdown", 523 | "metadata": {}, 524 | "source": [] 525 | } 526 | ], 527 | "metadata": { 528 | "colab": { 529 | "provenance": [] 530 | }, 531 | "kernelspec": { 532 | "display_name": "venv", 533 | "language": "python", 534 | "name": "python3" 535 | }, 536 | "language_info": { 537 | "codemirror_mode": { 538 | "name": "ipython", 539 | "version": 3 540 | }, 541 | "file_extension": ".py", 542 | "mimetype": "text/x-python", 543 | "name": "python", 544 | "nbconvert_exporter": "python", 545 | "pygments_lexer": "ipython3", 546 | "version": "3.13.2" 547 | } 548 | }, 549 | "nbformat": 4, 550 | "nbformat_minor": 0 551 | } 552 | -------------------------------------------------------------------------------- /notebooks/part-2-multimodal-understanding.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "XkAAfCT2pezK" 7 | }, 8 | "source": [ 9 | "##### Copyright 2025 Patrick Loeber, Google LLC" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": { 16 | "cellView": "form", 17 | "id": "HUwz5T0qpezL" 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "\n", 22 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 23 | "# you may not use this file except in compliance with the License.\n", 24 | "# You may obtain a copy of the License at\n", 25 | "#\n", 26 | "# https://www.apache.org/licenses/LICENSE-2.0\n", 27 | "#\n", 28 | "# Unless required by applicable law or agreed to in writing, software\n", 29 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 30 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 31 | "# See the License for the specific language governing permissions and\n", 32 | "# limitations under the License." 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": { 38 | "id": "-4sy2g5g5h50" 39 | }, 40 | "source": [ 41 | "# Workshop: Build with Gemini (Part 2)\n", 42 | "\n", 43 | "\"Open\n", 44 | "\n", 45 | "This workshop teaches how to build with Gemini using the Gemini API and Python SDK.\n", 46 | "\n", 47 | "Course outline:\n", 48 | "\n", 49 | "- **[Part1: Quickstart + Text prompting](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-1-text-prompting.ipynb)**\n", 50 | "\n", 51 | "- **Part 2 (this notebook): Multimodal understanding (image, video, audio, docs, code)**\n", 52 | " - Image\n", 53 | " - Video\n", 54 | " - Audio\n", 55 | " - Documents (PDFs)\n", 56 | " - Code\n", 57 | " - Final excercise: Analyze supermarket invoice\n", 58 | "\n", 59 | "- **[Part 3: Thinking models + agentic capabilities (tool usage)](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-3-thinking-and-tools.ipynb)**" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": { 65 | "id": "enN0SFUq5_mo" 66 | }, 67 | "source": [ 68 | "## 0. Use the Google AI Studio as playground\n", 69 | "\n", 70 | "Explore and play with all models in the [Google AI Studio](https://aistudio.google.com/apikey)." 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": { 76 | "id": "fJjFsaSg6EoF" 77 | }, 78 | "source": [ 79 | "## 1. Setup\n", 80 | "\n", 81 | "Get a free API key in the [Google AI Studio](https://aistudio.google.com/apikey) and set up the [Google Gen AI Python SDK](https://github.com/googleapis/python-genai)" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": null, 87 | "metadata": { 88 | "id": "7SzjZdf7mwD_" 89 | }, 90 | "outputs": [], 91 | "source": [ 92 | "%pip install -U -q google-genai" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": { 99 | "id": "BF3gXZyFm3Pf" 100 | }, 101 | "outputs": [], 102 | "source": [ 103 | "from google.colab import userdata\n", 104 | "\n", 105 | "GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": null, 111 | "metadata": { 112 | "id": "0lajO_7dnFya" 113 | }, 114 | "outputs": [], 115 | "source": [ 116 | "from google import genai\n", 117 | "from google.genai import types\n", 118 | "\n", 119 | "client = genai.Client(api_key=GOOGLE_API_KEY)" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "metadata": { 126 | "id": "hsLIkbX1dK-v" 127 | }, 128 | "outputs": [], 129 | "source": [ 130 | "MODEL = \"gemini-2.0-flash\"" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": { 136 | "id": "P-qkPEabTURX" 137 | }, 138 | "source": [ 139 | "## Image understanding\n", 140 | "\n", 141 | "Gemini models are able to process and understand images, e.g., you can use Gemini to describe, caption, and answer questions about images, and you can even use it for object detection." 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": null, 147 | "metadata": { 148 | "id": "RXRNfCtATTNG" 149 | }, 150 | "outputs": [], 151 | "source": [ 152 | "!curl -o image.jpg \"https://storage.googleapis.com/generativeai-downloads/images/Cupcakes.jpg\"" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "metadata": { 159 | "id": "ZsnYkEF2Tcm8" 160 | }, 161 | "outputs": [], 162 | "source": [ 163 | "from PIL import Image\n", 164 | "image = Image.open(\"image.jpg\")\n", 165 | "print(image.size)\n", 166 | "image" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": { 172 | "id": "FEVFI9_N7wGJ" 173 | }, 174 | "source": [ 175 | "For total image payload size less than 20MB, we recommend either uploading base64 encoded images or directly uploading locally stored image files.\n", 176 | "\n", 177 | "You can use a Pillow image in your prompt:" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": null, 183 | "metadata": { 184 | "id": "A5d73cjXTmen" 185 | }, 186 | "outputs": [], 187 | "source": [ 188 | "# TODO: ask a question about the image\n", 189 | "# info: you'll find the solutions in the `solutions` folder" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": { 195 | "id": "1UqzcAnEb31e" 196 | }, 197 | "source": [ 198 | "Or you can use base64 encoded images" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": null, 204 | "metadata": { 205 | "id": "de7qYAgWbMUQ" 206 | }, 207 | "outputs": [], 208 | "source": [ 209 | "import requests\n", 210 | "\n", 211 | "res = requests.get(\"https://storage.googleapis.com/generativeai-downloads/images/Cupcakes.jpg\")\n", 212 | "\n", 213 | "# TODO: use the base64 image and ask a question" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": { 219 | "id": "5KaTPCZ0c4QN" 220 | }, 221 | "source": [ 222 | "You can use the File API for large payloads (>20MB).\n", 223 | "\n", 224 | " The File API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. They can be accessed in that period with your API key, but cannot be downloaded from the API. It is available at no cost in all regions where the Gemini API is available." 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": null, 230 | "metadata": { 231 | "id": "uqXFR3a4cgV7" 232 | }, 233 | "outputs": [], 234 | "source": [ 235 | "# TODO: upload the file and ask a question about it" 236 | ] 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "metadata": { 241 | "id": "YCgdLPwZp_5D" 242 | }, 243 | "source": [ 244 | "#### **!! Exercise!!**\n", 245 | "\n", 246 | "- Use the following image: https://storage.googleapis.com/generativeai-downloads/images/croissant.jpg\n", 247 | "- Tell Gemini to describe the image\n", 248 | "- Then asked Gemini for a recipe to bake this item. Include item names and quantities for the recipe." 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": null, 254 | "metadata": { 255 | "id": "GS9-4esnqg4O" 256 | }, 257 | "outputs": [], 258 | "source": [ 259 | "# TODO" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": { 265 | "id": "oQMyViVxe9Tg" 266 | }, 267 | "source": [ 268 | "#### Bounding box\n", 269 | "\n", 270 | "Gemini models are trained to return bounding box coordinates.\n", 271 | "\n", 272 | "**Important**: Gemini returns bounding box coordinates in this format:\n", 273 | "\n", 274 | "- `[y_min, x_min, y_max, x_max]`\n", 275 | "- and normalized to `[0,1000]`\n", 276 | "\n", 277 | "**Tip**: Ask Gemini to return JSON format and configure `config={'response_mime_type': 'application/json'}`:" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": null, 283 | "metadata": { 284 | "id": "I-8OzBbNe8k-" 285 | }, 286 | "outputs": [], 287 | "source": [ 288 | "# TODO: ask to return bounding boxes \n", 289 | "\n", 290 | "bboxes = ..." 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": { 296 | "id": "F5qAcdB08pCG" 297 | }, 298 | "source": [ 299 | "Create a helper function to denormalize and draw the bounding boxes:\n", 300 | "\n" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": null, 306 | "metadata": { 307 | "id": "rAnDqBtugriS" 308 | }, 309 | "outputs": [], 310 | "source": [ 311 | "from PIL import ImageDraw, ImageFont\n", 312 | "\n", 313 | "line_width = 4\n", 314 | "font = ImageFont.load_default(size=16)\n", 315 | "\n", 316 | "labels = list(set(box['label'] for box in bboxes))\n", 317 | "\n", 318 | "def draw_bounding_boxes(image, bounding_boxes):\n", 319 | " img = image.copy()\n", 320 | " width, height = img.size\n", 321 | "\n", 322 | " draw = ImageDraw.Draw(img)\n", 323 | "\n", 324 | " colors = ['blue','red','green','yellow','orange','pink','purple']\n", 325 | "\n", 326 | " for box in bounding_boxes:\n", 327 | " y_min, x_min, y_max, x_max = box['box_2d']\n", 328 | " label = box['label']\n", 329 | "\n", 330 | " # Convert normalized coordinates to absolute coordinates\n", 331 | " y_min = int(y_min/1000 * height)\n", 332 | " x_min = int(x_min/1000 * width)\n", 333 | " y_max = int(y_max/1000 * height)\n", 334 | " x_max = int(x_max/1000 * width)\n", 335 | "\n", 336 | " color = colors[labels.index(label) % len(colors)]\n", 337 | " draw.rectangle([(x_min, y_min), (x_max, y_max)], outline=color, width=line_width)\n", 338 | "\n", 339 | " draw.text((x_min+line_width, y_min), label, fill=color, font=font)\n", 340 | "\n", 341 | " display(img)\n", 342 | "\n", 343 | "draw_bounding_boxes(image, bboxes)" 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": { 349 | "id": "Cqw5dRKLiWV8" 350 | }, 351 | "source": [ 352 | "## Video\n", 353 | "\n", 354 | "Gemini models are able to process videos. The 1M context window support up to approximately an hour of video data.\n", 355 | "\n", 356 | "For technical details about supported video formats, see [the docs](https://ai.google.dev/gemini-api/docs/vision#technical-details-video)." 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": null, 362 | "metadata": { 363 | "id": "csJLQl-IiX4R" 364 | }, 365 | "outputs": [], 366 | "source": [ 367 | "!wget https://storage.googleapis.com/generativeai-downloads/videos/post_its.mp4 -O Post_its.mp4 -q" 368 | ] 369 | }, 370 | { 371 | "cell_type": "markdown", 372 | "metadata": { 373 | "id": "Lxg-yOOs9uV5" 374 | }, 375 | "source": [ 376 | "Use the File API to upload a video. Here we also check the processing state:" 377 | ] 378 | }, 379 | { 380 | "cell_type": "code", 381 | "execution_count": null, 382 | "metadata": { 383 | "id": "cR8WEJBHieiA" 384 | }, 385 | "outputs": [], 386 | "source": [ 387 | "import time\n", 388 | "\n", 389 | "def upload_video(video_file_name):\n", 390 | " video_file = client.files.upload(file=video_file_name)\n", 391 | "\n", 392 | " while video_file.state == \"PROCESSING\":\n", 393 | " print('Waiting for video to be processed.')\n", 394 | " time.sleep(10)\n", 395 | " video_file = client.files.get(name=video_file.name)\n", 396 | "\n", 397 | " if video_file.state == \"FAILED\":\n", 398 | " raise ValueError(video_file.state)\n", 399 | "\n", 400 | " print(f'Video processing complete: ' + video_file.uri)\n", 401 | " return video_file\n", 402 | "\n", 403 | "post_its_video = upload_video('Post_its.mp4')" 404 | ] 405 | }, 406 | { 407 | "cell_type": "markdown", 408 | "metadata": { 409 | "id": "t5B3xCns93gL" 410 | }, 411 | "source": [ 412 | "Now you can use the uploaded file in your prompt:" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": null, 418 | "metadata": { 419 | "id": "Yx_TCe2Oih0n" 420 | }, 421 | "outputs": [], 422 | "source": [ 423 | "# TODO: ask to list all post-its from the video" 424 | ] 425 | }, 426 | { 427 | "cell_type": "markdown", 428 | "metadata": { 429 | "id": "o-DPmMBdlQdl" 430 | }, 431 | "source": [ 432 | "#### YouTube video support\n", 433 | "\n", 434 | "The Gemini API and AI Studio support YouTube URLs as a file data Part. You can include a YouTube URL with a prompt asking the model to summarize, translate, or otherwise interact with the video content." 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": null, 440 | "metadata": { 441 | "id": "bDoX9Szrjsc_" 442 | }, 443 | "outputs": [], 444 | "source": [ 445 | "youtube_url = \"https://youtu.be/LlWDx0LSDok\"\n", 446 | "# TODO: ask to summarize the video" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": { 452 | "id": "jSHibXZSshvS" 453 | }, 454 | "source": [ 455 | "#### **!! Exercise !!**\n", 456 | "\n", 457 | "- Your turn! Use this video (*If I could only cook one dish for a vegan skeptic* from Rainbow Plant Life: https://youtu.be/BHRyfEbhFFU\n", 458 | "- Ask Gemini about to describe the video and to get the recipe" 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": null, 464 | "metadata": { 465 | "id": "srY2rlKnuWG_" 466 | }, 467 | "outputs": [], 468 | "source": [ 469 | "# TODO: complete the exercise" 470 | ] 471 | }, 472 | { 473 | "cell_type": "markdown", 474 | "metadata": { 475 | "id": "oPGJ6kxSoL7O" 476 | }, 477 | "source": [ 478 | "## Audio\n", 479 | "\n", 480 | "You can use Gemini to process audio files. For example, you can use it to generate a transcript of an audio file or to summarize the content of an audio file.\n", 481 | "\n", 482 | "Gemini represents each second of audio as 32 tokens; for example, one minute of audio is represented as 1,920 tokens.\n", 483 | "\n", 484 | "For more info about technical details and supported formats, see [the docs](https://ai.google.dev/gemini-api/docs/audio#supported-formats)." 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": null, 490 | "metadata": { 491 | "id": "RFtpKexFnleG" 492 | }, 493 | "outputs": [], 494 | "source": [ 495 | "URL = \"https://storage.googleapis.com/generativeai-downloads/data/jeff-dean-presentation.mp3\"\n", 496 | "!wget -q $URL -O sample.mp3" 497 | ] 498 | }, 499 | { 500 | "cell_type": "code", 501 | "execution_count": null, 502 | "metadata": { 503 | "id": "zHwv_ykGWhRP" 504 | }, 505 | "outputs": [], 506 | "source": [ 507 | "import IPython\n", 508 | "IPython.display.Audio(\"sample.mp3\")" 509 | ] 510 | }, 511 | { 512 | "cell_type": "code", 513 | "execution_count": null, 514 | "metadata": { 515 | "id": "LjH3mI_2lwpm" 516 | }, 517 | "outputs": [], 518 | "source": [ 519 | "# TODO: ask to summarize the audio" 520 | ] 521 | }, 522 | { 523 | "cell_type": "markdown", 524 | "metadata": { 525 | "id": "F6ScOLTfVNx-" 526 | }, 527 | "source": [ 528 | "1 minute audio = ~130 words or ~170 tokens\n", 529 | "8192 / 170 = ~48 min output length.\n", 530 | "\n", 531 | "You can use Gemini for transcribing, but be aware of the output token limit.\n", 532 | "\n", 533 | "We can use `pydub` to split the audio file:" 534 | ] 535 | }, 536 | { 537 | "cell_type": "code", 538 | "execution_count": null, 539 | "metadata": { 540 | "id": "e7kLjGpPWcj7" 541 | }, 542 | "outputs": [], 543 | "source": [ 544 | "%pip install pydub" 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": null, 550 | "metadata": { 551 | "id": "6BqVJQdZWbW2" 552 | }, 553 | "outputs": [], 554 | "source": [ 555 | "from pydub import AudioSegment\n", 556 | "audio = AudioSegment.from_mp3(\"sample.mp3\")\n", 557 | "duration = 60 * 1000 # pydub works in milliseconds\n", 558 | "audio_clip = audio[:duration]" 559 | ] 560 | }, 561 | { 562 | "cell_type": "code", 563 | "execution_count": null, 564 | "metadata": { 565 | "id": "k3AT-dTRW9NX" 566 | }, 567 | "outputs": [], 568 | "source": [ 569 | "audio_clip" 570 | ] 571 | }, 572 | { 573 | "cell_type": "code", 574 | "execution_count": null, 575 | "metadata": { 576 | "id": "3m87bU5vb3pq" 577 | }, 578 | "outputs": [], 579 | "source": [ 580 | "import io\n", 581 | "buffer = io.BytesIO()\n", 582 | "audio_clip.export(buffer, format=\"mp3\")\n", 583 | "\n", 584 | "audio_bytes = buffer.read()" 585 | ] 586 | }, 587 | { 588 | "cell_type": "markdown", 589 | "metadata": { 590 | "id": "wqyVx14lUSOS" 591 | }, 592 | "source": [ 593 | "For files below 20 MB, you can provide the audio file directly as inline data in your request.\n", 594 | "\n", 595 | "To do this, use `types.Part.from_bytes` and add it to the `contents` argument when calling `generate_content()`:" 596 | ] 597 | }, 598 | { 599 | "cell_type": "code", 600 | "execution_count": null, 601 | "metadata": { 602 | "id": "pFAZ0JslS73Q" 603 | }, 604 | "outputs": [], 605 | "source": [ 606 | "# TODO: ask to transribe the file" 607 | ] 608 | }, 609 | { 610 | "cell_type": "markdown", 611 | "metadata": { 612 | "id": "Z2d9xevDVB1r" 613 | }, 614 | "source": [ 615 | "Let's use a return format that's easier to understand:" 616 | ] 617 | }, 618 | { 619 | "cell_type": "code", 620 | "execution_count": null, 621 | "metadata": { 622 | "id": "_BVR0YxKavVw" 623 | }, 624 | "outputs": [], 625 | "source": [ 626 | "# TODO: ask to transcribe and return an easier format with timestamps" 627 | ] 628 | }, 629 | { 630 | "cell_type": "markdown", 631 | "metadata": { 632 | "id": "78hhvMuLvEkG" 633 | }, 634 | "source": [ 635 | "Another useful prompt you can try with audio files:\n", 636 | "- Refer to timestamps: `Provide a transcript of the speech from 02:30 to 03:29.`" 637 | ] 638 | }, 639 | { 640 | "cell_type": "markdown", 641 | "metadata": { 642 | "id": "curIC8c7qCYh" 643 | }, 644 | "source": [ 645 | "## PDFs\n", 646 | "\n", 647 | "PDFs can also be used in the same way:" 648 | ] 649 | }, 650 | { 651 | "cell_type": "code", 652 | "execution_count": null, 653 | "metadata": { 654 | "id": "CKKjFWWtpkUr" 655 | }, 656 | "outputs": [], 657 | "source": [ 658 | "URL = \"https://storage.googleapis.com/generativeai-downloads/data/pdf_structured_outputs/invoice.pdf\"\n", 659 | "!wget -q $URL -O invoice.pdf" 660 | ] 661 | }, 662 | { 663 | "cell_type": "code", 664 | "execution_count": null, 665 | "metadata": { 666 | "id": "PEeoY6nyqfql" 667 | }, 668 | "outputs": [], 669 | "source": [ 670 | "# TODO: upload the PDF" 671 | ] 672 | }, 673 | { 674 | "cell_type": "code", 675 | "execution_count": null, 676 | "metadata": { 677 | "id": "PakfSabcmy0L" 678 | }, 679 | "outputs": [], 680 | "source": [ 681 | "# TODO: count tokens" 682 | ] 683 | }, 684 | { 685 | "cell_type": "markdown", 686 | "metadata": { 687 | "id": "fTCYwaY4uiSA" 688 | }, 689 | "source": [ 690 | "**Next step**: A cool feature I recommend is to combine it with structured outputs using Pydantic." 691 | ] 692 | }, 693 | { 694 | "cell_type": "code", 695 | "execution_count": null, 696 | "metadata": { 697 | "id": "SckDEY7hnbbP" 698 | }, 699 | "outputs": [], 700 | "source": [ 701 | "# TODO: define a schema and extract info" 702 | ] 703 | }, 704 | { 705 | "cell_type": "code", 706 | "execution_count": null, 707 | "metadata": { 708 | "id": "8miXBdbFYBeg" 709 | }, 710 | "outputs": [], 711 | "source": [ 712 | "response.parsed.model_dump()" 713 | ] 714 | }, 715 | { 716 | "cell_type": "markdown", 717 | "metadata": { 718 | "id": "7mBgXG1p-pF1" 719 | }, 720 | "source": [ 721 | "## Code\n", 722 | "\n", 723 | "Gemini is good at understanding and generating code.\n", 724 | "\n", 725 | "Let's use [gitingest](https://github.com/cyclotruc/gitingest) to chat with a GitHub repo:" 726 | ] 727 | }, 728 | { 729 | "cell_type": "code", 730 | "execution_count": null, 731 | "metadata": { 732 | "id": "zA2XH9Jf-qja" 733 | }, 734 | "outputs": [], 735 | "source": [ 736 | "%pip install gitingest" 737 | ] 738 | }, 739 | { 740 | "cell_type": "code", 741 | "execution_count": null, 742 | "metadata": { 743 | "id": "6C1uQK2_71I_" 744 | }, 745 | "outputs": [], 746 | "source": [ 747 | "from gitingest import ingest_async\n", 748 | "\n", 749 | "summary, tree, content = await ingest_async(\"https://github.com/patrickloeber/snake-ai-pytorch\")" 750 | ] 751 | }, 752 | { 753 | "cell_type": "code", 754 | "execution_count": null, 755 | "metadata": { 756 | "id": "ubjz-HhZBia6" 757 | }, 758 | "outputs": [], 759 | "source": [ 760 | "print(summary)" 761 | ] 762 | }, 763 | { 764 | "cell_type": "code", 765 | "execution_count": null, 766 | "metadata": { 767 | "id": "v8lNYVU9CPMk" 768 | }, 769 | "outputs": [], 770 | "source": [ 771 | "print(tree)" 772 | ] 773 | }, 774 | { 775 | "cell_type": "code", 776 | "execution_count": null, 777 | "metadata": { 778 | "id": "heedUewH_M3r" 779 | }, 780 | "outputs": [], 781 | "source": [ 782 | "# TODO: create a chat and ask questions about the code" 783 | ] 784 | }, 785 | { 786 | "cell_type": "markdown", 787 | "metadata": { 788 | "id": "0Cd1QIZObj6P" 789 | }, 790 | "source": [ 791 | "## Exercise: Analyze supermarket invoice\n", 792 | "\n", 793 | "Task:\n", 794 | "- Define a schema for a single item that contains `item_name` and `item_cost`\n", 795 | "- Define a schema for the supermarket invoice with `items`, `date`, and `total_cost`\n", 796 | "- Use Gemini to extract all info from the supermarket bill into the defined supermarket invoice schema.\n", 797 | "- Ask Gemini to list a few healthy recipes based on the items. If you have dietary restrictions, tell Gemini about it!" 798 | ] 799 | }, 800 | { 801 | "cell_type": "code", 802 | "execution_count": null, 803 | "metadata": { 804 | "id": "nto6Tj4wevTt" 805 | }, 806 | "outputs": [], 807 | "source": [ 808 | "import requests\n", 809 | "url = 'https://raw.githubusercontent.com/patrickloeber/workshop-build-with-gemini/main/data/rewe_invoice.pdf'\n", 810 | "res = requests.get(url)\n", 811 | "with open(\"rewe_invoice.pdf\", \"wb\") as f:\n", 812 | " f.write(res.content)" 813 | ] 814 | }, 815 | { 816 | "cell_type": "code", 817 | "execution_count": null, 818 | "metadata": { 819 | "id": "e5ds7SSWo6xp" 820 | }, 821 | "outputs": [], 822 | "source": [ 823 | "rewe_pdf = client.files.upload(file='rewe_invoice.pdf')" 824 | ] 825 | }, 826 | { 827 | "cell_type": "code", 828 | "execution_count": null, 829 | "metadata": { 830 | "id": "d9-Jvx58peg7" 831 | }, 832 | "outputs": [], 833 | "source": [ 834 | "# TODO: complete the exercise" 835 | ] 836 | }, 837 | { 838 | "cell_type": "markdown", 839 | "metadata": { 840 | "id": "EO-TUwbiIHu5" 841 | }, 842 | "source": [ 843 | "## Recap & Next steps\n", 844 | "\n", 845 | "Great job, you're now an expert in working with multimodal data :)\n", 846 | "\n", 847 | "Gemini's multimodal capabilities are powerful, and with the Python SDK you only need a few lines of code to process various media types, including text, audio, images, videos, and PDFs.\n", 848 | "\n", 849 | "For many use cases, it's helpful to constrain Gemini to respond with JSON using structured outputs.\n", 850 | "\n", 851 | "More helpful resources:\n", 852 | "\n", 853 | "- [Audio understanding docs](https://ai.google.dev/gemini-api/docs/audio?lang=python)\n", 854 | "- [Visio understanding docs](https://ai.google.dev/gemini-api/docs/vision?lang=python)\n", 855 | "- [Philschmid blog post: From PDFs to Insights](https://www.philschmid.de/gemini-pdf-to-data)\n", 856 | "- [Structured output docs](https://ai.google.dev/gemini-api/docs/structured-output?lang=python)\n", 857 | "- [Video understanding cookbook](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Video_understanding.ipynb)\n", 858 | "\n", 859 | "Next steps:\n", 860 | "\n", 861 | "- **[Part 3: Thinking models + agentic capabilities (tool usage)](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-3-thinking-and-tools.ipynb)**\n" 862 | ] 863 | } 864 | ], 865 | "metadata": { 866 | "colab": { 867 | "provenance": [] 868 | }, 869 | "kernelspec": { 870 | "display_name": "Python 3", 871 | "name": "python3" 872 | }, 873 | "language_info": { 874 | "name": "python" 875 | } 876 | }, 877 | "nbformat": 4, 878 | "nbformat_minor": 0 879 | } 880 | -------------------------------------------------------------------------------- /notebooks/part-3-thinking-and-tools.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "nrD2kWanydfP" 7 | }, 8 | "source": [ 9 | "##### Copyright 2025 Patrick Loeber, Google LLC" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": { 16 | "cellView": "form", 17 | "id": "wrgUJetgydfR" 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "\n", 22 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 23 | "# you may not use this file except in compliance with the License.\n", 24 | "# You may obtain a copy of the License at\n", 25 | "#\n", 26 | "# https://www.apache.org/licenses/LICENSE-2.0\n", 27 | "#\n", 28 | "# Unless required by applicable law or agreed to in writing, software\n", 29 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 30 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 31 | "# See the License for the specific language governing permissions and\n", 32 | "# limitations under the License." 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": { 38 | "id": "si1uWsxtj0W6" 39 | }, 40 | "source": [ 41 | "# Workshop: Build with Gemini (Part 3)\n", 42 | "\n", 43 | "\"Open\n", 44 | "\n", 45 | "This workshop teaches how to build with Gemini using the Gemini API and Python SDK.\n", 46 | "\n", 47 | "Course outline:\n", 48 | "\n", 49 | "- **[Part1: Quickstart + Text prompting](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-1-text-prompting.ipynb)**\n", 50 | "\n", 51 | "- **[Part 2: Multimodal understanding (image, video, audio, docs, code)](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-2-multimodal-understanding.ipynb)**\n", 52 | "\n", 53 | "- **Part 3 (this notebook): Thinking models + agentic capabilities (tool usage)**\n", 54 | " - Thinking models\n", 55 | " - Structured outputps\n", 56 | " - Code execution\n", 57 | " - Grounding with Google Search\n", 58 | " - Function calling\n", 59 | " - Final excercise: Give Gemini access to the PokéAPI to answer Pokémon questions\n", 60 | "\n", 61 | "## 0. Use the Google AI Studio as playground\n", 62 | "\n", 63 | "Explore and play with all models in the [Google AI Studio](https://aistudio.google.com/apikey).\n", 64 | "\n", 65 | "## 1. Setup\n", 66 | "\n", 67 | "Get a free API key in the [Google AI Studio](https://aistudio.google.com/apikey) and set up the [Google Gen AI Python SDK](https://github.com/googleapis/python-genai)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": null, 73 | "metadata": { 74 | "id": "7SzjZdf7mwD_" 75 | }, 76 | "outputs": [], 77 | "source": [ 78 | "%pip install -U -q google-genai" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": { 85 | "id": "BF3gXZyFm3Pf" 86 | }, 87 | "outputs": [], 88 | "source": [ 89 | "from google.colab import userdata\n", 90 | "\n", 91 | "GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": null, 97 | "metadata": { 98 | "id": "0lajO_7dnFya" 99 | }, 100 | "outputs": [], 101 | "source": [ 102 | "from google import genai\n", 103 | "from google.genai import types\n", 104 | "\n", 105 | "client = genai.Client(api_key=GOOGLE_API_KEY)" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": { 111 | "id": "drS_GiIih9kw" 112 | }, 113 | "source": [ 114 | "## Thinking models\n", 115 | "\n", 116 | "Starting with Gemini 2.5, all models have thinking capabilities. These models use an internal \"thinking process\" during response generation. This process contributes to their improved reasoning capabilities and allows them to solve complex tasks, particularly complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents.\n", 117 | "\n", 118 | "Thinking models are also great at working with tools to perform actions beyond generating text. This allows them to interact with external systems, execute code, or access real-time information, incorporating the results into their reasoning and final response.\n", 119 | "\n", 120 | "(Note: Tools are also available with Gemini 2.0 models)" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": null, 126 | "metadata": { 127 | "id": "uqCNX_91q2YN" 128 | }, 129 | "outputs": [], 130 | "source": [ 131 | "# 2.5 Pro\n", 132 | "MODEL = \"gemini-2.5-pro-exp-03-25\" # with paid tier: gemini-2.5-pro-preview-03-25\n", 133 | "\n", 134 | "# 2.5 Flash\n", 135 | "# MODEL = \"gemini-2.5-flash-preview-04-17\"" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "Note that 2.5 Pro has a 5 RPM [rate limit on the free tier](https://ai.google.dev/gemini-api/docs/rate-limits#current-rate-limits). If you run into errors, wait a minute before sending the next request." 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": null, 148 | "metadata": { 149 | "id": "6iRjV4L-sMxp" 150 | }, 151 | "outputs": [], 152 | "source": [ 153 | "# TODO: send a prompt with Gemini 2.5\n", 154 | "# info: you'll find the solutions in the `solutions` folder" 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": { 160 | "id": "lCenTU9B0RiC" 161 | }, 162 | "source": [ 163 | "## **!! Exercise !!** ##\n", 164 | "\n", 165 | "- Go to [Google AI Studio](https://ai.dev/?model=gemini-2.5-pro-preview-03-25), use Gemini 2.5 Pro, give it a complex task, and pbserve the thinking process. For example, create a p5js game in one shot:\n", 166 | "\n", 167 | "```\n", 168 | "Make a p5js soccer game simulation. There should be 2 teams and each player on the team should have their path traveled displayed. Add live stats on the right side and score in the top bar. no HTML\n", 169 | "```" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": { 175 | "id": "P-qkPEabTURX" 176 | }, 177 | "source": [ 178 | "## Structured output\n", 179 | "\n", 180 | "Gemini generates unstructured text by default, but some applications require structured text. For these use cases, you can constrain Gemini to respond with JSON, a structured data format suitable for automated processing. You can also constrain the model to respond with one of the options specified in an enum." 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": null, 186 | "metadata": { 187 | "id": "ZsnYkEF2Tcm8" 188 | }, 189 | "outputs": [], 190 | "source": [ 191 | "from pydantic import BaseModel\n", 192 | "\n", 193 | "class Recipe(BaseModel):\n", 194 | " recipe_name: str\n", 195 | " ingredients: list[str]\n", 196 | "\n", 197 | "response = client.models.generate_content(\n", 198 | " model=MODEL,\n", 199 | " contents='List a three popular cookie recipes. Be sure to include the amounts of ingredients.',\n", 200 | " config={\n", 201 | " 'response_mime_type': 'application/json',\n", 202 | " 'response_schema': list[Recipe],\n", 203 | " },\n", 204 | ")\n", 205 | "# Use the response as a JSON string.\n", 206 | "print(response.text)\n", 207 | "\n", 208 | "# Use instantiated objects.\n", 209 | "my_recipes: list[Recipe] = response.parsed" 210 | ] 211 | }, 212 | { 213 | "cell_type": "markdown", 214 | "metadata": { 215 | "id": "Qp8wv9jstl96" 216 | }, 217 | "source": [ 218 | "Contrain to enums:" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": null, 224 | "metadata": { 225 | "id": "jQbvMIvuLUUE" 226 | }, 227 | "outputs": [], 228 | "source": [ 229 | "# TODO: contrain output to an enum schema\n" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": { 235 | "id": "jicpPkAntnud" 236 | }, 237 | "source": [ 238 | "Or use the builtin Python enum class:" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": null, 244 | "metadata": { 245 | "id": "a0W_6JyZN_ED" 246 | }, 247 | "outputs": [], 248 | "source": [ 249 | "# TODO: constrain output to enum class" 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": { 255 | "id": "16C3AP4YOVlc" 256 | }, 257 | "source": [ 258 | "## Code execution\n", 259 | "\n", 260 | "The code execution feature enables the model to generate and run Python code and learn iteratively from the results until it arrives at a final output. You can use this code execution capability to build applications that benefit from code-based reasoning and that produce text output. For example, you could use code execution in an application that solves equations or processes text." 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": null, 266 | "metadata": { 267 | "id": "SUMFt9wqOgVk" 268 | }, 269 | "outputs": [], 270 | "source": [ 271 | "# TODO: tell gemini to use code to answer a math question" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": null, 277 | "metadata": { 278 | "id": "0Bgfv66AOjiA" 279 | }, 280 | "outputs": [], 281 | "source": [ 282 | "response" 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": null, 288 | "metadata": { 289 | "id": "Yxo7JHSWOspM" 290 | }, 291 | "outputs": [], 292 | "source": [ 293 | "from IPython.display import Image, Markdown, Code, HTML\n", 294 | "\n", 295 | "def display_code_execution_result(response):\n", 296 | " for part in response.candidates[0].content.parts:\n", 297 | " if part.text is not None:\n", 298 | " display(Markdown(part.text))\n", 299 | " if part.executable_code is not None:\n", 300 | " code_html = f'
{part.executable_code.code}
' # Change code color\n", 301 | " display(HTML(code_html))\n", 302 | " if part.code_execution_result is not None:\n", 303 | " display(Markdown(\"#### Output\"))\n", 304 | " display(Markdown(part.code_execution_result.output))\n", 305 | " if part.inline_data is not None:\n", 306 | " display(Image(data=part.inline_data.data, format=\"png\"))\n", 307 | " display(Markdown(\"---\"))\n", 308 | "\n", 309 | "display_code_execution_result(response)" 310 | ] 311 | }, 312 | { 313 | "cell_type": "markdown", 314 | "metadata": { 315 | "id": "W0m5rasbQsDa" 316 | }, 317 | "source": [ 318 | "## Grounding with Google Search\n", 319 | "\n", 320 | "If Google Search is configured as a tool, Gemini can decide when to use Google Search to improve the accuracy and recency of responses.\n", 321 | "\n", 322 | "Here's a question about a recent event without Google Search:\n", 323 | "\n" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": null, 329 | "metadata": { 330 | "id": "582GKc2DQ-N6" 331 | }, 332 | "outputs": [], 333 | "source": [ 334 | "response = client.models.generate_content(\n", 335 | " model=MODEL,\n", 336 | " contents=\"Who won the super bowl this year?\",\n", 337 | ")\n", 338 | "\n", 339 | "print(response.text)" 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": null, 345 | "metadata": { 346 | "id": "SREuxqDSQs1y" 347 | }, 348 | "outputs": [], 349 | "source": [ 350 | "# TODO: enable Search and ask the same question" 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": null, 356 | "metadata": { 357 | "id": "dnCFGS7nQ9WB" 358 | }, 359 | "outputs": [], 360 | "source": [ 361 | "for part in response.candidates[0].content.parts:\n", 362 | " print(part.text)" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": null, 368 | "metadata": { 369 | "id": "BUgF_qSFQ6KW" 370 | }, 371 | "outputs": [], 372 | "source": [ 373 | "# To get grounding metadata as web content.\n", 374 | "HTML(response.candidates[0].grounding_metadata.search_entry_point.rendered_content)" 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": { 380 | "id": "tN2AYpE2yqpQ" 381 | }, 382 | "source": [ 383 | "#### **!! Exercise !!**\n", 384 | "\n", 385 | "Use Gemini with Google Search for the current weather and the forecast for the next weekend in Berlin" 386 | ] 387 | }, 388 | { 389 | "cell_type": "code", 390 | "execution_count": null, 391 | "metadata": { 392 | "id": "vRkMJFA6yoKt" 393 | }, 394 | "outputs": [], 395 | "source": [ 396 | "# TODO: complete the exercise" 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": { 402 | "id": "aKRcuZE_Rjl-" 403 | }, 404 | "source": [ 405 | "## Function calling\n", 406 | "\n", 407 | "Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model understands when to call specific functions and provides the necessary parameters to execute real-world actions." 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": null, 413 | "metadata": { 414 | "id": "iL1FX3euRlQN" 415 | }, 416 | "outputs": [], 417 | "source": [ 418 | "from google.genai import types\n", 419 | "\n", 420 | "# Define the function declaration for the model\n", 421 | "weather_function = {\n", 422 | " \"name\": \"get_current_temperature\",\n", 423 | " \"description\": \"Gets the current temperature for a given location.\",\n", 424 | " \"parameters\": {\n", 425 | " \"type\": \"object\",\n", 426 | " \"properties\": {\n", 427 | " \"location\": {\n", 428 | " \"type\": \"string\",\n", 429 | " \"description\": \"The city name\",\n", 430 | " },\n", 431 | " },\n", 432 | " \"required\": [\"location\"],\n", 433 | " },\n", 434 | "}\n", 435 | "\n", 436 | "# Configure the client and tools\n", 437 | "# TODO\n", 438 | "\n", 439 | "# Send request with function declarations\n", 440 | "# TODO" 441 | ] 442 | }, 443 | { 444 | "cell_type": "markdown", 445 | "metadata": { 446 | "id": "ZbWYwJ1G7s3_" 447 | }, 448 | "source": [ 449 | "Check for a function call" 450 | ] 451 | }, 452 | { 453 | "cell_type": "code", 454 | "execution_count": null, 455 | "metadata": { 456 | "id": "-MvimiB5U30c" 457 | }, 458 | "outputs": [], 459 | "source": [ 460 | "if response.candidates[0].content.parts[0].function_call:\n", 461 | " function_call = response.candidates[0].content.parts[0].function_call\n", 462 | " print(f\"Function to call: {function_call.name}\")\n", 463 | " print(f\"Arguments: {function_call.args}\")\n", 464 | " # In a real app, you would call your function here:\n", 465 | " # result = get_current_temperature(**function_call.args)\n", 466 | "else:\n", 467 | " print(\"No function call found in the response.\")\n", 468 | " print(response.text)" 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": { 474 | "id": "qpV5sW9B0oBg" 475 | }, 476 | "source": [ 477 | "### Automatic Function Calling (Python Only)\n", 478 | "\n", 479 | "When using the Python SDK, you can provide Python functions directly as tools.\n", 480 | "\n", 481 | "The SDK handles the function call and returns the final text." 482 | ] 483 | }, 484 | { 485 | "cell_type": "code", 486 | "execution_count": null, 487 | "metadata": { 488 | "id": "IqL-113f020c" 489 | }, 490 | "outputs": [], 491 | "source": [ 492 | "# Define the function with type hints and docstring\n", 493 | "def get_current_temperature(location: str) -> dict:\n", 494 | " # TODO\n", 495 | "\n", 496 | "\n", 497 | "# TODO: configure the function and send a question about the temperature" 498 | ] 499 | }, 500 | { 501 | "cell_type": "markdown", 502 | "metadata": { 503 | "id": "okP9B1cJ7yKi" 504 | }, 505 | "source": [ 506 | "Check the function calling history:" 507 | ] 508 | }, 509 | { 510 | "cell_type": "code", 511 | "execution_count": null, 512 | "metadata": { 513 | "id": "gOQwksiK7z1X" 514 | }, 515 | "outputs": [], 516 | "source": [ 517 | "for content in response.automatic_function_calling_history:\n", 518 | " for part in content.parts:\n", 519 | " if part.function_call:\n", 520 | " print(part.function_call)" 521 | ] 522 | }, 523 | { 524 | "cell_type": "markdown", 525 | "metadata": { 526 | "id": "ZfnxbheQ5GsO" 527 | }, 528 | "source": [ 529 | "## Exercise: Get Pokémon stats\n", 530 | "\n", 531 | "- Define a function that can work with the PokéAPI and get Pokémon stats.\n", 532 | "- Endpoint to use: `GET https://pokeapi.co/api/v2/pokemon/`\n", 533 | "- Call Gemini and give it access to the function, then answer questions like: `\"What stats does the Pokemon Squirtle have?\"`\n" 534 | ] 535 | }, 536 | { 537 | "cell_type": "code", 538 | "execution_count": null, 539 | "metadata": { 540 | "id": "bNe1p_BkVOmu" 541 | }, 542 | "outputs": [], 543 | "source": [ 544 | "# TODO: complete the exercise" 545 | ] 546 | }, 547 | { 548 | "cell_type": "markdown", 549 | "metadata": { 550 | "id": "yTWLa3ZZ_yhT" 551 | }, 552 | "source": [ 553 | "## Recap & Next steps\n", 554 | "\n", 555 | "Awesome work! You learned about thinking models with advanced reasoning capabilities and how to combine Gemini with tools for agentic use cases.\n", 556 | "\n", 557 | "More helpful resources:\n", 558 | "\n", 559 | "- [Thinking docs](https://ai.google.dev/gemini-api/docs/thinking)\n", 560 | "- [Structured output docs](https://ai.google.dev/gemini-api/docs/structured-output?lang=python)\n", 561 | "- [Code execution docs](https://ai.google.dev/gemini-api/docs/code-execution?lang=python)\n", 562 | "- [Grounding docs](https://ai.google.dev/gemini-api/docs/grounding?lang=python)\n", 563 | "- [Function calling docs](https://ai.google.dev/gemini-api/docs/function-calling?example=weather)\n", 564 | "\n", 565 | "🎉🎉**Conratulations, you completed the workshop!**🎉🎉\n", 566 | "\n", 567 | "**Next steps**: There's even more you can do with Gemini which we didn't cover in this workshop:\n", 568 | "\n", 569 | "- [Image creation and editing with Gemini 2.0](https://github.com/patrickloeber/genai-tutorials/blob/main/notebooks/gemini-image-editing.ipynb)\n", 570 | "- [Live API: Talk to Gemini and share your camera](https://aistudio.google.com/live) & [Live API cookbook](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Get_started_LiveAPI.ipynb)\n" 571 | ] 572 | } 573 | ], 574 | "metadata": { 575 | "colab": { 576 | "provenance": [] 577 | }, 578 | "kernelspec": { 579 | "display_name": "Python 3", 580 | "name": "python3" 581 | }, 582 | "language_info": { 583 | "name": "python" 584 | } 585 | }, 586 | "nbformat": 4, 587 | "nbformat_minor": 0 588 | } 589 | -------------------------------------------------------------------------------- /solutions/solution-part-1-text-prompting.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "XSc7AU66mJSC" 7 | }, 8 | "source": [ 9 | "##### Copyright 2025 Patrick Loeber, Google LLC" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": { 16 | "cellView": "form", 17 | "id": "tc6tjo9vmJSE" 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "\n", 22 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 23 | "# you may not use this file except in compliance with the License.\n", 24 | "# You may obtain a copy of the License at\n", 25 | "#\n", 26 | "# https://www.apache.org/licenses/LICENSE-2.0\n", 27 | "#\n", 28 | "# Unless required by applicable law or agreed to in writing, software\n", 29 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 30 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 31 | "# See the License for the specific language governing permissions and\n", 32 | "# limitations under the License." 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": { 38 | "id": "CuC_VSKMcEt6" 39 | }, 40 | "source": [ 41 | "# Workshop: Build with Gemini (Part 1)\n", 42 | "\n", 43 | "\"Open\n", 44 | "\n", 45 | "This workshop teaches how to build with Gemini using the Gemini API and Python SDK.\n", 46 | "\n", 47 | "Course outline:\n", 48 | "\n", 49 | "- **Part1 (this notebook): Quickstart + Text prompting**\n", 50 | " - Text understanding\n", 51 | " - Streaming response\n", 52 | " - Chats\n", 53 | " - System prompts\n", 54 | " - Config options\n", 55 | " - Long context\n", 56 | " - Token usage\n", 57 | " - Final excercise: Chat with book\n", 58 | "\n", 59 | "- **[Part 2: Multimodal understanding (image, video, audio, docs, code)](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-2-multimodal-understanding.ipynb)**\n", 60 | "\n", 61 | "- **[Part 3: Thinking models + agentic capabilities (tool usage)](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-3-thinking-and-tools.ipynb)**" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": { 67 | "id": "avRVsnMMJvof" 68 | }, 69 | "source": [ 70 | "## 0. Use the Google AI Studio as playground\n", 71 | "\n", 72 | "Explore and play with all models in the [Google AI Studio](https://aistudio.google.com/apikey).\n" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": { 78 | "id": "jnl6q8tMcpwU" 79 | }, 80 | "source": [ 81 | "## 1. Setup\n" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": { 87 | "id": "DD1kaBP4dnZG" 88 | }, 89 | "source": [ 90 | "Get a free API key in the [Google AI Studio](https://aistudio.google.com/apikey)" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 1, 96 | "metadata": { 97 | "id": "j6raUs82eYfk" 98 | }, 99 | "outputs": [], 100 | "source": [ 101 | "from google.colab import userdata\n", 102 | "\n", 103 | "GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')" 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": { 109 | "id": "yKjUEGGzdp87" 110 | }, 111 | "source": [ 112 | "Install the [Google Gen AI Python SDK](https://github.com/googleapis/python-genai)" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": 2, 118 | "metadata": { 119 | "colab": { 120 | "base_uri": "https://localhost:8080/" 121 | }, 122 | "id": "Y4d9NjqNeAXx", 123 | "outputId": "0970aed7-9f27-4146-aa6d-13e1878d1899" 124 | }, 125 | "outputs": [ 126 | { 127 | "name": "stdout", 128 | "output_type": "stream", 129 | "text": [ 130 | "\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/159.7 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m159.7/159.7 kB\u001b[0m \u001b[31m4.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 131 | "\u001b[?25h" 132 | ] 133 | } 134 | ], 135 | "source": [ 136 | "%pip install -q -U google-genai" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": { 142 | "id": "d6b7d1FleDuz" 143 | }, 144 | "source": [ 145 | "Configure Client" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 3, 151 | "metadata": { 152 | "id": "o6Uort3heUqT" 153 | }, 154 | "outputs": [], 155 | "source": [ 156 | "from google import genai\n", 157 | "from google.genai import types\n", 158 | "\n", 159 | "client = genai.Client(api_key=GOOGLE_API_KEY)" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": { 165 | "id": "1P2KmoPSgRxO" 166 | }, 167 | "source": [ 168 | "Configure model. See all [models](https://ai.google.dev/gemini-api/docs/models)" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 4, 174 | "metadata": { 175 | "id": "0qcgiiP7gO-6" 176 | }, 177 | "outputs": [], 178 | "source": [ 179 | "MODEL = \"gemini-2.0-flash\"" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": { 185 | "id": "LLsGbeGec8iF" 186 | }, 187 | "source": [ 188 | "## 2. Send your first prompt" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": 5, 194 | "metadata": { 195 | "colab": { 196 | "base_uri": "https://localhost:8080/" 197 | }, 198 | "id": "e57RFdZ6dRro", 199 | "outputId": "c0987493-7979-48d7-8275-47bd55f7f819" 200 | }, 201 | "outputs": [ 202 | { 203 | "name": "stdout", 204 | "output_type": "stream", 205 | "text": [ 206 | "Okay, here are 3 names for a vegan restaurant, with a little explanation of each:\n", 207 | "\n", 208 | "1. **The Rooted Table:** This name evokes a sense of groundedness, naturalness, and connection to the earth, which aligns well with vegan values. \"Table\" suggests a place of community and sharing a meal.\n", 209 | "\n", 210 | "2. **Verdant Spoon:** \"Verdant\" means green and flourishing, representing plant-based ingredients. \"Spoon\" is simple, inviting, and directly related to eating. This name is a bit more sophisticated and suggests fresh, vibrant flavors.\n", 211 | "\n", 212 | "3. **Bloom Eats:** \"Bloom\" suggests growth, life, and the beauty of plants. \"Eats\" is casual and approachable, making it feel like a welcoming and accessible restaurant for everyone.\n", 213 | "\n" 214 | ] 215 | } 216 | ], 217 | "source": [ 218 | "response = client.models.generate_content(\n", 219 | " model=MODEL,\n", 220 | " contents=\"Create 3 names for a vegan restaurant\"\n", 221 | ")\n", 222 | "\n", 223 | "print(response.text)" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": { 229 | "id": "-rfjqevtmRBO" 230 | }, 231 | "source": [ 232 | "#### **!! Exercise !!**\n", 233 | "- Send a few more prompts\n", 234 | " - Tell Gemini to write a blog post about the transformers architecture\n", 235 | " - Ask Gemini to explain list comprehension in Python\n", 236 | "- Experiment with models:\n", 237 | " - Try Gemini 2.0 Flash-Lite\n", 238 | " - Try Gemini 2.5 Pro Exp" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": 6, 244 | "metadata": { 245 | "id": "l4Zj8kiIoRqn" 246 | }, 247 | "outputs": [], 248 | "source": [ 249 | "model_id = \"gemini-2.5-pro-exp-03-25\" # paid tier with higher rate limits: gemini-2.5-pro-preview-03-25\n", 250 | "model_id = \"gemini-2.0-flash-lite\"" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": { 256 | "id": "vHqnTYJFdSlG" 257 | }, 258 | "source": [ 259 | "## 3. Text understanding" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": { 265 | "id": "WHRVaK0-tCE_" 266 | }, 267 | "source": [ 268 | "The simplest way to generate text is to provide the model with a text-only prompt. `contents` can be a single prompt, a list of prompts, or a combination of multimodal inputs." 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": 7, 274 | "metadata": { 275 | "colab": { 276 | "base_uri": "https://localhost:8080/" 277 | }, 278 | "id": "A_HqjSiFsUQ2", 279 | "outputId": "d17984d4-46ad-4af4-94e1-b326481771d4" 280 | }, 281 | "outputs": [ 282 | { 283 | "name": "stdout", 284 | "output_type": "stream", 285 | "text": [ 286 | "Okay, here are 3 name ideas for a vegan restaurant in Berlin, keeping in mind Berlin's vibe (often described as hip, international, and a bit edgy) and the focus on veganism:\n", 287 | "\n", 288 | "1. **Spree Sprouts:**\n", 289 | " * **Why it works:** Combines the name of Berlin's river (Spree) with the idea of fresh, growing plant-based food. It's catchy, memorable, and suggests a connection to the city.\n", 290 | "2. **Kiez Kitchen:**\n", 291 | " * **Why it works:** Uses the German word \"Kiez,\" referring to the local neighborhoods of Berlin. \"Kiez Kitchen\" evokes a sense of community, a local hangout, and homemade (plant-based) cooking.\n", 292 | "3. **The Green Grips:**\n", 293 | " * **Why it works:** A more modern and playful name. \"Grips\" has a slightly edgy feel, suggesting an interesting and confident take on vegan food. \"Green Grips\" is short, memorable, and hints at wholesome, plant-based ingredients.\n", 294 | "\n", 295 | "I tried to balance Berlin's character with the vegan theme in these suggestions. Good luck with your restaurant!\n", 296 | "\n" 297 | ] 298 | } 299 | ], 300 | "source": [ 301 | "response = client.models.generate_content(\n", 302 | " model=MODEL,\n", 303 | " #contents=\"Create 3 names for a vegan restaurant\",\n", 304 | " #contents=[\"Create 3 names for a vegan restaurant\"],\n", 305 | " contents=[\"Create 3 names for a vegan restaurant\", \"city: Berlin\"]\n", 306 | ")\n", 307 | "\n", 308 | "print(response.text)" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": { 314 | "id": "itCzXz1BiG5g" 315 | }, 316 | "source": [ 317 | "#### Streaming response\n", 318 | "\n", 319 | "By default, the model returns a response after completing the entire text generation process. You can achieve faster interactions by using streaming to return outputs as they're generated." 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 8, 325 | "metadata": { 326 | "colab": { 327 | "base_uri": "https://localhost:8080/" 328 | }, 329 | "id": "7d6HzwfZdWbt", 330 | "outputId": "e4ea9112-a647-4a4a-a13d-4121b0d38f29" 331 | }, 332 | "outputs": [ 333 | { 334 | "name": "stdout", 335 | "output_type": "stream", 336 | "text": [ 337 | "Alright, let's break down how AI works, without getting *too* technical. Think of it as teaching a computer to do things that normally require human intelligence. Here's a general overview:\n", 338 | "\n", 339 | "**Core Idea: Pattern Recognition and Prediction**\n", 340 | "\n", 341 | "At its heart, most modern AI is about recognizing patterns in data and using those patterns to make predictions or decisions. It's like how you learned to recognize a dog: you saw many dogs, noticed common features (fur, tail, four legs), and now you can identify a new dog even if you've never seen that particular breed before. AI learns in a similar way, but on a much larger scale and often with more complex patterns.\n", 342 | "\n", 343 | "**Key Components & Concepts:**\n", 344 | "\n", 345 | "1. **Data:** This is the fuel that powers AI. The more relevant and high-quality data you have, the better the AI will perform. Data can be anything:\n", 346 | " * **Text:** Articles, books, social media posts, code.\n", 347 | " * **Images:** Photos, videos, scans, drawings.\n", 348 | " * **Audio:** Speech, music, sound effects.\n", 349 | " * **Numerical Data:** Statistics, sensor readings, financial data.\n", 350 | " * **Categorical Data:** Colors, types of products, geographic regions.\n", 351 | "\n", 352 | "2. **Algorithms (The Instructions):** These are the sets of rules and mathematical formulas that tell the AI how to process the data. Different types of algorithms are used for different tasks. Some common examples include:\n", 353 | " * **Machine Learning (ML):** The most common type of AI right now. Instead of being explicitly programmed, ML algorithms *learn* from data. They improve their performance over time as they are exposed to more data.\n", 354 | " * **Supervised Learning:** The algorithm is trained on a *labeled* dataset, meaning the correct answer is provided for each input. Think of it like teaching a child by showing them pictures of cats and dogs and telling them which is which. The algorithm learns to associate features (e.g., pointy ears, wagging tail) with the correct label (cat or dog). Used for tasks like classification (categorizing things) and regression (predicting continuous values).\n", 355 | " * **Unsupervised Learning:** The algorithm is trained on an *unlabeled* dataset. It must find patterns and structures in the data on its own, without any guidance. Think of it like giving a child a pile of blocks and letting them figure out how to group them. Used for tasks like clustering (grouping similar data points) and dimensionality reduction (simplifying data).\n", 356 | " * **Reinforcement Learning:** The algorithm learns by interacting with an environment and receiving rewards or penalties for its actions. Think of it like training a dog with treats. The algorithm tries to maximize its rewards by learning the optimal strategy. Used for tasks like game playing and robotics.\n", 357 | " * **Deep Learning (DL):** A subfield of machine learning that uses artificial neural networks with many layers (hence \"deep\"). These networks are inspired by the structure of the human brain and are particularly good at processing complex data like images, audio, and text.\n", 358 | " * **Neural Networks:** Consist of interconnected nodes (neurons) that process information. Each connection has a weight associated with it, which represents the strength of the connection. The network learns by adjusting these weights to improve its performance.\n", 359 | " * **Rule-Based Systems:** These systems use a set of pre-defined rules to make decisions. For example, a simple spam filter might have a rule that says \"If an email contains the word 'viagra' and comes from an unknown sender, mark it as spam.\" While simple to understand, they are often inflexible and require constant manual updates.\n", 360 | " * **Genetic Algorithms:** Inspired by natural selection, these algorithms use a population of candidate solutions and iteratively improve them through processes like mutation and crossover. Useful for optimization problems.\n", 361 | "\n", 362 | "3. **Training:** This is the process of feeding the AI algorithm data and allowing it to learn the underlying patterns. During training, the algorithm adjusts its internal parameters (e.g., weights in a neural network) to minimize errors and improve its accuracy. The training process can be very computationally intensive and time-consuming.\n", 363 | "\n", 364 | "4. **Inference (or Prediction):** Once the AI is trained, it can be used to make predictions or decisions on new, unseen data. This is called inference. For example, a trained image recognition AI can be used to identify objects in new photos.\n", 365 | "\n", 366 | "**A Simplified Example: Email Spam Filter**\n", 367 | "\n", 368 | "Let's say we want to build a spam filter using machine learning.\n", 369 | "\n", 370 | "1. **Data:** We collect a large dataset of emails, labeled as either \"spam\" or \"not spam\" (ham). Each email is broken down into features, like the words it contains, the sender's address, and the presence of links.\n", 371 | "2. **Algorithm:** We choose a supervised learning algorithm like a Naive Bayes classifier. This algorithm calculates the probability of an email being spam based on the presence of certain words. For example, it might learn that emails containing the word \"viagra\" have a high probability of being spam.\n", 372 | "3. **Training:** We feed the labeled emails to the algorithm. The algorithm learns the relationship between the features and the spam/not spam label. It adjusts its internal parameters to improve its accuracy in classifying emails.\n", 373 | "4. **Inference:** When a new email arrives, the algorithm analyzes its features (words, sender, etc.) and uses the knowledge it gained during training to predict whether the email is spam or not. It then takes action, such as moving the email to the spam folder.\n", 374 | "\n", 375 | "**Different Types of AI:**\n", 376 | "\n", 377 | "* **Narrow or Weak AI:** Designed for a specific task. Examples: spam filters, recommendation systems, voice assistants like Siri or Alexa, and self-driving cars. This is the type of AI we primarily use today.\n", 378 | "* **General or Strong AI (AGI):** Hypothetical AI with human-level intelligence, capable of performing any intellectual task that a human being can. This does not currently exist.\n", 379 | "* **Super AI:** Hypothetical AI that surpasses human intelligence in all aspects. This is purely theoretical and often seen in science fiction.\n", 380 | "\n", 381 | "**Important Considerations:**\n", 382 | "\n", 383 | "* **Bias:** AI algorithms can inherit biases from the data they are trained on. This can lead to unfair or discriminatory outcomes. For example, if an AI is trained on a dataset of faces that is predominantly white, it may perform poorly on faces of other ethnicities.\n", 384 | "* **Explainability:** Some AI algorithms, like deep neural networks, are very complex and difficult to understand. This can make it hard to explain why they made a particular decision, which can be a problem in situations where transparency is important (e.g., loan applications).\n", 385 | "* **Ethics:** AI raises a number of ethical concerns, such as job displacement, privacy, and the potential for misuse.\n", 386 | "\n", 387 | "**In summary:** AI works by using algorithms to analyze data, identify patterns, and make predictions or decisions. The type of AI used depends on the specific task and the available data. While AI has made significant progress in recent years, it is still a rapidly evolving field with many challenges and opportunities.\n", 388 | "\n", 389 | "This explanation provides a good foundation. If you want to delve deeper into specific areas like neural networks, reinforcement learning, or natural language processing, let me know, and I can provide more details.\n" 390 | ] 391 | } 392 | ], 393 | "source": [ 394 | "response = client.models.generate_content_stream(\n", 395 | " model=MODEL,\n", 396 | " contents=[\"Explain how AI works\"]\n", 397 | ")\n", 398 | "\n", 399 | "for chunk in response:\n", 400 | " print(chunk.text, end=\"\")" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": { 406 | "id": "LZjfCkzSdcEc" 407 | }, 408 | "source": [ 409 | "#### Chat\n", 410 | "\n", 411 | "The SDK chat class provides an interface to keep track of conversation history. Behind the scenes it uses the same `generate_content` method." 412 | ] 413 | }, 414 | { 415 | "cell_type": "code", 416 | "execution_count": 9, 417 | "metadata": { 418 | "colab": { 419 | "base_uri": "https://localhost:8080/" 420 | }, 421 | "id": "BCI8O9Ldjn6q", 422 | "outputId": "ff023acd-e7db-4cdf-b4e8-e24271525f9b" 423 | }, 424 | "outputs": [ 425 | { 426 | "name": "stdout", 427 | "output_type": "stream", 428 | "text": [ 429 | "Okay, that's nice to know! Tell me more about your dogs. What are their names and breeds? Do they get along well? I'm happy to chat about them!\n", 430 | "\n" 431 | ] 432 | } 433 | ], 434 | "source": [ 435 | "chat = client.chats.create(model=MODEL)\n", 436 | "\n", 437 | "response = chat.send_message(\"I have 2 dogs in my house.\")\n", 438 | "print(response.text)" 439 | ] 440 | }, 441 | { 442 | "cell_type": "code", 443 | "execution_count": 10, 444 | "metadata": { 445 | "colab": { 446 | "base_uri": "https://localhost:8080/" 447 | }, 448 | "id": "mmfMuI44Kev2", 449 | "outputId": "3125acf8-1483-42f3-ee19-1f855adb83dd" 450 | }, 451 | "outputs": [ 452 | { 453 | "name": "stdout", 454 | "output_type": "stream", 455 | "text": [ 456 | "Two poodles! That's wonderful! Poodles are such intelligent and elegant dogs. Are they Standard, Miniature, or Toy Poodles? What are their names and personalities like? I'd love to hear more about them.\n", 457 | "\n" 458 | ] 459 | } 460 | ], 461 | "source": [ 462 | "response = chat.send_message(\"I have 2 poodles\")\n", 463 | "print(response.text)" 464 | ] 465 | }, 466 | { 467 | "cell_type": "markdown", 468 | "metadata": { 469 | "id": "E_MkOG6uLs75" 470 | }, 471 | "source": [ 472 | "#### Parameters\n", 473 | "\n", 474 | "Every prompt you send to the model includes parameters that control how the model generates responses. You can configure these parameters, or let the model use the default options." 475 | ] 476 | }, 477 | { 478 | "cell_type": "code", 479 | "execution_count": 11, 480 | "metadata": { 481 | "colab": { 482 | "base_uri": "https://localhost:8080/" 483 | }, 484 | "id": "J_jk93Z-Lum-", 485 | "outputId": "db565ae1-941b-4492-c560-92fd9e79a60c" 486 | }, 487 | "outputs": [ 488 | { 489 | "name": "stdout", 490 | "output_type": "stream", 491 | "text": [ 492 | "Explaining how AI works is like explaining how the human brain works – it's incredibly complex and there are many different approaches and levels of understanding.\n" 493 | ] 494 | } 495 | ], 496 | "source": [ 497 | "response = client.models.generate_content(\n", 498 | " model=MODEL,\n", 499 | " contents=[\"Explain how AI works\"],\n", 500 | " config=types.GenerateContentConfig(\n", 501 | " max_output_tokens=30,\n", 502 | " temperature=1.0,\n", 503 | " top_p=0.95,\n", 504 | " top_k=40,\n", 505 | " stop_sequences=None,\n", 506 | " seed=1234,\n", 507 | " )\n", 508 | ")\n", 509 | "print(response.text)" 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": { 515 | "id": "DPyrJ9ul7yuv" 516 | }, 517 | "source": [ 518 | "- `max_output_tokens`: Sets the maximum number of tokens to include in a candidate.\n", 519 | "- `temperature`: Controls the randomness of the output. Use higher values for more creative responses, and lower values for more deterministic responses. Values can range from [0.0, 2.0].\n", 520 | "- `top_p`: Changes how the model selects tokens for output. Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value.\n", 521 | "- `top_k`: Changes how the model selects tokens for output. A top_k of 1 means the selected token is the most probable among all the tokens in the model's vocabulary, while a top_k of 3 means that the next token is selected from among the 3 most probable using the temperature. Tokens are further filtered based on top_p with the final token selected using temperature sampling.\n", 522 | "- `stop_sequences`: List of strings (up to 5) that tells the model to stop generating text if one of the strings is encountered in the response. If specified, the API will stop at the first appearance of a stop sequence.\n", 523 | "- `seed`: If specified, the model makes a best effort to provide the same response for repeated requests. By default, a random number is used." 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": { 529 | "id": "sG9JgfKF8nvr" 530 | }, 531 | "source": [ 532 | "#### System instructions\n", 533 | "\n", 534 | "System instructions let you steer the behavior of a model based on your specific use case. When you provide system instructions, you give the model additional context to help it understand the task and generate more customized responses. The model should adhere to the system instructions over the full interaction with the user, enabling you to specify product-level behavior separate from the prompts provided by end users." 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "execution_count": 12, 540 | "metadata": { 541 | "colab": { 542 | "base_uri": "https://localhost:8080/" 543 | }, 544 | "id": "CayVOonC8st5", 545 | "outputId": "a96fb99c-b8f7-4e41-a82d-127b8a60279e" 546 | }, 547 | "outputs": [ 548 | { 549 | "name": "stdout", 550 | "output_type": "stream", 551 | "text": [ 552 | "Ah, a greeting! And a pleasant one at that. To what do I owe the pleasure of this conversation? Come, sit, sit. Would you care for a lemon drop? They are quite delightful and I find they often help to illuminate a difficult subject. Now, tell me, what troubles you, or perhaps, what wonder brings you here? Don't be shy, speak freely. My ears, and indeed, my wisdom, are at your disposal.\n", 553 | "\n" 554 | ] 555 | } 556 | ], 557 | "source": [ 558 | "response = client.models.generate_content(\n", 559 | " model=MODEL,\n", 560 | " config=types.GenerateContentConfig(\n", 561 | " system_instruction=\"You are a Dumbledore.\"),\n", 562 | " contents=\"Hello there\"\n", 563 | ")\n", 564 | "\n", 565 | "print(response.text)" 566 | ] 567 | }, 568 | { 569 | "cell_type": "markdown", 570 | "metadata": { 571 | "id": "kjdRzLbN-ANo" 572 | }, 573 | "source": [ 574 | "#### Long context and token counting\n", 575 | "\n", 576 | "Gemini 2.0 Flash and 2.5 Pro have a 1M token context window.\n", 577 | "\n", 578 | "In practice, 1 million tokens could look like:\n", 579 | "\n", 580 | "- 50,000 lines of code (with the standard 80 characters per line)\n", 581 | "- All the text messages you have sent in the last 5 years\n", 582 | "- 8 average length English novels\n", 583 | "- 1 hour of video data\n", 584 | "\n", 585 | "Let's feed in an entire book and ask questions:\n", 586 | "\n" 587 | ] 588 | }, 589 | { 590 | "cell_type": "code", 591 | "execution_count": 13, 592 | "metadata": { 593 | "id": "b6pGhOkj-CFS" 594 | }, 595 | "outputs": [], 596 | "source": [ 597 | "import requests\n", 598 | "res = requests.get(\"https://gutenberg.org/cache/epub/16317/pg16317.txt\")\n", 599 | "book = res.text" 600 | ] 601 | }, 602 | { 603 | "cell_type": "code", 604 | "execution_count": 14, 605 | "metadata": { 606 | "colab": { 607 | "base_uri": "https://localhost:8080/" 608 | }, 609 | "id": "C0nnKaKC-NMu", 610 | "outputId": "bf3fb659-2b0c-4cbc-8c5e-f295f7fa7998" 611 | }, 612 | "outputs": [ 613 | { 614 | "name": "stdout", 615 | "output_type": "stream", 616 | "text": [ 617 | "The Project Gutenberg eBook of The Art of Public Speaking\r\n", 618 | " \r\n", 619 | "This ebook is for the use of anyon\n" 620 | ] 621 | } 622 | ], 623 | "source": [ 624 | "print(book[:100])" 625 | ] 626 | }, 627 | { 628 | "cell_type": "code", 629 | "execution_count": 15, 630 | "metadata": { 631 | "colab": { 632 | "base_uri": "https://localhost:8080/" 633 | }, 634 | "id": "Ves9N2m-_k-V", 635 | "outputId": "fd365b44-732b-40c5-b5c1-557424adac03" 636 | }, 637 | "outputs": [ 638 | { 639 | "name": "stdout", 640 | "output_type": "stream", 641 | "text": [ 642 | "# charakters 979714\n", 643 | "# words 162461\n", 644 | "# tokens: ~216614\n" 645 | ] 646 | } 647 | ], 648 | "source": [ 649 | "print(f\"# charakters {len(book)}\")\n", 650 | "print(f\"# words {len(book.split())}\")\n", 651 | "print(f\"# tokens: ~{int(len(book.split()) * 4/3)}\") # rule of thumb: 100tokens=75words" 652 | ] 653 | }, 654 | { 655 | "cell_type": "code", 656 | "execution_count": 16, 657 | "metadata": { 658 | "colab": { 659 | "base_uri": "https://localhost:8080/" 660 | }, 661 | "id": "6hmtD77wMXdF", 662 | "outputId": "2c654f1f-fd80-4c7a-9c67-5129f0343477" 663 | }, 664 | "outputs": [ 665 | { 666 | "name": "stdout", 667 | "output_type": "stream", 668 | "text": [ 669 | "\"The Art of Public Speaking,\" authored by J. Berg Esenwein and Dale Carnegie, is a guide on developing effective public speaking skills. It emphasizes that public speaking is about expressing oneself authentically and meaningfully, rather than simply following rigid rules or imitating others. The book advocates for self-development and the cultivation of a strong will to master one's thoughts, feelings, and physical abilities. It also highlights the importance of avoiding monotony, using emphasis strategically, varying pitch and pace, employing pauses for impact, and mastering inflection to convey meaning and emotion. The book covers practical aspects like voice control, distinctness of utterance, and the power of gesture. It also delves into the art of influencing audiences through exposition, description, narration, suggestion, argument, and persuasion. Ultimately, \"The Art of Public Speaking\" aims to help individuals become confident, engaging, and impactful communicators.\n", 670 | "\n" 671 | ] 672 | } 673 | ], 674 | "source": [ 675 | "prompt = f\"\"\"Summarize the book.\n", 676 | "\n", 677 | "Book:\n", 678 | "{book}\n", 679 | "\"\"\"\n", 680 | "\n", 681 | "response = client.models.generate_content(\n", 682 | " model=MODEL,\n", 683 | " contents=prompt\n", 684 | ")\n", 685 | "\n", 686 | "print(response.text)" 687 | ] 688 | }, 689 | { 690 | "cell_type": "markdown", 691 | "metadata": { 692 | "id": "jt9NUCaexPqy" 693 | }, 694 | "source": [ 695 | "To understand the token usage, you can check `usage_metadata`:" 696 | ] 697 | }, 698 | { 699 | "cell_type": "code", 700 | "execution_count": 17, 701 | "metadata": { 702 | "colab": { 703 | "base_uri": "https://localhost:8080/" 704 | }, 705 | "id": "6LAoNQ3Ys-CB", 706 | "outputId": "72a2b614-3b88-4e31-acbc-541e40dbe720" 707 | }, 708 | "outputs": [ 709 | { 710 | "name": "stdout", 711 | "output_type": "stream", 712 | "text": [ 713 | "182\n", 714 | "243973\n", 715 | "244155\n" 716 | ] 717 | } 718 | ], 719 | "source": [ 720 | "print(response.usage_metadata.candidates_token_count) # output\n", 721 | "print(response.usage_metadata.prompt_token_count) # input\n", 722 | "print(response.usage_metadata.total_token_count) # total" 723 | ] 724 | }, 725 | { 726 | "cell_type": "markdown", 727 | "metadata": { 728 | "id": "9jzrjfNDxUhZ" 729 | }, 730 | "source": [ 731 | "You can also use `count_tokens` to check the size of your input prompt(s):" 732 | ] 733 | }, 734 | { 735 | "cell_type": "code", 736 | "execution_count": 18, 737 | "metadata": { 738 | "colab": { 739 | "base_uri": "https://localhost:8080/" 740 | }, 741 | "id": "EIrVpB-Htc3y", 742 | "outputId": "4420023c-b885-4362-a488-d851fde716e7" 743 | }, 744 | "outputs": [ 745 | { 746 | "name": "stdout", 747 | "output_type": "stream", 748 | "text": [ 749 | "total_tokens=250549 cached_content_token_count=None\n" 750 | ] 751 | } 752 | ], 753 | "source": [ 754 | "res = client.models.count_tokens(model=MODEL, contents=prompt)\n", 755 | "print(res)" 756 | ] 757 | }, 758 | { 759 | "cell_type": "markdown", 760 | "metadata": { 761 | "id": "pE7MEKBI18K0" 762 | }, 763 | "source": [ 764 | "## !! Exercise: Chat with a book !!\n", 765 | "\n", 766 | "Task:\n", 767 | "- Create a chat\n", 768 | "- Use a system prompt: `\"You are an expert book reviewer with a witty tone.\"`\n", 769 | "- Use a temperature of `1.5`\n", 770 | "- Ask 1 to summarize the book\n", 771 | "- Ask 1 question to explain more detail about a certain topic from the book\n", 772 | "- Ask to create a social media post based on the book\n", 773 | "- Print the total number of tokens used during the chat" 774 | ] 775 | }, 776 | { 777 | "cell_type": "code", 778 | "execution_count": 21, 779 | "metadata": { 780 | "colab": { 781 | "base_uri": "https://localhost:8080/" 782 | }, 783 | "id": "sKL0JNbCzY0P", 784 | "outputId": "d50dc228-9a2f-4c6d-9472-3ce381c783b5" 785 | }, 786 | "outputs": [ 787 | { 788 | "name": "stdout", 789 | "output_type": "stream", 790 | "text": [ 791 | "Okay, here's a witty summary of \"The Art of Public Speaking\" in 10 bullet points:\n", 792 | "\n", 793 | "* **Fear No More, Speech Goblins!** Authors J. Berg Esenwein and Dale Carnegie provide a vintage (1915!) cure for stage fright – facing the music (or audience) head-on is key! Reading about bravery doesn't make you a hero.\n", 794 | "* **Ditch the Drone, Dude!** The book argues monotony is a cardinal sin and demonstrates a failure of artistic application! Break free from vocal monotony, or your speech might as well be elevator music.\n", 795 | "* **Not All Words Were Created Equal:** \"The Art of Public Speaking\" preaches strategic emphasis, making mountain-peak words stand out while the grammatical pebbles stay properly subordinated. Think \"Dog bites Man,\" versus, \"Man Bites Dog.\"\n", 796 | "* **Ride the Pitch Pony:** Vary your vocal altitude or your audience will descend into a snooze. Apparently, keeping your vocal cords on autopilot is as thrilling as watching paint dry.\n", 797 | "* **Tempo Tango:** Learn to quickstep or waltz, not just shuffle! Variation in pace adds dynamism and a conversational vibe, which helps avoid listeners thinking your speech is coming from an antiquated wax cylinder.\n", 798 | "* **Silence is Golden:** Like a well-placed dramatic pause, or, even more wisely, *after* the point, pausing is the power tool no speaker can live without! If even the Creator needed to pause, you need one too.\n", 799 | "* **Inflection is your Soul Shining Through.** It adds an authenticity that the author feels your audience *needs* to experience. You can read dry lines... or imbue life and character into your talking points.\n", 800 | "* **Concentrate, Grasshopper:** Train your brain to ride the line instead of wandering around like a tumbleweed! Avoid starting well and fading off into the sunset -- a speaker needs to deliver sentences just as strongly as they began\n", 801 | "* **Feeling is more than a \"Hallmark\" Card:** Cultivate sincere feeling toward your material (instead of feigning enthusiasm) will draw you near to authority! Force creates connection where none exists otherwise. It's up to you how this influence changes your life.\n", 802 | "* **Become your own \"Speaker's Digest,\" by expanding and perfecting all the lessons here...** In short? Read, analyze, observe, and find your groove! As a well-loved book, this collection is there for you, to use as your starting-off point for a wonderful new form of personal expression.\n" 803 | ] 804 | } 805 | ], 806 | "source": [ 807 | "chat = client.chats.create(\n", 808 | " model=MODEL,\n", 809 | " config=types.GenerateContentConfig(\n", 810 | " system_instruction=\"You are an expert book reviewer with a witty tone.\",\n", 811 | " temperature=1.5\n", 812 | " )\n", 813 | ")\n", 814 | "\n", 815 | "prompt = f\"\"\"Summarize the book in 10 bullet points.\n", 816 | "\n", 817 | "Book:\n", 818 | "{book}\n", 819 | "\"\"\"\n", 820 | "\n", 821 | "response = chat.send_message(prompt)\n", 822 | "print(response.text)" 823 | ] 824 | }, 825 | { 826 | "cell_type": "code", 827 | "execution_count": 22, 828 | "metadata": { 829 | "colab": { 830 | "base_uri": "https://localhost:8080/" 831 | }, 832 | "id": "q84VP46KW0Vk", 833 | "outputId": "87e3b936-4959-433b-98c5-7f45de66ba78" 834 | }, 835 | "outputs": [ 836 | { 837 | "name": "stdout", 838 | "output_type": "stream", 839 | "text": [ 840 | "Alright, buckle up, because we're about to delve into the thrilling world of speech delivery methods – it's less about reciting, and more about orchestrating an engaging performance with your voice and your presence!\n", 841 | " \n", 842 | "\n", 843 | "1. **Reading from Manuscript: The Anchor Approach (and Its Perils)**\n", 844 | "\n", 845 | " * **What it is:** This involves writing out your speech word-for-word and reading directly from that manuscript.\n", 846 | " * **Pros:** Perfect for highly technical or legally sensitive situations where every word *absolutely* counts. Ensures precision and no accidental misspeakings.\n", 847 | " * **Cons:** *The dreaded monotone strikes!* Hard to sound natural or engaging if glued to the page. Often lacks eye contact, reduces connection with the audience, and tends toward a stiff, formal tone, so it will seem to go on forever!\n", 848 | " * **Witty Analogy:** It's like having a safety net so robust, you're afraid to actually try any daring acrobatics. Good as insurance, but terrible as a performance strategy.\n", 849 | " * **Tip for Success (if you *must* read):** Practice reading aloud extensively beforehand, mark up your manuscript for pauses, emphases, and inflections, and above all, *know* the material intimately even though you're reading it. Imagine giving them the story - instead of reading words on a page!\n", 850 | " \n", 851 | "\n", 852 | "2. **Memorized Speech: The Robotic Recital**\n", 853 | "\n", 854 | " * **What it is:** Writing the entire speech and then committing it to memory verbatim.\n", 855 | " * **Pros:** Allows precise control of language, structure, and timing. Gives the impression that everything's gone smoothly from start to end, without any hiccups of \"umm\" or \"ahh\" inserted.\n", 856 | " * **Cons:** *Major risk of sounding stiff and unnatural*. Prone to memory slips, awkward pauses, or that dreaded moment when you're \"off book\" -- you are basically a CD that has been put on Pause. Requires diligent practice and mastery of content that are prone to failure when unexpected surprises are put upon it. The most popular option to avoid unless you love to sing in the shower.\n", 857 | " * **Witty Analogy:** Like trying to be a jazz musician who's only allowed to play sheet music. Technical proficiency? Maybe. Soul? Dubious.\n", 858 | " * **Tip for Success (if you *dare*):** Know the material backward and forward! Focus on conveying the meaning and emotion behind the words rather than just reciting them, as every stage-actor who doesn't phone it in does! Try memorizing in \"chunks\" of meaning rather than pure rote.\n", 859 | " \n", 860 | "\n", 861 | "3. **Speaking from Notes: The Balanced Act**\n", 862 | "\n", 863 | " * **What it is:** Using a structured outline or brief containing key points and supporting information to guide your delivery.\n", 864 | " * **Pros:** Offers flexibility to adapt to the audience while maintaining structure and covering key arguments. Promotes a more conversational style while helping keep you on track.\n", 865 | " * **Cons:** Can become overly reliant on notes and risk losing eye contact. If not well-organized, can lead to rambling or forgetting crucial information. Your handwriting had better be superb\n", 866 | " * **Witty Analogy:** It's like navigating a familiar city with a map – you have a sense of where you're going but can still explore side streets and interact with the locals.\n", 867 | " * **Tip for Success:** Condense notes to the bare minimum! Use keywords, phrases, and visual cues to trigger your memory. Practice using the notes as a springboard, not a crutch. Remember that you prepared what you meant to do!\n", 868 | " \n", 869 | "\n", 870 | "4. **Extemporaneous Speaking: The Seat-of-Your-Pants Performance**\n", 871 | "\n", 872 | " * **What it is:** Delivering a speech with minimal preparation and no written text or notes, relying on a strong knowledge of the subject matter and improvisational skills. This often sounds less prepared for its audience.\n", 873 | " * **Pros:** Authentic, engaging, and highly responsive to the audience's energy and feedback. Gives you the best opportunity to connect with individuals while keeping sight on the task before you.\n", 874 | " * **Cons:** Requires extensive knowledge and public speaking experience. High risk of rambling, omitting important points, or getting lost in a tangent that leads nowhere. Not recommended for people trying to build trust.\n", 875 | " * **Witty Analogy:** Like performing stand-up comedy - hilarious when it lands, but absolutely disastrous if your material bombs or your timing is off. It would also make sense, as this is how you get that material going: doing it, and getting responses to do it!\n", 876 | " * **Tip for Success:** Master the subject matter *inside and out.* Practice thinking on your feet and organizing thoughts rapidly. Build a mental \"toolbox\" of anecdotes, examples, and counter-arguments. If this approach turns out better for you, that says all the great that is necessary. And do some homework before going!\n", 877 | "\n", 878 | "Hope this expanded guide helps you decide which method to pick. Break a leg! Or break a silence -- your call!\n", 879 | "\n" 880 | ] 881 | } 882 | ], 883 | "source": [ 884 | "response = chat.send_message(\"Explain the various methods of speech delivery in more detail\")\n", 885 | "print(response.text)\n", 886 | "# response = chat.send_message(\"Create a linkedin post with 1 or 2 key insighs from the book. Keep the tone casual and make it inspirational\")\n", 887 | "# print(response.text)" 888 | ] 889 | }, 890 | { 891 | "cell_type": "code", 892 | "execution_count": 23, 893 | "metadata": { 894 | "colab": { 895 | "base_uri": "https://localhost:8080/" 896 | }, 897 | "id": "RXAqKzbGW_FK", 898 | "outputId": "bcb1dbcc-b87e-46d7-f72a-08478e6fb408" 899 | }, 900 | "outputs": [ 901 | { 902 | "name": "stdout", 903 | "output_type": "stream", 904 | "text": [ 905 | "245615\n" 906 | ] 907 | } 908 | ], 909 | "source": [ 910 | "print(response.usage_metadata.total_token_count)" 911 | ] 912 | }, 913 | { 914 | "cell_type": "markdown", 915 | "metadata": { 916 | "id": "muzBsZi5Fmgs" 917 | }, 918 | "source": [ 919 | "## Recap & Next steps\n", 920 | "\n", 921 | "Nice work! You learned\n", 922 | "- Python SDK quickstart\n", 923 | "- Text prompting\n", 924 | "- Streaming and chats\n", 925 | "- System prompts and config options\n", 926 | "- Long context and token counting\n", 927 | "\n", 928 | "\n", 929 | "More helpful resources:\n", 930 | "- [API docs quickstart](https://ai.google.dev/gemini-api/docs/quickstart?lang=python)\n", 931 | "- [Text generation docs](https://ai.google.dev/gemini-api/docs/text-generation)\n", 932 | "- [Long context docs](https://ai.google.dev/gemini-api/docs/long-context)\n", 933 | "\n", 934 | "Next steps:\n", 935 | "- [Part 2: Multimodal understanding (image, video, audio, docs, code)](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-2-multimodal-understanding.ipynb)" 936 | ] 937 | }, 938 | { 939 | "cell_type": "markdown", 940 | "metadata": { 941 | "id": "Y_FXjhH-VRl9" 942 | }, 943 | "source": [] 944 | } 945 | ], 946 | "metadata": { 947 | "colab": { 948 | "provenance": [] 949 | }, 950 | "kernelspec": { 951 | "display_name": "venv", 952 | "language": "python", 953 | "name": "python3" 954 | }, 955 | "language_info": { 956 | "codemirror_mode": { 957 | "name": "ipython", 958 | "version": 3 959 | }, 960 | "file_extension": ".py", 961 | "mimetype": "text/x-python", 962 | "name": "python", 963 | "nbconvert_exporter": "python", 964 | "pygments_lexer": "ipython3", 965 | "version": "3.13.2" 966 | } 967 | }, 968 | "nbformat": 4, 969 | "nbformat_minor": 0 970 | } 971 | -------------------------------------------------------------------------------- /solutions/solution-part-3-thinking-and-tools.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "nrD2kWanydfP" 7 | }, 8 | "source": [ 9 | "##### Copyright 2025 Patrick Loeber, Google LLC" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": { 16 | "cellView": "form", 17 | "id": "wrgUJetgydfR" 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "\n", 22 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 23 | "# you may not use this file except in compliance with the License.\n", 24 | "# You may obtain a copy of the License at\n", 25 | "#\n", 26 | "# https://www.apache.org/licenses/LICENSE-2.0\n", 27 | "#\n", 28 | "# Unless required by applicable law or agreed to in writing, software\n", 29 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 30 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 31 | "# See the License for the specific language governing permissions and\n", 32 | "# limitations under the License." 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": { 38 | "id": "si1uWsxtj0W6" 39 | }, 40 | "source": [ 41 | "# Workshop: Build with Gemini (Part 3)\n", 42 | "\n", 43 | "\"Open\n", 44 | "\n", 45 | "This workshop teaches how to build with Gemini using the Gemini API and Python SDK.\n", 46 | "\n", 47 | "Course outline:\n", 48 | "\n", 49 | "- **[Part1: Quickstart + Text prompting](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-1-text-prompting.ipynb)**\n", 50 | "\n", 51 | "- **[Part 2: Multimodal understanding (image, video, audio, docs, code)](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/notebooks/part-2-multimodal-understanding.ipynb)**\n", 52 | "\n", 53 | "- **Part 3 (this notebook): Thinking models + agentic capabilities (tool usage)**\n", 54 | " - Thinking models\n", 55 | " - Structured outputps\n", 56 | " - Code execution\n", 57 | " - Grounding with Google Search\n", 58 | " - Function calling\n", 59 | " - Final excercise: Give Gemini access to the PokéAPI to answer Pokémon questions\n", 60 | "\n", 61 | "## 0. Use the Google AI Studio as playground\n", 62 | "\n", 63 | "Explore and play with all models in the [Google AI Studio](https://aistudio.google.com/apikey).\n", 64 | "\n", 65 | "## 1. Setup\n", 66 | "\n", 67 | "Get a free API key in the [Google AI Studio](https://aistudio.google.com/apikey) and set up the [Google Gen AI Python SDK](https://github.com/googleapis/python-genai)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 1, 73 | "metadata": { 74 | "colab": { 75 | "base_uri": "https://localhost:8080/" 76 | }, 77 | "id": "7SzjZdf7mwD_", 78 | "outputId": "21c88fc8-2f89-40c0-8d82-0e4f2769d2c7" 79 | }, 80 | "outputs": [ 81 | { 82 | "name": "stdout", 83 | "output_type": "stream", 84 | "text": [ 85 | "\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/159.7 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[90m╺\u001b[0m\u001b[90m━\u001b[0m \u001b[32m153.6/159.7 kB\u001b[0m \u001b[31m7.9 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[90m╺\u001b[0m\u001b[90m━\u001b[0m \u001b[32m153.6/159.7 kB\u001b[0m \u001b[31m7.9 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m159.7/159.7 kB\u001b[0m \u001b[31m1.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 86 | "\u001b[?25h" 87 | ] 88 | } 89 | ], 90 | "source": [ 91 | "%pip install -U -q google-genai" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": 2, 97 | "metadata": { 98 | "id": "BF3gXZyFm3Pf" 99 | }, 100 | "outputs": [], 101 | "source": [ 102 | "from google.colab import userdata\n", 103 | "\n", 104 | "GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 3, 110 | "metadata": { 111 | "id": "0lajO_7dnFya" 112 | }, 113 | "outputs": [], 114 | "source": [ 115 | "from google import genai\n", 116 | "from google.genai import types\n", 117 | "\n", 118 | "client = genai.Client(api_key=GOOGLE_API_KEY)" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": { 124 | "id": "drS_GiIih9kw" 125 | }, 126 | "source": [ 127 | "## Thinking models\n", 128 | "\n", 129 | "Starting with Gemini 2.5, all models have thinking capabilities. These models use an internal \"thinking process\" during response generation. This process contributes to their improved reasoning capabilities and allows them to solve complex tasks, particularly complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents.\n", 130 | "\n", 131 | "Thinking models are also great at working with tools to perform actions beyond generating text. This allows them to interact with external systems, execute code, or access real-time information, incorporating the results into their reasoning and final response.\n", 132 | "\n", 133 | "(Note: Tools are also available with Gemini 2.0 models)" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": null, 139 | "metadata": { 140 | "id": "uqCNX_91q2YN" 141 | }, 142 | "outputs": [], 143 | "source": [ 144 | "# 2.5 Pro\n", 145 | "MODEL = \"gemini-2.5-pro-exp-03-25\" # with paid tier: gemini-2.5-pro-preview-03-25\n", 146 | "\n", 147 | "# 2.5 Flash\n", 148 | "# MODEL = \"gemini-2.5-flash-preview-04-17\"" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "Note that 2.5 Pro has a 5 RPM [rate limit on the free tier](https://ai.google.dev/gemini-api/docs/rate-limits#current-rate-limits). If you run into errors, wait a minute before sending the next request." 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 6, 161 | "metadata": { 162 | "colab": { 163 | "base_uri": "https://localhost:8080/" 164 | }, 165 | "id": "6iRjV4L-sMxp", 166 | "outputId": "e8d17a05-7371-4e89-996f-a89a6c6367bd" 167 | }, 168 | "outputs": [ 169 | { 170 | "name": "stdout", 171 | "output_type": "stream", 172 | "text": [ 173 | "It still takes **5 minutes** to boil three eggs.\n", 174 | "\n", 175 | "You can boil them all at the same time in the same pot of water. The cooking time for each egg doesn't change based on how many others are in the pot (as long as the pot is big enough and the water stays boiling).\n" 176 | ] 177 | } 178 | ], 179 | "source": [ 180 | "response = client.models.generate_content(\n", 181 | " model=MODEL,\n", 182 | " contents=\"If it takes 5 minutes to boil one egg, how long does it take to boil three eggs?\"\n", 183 | ")\n", 184 | "\n", 185 | "print(response.text)" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": { 191 | "id": "lCenTU9B0RiC" 192 | }, 193 | "source": [ 194 | "## **!! Exercise !!** ##\n", 195 | "\n", 196 | "- Go to [Google AI Studio](https://ai.dev/?model=gemini-2.5-pro-preview-03-25), use Gemini 2.5 Pro, give it a complex task, and pbserve the thinking process. For example, create a p5js game in one shot:\n", 197 | "\n", 198 | "```\n", 199 | "Make a p5js soccer game simulation. There should be 2 teams and each player on the team should have their path traveled displayed. Add live stats on the right side and score in the top bar. no HTML\n", 200 | "```" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": { 206 | "id": "P-qkPEabTURX" 207 | }, 208 | "source": [ 209 | "## Structured output\n", 210 | "\n", 211 | "Gemini generates unstructured text by default, but some applications require structured text. For these use cases, you can constrain Gemini to respond with JSON, a structured data format suitable for automated processing. You can also constrain the model to respond with one of the options specified in an enum." 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": null, 217 | "metadata": { 218 | "id": "ZsnYkEF2Tcm8" 219 | }, 220 | "outputs": [], 221 | "source": [ 222 | "from pydantic import BaseModel\n", 223 | "\n", 224 | "class Recipe(BaseModel):\n", 225 | " recipe_name: str\n", 226 | " ingredients: list[str]\n", 227 | "\n", 228 | "response = client.models.generate_content(\n", 229 | " model=MODEL,\n", 230 | " contents='List a three popular cookie recipes. Be sure to include the amounts of ingredients.',\n", 231 | " config={\n", 232 | " 'response_mime_type': 'application/json',\n", 233 | " 'response_schema': list[Recipe],\n", 234 | " },\n", 235 | ")\n", 236 | "# Use the response as a JSON string.\n", 237 | "print(response.text)\n", 238 | "\n", 239 | "# Use instantiated objects.\n", 240 | "my_recipes: list[Recipe] = response.parsed" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": { 246 | "id": "Qp8wv9jstl96" 247 | }, 248 | "source": [ 249 | "Contrain to enums:" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": 7, 255 | "metadata": { 256 | "colab": { 257 | "base_uri": "https://localhost:8080/" 258 | }, 259 | "id": "jQbvMIvuLUUE", 260 | "outputId": "baa6da78-a46c-40c7-a3ae-967a5d2eac09" 261 | }, 262 | "outputs": [ 263 | { 264 | "name": "stdout", 265 | "output_type": "stream", 266 | "text": [ 267 | "froot\n" 268 | ] 269 | } 270 | ], 271 | "source": [ 272 | "response = client.models.generate_content(\n", 273 | " model=MODEL,\n", 274 | " contents='What type of food is a banana?',\n", 275 | " config={\n", 276 | " 'response_mime_type': 'text/x.enum',\n", 277 | " 'response_schema': {\n", 278 | " \"type\": \"STRING\",\n", 279 | " \"enum\": [\"froot\", \"vegetable\", \"grains\", \"protein foods\", \"dairy\"],\n", 280 | " },\n", 281 | " },\n", 282 | ")\n", 283 | "\n", 284 | "print(response.text)" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": { 290 | "id": "jicpPkAntnud" 291 | }, 292 | "source": [ 293 | "Or use the builtin Python enum class:" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": 8, 299 | "metadata": { 300 | "colab": { 301 | "base_uri": "https://localhost:8080/" 302 | }, 303 | "id": "a0W_6JyZN_ED", 304 | "outputId": "1af704fc-41f6-4ac8-a078-7efe765ca5d3" 305 | }, 306 | "outputs": [ 307 | { 308 | "name": "stdout", 309 | "output_type": "stream", 310 | "text": [ 311 | "dairy\n" 312 | ] 313 | } 314 | ], 315 | "source": [ 316 | "import enum\n", 317 | "\n", 318 | "class FOOD(enum.Enum):\n", 319 | " FROOT = \"froot\"\n", 320 | " VEGETABLE = \"vegetable\"\n", 321 | " GRAINS = \"grains\"\n", 322 | " PROTEIN_FOODS = \"protein foods\"\n", 323 | " DAIRY = \"dairy\"\n", 324 | "\n", 325 | "response = client.models.generate_content(\n", 326 | " model=MODEL,\n", 327 | " contents='What type of food is cheese?',\n", 328 | " config={\n", 329 | " 'response_mime_type': 'text/x.enum',\n", 330 | " 'response_schema': FOOD,\n", 331 | " },\n", 332 | ")\n", 333 | "\n", 334 | "print(response.text)" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": { 340 | "id": "16C3AP4YOVlc" 341 | }, 342 | "source": [ 343 | "## Code execution\n", 344 | "\n", 345 | "The code execution feature enables the model to generate and run Python code and learn iteratively from the results until it arrives at a final output. You can use this code execution capability to build applications that benefit from code-based reasoning and that produce text output. For example, you could use code execution in an application that solves equations or processes text." 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": 10, 351 | "metadata": { 352 | "id": "SUMFt9wqOgVk" 353 | }, 354 | "outputs": [], 355 | "source": [ 356 | "from google.genai import types\n", 357 | "\n", 358 | "# In your prompt, give instruction to use/generate code\n", 359 | "\n", 360 | "response = client.models.generate_content(\n", 361 | " model=MODEL,\n", 362 | " contents='What is the sum of the first 50 prime numbers? '\n", 363 | " 'Generate and run code for the calculation.',\n", 364 | " config=types.GenerateContentConfig(\n", 365 | " tools=[types.Tool(\n", 366 | " code_execution=types.ToolCodeExecution\n", 367 | " )]\n", 368 | " )\n", 369 | ")" 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": 11, 375 | "metadata": { 376 | "colab": { 377 | "base_uri": "https://localhost:8080/" 378 | }, 379 | "id": "0Bgfv66AOjiA", 380 | "outputId": "1abd3a15-94f8-45f7-c843-bae9af7ffda1" 381 | }, 382 | "outputs": [ 383 | { 384 | "data": { 385 | "text/plain": [ 386 | "GenerateContentResponse(candidates=[Candidate(content=Content(parts=[Part(video_metadata=None, thought=None, code_execution_result=None, executable_code=None, file_data=None, function_call=None, function_response=None, inline_data=None, text=\"Okay, I can help with that.\\n\\nHere's the plan:\\n1. Define a function `is_prime(n)` to check if a given number `n` is prime.\\n2. Initialize a counter for primes found and a variable for the sum.\\n3. Iterate through numbers starting from 2.\\n4. If a number is prime, add it to the sum and increment the prime counter.\\n5. Stop when 50 primes have been found.\\n6. Print the final sum.\\n\\nHere is the Python code to perform the calculation:\\n\"), Part(video_metadata=None, thought=None, code_execution_result=None, executable_code=ExecutableCode(code='import math\\n\\ndef is_prime(n):\\n \"\"\"Checks if a number n is prime.\"\"\"\\n if n <= 1:\\n return False\\n if n == 2:\\n return True\\n if n % 2 == 0:\\n return False\\n # Check odd divisors from 3 up to sqrt(n)\\n for i in range(3, int(math.sqrt(n)) + 1, 2):\\n if n % i == 0:\\n return False\\n return True\\n\\ncount = 0\\nnum = 2\\nprime_sum = 0\\n# Store the primes found for verification if needed\\nprimes_found = []\\n\\ntarget_count = 50\\n\\nwhile count < target_count:\\n if is_prime(num):\\n prime_sum += num\\n primes_found.append(num)\\n count += 1\\n num += 1\\n\\n# print(f\"The first {target_count} prime numbers are: {primes_found}\")\\nprint(f\"The sum of the first {target_count} prime numbers is: {prime_sum}\")', language=), file_data=None, function_call=None, function_response=None, inline_data=None, text=None), Part(video_metadata=None, thought=None, code_execution_result=CodeExecutionResult(outcome=, output='The sum of the first 50 prime numbers is: 5117\\n'), executable_code=None, file_data=None, function_call=None, function_response=None, inline_data=None, text=None), Part(video_metadata=None, thought=None, code_execution_result=None, executable_code=None, file_data=None, function_call=None, function_response=None, inline_data=None, text='**Findings:**\\n\\nBased on the executed code:\\nThe sum of the first 50 prime numbers is **5117**.')], role='model'), citation_metadata=None, finish_message=None, token_count=None, finish_reason=, avg_logprobs=None, grounding_metadata=None, index=0, logprobs_result=None, safety_ratings=None)], create_time=None, response_id=None, model_version='gemini-2.5-pro-exp-03-25', prompt_feedback=None, usage_metadata=GenerateContentResponseUsageMetadata(cache_tokens_details=None, cached_content_token_count=None, candidates_token_count=1583, candidates_tokens_details=None, prompt_token_count=22, prompt_tokens_details=[ModalityTokenCount(modality=, token_count=22)], thoughts_token_count=1437, tool_use_prompt_token_count=417, tool_use_prompt_tokens_details=[ModalityTokenCount(modality=, token_count=417)], total_token_count=2022, traffic_type=None), automatic_function_calling_history=[], parsed=None)" 387 | ] 388 | }, 389 | "execution_count": 11, 390 | "metadata": {}, 391 | "output_type": "execute_result" 392 | } 393 | ], 394 | "source": [ 395 | "response" 396 | ] 397 | }, 398 | { 399 | "cell_type": "code", 400 | "execution_count": 12, 401 | "metadata": { 402 | "colab": { 403 | "base_uri": "https://localhost:8080/", 404 | "height": 941 405 | }, 406 | "id": "Yxo7JHSWOspM", 407 | "outputId": "9793fac5-c01a-4f65-dc76-d5536972384d" 408 | }, 409 | "outputs": [ 410 | { 411 | "data": { 412 | "text/markdown": [ 413 | "Okay, I can help with that.\n", 414 | "\n", 415 | "Here's the plan:\n", 416 | "1. Define a function `is_prime(n)` to check if a given number `n` is prime.\n", 417 | "2. Initialize a counter for primes found and a variable for the sum.\n", 418 | "3. Iterate through numbers starting from 2.\n", 419 | "4. If a number is prime, add it to the sum and increment the prime counter.\n", 420 | "5. Stop when 50 primes have been found.\n", 421 | "6. Print the final sum.\n", 422 | "\n", 423 | "Here is the Python code to perform the calculation:\n" 424 | ], 425 | "text/plain": [ 426 | "" 427 | ] 428 | }, 429 | "metadata": {}, 430 | "output_type": "display_data" 431 | }, 432 | { 433 | "data": { 434 | "text/markdown": [ 435 | "---" 436 | ], 437 | "text/plain": [ 438 | "" 439 | ] 440 | }, 441 | "metadata": {}, 442 | "output_type": "display_data" 443 | }, 444 | { 445 | "data": { 446 | "text/html": [ 447 | "
import math\n",
 448 |               "\n",
 449 |               "def is_prime(n):\n",
 450 |               "    \"\"\"Checks if a number n is prime.\"\"\"\n",
 451 |               "    if n <= 1:\n",
 452 |               "        return False\n",
 453 |               "    if n == 2:\n",
 454 |               "        return True\n",
 455 |               "    if n % 2 == 0:\n",
 456 |               "        return False\n",
 457 |               "    # Check odd divisors from 3 up to sqrt(n)\n",
 458 |               "    for i in range(3, int(math.sqrt(n)) + 1, 2):\n",
 459 |               "        if n % i == 0:\n",
 460 |               "            return False\n",
 461 |               "    return True\n",
 462 |               "\n",
 463 |               "count = 0\n",
 464 |               "num = 2\n",
 465 |               "prime_sum = 0\n",
 466 |               "# Store the primes found for verification if needed\n",
 467 |               "primes_found = []\n",
 468 |               "\n",
 469 |               "target_count = 50\n",
 470 |               "\n",
 471 |               "while count < target_count:\n",
 472 |               "    if is_prime(num):\n",
 473 |               "        prime_sum += num\n",
 474 |               "        primes_found.append(num)\n",
 475 |               "        count += 1\n",
 476 |               "    num += 1\n",
 477 |               "\n",
 478 |               "# print(f\"The first {target_count} prime numbers are: {primes_found}\")\n",
 479 |               "print(f\"The sum of the first {target_count} prime numbers is: {prime_sum}\")
" 480 | ], 481 | "text/plain": [ 482 | "" 483 | ] 484 | }, 485 | "metadata": {}, 486 | "output_type": "display_data" 487 | }, 488 | { 489 | "data": { 490 | "text/markdown": [ 491 | "---" 492 | ], 493 | "text/plain": [ 494 | "" 495 | ] 496 | }, 497 | "metadata": {}, 498 | "output_type": "display_data" 499 | }, 500 | { 501 | "data": { 502 | "text/markdown": [ 503 | "#### Output" 504 | ], 505 | "text/plain": [ 506 | "" 507 | ] 508 | }, 509 | "metadata": {}, 510 | "output_type": "display_data" 511 | }, 512 | { 513 | "data": { 514 | "text/markdown": [ 515 | "The sum of the first 50 prime numbers is: 5117\n" 516 | ], 517 | "text/plain": [ 518 | "" 519 | ] 520 | }, 521 | "metadata": {}, 522 | "output_type": "display_data" 523 | }, 524 | { 525 | "data": { 526 | "text/markdown": [ 527 | "---" 528 | ], 529 | "text/plain": [ 530 | "" 531 | ] 532 | }, 533 | "metadata": {}, 534 | "output_type": "display_data" 535 | }, 536 | { 537 | "data": { 538 | "text/markdown": [ 539 | "**Findings:**\n", 540 | "\n", 541 | "Based on the executed code:\n", 542 | "The sum of the first 50 prime numbers is **5117**." 543 | ], 544 | "text/plain": [ 545 | "" 546 | ] 547 | }, 548 | "metadata": {}, 549 | "output_type": "display_data" 550 | }, 551 | { 552 | "data": { 553 | "text/markdown": [ 554 | "---" 555 | ], 556 | "text/plain": [ 557 | "" 558 | ] 559 | }, 560 | "metadata": {}, 561 | "output_type": "display_data" 562 | } 563 | ], 564 | "source": [ 565 | "from IPython.display import Image, Markdown, Code, HTML\n", 566 | "\n", 567 | "def display_code_execution_result(response):\n", 568 | " for part in response.candidates[0].content.parts:\n", 569 | " if part.text is not None:\n", 570 | " display(Markdown(part.text))\n", 571 | " if part.executable_code is not None:\n", 572 | " code_html = f'
{part.executable_code.code}
' # Change code color\n", 573 | " display(HTML(code_html))\n", 574 | " if part.code_execution_result is not None:\n", 575 | " display(Markdown(\"#### Output\"))\n", 576 | " display(Markdown(part.code_execution_result.output))\n", 577 | " if part.inline_data is not None:\n", 578 | " display(Image(data=part.inline_data.data, format=\"png\"))\n", 579 | " display(Markdown(\"---\"))\n", 580 | "\n", 581 | "display_code_execution_result(response)" 582 | ] 583 | }, 584 | { 585 | "cell_type": "markdown", 586 | "metadata": { 587 | "id": "W0m5rasbQsDa" 588 | }, 589 | "source": [ 590 | "## Grounding with Google Search\n", 591 | "\n", 592 | "If Google Search is configured as a tool, Gemini can decide when to use Google Search to improve the accuracy and recency of responses.\n", 593 | "\n", 594 | "Here's a question about a recent event without Google Search:\n", 595 | "\n" 596 | ] 597 | }, 598 | { 599 | "cell_type": "code", 600 | "execution_count": 16, 601 | "metadata": { 602 | "colab": { 603 | "base_uri": "https://localhost:8080/" 604 | }, 605 | "id": "582GKc2DQ-N6", 606 | "outputId": "49dcb0f1-703a-4dc6-ac4d-b6f06d0b21cf" 607 | }, 608 | "outputs": [ 609 | { 610 | "name": "stdout", 611 | "output_type": "stream", 612 | "text": [ 613 | "The Super Bowl in 2025 (Super Bowl LIX) hasn't happened yet!\n", 614 | "\n", 615 | "It is scheduled to be played on **February 9, 2025**, at the Caesars Superdome in New Orleans, Louisiana. It will determine the champion of the 2024 NFL season.\n", 616 | "\n", 617 | "We'll have to wait until then to find out who wins!\n" 618 | ] 619 | } 620 | ], 621 | "source": [ 622 | "response = client.models.generate_content(\n", 623 | " model=MODEL,\n", 624 | " contents=\"Who won the super bowl in 2025?\",\n", 625 | ")\n", 626 | "\n", 627 | "print(response.text)" 628 | ] 629 | }, 630 | { 631 | "cell_type": "code", 632 | "execution_count": 17, 633 | "metadata": { 634 | "id": "SREuxqDSQs1y" 635 | }, 636 | "outputs": [], 637 | "source": [ 638 | "from google.genai.types import Tool, GenerateContentConfig, GoogleSearch\n", 639 | "\n", 640 | "google_search_tool = Tool(\n", 641 | " google_search = GoogleSearch()\n", 642 | ")\n", 643 | "\n", 644 | "response = client.models.generate_content(\n", 645 | " model=MODEL,\n", 646 | " contents=\"Who won the super bowl in 2025?\",\n", 647 | " config=GenerateContentConfig(\n", 648 | " tools=[google_search_tool],\n", 649 | " response_modalities=[\"TEXT\"],\n", 650 | " )\n", 651 | ")" 652 | ] 653 | }, 654 | { 655 | "cell_type": "code", 656 | "execution_count": 18, 657 | "metadata": { 658 | "colab": { 659 | "base_uri": "https://localhost:8080/" 660 | }, 661 | "id": "dnCFGS7nQ9WB", 662 | "outputId": "f5e1f2ab-84fd-4925-9100-40281bd09447" 663 | }, 664 | "outputs": [ 665 | { 666 | "name": "stdout", 667 | "output_type": "stream", 668 | "text": [ 669 | "The **Philadelphia Eagles** won Super Bowl LIX in 2025.\n", 670 | "\n", 671 | "Here are some details about the game:\n", 672 | "\n", 673 | "* **Date:** February 9, 2025\n", 674 | "* **Location:** Caesars Superdome, New Orleans, Louisiana\n", 675 | "* **Matchup:** Philadelphia Eagles (NFC Champion) vs. Kansas City Chiefs (AFC Champion and two-time defending Super Bowl champion)\n", 676 | "* **Final Score:** Philadelphia Eagles 40, Kansas City Chiefs 22\n", 677 | "* **Outcome:** The Eagles secured their second Super Bowl title in franchise history, preventing the Chiefs from achieving an unprecedented three consecutive Super Bowl wins.\n", 678 | "* **MVP:** Eagles quarterback Jalen Hurts was named Super Bowl MVP. He threw for 221 yards and two touchdowns, and rushed for 72 yards and another touchdown.\n" 679 | ] 680 | } 681 | ], 682 | "source": [ 683 | "for part in response.candidates[0].content.parts:\n", 684 | " print(part.text)" 685 | ] 686 | }, 687 | { 688 | "cell_type": "code", 689 | "execution_count": 19, 690 | "metadata": { 691 | "colab": { 692 | "base_uri": "https://localhost:8080/", 693 | "height": 65 694 | }, 695 | "id": "BUgF_qSFQ6KW", 696 | "outputId": "0cfc43da-0b07-47ce-e46e-9e21839aad8c" 697 | }, 698 | "outputs": [ 699 | { 700 | "data": { 701 | "text/html": [ 702 | "\n", 805 | "
\n", 806 | "
\n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | "
\n", 821 | "
\n", 822 | "
\n", 823 | " Super Bowl 59 result\n", 824 | " Who won Super Bowl 2025?\n", 825 | " Super Bowl LIX winner\n", 826 | "
\n", 827 | "
\n" 828 | ], 829 | "text/plain": [ 830 | "" 831 | ] 832 | }, 833 | "execution_count": 19, 834 | "metadata": {}, 835 | "output_type": "execute_result" 836 | } 837 | ], 838 | "source": [ 839 | "# To get grounding metadata as web content.\n", 840 | "HTML(response.candidates[0].grounding_metadata.search_entry_point.rendered_content)" 841 | ] 842 | }, 843 | { 844 | "cell_type": "markdown", 845 | "metadata": { 846 | "id": "tN2AYpE2yqpQ" 847 | }, 848 | "source": [ 849 | "#### **!! Exercise !!**\n", 850 | "\n", 851 | "Use Gemini with Google Search for the current weather and the forecast for the next weekend in Berlin" 852 | ] 853 | }, 854 | { 855 | "cell_type": "code", 856 | "execution_count": 20, 857 | "metadata": { 858 | "colab": { 859 | "base_uri": "https://localhost:8080/" 860 | }, 861 | "id": "vRkMJFA6yoKt", 862 | "outputId": "aae6d370-5040-427c-b9ba-632a0438626e" 863 | }, 864 | "outputs": [ 865 | { 866 | "name": "stdout", 867 | "output_type": "stream", 868 | "text": [ 869 | "Based on the search results, here is the current weather and weekend forecast for Berlin:\n", 870 | "\n", 871 | "**Current Weather in Berlin:**\n", 872 | "\n", 873 | "* The current temperature is around 11-15°C.\n", 874 | "* Conditions are currently cloudy or a mix of sun and clouds. Some sources mention possible showers, but generally dry.\n", 875 | "* The temperature is expected to rise to a high of about 19°C today.\n", 876 | "* The wind is generally weak.\n", 877 | "\n", 878 | "**Weekend Weather Forecast for Berlin:**\n", 879 | "\n", 880 | "* **Saturday:** The weather is expected to be lightly clouded or a mix of sun and clouds. Maximum temperatures are forecasted to be around 13°C to 16°C, with minimums around 3°C to 9°C. There is a low chance of precipitation.\n", 881 | "* **Sunday:** Similar conditions to Saturday are expected, with partly cloudy skies or a mix of sun and clouds. Temperatures might be slightly warmer, with highs around 15°C to 16°C and lows around 4°C to 5°C. There is also a low chance of precipitation.\n", 882 | "\n", 883 | "Please note that weather forecasts can change, especially specific details like cloud cover and exact temperatures. The forecasts cited are based on information available around April 20th-21st for the following weekend (April 26th-27th).\n" 884 | ] 885 | } 886 | ], 887 | "source": [ 888 | "from google.genai.types import Tool, GenerateContentConfig, GoogleSearch\n", 889 | "\n", 890 | "google_search_tool = Tool(\n", 891 | " google_search = GoogleSearch()\n", 892 | ")\n", 893 | "\n", 894 | "response = client.models.generate_content(\n", 895 | " model=MODEL,\n", 896 | " contents=\"Get the current weather in Berlin. Also get the forecast for the weekend\",\n", 897 | " config=GenerateContentConfig(\n", 898 | " tools=[google_search_tool],\n", 899 | " response_modalities=[\"TEXT\"],\n", 900 | " )\n", 901 | ")\n", 902 | "\n", 903 | "for part in response.candidates[0].content.parts:\n", 904 | " print(part.text)" 905 | ] 906 | }, 907 | { 908 | "cell_type": "markdown", 909 | "metadata": { 910 | "id": "aKRcuZE_Rjl-" 911 | }, 912 | "source": [ 913 | "## Function calling\n", 914 | "\n", 915 | "Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model understands when to call specific functions and provides the necessary parameters to execute real-world actions." 916 | ] 917 | }, 918 | { 919 | "cell_type": "code", 920 | "execution_count": 21, 921 | "metadata": { 922 | "id": "iL1FX3euRlQN" 923 | }, 924 | "outputs": [], 925 | "source": [ 926 | "from google.genai import types\n", 927 | "\n", 928 | "# Define the function declaration for the model\n", 929 | "weather_function = {\n", 930 | " \"name\": \"get_current_temperature\",\n", 931 | " \"description\": \"Gets the current temperature for a given location.\",\n", 932 | " \"parameters\": {\n", 933 | " \"type\": \"object\",\n", 934 | " \"properties\": {\n", 935 | " \"location\": {\n", 936 | " \"type\": \"string\",\n", 937 | " \"description\": \"The city name\",\n", 938 | " },\n", 939 | " },\n", 940 | " \"required\": [\"location\"],\n", 941 | " },\n", 942 | "}\n", 943 | "\n", 944 | "# Configure the client and tools\n", 945 | "tools = types.Tool(function_declarations=[weather_function])\n", 946 | "\n", 947 | "# Send request with function declarations\n", 948 | "response = client.models.generate_content(\n", 949 | " model=MODEL,\n", 950 | " contents=\"What's the temperature in London?\",\n", 951 | " config=types.GenerateContentConfig(tools=[tools])\n", 952 | ")" 953 | ] 954 | }, 955 | { 956 | "cell_type": "markdown", 957 | "metadata": { 958 | "id": "ZbWYwJ1G7s3_" 959 | }, 960 | "source": [ 961 | "Check for a function call" 962 | ] 963 | }, 964 | { 965 | "cell_type": "code", 966 | "execution_count": 22, 967 | "metadata": { 968 | "colab": { 969 | "base_uri": "https://localhost:8080/" 970 | }, 971 | "id": "-MvimiB5U30c", 972 | "outputId": "5540a599-6202-4bae-e4ec-0982eebf439c" 973 | }, 974 | "outputs": [ 975 | { 976 | "name": "stdout", 977 | "output_type": "stream", 978 | "text": [ 979 | "Function to call: get_current_temperature\n", 980 | "Arguments: {'location': 'London'}\n" 981 | ] 982 | } 983 | ], 984 | "source": [ 985 | "if response.candidates[0].content.parts[0].function_call:\n", 986 | " function_call = response.candidates[0].content.parts[0].function_call\n", 987 | " print(f\"Function to call: {function_call.name}\")\n", 988 | " print(f\"Arguments: {function_call.args}\")\n", 989 | " # In a real app, you would call your function here:\n", 990 | " # result = get_current_temperature(**function_call.args)\n", 991 | "else:\n", 992 | " print(\"No function call found in the response.\")\n", 993 | " print(response.text)" 994 | ] 995 | }, 996 | { 997 | "cell_type": "markdown", 998 | "metadata": { 999 | "id": "qpV5sW9B0oBg" 1000 | }, 1001 | "source": [ 1002 | "### Automatic Function Calling (Python Only)\n", 1003 | "\n", 1004 | "When using the Python SDK, you can provide Python functions directly as tools.\n", 1005 | "\n", 1006 | "The SDK handles the function call and returns the final text." 1007 | ] 1008 | }, 1009 | { 1010 | "cell_type": "code", 1011 | "execution_count": 23, 1012 | "metadata": { 1013 | "colab": { 1014 | "base_uri": "https://localhost:8080/" 1015 | }, 1016 | "id": "IqL-113f020c", 1017 | "outputId": "bc0d8c7c-8272-404b-ab34-9bc8ad1f712d" 1018 | }, 1019 | "outputs": [ 1020 | { 1021 | "name": "stdout", 1022 | "output_type": "stream", 1023 | "text": [ 1024 | "The current temperature in Boston, USA is 25 degrees Celsius.\n" 1025 | ] 1026 | } 1027 | ], 1028 | "source": [ 1029 | "# Define the function with type hints and docstring\n", 1030 | "def get_current_temperature(location: str) -> dict:\n", 1031 | " \"\"\"Gets the current temperature for a given location.\n", 1032 | "\n", 1033 | " Args:\n", 1034 | " location: The city and country, e.g. San Francisco, USA\n", 1035 | "\n", 1036 | " Returns:\n", 1037 | " A dictionary containing the temperature and unit.\n", 1038 | " \"\"\"\n", 1039 | " # ... (implementation) ...\n", 1040 | " return {\"temperature\": 25, \"unit\": \"Celsius\"}\n", 1041 | "\n", 1042 | "\n", 1043 | "response = client.models.generate_content(\n", 1044 | " model=MODEL,\n", 1045 | " contents=\"What's the temperature in Boston?\",\n", 1046 | " config=types.GenerateContentConfig(\n", 1047 | " tools=[get_current_temperature],\n", 1048 | " # to diable automatic funtion calling, you can set this:\n", 1049 | " # automatic_function_calling=types.AutomaticFunctionCallingConfig(disable=True)\n", 1050 | " )\n", 1051 | ")\n", 1052 | "\n", 1053 | "print(response.text)" 1054 | ] 1055 | }, 1056 | { 1057 | "cell_type": "markdown", 1058 | "metadata": { 1059 | "id": "okP9B1cJ7yKi" 1060 | }, 1061 | "source": [ 1062 | "Check the function calling history:" 1063 | ] 1064 | }, 1065 | { 1066 | "cell_type": "code", 1067 | "execution_count": 24, 1068 | "metadata": { 1069 | "colab": { 1070 | "base_uri": "https://localhost:8080/" 1071 | }, 1072 | "id": "gOQwksiK7z1X", 1073 | "outputId": "1df3edab-a3e8-4248-f5de-8807ae2e124f" 1074 | }, 1075 | "outputs": [ 1076 | { 1077 | "name": "stdout", 1078 | "output_type": "stream", 1079 | "text": [ 1080 | "id=None args={'location': 'Boston, USA'} name='get_current_temperature'\n" 1081 | ] 1082 | } 1083 | ], 1084 | "source": [ 1085 | "for content in response.automatic_function_calling_history:\n", 1086 | " for part in content.parts:\n", 1087 | " if part.function_call:\n", 1088 | " print(part.function_call)" 1089 | ] 1090 | }, 1091 | { 1092 | "cell_type": "markdown", 1093 | "metadata": { 1094 | "id": "ZfnxbheQ5GsO" 1095 | }, 1096 | "source": [ 1097 | "## Exercise: Get Pokémon stats\n", 1098 | "\n", 1099 | "- Define a function that can work with the PokéAPI and get Pokémon stats.\n", 1100 | "- Endpoint to use: `GET https://pokeapi.co/api/v2/pokemon/`\n", 1101 | "- Call Gemini and give it access to the function, then answer questions like: `\"What stats does the Pokemon Squirtle have?\"`\n" 1102 | ] 1103 | }, 1104 | { 1105 | "cell_type": "code", 1106 | "execution_count": 25, 1107 | "metadata": { 1108 | "colab": { 1109 | "base_uri": "https://localhost:8080/" 1110 | }, 1111 | "id": "bNe1p_BkVOmu", 1112 | "outputId": "b997e60e-213a-4315-f1e9-c10e00c33607" 1113 | }, 1114 | "outputs": [ 1115 | { 1116 | "name": "stdout", 1117 | "output_type": "stream", 1118 | "text": [ 1119 | "Squirtle has the following base stats:\n", 1120 | "* **HP**: 44\n", 1121 | "* **Attack**: 48\n", 1122 | "* **Defense**: 65\n", 1123 | "* **Special Attack**: 50\n", 1124 | "* **Special Defense**: 64\n", 1125 | "* **Speed**: 43\n" 1126 | ] 1127 | } 1128 | ], 1129 | "source": [ 1130 | "import requests\n", 1131 | "\n", 1132 | "def get_pokemon_info(pokemon: str) -> dict:\n", 1133 | " \"\"\"Gets pokemon info for a given pokemon name.\n", 1134 | "\n", 1135 | " Args:\n", 1136 | " pokemon: The name of the pokemon.\n", 1137 | "\n", 1138 | " Returns:\n", 1139 | " A dictionary containing the info.\n", 1140 | " \"\"\"\n", 1141 | " resp = requests.get(f\"https://pokeapi.co/api/v2/pokemon/{pokemon.lower()}\")\n", 1142 | " return resp.json()\n", 1143 | "\n", 1144 | "\n", 1145 | "response = client.models.generate_content(\n", 1146 | " model=MODEL,\n", 1147 | " contents=\"What stats does the Pokemon Squirtle have?\",\n", 1148 | " config=types.GenerateContentConfig(tools=[get_pokemon_info])\n", 1149 | ")\n", 1150 | "\n", 1151 | "print(response.text)" 1152 | ] 1153 | }, 1154 | { 1155 | "cell_type": "code", 1156 | "execution_count": 26, 1157 | "metadata": { 1158 | "colab": { 1159 | "base_uri": "https://localhost:8080/" 1160 | }, 1161 | "id": "Rg1Bu_VgdARK", 1162 | "outputId": "4ce2c6f6-3425-445d-f7ae-3d4c8d5b59c4" 1163 | }, 1164 | "outputs": [ 1165 | { 1166 | "name": "stdout", 1167 | "output_type": "stream", 1168 | "text": [ 1169 | "id=None args={'pokemon': 'Squirtle'} name='get_pokemon_info'\n" 1170 | ] 1171 | } 1172 | ], 1173 | "source": [ 1174 | "for content in response.automatic_function_calling_history:\n", 1175 | " for part in content.parts:\n", 1176 | " if part.function_call:\n", 1177 | " print(part.function_call)" 1178 | ] 1179 | }, 1180 | { 1181 | "cell_type": "markdown", 1182 | "metadata": { 1183 | "id": "yTWLa3ZZ_yhT" 1184 | }, 1185 | "source": [ 1186 | "## Recap & Next steps\n", 1187 | "\n", 1188 | "Awesome work! You learned about thinking models with advanced reasoning capabilities and how to combine Gemini with tools for agentic use cases.\n", 1189 | "\n", 1190 | "More helpful resources:\n", 1191 | "\n", 1192 | "- [Thinking docs](https://ai.google.dev/gemini-api/docs/thinking)\n", 1193 | "- [Structured output docs](https://ai.google.dev/gemini-api/docs/structured-output?lang=python)\n", 1194 | "- [Code execution docs](https://ai.google.dev/gemini-api/docs/code-execution?lang=python)\n", 1195 | "- [Grounding docs](https://ai.google.dev/gemini-api/docs/grounding?lang=python)\n", 1196 | "- [Function calling docs](https://ai.google.dev/gemini-api/docs/function-calling?example=weather)\n", 1197 | "\n", 1198 | "🎉🎉**Conratulations, you completed the workshop!**🎉🎉\n", 1199 | "\n", 1200 | "**Next steps**: There's even more you can do with Gemini which we didn't cover in this workshop:\n", 1201 | "\n", 1202 | "- [Image creation and editing with Gemini 2.0](https://github.com/patrickloeber/genai-tutorials/blob/main/notebooks/gemini-image-editing.ipynb)\n", 1203 | "- [Live API: Talk to Gemini and share your camera](https://aistudio.google.com/live) & [Live API cookbook](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Get_started_LiveAPI.ipynb)\n" 1204 | ] 1205 | } 1206 | ], 1207 | "metadata": { 1208 | "colab": { 1209 | "provenance": [] 1210 | }, 1211 | "kernelspec": { 1212 | "display_name": "Python 3", 1213 | "name": "python3" 1214 | }, 1215 | "language_info": { 1216 | "name": "python" 1217 | } 1218 | }, 1219 | "nbformat": 4, 1220 | "nbformat_minor": 0 1221 | } 1222 | --------------------------------------------------------------------------------