├── .gitattributes ├── 9781484299937.jpg ├── Contributing.md ├── README.md ├── LICENSE.txt └── Kulkarni_Ch_11_Generative AI └── code └── Scikit_llm.ipynb /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | -------------------------------------------------------------------------------- /9781484299937.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Apress/Applied-Generative-AI-for-Beginners/HEAD/9781484299937.jpg -------------------------------------------------------------------------------- /Contributing.md: -------------------------------------------------------------------------------- 1 | # Contributing to Apress Source Code 2 | 3 | Copyright for Apress source code belongs to the author(s). However, under fair use you are encouraged to fork and contribute minor corrections and updates for the benefit of the author(s) and other readers. 4 | 5 | ## How to Contribute 6 | 7 | 1. Make sure you have a GitHub account. 8 | 2. Fork the repository for the relevant book. 9 | 3. Create a new branch on which to make your change, e.g. 10 | `git checkout -b my_code_contribution` 11 | 4. Commit your change. Include a commit message describing the correction. Please note that if your commit message is not clear, the correction will not be accepted. 12 | 5. Submit a pull request. 13 | 14 | Thank you for your contribution! -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Apress Source Code 2 | 3 | This repository accompanies [*Applied Generative AI for Beginners: Practical Knowledge on Diffusion Models, ChatGPT, and Other LLMs*](https://www.link.springer.com/book/10.1007/9781484299937) by Akshay Kulkarni, Adarsha Shivananda, Anoosh Kulkarni, and Dilip Gudivada(Apress, 2023). 4 | 5 | [comment]: #cover 6 | ![Cover image](9781484299937.jpg) 7 | 8 | Download the files as a zip using the green button, or clone the repository to your machine using Git. 9 | 10 | ## Releases 11 | 12 | Release v1.0 corresponds to the code in the published book, without corrections or updates. 13 | 14 | ## Contributions 15 | 16 | See the file Contributing.md for more information on how you can contribute to this repository. -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Freeware License, some rights reserved 2 | 3 | Copyright (c) 2023 Akshay Kulkarni, Adarsha Shivananda, Anoosh Kulkarni, and Dilip Gudivada 4 | 5 | Permission is hereby granted, free of charge, to anyone obtaining a copy 6 | of this software and associated documentation files (the "Software"), 7 | to work with the Software within the limits of freeware distribution and fair use. 8 | This includes the rights to use, copy, and modify the Software for personal use. 9 | Users are also allowed and encouraged to submit corrections and modifications 10 | to the Software for the benefit of other users. 11 | 12 | It is not allowed to reuse, modify, or redistribute the Software for 13 | commercial use in any way, or for a user’s educational materials such as books 14 | or blog articles without prior permission from the copyright holder. 15 | 16 | The above copyright notice and this permission notice need to be included 17 | in all copies or substantial portions of the software. 18 | 19 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 20 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 21 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 22 | AUTHORS OR COPYRIGHT HOLDERS OR APRESS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 23 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 24 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 25 | SOFTWARE. 26 | 27 | 28 | -------------------------------------------------------------------------------- /Kulkarni_Ch_11_Generative AI/code/Scikit_llm.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [] 7 | }, 8 | "kernelspec": { 9 | "name": "python3", 10 | "display_name": "Python 3" 11 | }, 12 | "language_info": { 13 | "name": "python" 14 | } 15 | }, 16 | "cells": [ 17 | { 18 | "cell_type": "markdown", 19 | "source": [ 20 | "**Installation and setup**" 21 | ], 22 | "metadata": { 23 | "id": "3VRD_z_jKP0N" 24 | } 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 1, 29 | "metadata": { 30 | "id": "g8nrlWIpYW07" 31 | }, 32 | "outputs": [], 33 | "source": [ 34 | "%%capture\n", 35 | "!pip install scikit-llm watermark" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "source": [ 41 | "\n", 42 | "%load_ext watermark\n", 43 | "%watermark -a \"Anoosh Kulkarni\" -vmp scikit-llm" 44 | ], 45 | "metadata": { 46 | "colab": { 47 | "base_uri": "https://localhost:8080/" 48 | }, 49 | "id": "wgYOCUKIYfE-", 50 | "outputId": "e8a2ee6d-df15-4ba3-860b-3cf936e78eba" 51 | }, 52 | "execution_count": 2, 53 | "outputs": [ 54 | { 55 | "output_type": "stream", 56 | "name": "stdout", 57 | "text": [ 58 | "Author: Anoosh Kulkarni\n", 59 | "\n", 60 | "Python implementation: CPython\n", 61 | "Python version : 3.10.12\n", 62 | "IPython version : 7.34.0\n", 63 | "\n", 64 | "scikit-llm: not installed\n", 65 | "\n", 66 | "Compiler : GCC 11.4.0\n", 67 | "OS : Linux\n", 68 | "Release : 5.15.120+\n", 69 | "Machine : x86_64\n", 70 | "Processor : x86_64\n", 71 | "CPU cores : 2\n", 72 | "Architecture: 64bit\n", 73 | "\n" 74 | ] 75 | } 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "source": [ 81 | "# importing SKLLMConfig to configure OpenAI API (key and Name)\n", 82 | "from skllm.config import SKLLMConfig\n", 83 | "\n", 84 | "OPENAI_API_KEY = \"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\"\n", 85 | "OPENAI_ORG_ID = \"org-0zZxhDnt9ssmvpugmzrP8ORT\"\n", 86 | "\n", 87 | "# Set your OpenAI API key\n", 88 | "SKLLMConfig.set_openai_key(OPENAI_API_KEY )\n", 89 | "\n", 90 | "# Set your OpenAI organization\n", 91 | "SKLLMConfig.set_openai_org(OPENAI_ORG_ID)" 92 | ], 93 | "metadata": { 94 | "id": "tMreaGi_Ytxk" 95 | }, 96 | "execution_count": 3, 97 | "outputs": [] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "source": [ 102 | "**ZeroShotGPTClassifier**" 103 | ], 104 | "metadata": { 105 | "id": "xos63AFYJEKT" 106 | } 107 | }, 108 | { 109 | "cell_type": "code", 110 | "source": [ 111 | "# importing zeroshotgptclassifier module and classification dataset\n", 112 | "from skllm import ZeroShotGPTClassifier\n", 113 | "from skllm.datasets import get_classification_dataset" 114 | ], 115 | "metadata": { 116 | "id": "1KXOBej3YyrP" 117 | }, 118 | "execution_count": 4, 119 | "outputs": [] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "source": [ 124 | "\n", 125 | "# sentiment analysis dataset\n", 126 | "# labels: positive, negative, neutral\n", 127 | "X, y = get_classification_dataset()" 128 | ], 129 | "metadata": { 130 | "id": "VF_p2cxXZd_Z" 131 | }, 132 | "execution_count": 5, 133 | "outputs": [] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "source": [ 138 | "len(X)" 139 | ], 140 | "metadata": { 141 | "colab": { 142 | "base_uri": "https://localhost:8080/" 143 | }, 144 | "id": "FmmtO6ZBZhBl", 145 | "outputId": "0f6cde45-2f06-46a4-c189-2407ec5b8d23" 146 | }, 147 | "execution_count": 6, 148 | "outputs": [ 149 | { 150 | "output_type": "execute_result", 151 | "data": { 152 | "text/plain": [ 153 | "30" 154 | ] 155 | }, 156 | "metadata": {}, 157 | "execution_count": 6 158 | } 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "source": [ 164 | "X" 165 | ], 166 | "metadata": { 167 | "colab": { 168 | "base_uri": "https://localhost:8080/" 169 | }, 170 | "id": "RQsFOKNsZj_i", 171 | "outputId": "1ccac778-8efe-4809-a176-5a005abe1cf7" 172 | }, 173 | "execution_count": 7, 174 | "outputs": [ 175 | { 176 | "output_type": "execute_result", 177 | "data": { 178 | "text/plain": [ 179 | "[\"I was absolutely blown away by the performances in 'Summer's End'. The acting was top-notch, and the plot had me gripped from start to finish. A truly captivating cinematic experience that I would highly recommend.\",\n", 180 | " \"The special effects in 'Star Battles: Nebula Conflict' were out of this world. I felt like I was actually in space. The storyline was incredibly engaging and left me wanting more. Excellent film.\",\n", 181 | " \"'The Lost Symphony' was a masterclass in character development and storytelling. The score was hauntingly beautiful and complimented the intense, emotional scenes perfectly. Kudos to the director and cast for creating such a masterpiece.\",\n", 182 | " \"I was pleasantly surprised by 'Love in the Time of Cholera'. The romantic storyline was heartwarming and the characters were incredibly realistic. The cinematography was also top-notch. A must-watch for all romance lovers.\",\n", 183 | " \"I went into 'Marble Street' with low expectations, but I was pleasantly surprised. The suspense was well-maintained throughout, and the twist at the end was something I did not see coming. Bravo!\",\n", 184 | " \"'The Great Plains' is a touching portrayal of life in rural America. The performances were heartfelt and the scenery was breathtaking. I was moved to tears by the end. It's a story that will stay with me for a long time.\",\n", 185 | " \"The screenwriting in 'Under the Willow Tree' was superb. The dialogue felt real and the characters were well-rounded. The performances were also fantastic. I haven't enjoyed a movie this much in a while.\",\n", 186 | " \"'Nightshade' is a brilliant take on the superhero genre. The protagonist was relatable and the villain was genuinely scary. The action sequences were thrilling and the storyline was engaging. I can't wait for the sequel.\",\n", 187 | " \"The cinematography in 'Awakening' was nothing short of spectacular. The visuals alone are worth the ticket price. The storyline was unique and the performances were solid. An overall fantastic film.\",\n", 188 | " \"'Eternal Embers' was a cinematic delight. The storytelling was original and the performances were exceptional. The director's vision was truly brought to life on the big screen. A must-see for all movie lovers.\",\n", 189 | " \"I was thoroughly disappointed with 'Silver Shadows'. The plot was confusing and the performances were lackluster. I wouldn't recommend wasting your time on this one.\",\n", 190 | " \"'The Darkened Path' was a disaster. The storyline was unoriginal, the acting was wooden and the special effects were laughably bad. Save your money and skip this one.\",\n", 191 | " \"I had high hopes for 'The Final Frontier', but it failed to deliver. The plot was full of holes and the characters were poorly developed. It was a disappointing experience.\",\n", 192 | " \"'The Fall of the Phoenix' was a letdown. The storyline was confusing and the characters were one-dimensional. I found myself checking my watch multiple times throughout the movie.\",\n", 193 | " \"I regret wasting my time on 'Emerald City'. The plot was nonsensical and the performances were uninspired. It was a major disappointment.\",\n", 194 | " \"I found 'Hollow Echoes' to be a complete mess. The plot was non-existent, the performances were overdone, and the pacing was all over the place. Definitely not worth the hype.\",\n", 195 | " \"'Underneath the Stars' was a huge disappointment. The storyline was predictable and the acting was mediocre at best. I was expecting so much more.\",\n", 196 | " \"I was left unimpressed by 'River's Edge'. The plot was convoluted, the characters were uninteresting, and the ending was unsatisfying. It's a pass for me.\",\n", 197 | " \"The acting in 'Desert Mirage' was subpar, and the plot was boring. I found myself yawning multiple times throughout the movie. Save your time and skip this one.\",\n", 198 | " \"'Crimson Dawn' was a major letdown. The plot was cliched and the characters were flat. The special effects were also poorly executed. I wouldn't recommend it.\",\n", 199 | " \"'Remember the Days' was utterly forgettable. The storyline was dull, the performances were bland, and the dialogue was cringeworthy. A big disappointment.\",\n", 200 | " \"'The Last Frontier' was simply okay. The plot was decent and the performances were acceptable. However, it lacked a certain spark to make it truly memorable.\",\n", 201 | " \"'Through the Storm' was not bad, but it wasn't great either. The storyline was somewhat predictable, and the characters were somewhat stereotypical. It was an average movie at best.\",\n", 202 | " \"I found 'After the Rain' to be pretty average. The plot was okay and the performances were decent, but it didn't leave a lasting impression on me.\",\n", 203 | " \"'Beyond the Horizon' was neither good nor bad. The plot was interesting enough, but the characters were not very well developed. It was an okay watch.\",\n", 204 | " \"'The Silent Echo' was a mediocre movie. The storyline was passable and the performances were fair, but it didn't stand out in any way.\",\n", 205 | " \"I thought 'The Scent of Roses' was pretty average. The plot was somewhat engaging, and the performances were okay, but it didn't live up to my expectations.\",\n", 206 | " \"'Under the Same Sky' was an okay movie. The plot was decent, and the performances were fine, but it lacked depth and originality. It's not a movie I would watch again.\",\n", 207 | " \"'Chasing Shadows' was fairly average. The plot was not bad, and the performances were passable, but it lacked a certain spark. It was just okay.\",\n", 208 | " \"'Beneath the Surface' was pretty run-of-the-mill. The plot was decent, the performances were okay, but it wasn't particularly memorable. It was an okay movie.\"]" 209 | ] 210 | }, 211 | "metadata": {}, 212 | "execution_count": 7 213 | } 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "source": [ 219 | "y" 220 | ], 221 | "metadata": { 222 | "colab": { 223 | "base_uri": "https://localhost:8080/" 224 | }, 225 | "id": "4wZJf5nLZpR2", 226 | "outputId": "d5a36953-c2bb-4bb9-84ea-8d05926c9ce1" 227 | }, 228 | "execution_count": 8, 229 | "outputs": [ 230 | { 231 | "output_type": "execute_result", 232 | "data": { 233 | "text/plain": [ 234 | "['positive',\n", 235 | " 'positive',\n", 236 | " 'positive',\n", 237 | " 'positive',\n", 238 | " 'positive',\n", 239 | " 'positive',\n", 240 | " 'positive',\n", 241 | " 'positive',\n", 242 | " 'positive',\n", 243 | " 'positive',\n", 244 | " 'negative',\n", 245 | " 'negative',\n", 246 | " 'negative',\n", 247 | " 'negative',\n", 248 | " 'negative',\n", 249 | " 'negative',\n", 250 | " 'negative',\n", 251 | " 'negative',\n", 252 | " 'negative',\n", 253 | " 'negative',\n", 254 | " 'neutral',\n", 255 | " 'neutral',\n", 256 | " 'neutral',\n", 257 | " 'neutral',\n", 258 | " 'neutral',\n", 259 | " 'neutral',\n", 260 | " 'neutral',\n", 261 | " 'neutral',\n", 262 | " 'neutral',\n", 263 | " 'neutral']" 264 | ] 265 | }, 266 | "metadata": {}, 267 | "execution_count": 8 268 | } 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "source": [ 274 | "# to notice: indexing starts at 0\n", 275 | "def training_data(data):\n", 276 | " subset_1 = data[:8] # First 8 elements from 1-10\n", 277 | " subset_2 = data[10:18] # First 8 elements from 11-20\n", 278 | " subset_3 = data[20:28] # First 8 elements from rest of the data\n", 279 | "\n", 280 | " combined_data = subset_1 + subset_2 + subset_3\n", 281 | " return combined_data" 282 | ], 283 | "metadata": { 284 | "id": "SGDmNEbOZsEy" 285 | }, 286 | "execution_count": 9, 287 | "outputs": [] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "source": [ 292 | "# to notice: indexing starts at 0\n", 293 | "def testing_data(data):\n", 294 | " subset_1 = data[8:10] # Last 2 elements from 1-10\n", 295 | " subset_2 = data[18:20] # Last 2 elements from 11-20\n", 296 | " subset_3 = data[28:30] # Last 2 elements from rest of the data\n", 297 | "\n", 298 | " combined_data = subset_1 + subset_2 + subset_3\n", 299 | " return combined_data" 300 | ], 301 | "metadata": { 302 | "id": "jMbOlHTFZyr0" 303 | }, 304 | "execution_count": 10, 305 | "outputs": [] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "source": [ 310 | "\n", 311 | "X_train = training_data(X)\n", 312 | "print(len(X_train))\n", 313 | "X_train" 314 | ], 315 | "metadata": { 316 | "colab": { 317 | "base_uri": "https://localhost:8080/" 318 | }, 319 | "id": "ni3ZOOHjZ07S", 320 | "outputId": "74b35c38-663a-43e3-b8f2-562c2aa99556" 321 | }, 322 | "execution_count": 11, 323 | "outputs": [ 324 | { 325 | "output_type": "stream", 326 | "name": "stdout", 327 | "text": [ 328 | "24\n" 329 | ] 330 | }, 331 | { 332 | "output_type": "execute_result", 333 | "data": { 334 | "text/plain": [ 335 | "[\"I was absolutely blown away by the performances in 'Summer's End'. The acting was top-notch, and the plot had me gripped from start to finish. A truly captivating cinematic experience that I would highly recommend.\",\n", 336 | " \"The special effects in 'Star Battles: Nebula Conflict' were out of this world. I felt like I was actually in space. The storyline was incredibly engaging and left me wanting more. Excellent film.\",\n", 337 | " \"'The Lost Symphony' was a masterclass in character development and storytelling. The score was hauntingly beautiful and complimented the intense, emotional scenes perfectly. Kudos to the director and cast for creating such a masterpiece.\",\n", 338 | " \"I was pleasantly surprised by 'Love in the Time of Cholera'. The romantic storyline was heartwarming and the characters were incredibly realistic. The cinematography was also top-notch. A must-watch for all romance lovers.\",\n", 339 | " \"I went into 'Marble Street' with low expectations, but I was pleasantly surprised. The suspense was well-maintained throughout, and the twist at the end was something I did not see coming. Bravo!\",\n", 340 | " \"'The Great Plains' is a touching portrayal of life in rural America. The performances were heartfelt and the scenery was breathtaking. I was moved to tears by the end. It's a story that will stay with me for a long time.\",\n", 341 | " \"The screenwriting in 'Under the Willow Tree' was superb. The dialogue felt real and the characters were well-rounded. The performances were also fantastic. I haven't enjoyed a movie this much in a while.\",\n", 342 | " \"'Nightshade' is a brilliant take on the superhero genre. The protagonist was relatable and the villain was genuinely scary. The action sequences were thrilling and the storyline was engaging. I can't wait for the sequel.\",\n", 343 | " \"I was thoroughly disappointed with 'Silver Shadows'. The plot was confusing and the performances were lackluster. I wouldn't recommend wasting your time on this one.\",\n", 344 | " \"'The Darkened Path' was a disaster. The storyline was unoriginal, the acting was wooden and the special effects were laughably bad. Save your money and skip this one.\",\n", 345 | " \"I had high hopes for 'The Final Frontier', but it failed to deliver. The plot was full of holes and the characters were poorly developed. It was a disappointing experience.\",\n", 346 | " \"'The Fall of the Phoenix' was a letdown. The storyline was confusing and the characters were one-dimensional. I found myself checking my watch multiple times throughout the movie.\",\n", 347 | " \"I regret wasting my time on 'Emerald City'. The plot was nonsensical and the performances were uninspired. It was a major disappointment.\",\n", 348 | " \"I found 'Hollow Echoes' to be a complete mess. The plot was non-existent, the performances were overdone, and the pacing was all over the place. Definitely not worth the hype.\",\n", 349 | " \"'Underneath the Stars' was a huge disappointment. The storyline was predictable and the acting was mediocre at best. I was expecting so much more.\",\n", 350 | " \"I was left unimpressed by 'River's Edge'. The plot was convoluted, the characters were uninteresting, and the ending was unsatisfying. It's a pass for me.\",\n", 351 | " \"'Remember the Days' was utterly forgettable. The storyline was dull, the performances were bland, and the dialogue was cringeworthy. A big disappointment.\",\n", 352 | " \"'The Last Frontier' was simply okay. The plot was decent and the performances were acceptable. However, it lacked a certain spark to make it truly memorable.\",\n", 353 | " \"'Through the Storm' was not bad, but it wasn't great either. The storyline was somewhat predictable, and the characters were somewhat stereotypical. It was an average movie at best.\",\n", 354 | " \"I found 'After the Rain' to be pretty average. The plot was okay and the performances were decent, but it didn't leave a lasting impression on me.\",\n", 355 | " \"'Beyond the Horizon' was neither good nor bad. The plot was interesting enough, but the characters were not very well developed. It was an okay watch.\",\n", 356 | " \"'The Silent Echo' was a mediocre movie. The storyline was passable and the performances were fair, but it didn't stand out in any way.\",\n", 357 | " \"I thought 'The Scent of Roses' was pretty average. The plot was somewhat engaging, and the performances were okay, but it didn't live up to my expectations.\",\n", 358 | " \"'Under the Same Sky' was an okay movie. The plot was decent, and the performances were fine, but it lacked depth and originality. It's not a movie I would watch again.\"]" 359 | ] 360 | }, 361 | "metadata": {}, 362 | "execution_count": 11 363 | } 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "source": [ 369 | "\n", 370 | "y_train = training_data(y)\n", 371 | "print(len(y_train))\n", 372 | "y_train" 373 | ], 374 | "metadata": { 375 | "colab": { 376 | "base_uri": "https://localhost:8080/" 377 | }, 378 | "id": "aNfFYDq2Z3EP", 379 | "outputId": "81ecb3c6-d08d-4bd4-832c-19833aace13a" 380 | }, 381 | "execution_count": 12, 382 | "outputs": [ 383 | { 384 | "output_type": "stream", 385 | "name": "stdout", 386 | "text": [ 387 | "24\n" 388 | ] 389 | }, 390 | { 391 | "output_type": "execute_result", 392 | "data": { 393 | "text/plain": [ 394 | "['positive',\n", 395 | " 'positive',\n", 396 | " 'positive',\n", 397 | " 'positive',\n", 398 | " 'positive',\n", 399 | " 'positive',\n", 400 | " 'positive',\n", 401 | " 'positive',\n", 402 | " 'negative',\n", 403 | " 'negative',\n", 404 | " 'negative',\n", 405 | " 'negative',\n", 406 | " 'negative',\n", 407 | " 'negative',\n", 408 | " 'negative',\n", 409 | " 'negative',\n", 410 | " 'neutral',\n", 411 | " 'neutral',\n", 412 | " 'neutral',\n", 413 | " 'neutral',\n", 414 | " 'neutral',\n", 415 | " 'neutral',\n", 416 | " 'neutral',\n", 417 | " 'neutral']" 418 | ] 419 | }, 420 | "metadata": {}, 421 | "execution_count": 12 422 | } 423 | ] 424 | }, 425 | { 426 | "cell_type": "code", 427 | "source": [ 428 | "\n", 429 | "X_test = testing_data(X)\n", 430 | "print(len(X_test))\n", 431 | "X_test\n", 432 | "\n" 433 | ], 434 | "metadata": { 435 | "colab": { 436 | "base_uri": "https://localhost:8080/" 437 | }, 438 | "id": "FCBsDcgMZ767", 439 | "outputId": "0a9d9dc3-d388-4882-e46b-9be4d07c2c3d" 440 | }, 441 | "execution_count": 13, 442 | "outputs": [ 443 | { 444 | "output_type": "stream", 445 | "name": "stdout", 446 | "text": [ 447 | "6\n" 448 | ] 449 | }, 450 | { 451 | "output_type": "execute_result", 452 | "data": { 453 | "text/plain": [ 454 | "[\"The cinematography in 'Awakening' was nothing short of spectacular. The visuals alone are worth the ticket price. The storyline was unique and the performances were solid. An overall fantastic film.\",\n", 455 | " \"'Eternal Embers' was a cinematic delight. The storytelling was original and the performances were exceptional. The director's vision was truly brought to life on the big screen. A must-see for all movie lovers.\",\n", 456 | " \"The acting in 'Desert Mirage' was subpar, and the plot was boring. I found myself yawning multiple times throughout the movie. Save your time and skip this one.\",\n", 457 | " \"'Crimson Dawn' was a major letdown. The plot was cliched and the characters were flat. The special effects were also poorly executed. I wouldn't recommend it.\",\n", 458 | " \"'Chasing Shadows' was fairly average. The plot was not bad, and the performances were passable, but it lacked a certain spark. It was just okay.\",\n", 459 | " \"'Beneath the Surface' was pretty run-of-the-mill. The plot was decent, the performances were okay, but it wasn't particularly memorable. It was an okay movie.\"]" 460 | ] 461 | }, 462 | "metadata": {}, 463 | "execution_count": 13 464 | } 465 | ] 466 | }, 467 | { 468 | "cell_type": "code", 469 | "source": [ 470 | "y_test = testing_data(y)\n", 471 | "print(len(y_test))\n", 472 | "y_test" 473 | ], 474 | "metadata": { 475 | "colab": { 476 | "base_uri": "https://localhost:8080/" 477 | }, 478 | "id": "dCZJTxitZ_KS", 479 | "outputId": "93c940ea-0839-4479-e05e-756a4ed55454" 480 | }, 481 | "execution_count": 14, 482 | "outputs": [ 483 | { 484 | "output_type": "stream", 485 | "name": "stdout", 486 | "text": [ 487 | "6\n" 488 | ] 489 | }, 490 | { 491 | "output_type": "execute_result", 492 | "data": { 493 | "text/plain": [ 494 | "['positive', 'positive', 'negative', 'negative', 'neutral', 'neutral']" 495 | ] 496 | }, 497 | "metadata": {}, 498 | "execution_count": 14 499 | } 500 | ] 501 | }, 502 | { 503 | "cell_type": "code", 504 | "source": [ 505 | "\n", 506 | "# defining the openai model to use\n", 507 | "clf = ZeroShotGPTClassifier(openai_model=\"gpt-3.5-turbo\")\n", 508 | "\n", 509 | "# fitting the data\n", 510 | "clf.fit(X_train, y_train)" 511 | ], 512 | "metadata": { 513 | "colab": { 514 | "base_uri": "https://localhost:8080/", 515 | "height": 74 516 | }, 517 | "id": "RkpJsr77aB3O", 518 | "outputId": "4393d9be-1649-49b8-c531-c40053e005d7" 519 | }, 520 | "execution_count": 15, 521 | "outputs": [ 522 | { 523 | "output_type": "execute_result", 524 | "data": { 525 | "text/plain": [ 526 | "ZeroShotGPTClassifier()" 527 | ], 528 | "text/html": [ 529 | "
ZeroShotGPTClassifier()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" 530 | ] 531 | }, 532 | "metadata": {}, 533 | "execution_count": 15 534 | } 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "source": [ 540 | "%%time\n", 541 | "# predicting the data\n", 542 | "predicted_labels = clf.predict(X_test)" 543 | ], 544 | "metadata": { 545 | "colab": { 546 | "base_uri": "https://localhost:8080/" 547 | }, 548 | "id": "aDjGQWYDaFpT", 549 | "outputId": "eee8e3a1-4e21-47eb-c499-b8ccf49184a7" 550 | }, 551 | "execution_count": 16, 552 | "outputs": [ 553 | { 554 | "output_type": "stream", 555 | "name": "stderr", 556 | "text": [ 557 | " 17%|█▋ | 1/6 [00:09<00:47, 9.43s/it]" 558 | ] 559 | }, 560 | { 561 | "output_type": "stream", 562 | "name": "stdout", 563 | "text": [ 564 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 565 | "None\n", 566 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 567 | ] 568 | }, 569 | { 570 | "output_type": "stream", 571 | "name": "stderr", 572 | "text": [ 573 | "\r 33%|███▎ | 2/6 [00:18<00:37, 9.31s/it]" 574 | ] 575 | }, 576 | { 577 | "output_type": "stream", 578 | "name": "stdout", 579 | "text": [ 580 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 581 | "None\n", 582 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 583 | ] 584 | }, 585 | { 586 | "output_type": "stream", 587 | "name": "stderr", 588 | "text": [ 589 | "\r 50%|█████ | 3/6 [00:27<00:27, 9.26s/it]" 590 | ] 591 | }, 592 | { 593 | "output_type": "stream", 594 | "name": "stdout", 595 | "text": [ 596 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 597 | "None\n", 598 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 599 | ] 600 | }, 601 | { 602 | "output_type": "stream", 603 | "name": "stderr", 604 | "text": [ 605 | "\r 67%|██████▋ | 4/6 [00:37<00:18, 9.23s/it]" 606 | ] 607 | }, 608 | { 609 | "output_type": "stream", 610 | "name": "stdout", 611 | "text": [ 612 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 613 | "None\n", 614 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 615 | ] 616 | }, 617 | { 618 | "output_type": "stream", 619 | "name": "stderr", 620 | "text": [ 621 | "\r 83%|████████▎ | 5/6 [00:46<00:09, 9.22s/it]" 622 | ] 623 | }, 624 | { 625 | "output_type": "stream", 626 | "name": "stdout", 627 | "text": [ 628 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 629 | "None\n", 630 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 631 | ] 632 | }, 633 | { 634 | "output_type": "stream", 635 | "name": "stderr", 636 | "text": [ 637 | "100%|██████████| 6/6 [00:55<00:00, 9.24s/it]" 638 | ] 639 | }, 640 | { 641 | "output_type": "stream", 642 | "name": "stdout", 643 | "text": [ 644 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 645 | "None\n", 646 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n", 647 | "CPU times: user 461 ms, sys: 42 ms, total: 503 ms\n", 648 | "Wall time: 55.5 s\n" 649 | ] 650 | }, 651 | { 652 | "output_type": "stream", 653 | "name": "stderr", 654 | "text": [ 655 | "\n" 656 | ] 657 | } 658 | ] 659 | }, 660 | { 661 | "cell_type": "code", 662 | "source": [ 663 | "\n", 664 | "for review, sentiment in zip(X_test, predicted_labels):\n", 665 | " print(f\"Review: {review}\\nPredicted Sentiment: {sentiment}\\n\\n\")" 666 | ], 667 | "metadata": { 668 | "colab": { 669 | "base_uri": "https://localhost:8080/" 670 | }, 671 | "id": "orkD6gI3aJMP", 672 | "outputId": "85bc1ad5-0af6-41b7-faf3-0ae5d82b33a5" 673 | }, 674 | "execution_count": 17, 675 | "outputs": [ 676 | { 677 | "output_type": "stream", 678 | "name": "stdout", 679 | "text": [ 680 | "Review: The cinematography in 'Awakening' was nothing short of spectacular. The visuals alone are worth the ticket price. The storyline was unique and the performances were solid. An overall fantastic film.\n", 681 | "Predicted Sentiment: positive\n", 682 | "\n", 683 | "\n", 684 | "Review: 'Eternal Embers' was a cinematic delight. The storytelling was original and the performances were exceptional. The director's vision was truly brought to life on the big screen. A must-see for all movie lovers.\n", 685 | "Predicted Sentiment: neutral\n", 686 | "\n", 687 | "\n", 688 | "Review: The acting in 'Desert Mirage' was subpar, and the plot was boring. I found myself yawning multiple times throughout the movie. Save your time and skip this one.\n", 689 | "Predicted Sentiment: positive\n", 690 | "\n", 691 | "\n", 692 | "Review: 'Crimson Dawn' was a major letdown. The plot was cliched and the characters were flat. The special effects were also poorly executed. I wouldn't recommend it.\n", 693 | "Predicted Sentiment: negative\n", 694 | "\n", 695 | "\n", 696 | "Review: 'Chasing Shadows' was fairly average. The plot was not bad, and the performances were passable, but it lacked a certain spark. It was just okay.\n", 697 | "Predicted Sentiment: positive\n", 698 | "\n", 699 | "\n", 700 | "Review: 'Beneath the Surface' was pretty run-of-the-mill. The plot was decent, the performances were okay, but it wasn't particularly memorable. It was an okay movie.\n", 701 | "Predicted Sentiment: positive\n", 702 | "\n", 703 | "\n" 704 | ] 705 | } 706 | ] 707 | }, 708 | { 709 | "cell_type": "code", 710 | "source": [ 711 | "\n", 712 | "from sklearn.metrics import accuracy_score" 713 | ], 714 | "metadata": { 715 | "id": "63cTwtomabzp" 716 | }, 717 | "execution_count": 18, 718 | "outputs": [] 719 | }, 720 | { 721 | "cell_type": "code", 722 | "source": [ 723 | "print(f\"Accuracy: {accuracy_score(y_test, predicted_labels):.2f}\")" 724 | ], 725 | "metadata": { 726 | "colab": { 727 | "base_uri": "https://localhost:8080/" 728 | }, 729 | "id": "xrXq5OnnagsJ", 730 | "outputId": "d51d4470-27a2-492c-cd2b-cc8535c67f95" 731 | }, 732 | "execution_count": 19, 733 | "outputs": [ 734 | { 735 | "output_type": "stream", 736 | "name": "stdout", 737 | "text": [ 738 | "Accuracy: 0.33\n" 739 | ] 740 | } 741 | ] 742 | }, 743 | { 744 | "cell_type": "markdown", 745 | "source": [ 746 | "**Without labeled data**" 747 | ], 748 | "metadata": { 749 | "id": "JnHAADUIJNTu" 750 | } 751 | }, 752 | { 753 | "cell_type": "code", 754 | "source": [ 755 | "# defining the model\n", 756 | "clf_no_label = ZeroShotGPTClassifier()\n", 757 | "\n", 758 | "# No training so passing the labels only for prediction\n", 759 | "clf_no_label.fit(None, ['positive', 'negative', 'neutral'])\n", 760 | "\n", 761 | "# predicting the labels\n", 762 | "predicted_labels_without_training_data = clf_no_label.predict(X_test)\n", 763 | "predicted_labels_without_training_data\n" 764 | ], 765 | "metadata": { 766 | "id": "rMIQOqz4alBN", 767 | "colab": { 768 | "base_uri": "https://localhost:8080/" 769 | }, 770 | "outputId": "0e69b1f0-7881-43d6-e5b0-1b0eaa843839" 771 | }, 772 | "execution_count": 20, 773 | "outputs": [ 774 | { 775 | "output_type": "stream", 776 | "name": "stderr", 777 | "text": [ 778 | " 17%|█▋ | 1/6 [00:09<00:45, 9.19s/it]" 779 | ] 780 | }, 781 | { 782 | "output_type": "stream", 783 | "name": "stdout", 784 | "text": [ 785 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 786 | "None\n", 787 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 788 | ] 789 | }, 790 | { 791 | "output_type": "stream", 792 | "name": "stderr", 793 | "text": [ 794 | "\r 33%|███▎ | 2/6 [00:18<00:36, 9.20s/it]" 795 | ] 796 | }, 797 | { 798 | "output_type": "stream", 799 | "name": "stdout", 800 | "text": [ 801 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 802 | "None\n", 803 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 804 | ] 805 | }, 806 | { 807 | "output_type": "stream", 808 | "name": "stderr", 809 | "text": [ 810 | "\r 50%|█████ | 3/6 [00:27<00:27, 9.23s/it]" 811 | ] 812 | }, 813 | { 814 | "output_type": "stream", 815 | "name": "stdout", 816 | "text": [ 817 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 818 | "None\n", 819 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 820 | ] 821 | }, 822 | { 823 | "output_type": "stream", 824 | "name": "stderr", 825 | "text": [ 826 | "\r 67%|██████▋ | 4/6 [00:36<00:18, 9.21s/it]" 827 | ] 828 | }, 829 | { 830 | "output_type": "stream", 831 | "name": "stdout", 832 | "text": [ 833 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 834 | "None\n", 835 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 836 | ] 837 | }, 838 | { 839 | "output_type": "stream", 840 | "name": "stderr", 841 | "text": [ 842 | "\r 83%|████████▎ | 5/6 [00:46<00:09, 9.21s/it]" 843 | ] 844 | }, 845 | { 846 | "output_type": "stream", 847 | "name": "stdout", 848 | "text": [ 849 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 850 | "None\n", 851 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 852 | ] 853 | }, 854 | { 855 | "output_type": "stream", 856 | "name": "stderr", 857 | "text": [ 858 | "100%|██████████| 6/6 [00:55<00:00, 9.21s/it]" 859 | ] 860 | }, 861 | { 862 | "output_type": "stream", 863 | "name": "stdout", 864 | "text": [ 865 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 866 | "None\n", 867 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 868 | ] 869 | }, 870 | { 871 | "output_type": "stream", 872 | "name": "stderr", 873 | "text": [ 874 | "\n" 875 | ] 876 | }, 877 | { 878 | "output_type": "execute_result", 879 | "data": { 880 | "text/plain": [ 881 | "['neutral', 'neutral', 'positive', 'negative', 'negative', 'negative']" 882 | ] 883 | }, 884 | "metadata": {}, 885 | "execution_count": 20 886 | } 887 | ] 888 | }, 889 | { 890 | "cell_type": "code", 891 | "source": [ 892 | "for review, sentiment in zip(X_test, predicted_labels_without_training_data):\n", 893 | " print(f\"Review: {review}\\nPredicted Sentiment: {sentiment}\\n\\n\")\n" 894 | ], 895 | "metadata": { 896 | "colab": { 897 | "base_uri": "https://localhost:8080/" 898 | }, 899 | "id": "7l7k-PRKgCFq", 900 | "outputId": "6e15450b-85a8-4c87-e519-6063b149a84f" 901 | }, 902 | "execution_count": 21, 903 | "outputs": [ 904 | { 905 | "output_type": "stream", 906 | "name": "stdout", 907 | "text": [ 908 | "Review: The cinematography in 'Awakening' was nothing short of spectacular. The visuals alone are worth the ticket price. The storyline was unique and the performances were solid. An overall fantastic film.\n", 909 | "Predicted Sentiment: neutral\n", 910 | "\n", 911 | "\n", 912 | "Review: 'Eternal Embers' was a cinematic delight. The storytelling was original and the performances were exceptional. The director's vision was truly brought to life on the big screen. A must-see for all movie lovers.\n", 913 | "Predicted Sentiment: neutral\n", 914 | "\n", 915 | "\n", 916 | "Review: The acting in 'Desert Mirage' was subpar, and the plot was boring. I found myself yawning multiple times throughout the movie. Save your time and skip this one.\n", 917 | "Predicted Sentiment: positive\n", 918 | "\n", 919 | "\n", 920 | "Review: 'Crimson Dawn' was a major letdown. The plot was cliched and the characters were flat. The special effects were also poorly executed. I wouldn't recommend it.\n", 921 | "Predicted Sentiment: negative\n", 922 | "\n", 923 | "\n", 924 | "Review: 'Chasing Shadows' was fairly average. The plot was not bad, and the performances were passable, but it lacked a certain spark. It was just okay.\n", 925 | "Predicted Sentiment: negative\n", 926 | "\n", 927 | "\n", 928 | "Review: 'Beneath the Surface' was pretty run-of-the-mill. The plot was decent, the performances were okay, but it wasn't particularly memorable. It was an okay movie.\n", 929 | "Predicted Sentiment: negative\n", 930 | "\n", 931 | "\n" 932 | ] 933 | } 934 | ] 935 | }, 936 | { 937 | "cell_type": "code", 938 | "source": [ 939 | "print(f\"Accuracy: {accuracy_score(y_test, predicted_labels_without_training_data):.2f}\")" 940 | ], 941 | "metadata": { 942 | "colab": { 943 | "base_uri": "https://localhost:8080/" 944 | }, 945 | "id": "mMqdL4XBgKWP", 946 | "outputId": "ae7baa61-acdc-47b6-fb91-464be7982f8d" 947 | }, 948 | "execution_count": 22, 949 | "outputs": [ 950 | { 951 | "output_type": "stream", 952 | "name": "stdout", 953 | "text": [ 954 | "Accuracy: 0.17\n" 955 | ] 956 | } 957 | ] 958 | }, 959 | { 960 | "cell_type": "markdown", 961 | "source": [ 962 | "**Multi-Label Zero-Shot Text Classification**" 963 | ], 964 | "metadata": { 965 | "id": "8_vGFABMJfhA" 966 | } 967 | }, 968 | { 969 | "cell_type": "code", 970 | "source": [ 971 | "# importing Multi-Label zeroshot module and classification dataset\n", 972 | "from skllm import MultiLabelZeroShotGPTClassifier\n", 973 | "from skllm.datasets import get_multilabel_classification_dataset\n", 974 | "\n", 975 | "# get classification dataset from sklearn\n", 976 | "X, y = get_multilabel_classification_dataset()\n", 977 | "\n", 978 | "# defining the model\n", 979 | "clf = MultiLabelZeroShotGPTClassifier(max_labels=3)\n", 980 | "\n", 981 | "# fitting the model\n", 982 | "clf.fit(X, y)\n", 983 | "\n", 984 | "# making predictions\n", 985 | "labels = clf.predict(X)" 986 | ], 987 | "metadata": { 988 | "colab": { 989 | "base_uri": "https://localhost:8080/" 990 | }, 991 | "id": "V3o-hzGTgNyb", 992 | "outputId": "31a65e45-eba2-4057-8d19-fa21845c236c" 993 | }, 994 | "execution_count": 23, 995 | "outputs": [ 996 | { 997 | "output_type": "stream", 998 | "name": "stderr", 999 | "text": [ 1000 | " 10%|█ | 1/10 [00:09<01:22, 9.19s/it]" 1001 | ] 1002 | }, 1003 | { 1004 | "output_type": "stream", 1005 | "name": "stdout", 1006 | "text": [ 1007 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1008 | "None\n", 1009 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1010 | ] 1011 | }, 1012 | { 1013 | "output_type": "stream", 1014 | "name": "stderr", 1015 | "text": [ 1016 | "\r 20%|██ | 2/10 [00:18<01:13, 9.24s/it]" 1017 | ] 1018 | }, 1019 | { 1020 | "output_type": "stream", 1021 | "name": "stdout", 1022 | "text": [ 1023 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1024 | "None\n", 1025 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1026 | ] 1027 | }, 1028 | { 1029 | "output_type": "stream", 1030 | "name": "stderr", 1031 | "text": [ 1032 | "\r 30%|███ | 3/10 [00:27<01:04, 9.22s/it]" 1033 | ] 1034 | }, 1035 | { 1036 | "output_type": "stream", 1037 | "name": "stdout", 1038 | "text": [ 1039 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1040 | "None\n", 1041 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1042 | ] 1043 | }, 1044 | { 1045 | "output_type": "stream", 1046 | "name": "stderr", 1047 | "text": [ 1048 | "\r 40%|████ | 4/10 [00:36<00:55, 9.21s/it]" 1049 | ] 1050 | }, 1051 | { 1052 | "output_type": "stream", 1053 | "name": "stdout", 1054 | "text": [ 1055 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1056 | "None\n", 1057 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1058 | ] 1059 | }, 1060 | { 1061 | "output_type": "stream", 1062 | "name": "stderr", 1063 | "text": [ 1064 | "\r 50%|█████ | 5/10 [00:46<00:46, 9.23s/it]" 1065 | ] 1066 | }, 1067 | { 1068 | "output_type": "stream", 1069 | "name": "stdout", 1070 | "text": [ 1071 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1072 | "None\n", 1073 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1074 | ] 1075 | }, 1076 | { 1077 | "output_type": "stream", 1078 | "name": "stderr", 1079 | "text": [ 1080 | "\r 60%|██████ | 6/10 [00:55<00:36, 9.21s/it]" 1081 | ] 1082 | }, 1083 | { 1084 | "output_type": "stream", 1085 | "name": "stdout", 1086 | "text": [ 1087 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1088 | "None\n", 1089 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1090 | ] 1091 | }, 1092 | { 1093 | "output_type": "stream", 1094 | "name": "stderr", 1095 | "text": [ 1096 | "\r 70%|███████ | 7/10 [01:04<00:27, 9.21s/it]" 1097 | ] 1098 | }, 1099 | { 1100 | "output_type": "stream", 1101 | "name": "stdout", 1102 | "text": [ 1103 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1104 | "None\n", 1105 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1106 | ] 1107 | }, 1108 | { 1109 | "output_type": "stream", 1110 | "name": "stderr", 1111 | "text": [ 1112 | "\r 80%|████████ | 8/10 [01:13<00:18, 9.25s/it]" 1113 | ] 1114 | }, 1115 | { 1116 | "output_type": "stream", 1117 | "name": "stdout", 1118 | "text": [ 1119 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1120 | "None\n", 1121 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1122 | ] 1123 | }, 1124 | { 1125 | "output_type": "stream", 1126 | "name": "stderr", 1127 | "text": [ 1128 | "\r 90%|█████████ | 9/10 [01:23<00:09, 9.23s/it]" 1129 | ] 1130 | }, 1131 | { 1132 | "output_type": "stream", 1133 | "name": "stdout", 1134 | "text": [ 1135 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1136 | "None\n", 1137 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1138 | ] 1139 | }, 1140 | { 1141 | "output_type": "stream", 1142 | "name": "stderr", 1143 | "text": [ 1144 | "100%|██████████| 10/10 [01:32<00:00, 9.22s/it]" 1145 | ] 1146 | }, 1147 | { 1148 | "output_type": "stream", 1149 | "name": "stdout", 1150 | "text": [ 1151 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1152 | "None\n", 1153 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1154 | ] 1155 | }, 1156 | { 1157 | "output_type": "stream", 1158 | "name": "stderr", 1159 | "text": [ 1160 | "\n" 1161 | ] 1162 | } 1163 | ] 1164 | }, 1165 | { 1166 | "cell_type": "markdown", 1167 | "source": [ 1168 | "**Without labeled data**" 1169 | ], 1170 | "metadata": { 1171 | "id": "BzWPOGPtJuFy" 1172 | } 1173 | }, 1174 | { 1175 | "cell_type": "code", 1176 | "source": [ 1177 | "# getting classification dataset for prediction only\n", 1178 | "from skllm.datasets import get_multilabel_classification_dataset\n", 1179 | "from skllm import MultiLabelZeroShotGPTClassifier\n", 1180 | "X, _ = get_multilabel_classification_dataset()\n", 1181 | "\n", 1182 | "# Defining all the labels that needs to predicted\n", 1183 | "candidate_labels = [\n", 1184 | " \"Quality\",\n", 1185 | " \"Price\",\n", 1186 | " \"Delivery\",\n", 1187 | " \"Service\",\n", 1188 | " \"Product Variety\"\n", 1189 | "]\n", 1190 | "\n", 1191 | "# creating the model\n", 1192 | "clf = MultiLabelZeroShotGPTClassifier(max_labels=3)\n", 1193 | "\n", 1194 | "# fitting the labels only\n", 1195 | "clf.fit(None, [candidate_labels])\n", 1196 | "\n", 1197 | "# predicting the data\n", 1198 | "labels = clf.predict(X)" 1199 | ], 1200 | "metadata": { 1201 | "colab": { 1202 | "base_uri": "https://localhost:8080/" 1203 | }, 1204 | "id": "_jtZEuTHiSsx", 1205 | "outputId": "25786457-56eb-4b71-9b87-7c5d4334dd61" 1206 | }, 1207 | "execution_count": 24, 1208 | "outputs": [ 1209 | { 1210 | "output_type": "stream", 1211 | "name": "stderr", 1212 | "text": [ 1213 | " 10%|█ | 1/10 [00:09<01:22, 9.21s/it]" 1214 | ] 1215 | }, 1216 | { 1217 | "output_type": "stream", 1218 | "name": "stdout", 1219 | "text": [ 1220 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1221 | "None\n", 1222 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1223 | ] 1224 | }, 1225 | { 1226 | "output_type": "stream", 1227 | "name": "stderr", 1228 | "text": [ 1229 | "\r 20%|██ | 2/10 [00:18<01:13, 9.22s/it]" 1230 | ] 1231 | }, 1232 | { 1233 | "output_type": "stream", 1234 | "name": "stdout", 1235 | "text": [ 1236 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1237 | "None\n", 1238 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1239 | ] 1240 | }, 1241 | { 1242 | "output_type": "stream", 1243 | "name": "stderr", 1244 | "text": [ 1245 | "\r 30%|███ | 3/10 [00:27<01:04, 9.21s/it]" 1246 | ] 1247 | }, 1248 | { 1249 | "output_type": "stream", 1250 | "name": "stdout", 1251 | "text": [ 1252 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1253 | "None\n", 1254 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1255 | ] 1256 | }, 1257 | { 1258 | "output_type": "stream", 1259 | "name": "stderr", 1260 | "text": [ 1261 | "\r 40%|████ | 4/10 [00:36<00:55, 9.20s/it]" 1262 | ] 1263 | }, 1264 | { 1265 | "output_type": "stream", 1266 | "name": "stdout", 1267 | "text": [ 1268 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1269 | "None\n", 1270 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1271 | ] 1272 | }, 1273 | { 1274 | "output_type": "stream", 1275 | "name": "stderr", 1276 | "text": [ 1277 | "\r 50%|█████ | 5/10 [00:46<00:46, 9.21s/it]" 1278 | ] 1279 | }, 1280 | { 1281 | "output_type": "stream", 1282 | "name": "stdout", 1283 | "text": [ 1284 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1285 | "None\n", 1286 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1287 | ] 1288 | }, 1289 | { 1290 | "output_type": "stream", 1291 | "name": "stderr", 1292 | "text": [ 1293 | "\r 60%|██████ | 6/10 [00:55<00:36, 9.20s/it]" 1294 | ] 1295 | }, 1296 | { 1297 | "output_type": "stream", 1298 | "name": "stdout", 1299 | "text": [ 1300 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1301 | "None\n", 1302 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1303 | ] 1304 | }, 1305 | { 1306 | "output_type": "stream", 1307 | "name": "stderr", 1308 | "text": [ 1309 | "\r 70%|███████ | 7/10 [01:04<00:27, 9.20s/it]" 1310 | ] 1311 | }, 1312 | { 1313 | "output_type": "stream", 1314 | "name": "stdout", 1315 | "text": [ 1316 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1317 | "None\n", 1318 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1319 | ] 1320 | }, 1321 | { 1322 | "output_type": "stream", 1323 | "name": "stderr", 1324 | "text": [ 1325 | "\r 80%|████████ | 8/10 [01:13<00:18, 9.20s/it]" 1326 | ] 1327 | }, 1328 | { 1329 | "output_type": "stream", 1330 | "name": "stdout", 1331 | "text": [ 1332 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1333 | "None\n", 1334 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1335 | ] 1336 | }, 1337 | { 1338 | "output_type": "stream", 1339 | "name": "stderr", 1340 | "text": [ 1341 | "\r 90%|█████████ | 9/10 [01:22<00:09, 9.19s/it]" 1342 | ] 1343 | }, 1344 | { 1345 | "output_type": "stream", 1346 | "name": "stdout", 1347 | "text": [ 1348 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1349 | "None\n", 1350 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1351 | ] 1352 | }, 1353 | { 1354 | "output_type": "stream", 1355 | "name": "stderr", 1356 | "text": [ 1357 | "100%|██████████| 10/10 [01:31<00:00, 9.20s/it]" 1358 | ] 1359 | }, 1360 | { 1361 | "output_type": "stream", 1362 | "name": "stdout", 1363 | "text": [ 1364 | "Could not obtain the completion after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`\n", 1365 | "None\n", 1366 | "Could not extract the label from the completion: 'NoneType' object is not subscriptable\n" 1367 | ] 1368 | }, 1369 | { 1370 | "output_type": "stream", 1371 | "name": "stderr", 1372 | "text": [ 1373 | "\n" 1374 | ] 1375 | } 1376 | ] 1377 | }, 1378 | { 1379 | "cell_type": "markdown", 1380 | "source": [ 1381 | "**Text Vectorization**" 1382 | ], 1383 | "metadata": { 1384 | "id": "EYRB6-rTJ241" 1385 | } 1386 | }, 1387 | { 1388 | "cell_type": "code", 1389 | "source": [ 1390 | "# Importing the GPTVectorizer class from the skllm.preprocessing module\n", 1391 | "from skllm.preprocessing import GPTVectorizer\n", 1392 | "\n", 1393 | "# Creating an instance of the GPTVectorizer class and assigning it to the variable 'model'\n", 1394 | "model = GPTVectorizer()\n", 1395 | "\n", 1396 | "# transorming the\n", 1397 | "vectors = model.fit_transform(X)" 1398 | ], 1399 | "metadata": { 1400 | "colab": { 1401 | "base_uri": "https://localhost:8080/", 1402 | "height": 387 1403 | }, 1404 | "id": "xPALZOA1ATPw", 1405 | "outputId": "2daa0bfb-d3bd-4a29-b1bf-d7e4e97184c9" 1406 | }, 1407 | "execution_count": 25, 1408 | "outputs": [ 1409 | { 1410 | "output_type": "stream", 1411 | "name": "stderr", 1412 | "text": [ 1413 | " 0%| | 0/10 [00:09\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0;31m# transorming the\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 8\u001b[0;31m \u001b[0mvectors\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmodel\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit_transform\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 1424 | "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/sklearn/utils/_set_output.py\u001b[0m in \u001b[0;36mwrapped\u001b[0;34m(self, X, *args, **kwargs)\u001b[0m\n\u001b[1;32m 138\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0mwraps\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 139\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mwrapped\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 140\u001b[0;31m \u001b[0mdata_to_wrap\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 141\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata_to_wrap\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 142\u001b[0m \u001b[0;31m# only wrap the first output for cross decomposition\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 1425 | "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/skllm/preprocessing/gpt_vectorizer.py\u001b[0m in \u001b[0;36mfit_transform\u001b[0;34m(self, X, y, **fit_params)\u001b[0m\n\u001b[1;32m 92\u001b[0m \u001b[0membeddings\u001b[0m \u001b[0;34m:\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndarray\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 93\u001b[0m \"\"\"\n\u001b[0;32m---> 94\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtransform\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 1426 | "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/sklearn/utils/_set_output.py\u001b[0m in \u001b[0;36mwrapped\u001b[0;34m(self, X, *args, **kwargs)\u001b[0m\n\u001b[1;32m 138\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0mwraps\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 139\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mwrapped\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 140\u001b[0;31m \u001b[0mdata_to_wrap\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 141\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata_to_wrap\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 142\u001b[0m \u001b[0;31m# only wrap the first output for cross decomposition\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 1427 | "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/skllm/preprocessing/gpt_vectorizer.py\u001b[0m in \u001b[0;36mtransform\u001b[0;34m(self, X)\u001b[0m\n\u001b[1;32m 72\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtqdm\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 73\u001b[0m embeddings.append(\n\u001b[0;32m---> 74\u001b[0;31m \u001b[0m_get_embedding\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_openai_key\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_openai_org\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 75\u001b[0m )\n\u001b[1;32m 76\u001b[0m \u001b[0membeddings\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0masarray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0membeddings\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 1428 | "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/skllm/openai/embeddings.py\u001b[0m in \u001b[0;36mget_embedding\u001b[0;34m(text, key, org, model, max_retries)\u001b[0m\n\u001b[1;32m 46\u001b[0m \u001b[0merror_type\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 47\u001b[0m \u001b[0msleep\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 48\u001b[0;31m raise RuntimeError(\n\u001b[0m\u001b[1;32m 49\u001b[0m \u001b[0;34mf\"Could not obtain the embedding after {max_retries} retries: `{error_type} :: {error_msg}`\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 50\u001b[0m )\n", 1429 | "\u001b[0;31mRuntimeError\u001b[0m: Could not obtain the embedding after 3 retries: `RateLimitError :: You exceeded your current quota, please check your plan and billing details.`" 1430 | ] 1431 | } 1432 | ] 1433 | }, 1434 | { 1435 | "cell_type": "markdown", 1436 | "source": [ 1437 | "**GPTVectorizer with the XGBoost Classifier in a scikit-learn pipeline**" 1438 | ], 1439 | "metadata": { 1440 | "id": "5fabpeG5J7vt" 1441 | } 1442 | }, 1443 | { 1444 | "cell_type": "code", 1445 | "source": [ 1446 | "# Importing the necessary modules and classes\n", 1447 | "from sklearn.pipeline import Pipeline\n", 1448 | "from sklearn.preprocessing import LabelEncoder\n", 1449 | "from xgboost import XGBClassifier\n", 1450 | "\n", 1451 | "# Creating an instance of LabelEncoder class\n", 1452 | "le = LabelEncoder()\n", 1453 | "\n", 1454 | "# Encoding the training labels 'y_train' using LabelEncoder\n", 1455 | "y_train_encoded = le.fit_transform(y_train)\n", 1456 | "\n", 1457 | "# Encoding the test labels 'y_test' using LabelEncoder\n", 1458 | "y_test_encoded = le.transform(y_test)\n", 1459 | "\n", 1460 | "# Defining the steps of the pipeline as a list of tuples\n", 1461 | "steps = [('GPT', GPTVectorizer()), ('Clf', XGBClassifier())]\n", 1462 | "\n", 1463 | "# Creating a pipeline with the defined steps\n", 1464 | "clf = Pipeline(steps)\n", 1465 | "\n", 1466 | "# Fitting the pipeline on the training data 'X_train' and the encoded training labels 'y_train_encoded'\n", 1467 | "clf.fit(X_train, y_train_encoded)\n", 1468 | "\n", 1469 | "# Predicting the labels for the test data 'X_test' using the trained pipeline\n", 1470 | "yh = clf.predict(X_test)" 1471 | ], 1472 | "metadata": { 1473 | "id": "oaRA6mezBvpA" 1474 | }, 1475 | "execution_count": null, 1476 | "outputs": [] 1477 | }, 1478 | { 1479 | "cell_type": "markdown", 1480 | "source": [ 1481 | "**Text Summarization**" 1482 | ], 1483 | "metadata": { 1484 | "id": "ivlDIAP6KDSQ" 1485 | } 1486 | }, 1487 | { 1488 | "cell_type": "code", 1489 | "source": [ 1490 | "# Importing the GPTSummarizer class from the skllm.preprocessing module\n", 1491 | "from skllm.preprocessing import GPTSummarizer\n", 1492 | "\n", 1493 | "# Importing the get_summarization_dataset function\n", 1494 | "from skllm.datasets import get_summarization_dataset\n", 1495 | "\n", 1496 | "# Calling the get_summarization_dataset function\n", 1497 | "X = get_summarization_dataset()\n", 1498 | "\n", 1499 | "# Creating an instance of the GPTSummarizer\n", 1500 | "s = GPTSummarizer(openai_model='gpt-3.5-turbo', max_words=15)\n", 1501 | "\n", 1502 | "# Applying the fit_transform method of the GPTSummarizer instance to the input data 'X'.\n", 1503 | "# It fits the model to the data and generates the summaries, which are assigned to the variable 'summaries'\n", 1504 | "summaries = s.fit_transform(X)" 1505 | ], 1506 | "metadata": { 1507 | "id": "YLgAjTETCa8p" 1508 | }, 1509 | "execution_count": null, 1510 | "outputs": [] 1511 | }, 1512 | { 1513 | "cell_type": "code", 1514 | "source": [], 1515 | "metadata": { 1516 | "id": "j_gXjBvv6LUw" 1517 | }, 1518 | "execution_count": null, 1519 | "outputs": [] 1520 | } 1521 | ] 1522 | } --------------------------------------------------------------------------------