├── .gitignore ├── .env.example ├── README.md ├── Summarize_With_Loaders.ipynb └── Summarize_Youtube_Transcript.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | .env -------------------------------------------------------------------------------- /.env.example: -------------------------------------------------------------------------------- 1 | OPENAI_API_KEY=1234567890 -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Use LangChain to Summarize Youtube Video Transcript 2 | 3 | ## Get Started 4 | 5 | Before you run the notebook, please make sure you create a .env file in the root directory with the following environment set up. 6 | 7 | ``` 8 | OPENAI_API_KEY= 9 | ``` 10 | 11 | Please refer to .env.example for what it looks like. 12 | 13 | ## Python notebooks in this repository 14 | 15 | ### [Summarize_Youtube_Transcript.ipynb](./Summarize_Youtube_Transcript.ipynb) 16 | 17 | This is an example that shows you how to summarize a Youtube video by its transcript. 18 | 19 | ### [Summarize_With_Loaders.ipynb](./Summarize_With_Loaders.ipynb) 20 | 21 | This is an example that shows you how to use document loaders to summarize the following resources: 22 | 1. URL 23 | 2. PowerPoint 24 | 3. ReadTheDocs site 25 | 4. PDF 26 | -------------------------------------------------------------------------------- /Summarize_With_Loaders.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "view-in-github" 8 | }, 9 | "source": [ 10 | "\"Open" 11 | ] 12 | }, 13 | { 14 | "attachments": {}, 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "# Introduction\n", 19 | "\n", 20 | "This is an example that shows you how to use document loaders to summarize the following resources:\n", 21 | "1. URL\n", 22 | "2. PowerPoint\n", 23 | "3. ReadTheDocs site\n", 24 | "4. PDF" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 66, 30 | "metadata": { 31 | "id": "hZlVnW5zvOM_" 32 | }, 33 | "outputs": [ 34 | { 35 | "name": "stdout", 36 | "output_type": "stream", 37 | "text": [ 38 | "Requirement already satisfied: openai in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (0.27.2)\n", 39 | "Requirement already satisfied: requests>=2.20 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from openai) (2.28.2)\n", 40 | "Requirement already satisfied: tqdm in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from openai) (4.65.0)\n", 41 | "Requirement already satisfied: aiohttp in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from openai) (3.7.4.post0)\n", 42 | "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.20->openai) (1.26.15)\n", 43 | "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.20->openai) (3.1.0)\n", 44 | "Requirement already satisfied: idna<4,>=2.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.20->openai) (3.4)\n", 45 | "Requirement already satisfied: certifi>=2017.4.17 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.20->openai) (2022.12.7)\n", 46 | "Requirement already satisfied: chardet<5.0,>=2.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp->openai) (4.0.0)\n", 47 | "Requirement already satisfied: attrs>=17.3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp->openai) (22.2.0)\n", 48 | "Requirement already satisfied: async-timeout<4.0,>=3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp->openai) (3.0.1)\n", 49 | "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp->openai) (1.8.2)\n", 50 | "Requirement already satisfied: typing-extensions>=3.6.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp->openai) (4.5.0)\n", 51 | "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp->openai) (6.0.4)\n", 52 | "Note: you may need to restart the kernel to use updated packages.\n", 53 | "Requirement already satisfied: langchain==0.0.139 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (0.0.139)\n", 54 | "Requirement already satisfied: SQLAlchemy<2,>=1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain==0.0.139) (1.4.47)\n", 55 | "Requirement already satisfied: pydantic<2,>=1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain==0.0.139) (1.10.7)\n", 56 | "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain==0.0.139) (8.2.2)\n", 57 | "Collecting aiohttp<4.0.0,>=3.8.3\n", 58 | " Using cached aiohttp-3.8.4-cp39-cp39-macosx_10_9_x86_64.whl (360 kB)\n", 59 | "Requirement already satisfied: numpy<2,>=1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain==0.0.139) (1.23.5)\n", 60 | "Requirement already satisfied: requests<3,>=2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain==0.0.139) (2.28.2)\n", 61 | "Collecting async-timeout<5.0.0,>=4.0.0\n", 62 | " Using cached async_timeout-4.0.2-py3-none-any.whl (5.8 kB)\n", 63 | "Requirement already satisfied: PyYAML>=5.4.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain==0.0.139) (5.4.1)\n", 64 | "Requirement already satisfied: openapi-schema-pydantic<2.0,>=1.2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain==0.0.139) (1.2.4)\n", 65 | "Requirement already satisfied: dataclasses-json<0.6.0,>=0.5.7 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain==0.0.139) (0.5.7)\n", 66 | "Requirement already satisfied: gptcache>=0.1.7 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain==0.0.139) (0.1.10)\n", 67 | "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain==0.0.139) (3.1.0)\n", 68 | "Requirement already satisfied: frozenlist>=1.1.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain==0.0.139) (1.3.3)\n", 69 | "Requirement already satisfied: attrs>=17.3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain==0.0.139) (22.2.0)\n", 70 | "Requirement already satisfied: aiosignal>=1.1.2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain==0.0.139) (1.3.1)\n", 71 | "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain==0.0.139) (1.8.2)\n", 72 | "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain==0.0.139) (6.0.4)\n", 73 | "Requirement already satisfied: marshmallow<4.0.0,>=3.3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain==0.0.139) (3.19.0)\n", 74 | "Requirement already satisfied: typing-inspect>=0.4.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain==0.0.139) (0.8.0)\n", 75 | "Requirement already satisfied: marshmallow-enum<2.0.0,>=1.5.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain==0.0.139) (1.5.1)\n", 76 | "Requirement already satisfied: openai in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from gptcache>=0.1.7->langchain==0.0.139) (0.27.2)\n", 77 | "Requirement already satisfied: cachetools in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from gptcache>=0.1.7->langchain==0.0.139) (5.3.0)\n", 78 | "Requirement already satisfied: typing-extensions>=4.2.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from pydantic<2,>=1->langchain==0.0.139) (4.5.0)\n", 79 | "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests<3,>=2->langchain==0.0.139) (1.26.15)\n", 80 | "Requirement already satisfied: certifi>=2017.4.17 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests<3,>=2->langchain==0.0.139) (2022.12.7)\n", 81 | "Requirement already satisfied: idna<4,>=2.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests<3,>=2->langchain==0.0.139) (3.4)\n", 82 | "Requirement already satisfied: greenlet!=0.4.17 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from SQLAlchemy<2,>=1->langchain==0.0.139) (2.0.2)\n", 83 | "Requirement already satisfied: packaging>=17.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from marshmallow<4.0.0,>=3.3.0->dataclasses-json<0.6.0,>=0.5.7->langchain==0.0.139) (23.0)\n", 84 | "Requirement already satisfied: mypy-extensions>=0.3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from typing-inspect>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain==0.0.139) (1.0.0)\n", 85 | "Requirement already satisfied: tqdm in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from openai->gptcache>=0.1.7->langchain==0.0.139) (4.65.0)\n", 86 | "Installing collected packages: async-timeout, aiohttp\n", 87 | " Attempting uninstall: async-timeout\n", 88 | " Found existing installation: async-timeout 3.0.1\n", 89 | " Uninstalling async-timeout-3.0.1:\n", 90 | " Successfully uninstalled async-timeout-3.0.1\n", 91 | " Attempting uninstall: aiohttp\n", 92 | " Found existing installation: aiohttp 3.7.4.post0\n", 93 | " Uninstalling aiohttp-3.7.4.post0:\n", 94 | " Successfully uninstalled aiohttp-3.7.4.post0\n", 95 | "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", 96 | "bilibili-api 9.1.0 requires aiohttp~=3.7.4.post0, but you have aiohttp 3.8.4 which is incompatible.\u001b[0m\u001b[31m\n", 97 | "\u001b[0mSuccessfully installed aiohttp-3.8.4 async-timeout-4.0.2\n", 98 | "Note: you may need to restart the kernel to use updated packages.\n", 99 | "Requirement already satisfied: unstructured in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (0.5.11)\n", 100 | "Requirement already satisfied: argilla in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (1.5.1)\n", 101 | "Requirement already satisfied: lxml in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (4.6.5)\n", 102 | "Requirement already satisfied: msg_parser in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (1.2.0)\n", 103 | "Requirement already satisfied: nltk in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (3.8.1)\n", 104 | "Requirement already satisfied: openpyxl in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (3.1.2)\n", 105 | "Requirement already satisfied: pandas in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (1.5.3)\n", 106 | "Requirement already satisfied: pillow in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (9.5.0)\n", 107 | "Requirement already satisfied: pypandoc in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (1.11)\n", 108 | "Requirement already satisfied: python-docx in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (0.8.11)\n", 109 | "Requirement already satisfied: python-pptx in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (0.6.21)\n", 110 | "Requirement already satisfied: python-magic in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (0.4.27)\n", 111 | "Requirement already satisfied: markdown in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (3.4.3)\n", 112 | "Requirement already satisfied: requests in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (2.28.2)\n", 113 | "Requirement already satisfied: certifi>=2022.12.07 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (2022.12.7)\n", 114 | "Requirement already satisfied: packaging>=20.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (23.0)\n", 115 | "Requirement already satisfied: httpx<0.24,>=0.15 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (0.23.3)\n", 116 | "Requirement already satisfied: deprecated~=1.2.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (1.2.13)\n", 117 | "Requirement already satisfied: wrapt<1.15,>=1.13 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (1.14.1)\n", 118 | "Requirement already satisfied: pydantic>=1.7.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (1.10.7)\n", 119 | "Requirement already satisfied: numpy<1.24.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (1.23.5)\n", 120 | "Requirement already satisfied: monotonic in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (1.6)\n", 121 | "Requirement already satisfied: backoff in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (2.2.1)\n", 122 | "Requirement already satisfied: tqdm>=4.27.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (4.65.0)\n", 123 | "Requirement already satisfied: rich<=13.0.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (13.0.1)\n", 124 | "Requirement already satisfied: pytz>=2020.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from pandas->unstructured) (2023.2)\n", 125 | "Requirement already satisfied: python-dateutil>=2.8.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from pandas->unstructured) (2.8.2)\n", 126 | "Requirement already satisfied: importlib-metadata>=4.4 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from markdown->unstructured) (6.1.0)\n", 127 | "Requirement already satisfied: olefile>=0.46 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from msg_parser->unstructured) (0.46)\n", 128 | "Requirement already satisfied: click in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from nltk->unstructured) (8.1.3)\n", 129 | "Requirement already satisfied: regex>=2021.8.3 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from nltk->unstructured) (2023.3.23)\n", 130 | "Requirement already satisfied: joblib in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from nltk->unstructured) (1.2.0)\n", 131 | "Requirement already satisfied: et-xmlfile in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from openpyxl->unstructured) (1.1.0)\n", 132 | "Requirement already satisfied: XlsxWriter>=0.5.7 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from python-pptx->unstructured) (3.0.9)\n", 133 | "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests->unstructured) (1.26.15)\n", 134 | "Requirement already satisfied: idna<4,>=2.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests->unstructured) (3.4)\n", 135 | "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests->unstructured) (3.1.0)\n", 136 | "Requirement already satisfied: rfc3986[idna2008]<2,>=1.3 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from httpx<0.24,>=0.15->argilla->unstructured) (1.5.0)\n", 137 | "Requirement already satisfied: httpcore<0.17.0,>=0.15.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from httpx<0.24,>=0.15->argilla->unstructured) (0.16.3)\n", 138 | "Requirement already satisfied: sniffio in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from httpx<0.24,>=0.15->argilla->unstructured) (1.3.0)\n", 139 | "Requirement already satisfied: zipp>=0.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from importlib-metadata>=4.4->markdown->unstructured) (3.15.0)\n", 140 | "Requirement already satisfied: typing-extensions>=4.2.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from pydantic>=1.7.1->argilla->unstructured) (4.5.0)\n", 141 | "Requirement already satisfied: six>=1.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from python-dateutil>=2.8.1->pandas->unstructured) (1.16.0)\n", 142 | "Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from rich<=13.0.1->argilla->unstructured) (0.9.1)\n", 143 | "Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from rich<=13.0.1->argilla->unstructured) (2.14.0)\n", 144 | "Requirement already satisfied: anyio<5.0,>=3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from httpcore<0.17.0,>=0.15.0->httpx<0.24,>=0.15->argilla->unstructured) (3.6.2)\n", 145 | "Requirement already satisfied: h11<0.15,>=0.13 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from httpcore<0.17.0,>=0.15.0->httpx<0.24,>=0.15->argilla->unstructured) (0.14.0)\n", 146 | "Note: you may need to restart the kernel to use updated packages.\n", 147 | "Requirement already satisfied: tiktoken in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (0.3.2)\n", 148 | "Requirement already satisfied: requests>=2.26.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from tiktoken) (2.28.2)\n", 149 | "Requirement already satisfied: regex>=2022.1.18 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from tiktoken) (2023.3.23)\n", 150 | "Requirement already satisfied: certifi>=2017.4.17 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (2022.12.7)\n", 151 | "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (3.1.0)\n", 152 | "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (1.26.15)\n", 153 | "Requirement already satisfied: idna<4,>=2.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (3.4)\n", 154 | "Note: you may need to restart the kernel to use updated packages.\n" 155 | ] 156 | } 157 | ], 158 | "source": [ 159 | "%pip install openai\n", 160 | "%pip install langchain==0.0.139\n", 161 | "%pip install unstructured\n", 162 | "%pip install tiktoken" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 67, 168 | "metadata": { 169 | "id": "5lZp3p97vZPy" 170 | }, 171 | "outputs": [], 172 | "source": [ 173 | "import os\n", 174 | "from langchain.document_loaders import UnstructuredURLLoader, UnstructuredPowerPointLoader, ReadTheDocsLoader, PyPDFLoader\n", 175 | "from langchain.llms import OpenAI\n", 176 | "from langchain.chains.summarize import load_summarize_chain\n", 177 | "from langchain.callbacks import get_openai_callback\n", 178 | "from langchain.text_splitter import RecursiveCharacterTextSplitter" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 68, 184 | "metadata": {}, 185 | "outputs": [], 186 | "source": [ 187 | "def summarize_docs(docs, doc_url):\n", 188 | " print (f'You have {len(docs)} document(s) in your {doc_url} data')\n", 189 | " print (f'There are {len(docs[0].page_content)} characters in your document')\n", 190 | "\n", 191 | " text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", 192 | " split_docs = text_splitter.split_documents(docs)\n", 193 | "\n", 194 | " print (f'You have {len(split_docs)} split document(s)')\n", 195 | "\n", 196 | " OPENAI_API_KEY = os.environ['OPENAI_API_KEY']\n", 197 | " llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY, model_name=\"text-davinci-003\")\n", 198 | " chain = load_summarize_chain(llm, chain_type=\"map_reduce\", verbose=False)\n", 199 | "\n", 200 | " response = \"\"\n", 201 | " with get_openai_callback() as cb:\n", 202 | " response = chain.run(input_documents=split_docs)\n", 203 | " print(f\"Total Tokens: {cb.total_tokens}\")\n", 204 | " print(f\"Prompt Tokens: {cb.prompt_tokens}\")\n", 205 | " print(f\"Completion Tokens: {cb.completion_tokens}\")\n", 206 | " print(f\"Successful Requests: {cb.successful_requests}\")\n", 207 | " print(f\"Total Cost (USD): ${cb.total_cost}\")\n", 208 | "\n", 209 | " return response" 210 | ] 211 | }, 212 | { 213 | "attachments": {}, 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "1. Load a web page by its URL and get its content summarized." 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 69, 223 | "metadata": {}, 224 | "outputs": [ 225 | { 226 | "name": "stdout", 227 | "output_type": "stream", 228 | "text": [ 229 | "You have 1 document(s) in your https://edition.cnn.com/2023/04/13/business/delta-earnings/index.html data\n", 230 | "There are 2780 characters in your document\n", 231 | "You have 4 split document(s)\n", 232 | "Total Tokens: 1416\n", 233 | "Prompt Tokens: 980\n", 234 | "Completion Tokens: 436\n", 235 | "Successful Requests: 2\n", 236 | "Total Cost (USD): $0.02832\n" 237 | ] 238 | }, 239 | { 240 | "data": { 241 | "text/plain": [ 242 | "' Delta Airlines reported record advanced bookings for the summer, indicating a recovery from pandemic-related losses. Despite a one-time charge of $864 million related to a four-year labor deal with pilots, the company reported a net profit when excluding special items. Revenue was up 45% from a year earlier and 14% from the same period in 2019. Additionally, a passenger was taken into custody after opening a door of a Boeing 737 and deploying an emergency exit slide at Los Angeles International Airport. Delta Airlines is expecting earnings per share of between $2 and $2.25, and between $5 and $6 for the full year. Other major US airlines are likely to face rising labor costs due to upcoming negotiations with a majority of their employees.'" 243 | ] 244 | }, 245 | "execution_count": 69, 246 | "metadata": {}, 247 | "output_type": "execute_result" 248 | } 249 | ], 250 | "source": [ 251 | "url = \"https://edition.cnn.com/2023/04/13/business/delta-earnings/index.html\"\n", 252 | "summarize_docs(UnstructuredURLLoader(urls = [url]).load(), url)" 253 | ] 254 | }, 255 | { 256 | "attachments": {}, 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "2. Load PowerPoint file and get its content summarized." 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 70, 266 | "metadata": {}, 267 | "outputs": [ 268 | { 269 | "name": "stdout", 270 | "output_type": "stream", 271 | "text": [ 272 | "--2023-04-13 23:38:31-- https://github.com/tomw1808/truffle_eth_class2/blob/master/s08/Web3-intro.pptx?raw=true\n", 273 | "Resolving github.com (github.com)... 140.82.121.4\n", 274 | "Connecting to github.com (github.com)|140.82.121.4|:443... connected.\n", 275 | "HTTP request sent, awaiting response... 302 Found\n", 276 | "Location: https://github.com/tomw1808/truffle_eth_class2/raw/master/s08/Web3-intro.pptx [following]\n", 277 | "--2023-04-13 23:38:31-- https://github.com/tomw1808/truffle_eth_class2/raw/master/s08/Web3-intro.pptx\n", 278 | "Reusing existing connection to github.com:443.\n", 279 | "HTTP request sent, awaiting response... 302 Found\n", 280 | "Location: https://raw.githubusercontent.com/tomw1808/truffle_eth_class2/master/s08/Web3-intro.pptx [following]\n", 281 | "--2023-04-13 23:38:31-- https://raw.githubusercontent.com/tomw1808/truffle_eth_class2/master/s08/Web3-intro.pptx\n", 282 | "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8002::154, 2606:50c0:8001::154, ...\n", 283 | "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.\n", 284 | "HTTP request sent, awaiting response... 200 OK\n", 285 | "Length: 2598023 (2.5M) [application/octet-stream]\n", 286 | "Saving to: ‘Web3-intro.pptx’\n", 287 | "\n", 288 | "Web3-intro.pptx 100%[===================>] 2.48M 6.63MB/s in 0.4s \n", 289 | "\n", 290 | "2023-04-13 23:38:32 (6.63 MB/s) - ‘Web3-intro.pptx’ saved [2598023/2598023]\n", 291 | "\n" 292 | ] 293 | } 294 | ], 295 | "source": [ 296 | "\n", 297 | "!wget \"https://github.com/tomw1808/truffle_eth_class2/blob/master/s08/Web3-intro.pptx?raw=true\" -O Web3-intro.pptx" 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": 71, 303 | "metadata": {}, 304 | "outputs": [ 305 | { 306 | "name": "stdout", 307 | "output_type": "stream", 308 | "text": [ 309 | "You have 1 document(s) in your Web3-intro.pptx data\n", 310 | "There are 864 characters in your document\n", 311 | "You have 1 split document(s)\n", 312 | "Total Tokens: 531\n", 313 | "Prompt Tokens: 408\n", 314 | "Completion Tokens: 123\n", 315 | "Successful Requests: 2\n", 316 | "Total Cost (USD): $0.01062\n", 317 | " Web3 is a Javascript library that enables users to interact with the blockchain via the json-RPC interface. It connects the browser to the blockchain via port 8545 and provides practical examples such as connecting to the Ethereum Wiki and getting the balance of an account.\n" 318 | ] 319 | } 320 | ], 321 | "source": [ 322 | "loader = UnstructuredPowerPointLoader(\"Web3-intro.pptx\")\n", 323 | "response = summarize_docs(loader.load(), \"Web3-intro.pptx\")\n", 324 | "print(response)" 325 | ] 326 | }, 327 | { 328 | "attachments": {}, 329 | "cell_type": "markdown", 330 | "metadata": {}, 331 | "source": [ 332 | "3. Load readthedocs project and get its content summarized." 333 | ] 334 | }, 335 | { 336 | "cell_type": "code", 337 | "execution_count": 72, 338 | "metadata": {}, 339 | "outputs": [ 340 | { 341 | "name": "stdout", 342 | "output_type": "stream", 343 | "text": [ 344 | "--2023-04-13 23:42:04-- https://langchain.readthedocs.io/en/latest/\n", 345 | "Resolving langchain.readthedocs.io (langchain.readthedocs.io)... 2606:4700::6811:2152, 2606:4700::6811:2052, 104.17.32.82, ...\n", 346 | "Connecting to langchain.readthedocs.io (langchain.readthedocs.io)|2606:4700::6811:2152|:443... connected.\n", 347 | "HTTP request sent, awaiting response... 302 Found\n", 348 | "Location: https://python.langchain.com/en/latest/ [following]\n", 349 | "--2023-04-13 23:42:04-- https://python.langchain.com/en/latest/\n", 350 | "Resolving python.langchain.com (python.langchain.com)... 2606:4700::6811:2052, 2606:4700::6811:2152, 104.17.32.82, ...\n", 351 | "Connecting to python.langchain.com (python.langchain.com)|2606:4700::6811:2052|:443... connected.\n", 352 | "HTTP request sent, awaiting response... 200 OK\n", 353 | "Length: unspecified [text/html]\n", 354 | "Saving to: ‘langchain/langchain.readthedocs.io/en/latest/index.html’\n", 355 | "\n", 356 | "langchain.readthedo [ <=> ] 78.21K --.-KB/s in 0.05s \n", 357 | "\n", 358 | "2023-04-13 23:42:04 (1.62 MB/s) - ‘langchain/langchain.readthedocs.io/en/latest/index.html’ saved [80091]\n", 359 | "\n", 360 | "FINISHED --2023-04-13 23:42:04--\n", 361 | "Total wall clock time: 0.5s\n", 362 | "Downloaded: 1 files, 78K in 0.05s (1.62 MB/s)\n" 363 | ] 364 | } 365 | ], 366 | "source": [ 367 | "!wget -r -A.html -P langchain \"https://langchain.readthedocs.io/en/latest/\"" 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": 73, 373 | "metadata": {}, 374 | "outputs": [ 375 | { 376 | "name": "stdout", 377 | "output_type": "stream", 378 | "text": [ 379 | "You have 1 document(s) in your langchain data\n", 380 | "There are 5350 characters in your document\n", 381 | "You have 6 split document(s)\n", 382 | "Total Tokens: 2123\n", 383 | "Prompt Tokens: 1644\n", 384 | "Completion Tokens: 479\n", 385 | "Successful Requests: 2\n", 386 | "Total Cost (USD): $0.04246\n" 387 | ] 388 | }, 389 | { 390 | "data": { 391 | "text/plain": [ 392 | "' LangChain is a framework for developing applications powered by language models. It provides modules for models, prompts, memory, indexes, and chains, as well as resources such as the LangChainHub, a glossary, a gallery, deployments, and tracing guides. ModelLaboratory is a platform that makes it easy to experiment with different prompts, models, and chains. There is a Discord to discuss LangChain, and production support is available with a dedicated Slack channel. The Quickstart Guide provides information on getting started, modules, use cases, reference docs, LangChain Ecosystem, and additional resources.'" 393 | ] 394 | }, 395 | "execution_count": 73, 396 | "metadata": {}, 397 | "output_type": "execute_result" 398 | } 399 | ], 400 | "source": [ 401 | "loader = ReadTheDocsLoader(\"langchain\")\n", 402 | "summarize_docs(loader.load(), \"langchain\")" 403 | ] 404 | }, 405 | { 406 | "attachments": {}, 407 | "cell_type": "markdown", 408 | "metadata": {}, 409 | "source": [ 410 | "4. Load PDF file by URL and get its content summarized." 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": 74, 416 | "metadata": {}, 417 | "outputs": [ 418 | { 419 | "name": "stdout", 420 | "output_type": "stream", 421 | "text": [ 422 | "--2023-04-13 23:45:16-- https://ir.tesla.com/_flysystem/s3/sec/000095017023001409/tsla-20221231-gen.pdf\n", 423 | "Resolving ir.tesla.com (ir.tesla.com)... 2a02:26f0:9b00:39d::700, 2a02:26f0:9b00:393::700, 92.122.160.52\n", 424 | "Connecting to ir.tesla.com (ir.tesla.com)|2a02:26f0:9b00:39d::700|:443... connected.\n", 425 | "HTTP request sent, awaiting response... 200 OK\n", 426 | "Length: unspecified [application/pdf]\n", 427 | "Saving to: ‘tsla-20221231-gen.pdf’\n", 428 | "\n", 429 | "tsla-20221231-gen.p [ <=> ] 1.57M 5.48MB/s in 0.3s \n", 430 | "\n", 431 | "2023-04-13 23:45:17 (5.48 MB/s) - ‘tsla-20221231-gen.pdf’ saved [1650825]\n", 432 | "\n" 433 | ] 434 | } 435 | ], 436 | "source": [ 437 | "!wget \"https://ir.tesla.com/_flysystem/s3/sec/000095017023001409/tsla-20221231-gen.pdf\" -O tsla-20221231-gen.pdf" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": 75, 443 | "metadata": {}, 444 | "outputs": [ 445 | { 446 | "name": "stdout", 447 | "output_type": "stream", 448 | "text": [ 449 | "Requirement already satisfied: pypdf in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (3.7.1)\n", 450 | "Requirement already satisfied: typing_extensions>=3.10.0.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from pypdf) (4.5.0)\n", 451 | "Note: you may need to restart the kernel to use updated packages.\n" 452 | ] 453 | } 454 | ], 455 | "source": [ 456 | "%pip install pypdf" 457 | ] 458 | }, 459 | { 460 | "cell_type": "code", 461 | "execution_count": 76, 462 | "metadata": {}, 463 | "outputs": [ 464 | { 465 | "name": "stdout", 466 | "output_type": "stream", 467 | "text": [ 468 | "You have 10 document(s) in your tsla-20221231-gen.pdf data\n", 469 | "There are 3793 characters in your document\n", 470 | "You have 30 split document(s)\n", 471 | "Total Tokens: 14889\n", 472 | "Prompt Tokens: 12541\n", 473 | "Completion Tokens: 2348\n", 474 | "Successful Requests: 2\n", 475 | "Total Cost (USD): $0.29778\n" 476 | ] 477 | }, 478 | { 479 | "data": { 480 | "text/plain": [ 481 | "\" Tesla, Inc. has released its annual report on Form 10-K for the year ended December 31, 2022. The report includes information on the company's business, risk factors, unresolved staff comments, properties, legal proceedings, mine safety disclosures, market for the company's common equity, management's discussion and analysis of financial condition and results of operations, quantitative and qualitative disclosures about market risk, financial statements and supplementary data, changes in and disagreements with accountants on accounting and financial disclosure, controls and procedures, other information, and disclosure regarding foreign jurisdictions that prevent inspections. Tesla designs, develops, manufactures, sells, and leases high-performance electric vehicles and energy generation and storage systems, and offers related services. They offer leasing and loan financing arrangements for vehicles in North America, Europe, and Asia, and provide resale value guarantees or buyback guarantees in certain programs. They also offer an extensive network of Supercharger stops for their vehicles, with payment or free access depending on certain sales programs.\"" 482 | ] 483 | }, 484 | "execution_count": 76, 485 | "metadata": {}, 486 | "output_type": "execute_result" 487 | } 488 | ], 489 | "source": [ 490 | "loader = PyPDFLoader(\"tsla-20221231-gen.pdf\")\n", 491 | "pages = loader.load_and_split()\n", 492 | "summarize_docs(pages[:10], \"tsla-20221231-gen.pdf\")" 493 | ] 494 | } 495 | ], 496 | "metadata": { 497 | "colab": { 498 | "authorship_tag": "ABX9TyMmoN24WxC9YPbZeCUtS0+a", 499 | "include_colab_link": true, 500 | "provenance": [] 501 | }, 502 | "kernelspec": { 503 | "display_name": "Python 3", 504 | "name": "python3" 505 | }, 506 | "language_info": { 507 | "codemirror_mode": { 508 | "name": "ipython", 509 | "version": 3 510 | }, 511 | "file_extension": ".py", 512 | "mimetype": "text/x-python", 513 | "name": "python", 514 | "nbconvert_exporter": "python", 515 | "pygments_lexer": "ipython3", 516 | "version": "3.9.16" 517 | } 518 | }, 519 | "nbformat": 4, 520 | "nbformat_minor": 0 521 | } 522 | -------------------------------------------------------------------------------- /Summarize_Youtube_Transcript.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "view-in-github" 8 | }, 9 | "source": [ 10 | "\"Open" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 49, 16 | "metadata": { 17 | "id": "hZlVnW5zvOM_" 18 | }, 19 | "outputs": [ 20 | { 21 | "name": "stdout", 22 | "output_type": "stream", 23 | "text": [ 24 | "Requirement already satisfied: youtube_transcript_api in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (0.5.0)\n", 25 | "Requirement already satisfied: requests in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from youtube_transcript_api) (2.28.2)\n", 26 | "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests->youtube_transcript_api) (3.1.0)\n", 27 | "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests->youtube_transcript_api) (1.26.15)\n", 28 | "Requirement already satisfied: idna<4,>=2.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests->youtube_transcript_api) (3.4)\n", 29 | "Requirement already satisfied: certifi>=2017.4.17 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests->youtube_transcript_api) (2022.12.7)\n", 30 | "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 23.0.1 is available.\n", 31 | "You should consider upgrading via the '/Users/wyang14/.pyenv/versions/3.9.16/bin/python -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n", 32 | "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n", 33 | "Requirement already satisfied: openai in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (0.27.2)\n", 34 | "Requirement already satisfied: requests>=2.20 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from openai) (2.28.2)\n", 35 | "Requirement already satisfied: tqdm in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from openai) (4.65.0)\n", 36 | "Requirement already satisfied: aiohttp in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from openai) (3.8.4)\n", 37 | "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.20->openai) (3.1.0)\n", 38 | "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.20->openai) (1.26.15)\n", 39 | "Requirement already satisfied: idna<4,>=2.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.20->openai) (3.4)\n", 40 | "Requirement already satisfied: certifi>=2017.4.17 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.20->openai) (2022.12.7)\n", 41 | "Requirement already satisfied: attrs>=17.3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp->openai) (22.2.0)\n", 42 | "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp->openai) (6.0.4)\n", 43 | "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp->openai) (4.0.2)\n", 44 | "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp->openai) (1.8.2)\n", 45 | "Requirement already satisfied: aiosignal>=1.1.2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp->openai) (1.3.1)\n", 46 | "Requirement already satisfied: frozenlist>=1.1.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp->openai) (1.3.3)\n", 47 | "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 23.0.1 is available.\n", 48 | "You should consider upgrading via the '/Users/wyang14/.pyenv/versions/3.9.16/bin/python -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n", 49 | "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n", 50 | "Requirement already satisfied: langchain in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (0.0.135)\n", 51 | "Requirement already satisfied: SQLAlchemy<2,>=1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (1.4.47)\n", 52 | "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (4.0.2)\n", 53 | "Requirement already satisfied: PyYAML>=5.4.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (6.0)\n", 54 | "Requirement already satisfied: numpy<2,>=1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (1.23.5)\n", 55 | "Requirement already satisfied: openapi-schema-pydantic<2.0,>=1.2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (1.2.4)\n", 56 | "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (3.8.4)\n", 57 | "Requirement already satisfied: requests<3,>=2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (2.28.2)\n", 58 | "Requirement already satisfied: dataclasses-json<0.6.0,>=0.5.7 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (0.5.7)\n", 59 | "Requirement already satisfied: pydantic<2,>=1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (1.10.7)\n", 60 | "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (8.2.2)\n", 61 | "Requirement already satisfied: attrs>=17.3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (22.2.0)\n", 62 | "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (3.1.0)\n", 63 | "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.8.2)\n", 64 | "Requirement already satisfied: aiosignal>=1.1.2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.1)\n", 65 | "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.0.4)\n", 66 | "Requirement already satisfied: frozenlist>=1.1.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.3)\n", 67 | "Requirement already satisfied: typing-inspect>=0.4.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (0.8.0)\n", 68 | "Requirement already satisfied: marshmallow-enum<2.0.0,>=1.5.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (1.5.1)\n", 69 | "Requirement already satisfied: marshmallow<4.0.0,>=3.3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (3.19.0)\n", 70 | "Requirement already satisfied: typing-extensions>=4.2.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from pydantic<2,>=1->langchain) (4.5.0)\n", 71 | "Requirement already satisfied: certifi>=2017.4.17 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests<3,>=2->langchain) (2022.12.7)\n", 72 | "Requirement already satisfied: idna<4,>=2.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests<3,>=2->langchain) (3.4)\n", 73 | "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests<3,>=2->langchain) (1.26.15)\n", 74 | "Requirement already satisfied: greenlet!=0.4.17 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from SQLAlchemy<2,>=1->langchain) (2.0.2)\n", 75 | "Requirement already satisfied: packaging>=17.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from marshmallow<4.0.0,>=3.3.0->dataclasses-json<0.6.0,>=0.5.7->langchain) (23.0)\n", 76 | "Requirement already satisfied: mypy-extensions>=0.3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from typing-inspect>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain) (1.0.0)\n", 77 | "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 23.0.1 is available.\n", 78 | "You should consider upgrading via the '/Users/wyang14/.pyenv/versions/3.9.16/bin/python -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n", 79 | "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n", 80 | "Requirement already satisfied: unstructured in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (0.5.11)\n", 81 | "Requirement already satisfied: argilla in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (1.5.1)\n", 82 | "Requirement already satisfied: lxml in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (4.9.2)\n", 83 | "Requirement already satisfied: msg_parser in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (1.2.0)\n", 84 | "Requirement already satisfied: nltk in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (3.8.1)\n", 85 | "Requirement already satisfied: openpyxl in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (3.1.2)\n", 86 | "Requirement already satisfied: pandas in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (1.5.3)\n", 87 | "Requirement already satisfied: pillow in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (9.5.0)\n", 88 | "Requirement already satisfied: pypandoc in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (1.11)\n", 89 | "Requirement already satisfied: python-docx in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (0.8.11)\n", 90 | "Requirement already satisfied: python-pptx in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (0.6.21)\n", 91 | "Requirement already satisfied: python-magic in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (0.4.27)\n", 92 | "Requirement already satisfied: markdown in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (3.4.3)\n", 93 | "Requirement already satisfied: requests in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (2.28.2)\n", 94 | "Requirement already satisfied: certifi>=2022.12.07 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from unstructured) (2022.12.7)\n", 95 | "Requirement already satisfied: backoff in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (2.2.1)\n", 96 | "Requirement already satisfied: rich<=13.0.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (13.0.1)\n", 97 | "Requirement already satisfied: monotonic in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (1.6)\n", 98 | "Requirement already satisfied: deprecated~=1.2.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (1.2.13)\n", 99 | "Requirement already satisfied: numpy<1.24.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (1.23.5)\n", 100 | "Requirement already satisfied: httpx<0.24,>=0.15 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (0.23.3)\n", 101 | "Requirement already satisfied: pydantic>=1.7.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (1.10.7)\n", 102 | "Requirement already satisfied: packaging>=20.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (23.0)\n", 103 | "Requirement already satisfied: wrapt<1.15,>=1.13 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (1.14.1)\n", 104 | "Requirement already satisfied: tqdm>=4.27.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from argilla->unstructured) (4.65.0)\n", 105 | "Requirement already satisfied: pytz>=2020.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from pandas->unstructured) (2023.2)\n", 106 | "Requirement already satisfied: python-dateutil>=2.8.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from pandas->unstructured) (2.8.2)\n", 107 | "Requirement already satisfied: importlib-metadata>=4.4 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from markdown->unstructured) (6.1.0)\n", 108 | "Requirement already satisfied: olefile>=0.46 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from msg_parser->unstructured) (0.46)\n", 109 | "Requirement already satisfied: joblib in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from nltk->unstructured) (1.2.0)\n", 110 | "Requirement already satisfied: click in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from nltk->unstructured) (8.1.3)\n", 111 | "Requirement already satisfied: regex>=2021.8.3 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from nltk->unstructured) (2023.3.23)\n", 112 | "Requirement already satisfied: et-xmlfile in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from openpyxl->unstructured) (1.1.0)\n", 113 | "Requirement already satisfied: XlsxWriter>=0.5.7 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from python-pptx->unstructured) (3.0.9)\n", 114 | "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests->unstructured) (1.26.15)\n", 115 | "Requirement already satisfied: idna<4,>=2.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests->unstructured) (3.4)\n", 116 | "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests->unstructured) (3.1.0)\n", 117 | "Requirement already satisfied: httpcore<0.17.0,>=0.15.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from httpx<0.24,>=0.15->argilla->unstructured) (0.16.3)\n", 118 | "Requirement already satisfied: sniffio in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from httpx<0.24,>=0.15->argilla->unstructured) (1.3.0)\n", 119 | "Requirement already satisfied: rfc3986[idna2008]<2,>=1.3 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from httpx<0.24,>=0.15->argilla->unstructured) (1.5.0)\n", 120 | "Requirement already satisfied: zipp>=0.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from importlib-metadata>=4.4->markdown->unstructured) (3.15.0)\n", 121 | "Requirement already satisfied: typing-extensions>=4.2.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from pydantic>=1.7.1->argilla->unstructured) (4.5.0)\n", 122 | "Requirement already satisfied: six>=1.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from python-dateutil>=2.8.1->pandas->unstructured) (1.16.0)\n", 123 | "Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from rich<=13.0.1->argilla->unstructured) (2.14.0)\n", 124 | "Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from rich<=13.0.1->argilla->unstructured) (0.9.1)\n", 125 | "Requirement already satisfied: anyio<5.0,>=3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from httpcore<0.17.0,>=0.15.0->httpx<0.24,>=0.15->argilla->unstructured) (3.6.2)\n", 126 | "Requirement already satisfied: h11<0.15,>=0.13 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from httpcore<0.17.0,>=0.15.0->httpx<0.24,>=0.15->argilla->unstructured) (0.14.0)\n", 127 | "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 23.0.1 is available.\n", 128 | "You should consider upgrading via the '/Users/wyang14/.pyenv/versions/3.9.16/bin/python -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n", 129 | "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n", 130 | "Requirement already satisfied: tiktoken in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (0.3.2)\n", 131 | "Requirement already satisfied: regex>=2022.1.18 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from tiktoken) (2023.3.23)\n", 132 | "Requirement already satisfied: requests>=2.26.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from tiktoken) (2.28.2)\n", 133 | "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (1.26.15)\n", 134 | "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (3.1.0)\n", 135 | "Requirement already satisfied: idna<4,>=2.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (3.4)\n", 136 | "Requirement already satisfied: certifi>=2017.4.17 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (2022.12.7)\n", 137 | "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 23.0.1 is available.\n", 138 | "You should consider upgrading via the '/Users/wyang14/.pyenv/versions/3.9.16/bin/python -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n", 139 | "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n" 140 | ] 141 | } 142 | ], 143 | "source": [ 144 | "%pip install youtube_transcript_api\n", 145 | "%pip install openai\n", 146 | "%pip install langchain\n", 147 | "%pip install unstructured\n", 148 | "%pip install tiktoken" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 50, 154 | "metadata": {}, 155 | "outputs": [ 156 | { 157 | "name": "stdout", 158 | "output_type": "stream", 159 | "text": [ 160 | "Requirement already satisfied: langchain in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (0.0.135)\n", 161 | "Requirement already satisfied: pydantic<2,>=1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (1.10.7)\n", 162 | "Requirement already satisfied: dataclasses-json<0.6.0,>=0.5.7 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (0.5.7)\n", 163 | "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (4.0.2)\n", 164 | "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (8.2.2)\n", 165 | "Requirement already satisfied: SQLAlchemy<2,>=1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (1.4.47)\n", 166 | "Requirement already satisfied: openapi-schema-pydantic<2.0,>=1.2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (1.2.4)\n", 167 | "Requirement already satisfied: PyYAML>=5.4.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (6.0)\n", 168 | "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (3.8.4)\n", 169 | "Requirement already satisfied: requests<3,>=2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (2.28.2)\n", 170 | "Requirement already satisfied: numpy<2,>=1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from langchain) (1.23.5)\n", 171 | "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.8.2)\n", 172 | "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (3.1.0)\n", 173 | "Requirement already satisfied: aiosignal>=1.1.2 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.1)\n", 174 | "Requirement already satisfied: attrs>=17.3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (22.2.0)\n", 175 | "Requirement already satisfied: frozenlist>=1.1.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.3)\n", 176 | "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.0.4)\n", 177 | "Requirement already satisfied: marshmallow<4.0.0,>=3.3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (3.19.0)\n", 178 | "Requirement already satisfied: marshmallow-enum<2.0.0,>=1.5.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (1.5.1)\n", 179 | "Requirement already satisfied: typing-inspect>=0.4.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (0.8.0)\n", 180 | "Requirement already satisfied: typing-extensions>=4.2.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from pydantic<2,>=1->langchain) (4.5.0)\n", 181 | "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests<3,>=2->langchain) (1.26.15)\n", 182 | "Requirement already satisfied: idna<4,>=2.5 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests<3,>=2->langchain) (3.4)\n", 183 | "Requirement already satisfied: certifi>=2017.4.17 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from requests<3,>=2->langchain) (2022.12.7)\n", 184 | "Requirement already satisfied: greenlet!=0.4.17 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from SQLAlchemy<2,>=1->langchain) (2.0.2)\n", 185 | "Requirement already satisfied: packaging>=17.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from marshmallow<4.0.0,>=3.3.0->dataclasses-json<0.6.0,>=0.5.7->langchain) (23.0)\n", 186 | "Requirement already satisfied: mypy-extensions>=0.3.0 in /Users/wyang14/.pyenv/versions/3.9.16/lib/python3.9/site-packages (from typing-inspect>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain) (1.0.0)\n", 187 | "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 23.0.1 is available.\n", 188 | "You should consider upgrading via the '/Users/wyang14/.pyenv/versions/3.9.16/bin/python -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n", 189 | "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n" 190 | ] 191 | } 192 | ], 193 | "source": [ 194 | "%pip install langchain -U" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": 51, 200 | "metadata": { 201 | "id": "5lZp3p97vZPy" 202 | }, 203 | "outputs": [], 204 | "source": [ 205 | "from youtube_transcript_api import YouTubeTranscriptApi\n", 206 | "from youtube_transcript_api.formatters import TextFormatter" 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": 52, 212 | "metadata": {}, 213 | "outputs": [ 214 | { 215 | "name": "stdout", 216 | "output_type": "stream", 217 | "text": [ 218 | "For this video (UF9Iqmg94tk) transcripts are available in the following languages:\n", 219 | "\n", 220 | "(MANUALLY CREATED)\n", 221 | " - en-US (\"English (United States)\")[TRANSLATABLE]\n", 222 | "\n", 223 | "(GENERATED)\n", 224 | "None\n", 225 | "\n", 226 | "(TRANSLATION LANGUAGES)\n", 227 | " - af (\"Afrikaans\")\n", 228 | " - ak (\"Akan\")\n", 229 | " - sq (\"Albanian\")\n", 230 | " - am (\"Amharic\")\n", 231 | " - ar (\"Arabic\")\n", 232 | " - hy (\"Armenian\")\n", 233 | " - as (\"Assamese\")\n", 234 | " - ay (\"Aymara\")\n", 235 | " - az (\"Azerbaijani\")\n", 236 | " - bn (\"Bangla\")\n", 237 | " - eu (\"Basque\")\n", 238 | " - be (\"Belarusian\")\n", 239 | " - bho (\"Bhojpuri\")\n", 240 | " - bs (\"Bosnian\")\n", 241 | " - bg (\"Bulgarian\")\n", 242 | " - my (\"Burmese\")\n", 243 | " - ca (\"Catalan\")\n", 244 | " - ceb (\"Cebuano\")\n", 245 | " - zh-Hans (\"Chinese (Simplified)\")\n", 246 | " - zh-Hant (\"Chinese (Traditional)\")\n", 247 | " - co (\"Corsican\")\n", 248 | " - hr (\"Croatian\")\n", 249 | " - cs (\"Czech\")\n", 250 | " - da (\"Danish\")\n", 251 | " - dv (\"Divehi\")\n", 252 | " - nl (\"Dutch\")\n", 253 | " - en (\"English\")\n", 254 | " - eo (\"Esperanto\")\n", 255 | " - et (\"Estonian\")\n", 256 | " - ee (\"Ewe\")\n", 257 | " - fil (\"Filipino\")\n", 258 | " - fi (\"Finnish\")\n", 259 | " - fr (\"French\")\n", 260 | " - gl (\"Galician\")\n", 261 | " - lg (\"Ganda\")\n", 262 | " - ka (\"Georgian\")\n", 263 | " - de (\"German\")\n", 264 | " - el (\"Greek\")\n", 265 | " - gn (\"Guarani\")\n", 266 | " - gu (\"Gujarati\")\n", 267 | " - ht (\"Haitian Creole\")\n", 268 | " - ha (\"Hausa\")\n", 269 | " - haw (\"Hawaiian\")\n", 270 | " - iw (\"Hebrew\")\n", 271 | " - hi (\"Hindi\")\n", 272 | " - hmn (\"Hmong\")\n", 273 | " - hu (\"Hungarian\")\n", 274 | " - is (\"Icelandic\")\n", 275 | " - ig (\"Igbo\")\n", 276 | " - id (\"Indonesian\")\n", 277 | " - ga (\"Irish\")\n", 278 | " - it (\"Italian\")\n", 279 | " - ja (\"Japanese\")\n", 280 | " - jv (\"Javanese\")\n", 281 | " - kn (\"Kannada\")\n", 282 | " - kk (\"Kazakh\")\n", 283 | " - km (\"Khmer\")\n", 284 | " - rw (\"Kinyarwanda\")\n", 285 | " - ko (\"Korean\")\n", 286 | " - kri (\"Krio\")\n", 287 | " - ku (\"Kurdish\")\n", 288 | " - ky (\"Kyrgyz\")\n", 289 | " - lo (\"Lao\")\n", 290 | " - la (\"Latin\")\n", 291 | " - lv (\"Latvian\")\n", 292 | " - ln (\"Lingala\")\n", 293 | " - lt (\"Lithuanian\")\n", 294 | " - lb (\"Luxembourgish\")\n", 295 | " - mk (\"Macedonian\")\n", 296 | " - mg (\"Malagasy\")\n", 297 | " - ms (\"Malay\")\n", 298 | " - ml (\"Malayalam\")\n", 299 | " - mt (\"Maltese\")\n", 300 | " - mi (\"Māori\")\n", 301 | " - mr (\"Marathi\")\n", 302 | " - mn (\"Mongolian\")\n", 303 | " - ne (\"Nepali\")\n", 304 | " - nso (\"Northern Sotho\")\n", 305 | " - no (\"Norwegian\")\n", 306 | " - ny (\"Nyanja\")\n", 307 | " - or (\"Odia\")\n", 308 | " - om (\"Oromo\")\n", 309 | " - ps (\"Pashto\")\n", 310 | " - fa (\"Persian\")\n", 311 | " - pl (\"Polish\")\n", 312 | " - pt (\"Portuguese\")\n", 313 | " - pa (\"Punjabi\")\n", 314 | " - qu (\"Quechua\")\n", 315 | " - ro (\"Romanian\")\n", 316 | " - ru (\"Russian\")\n", 317 | " - sm (\"Samoan\")\n", 318 | " - sa (\"Sanskrit\")\n", 319 | " - gd (\"Scottish Gaelic\")\n", 320 | " - sr (\"Serbian\")\n", 321 | " - sn (\"Shona\")\n", 322 | " - sd (\"Sindhi\")\n", 323 | " - si (\"Sinhala\")\n", 324 | " - sk (\"Slovak\")\n", 325 | " - sl (\"Slovenian\")\n", 326 | " - so (\"Somali\")\n", 327 | " - st (\"Southern Sotho\")\n", 328 | " - es (\"Spanish\")\n", 329 | " - su (\"Sundanese\")\n", 330 | " - sw (\"Swahili\")\n", 331 | " - sv (\"Swedish\")\n", 332 | " - tg (\"Tajik\")\n", 333 | " - ta (\"Tamil\")\n", 334 | " - tt (\"Tatar\")\n", 335 | " - te (\"Telugu\")\n", 336 | " - th (\"Thai\")\n", 337 | " - ti (\"Tigrinya\")\n", 338 | " - ts (\"Tsonga\")\n", 339 | " - tr (\"Turkish\")\n", 340 | " - tk (\"Turkmen\")\n", 341 | " - uk (\"Ukrainian\")\n", 342 | " - ur (\"Urdu\")\n", 343 | " - ug (\"Uyghur\")\n", 344 | " - uz (\"Uzbek\")\n", 345 | " - vi (\"Vietnamese\")\n", 346 | " - cy (\"Welsh\")\n", 347 | " - fy (\"Western Frisian\")\n", 348 | " - xh (\"Xhosa\")\n", 349 | " - yi (\"Yiddish\")\n", 350 | " - yo (\"Yoruba\")\n", 351 | " - zu (\"Zulu\")\n" 352 | ] 353 | } 354 | ], 355 | "source": [ 356 | "transcript_list = YouTubeTranscriptApi.list_transcripts('UF9Iqmg94tk')\n", 357 | "print(transcript_list)" 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": 53, 363 | "metadata": { 364 | "id": "7XpA_I_nvb5M" 365 | }, 366 | "outputs": [], 367 | "source": [ 368 | "# trz7g-wilxs = \"5 RULES FOR THE REST OF YOUR LIFE\" | Matthew McConaughey MOTIVATIONAL SPEECH\n", 369 | "# UF9Iqmg94tk = \"Consistent Hashing | Algorithms You Should Know #1\"\n", 370 | "transcript = YouTubeTranscriptApi.get_transcript('UF9Iqmg94tk', languages=['en-US'])\n", 371 | "formatter = TextFormatter()\n", 372 | "text_formatted = formatter.format_transcript(transcript)\n", 373 | "with open('transcript.txt', 'w', encoding='utf-8') as text_file:\n", 374 | " text_file.write(text_formatted)" 375 | ] 376 | }, 377 | { 378 | "cell_type": "code", 379 | "execution_count": 54, 380 | "metadata": { 381 | "id": "1VsvtekHzMuC" 382 | }, 383 | "outputs": [ 384 | { 385 | "name": "stdout", 386 | "output_type": "stream", 387 | "text": [ 388 | "What do DynamoDB, Apache Cassandra, Discord, and \n", 389 | "Akamai CDN have in common? They all use consistent  \n", 390 | "hashing. Now, what is consistent hashing? Why do all \n", 391 | "the cool kids use it? In this video, we'll learn  \n", 392 | "all about it. Let's dive right in. In a large scale \n", 393 | "distributed system, data does not fit on a single  \n", 394 | "server. They are distributed across many machines. \n", 395 | "This is called horizontal scaling. To build such  \n", 396 | "a system with predictable performance. it is \n", 397 | "important to distribute the data evenly across  \n", 398 | "those servers. A common method to distribute \n", 399 | "data as evenly as possible among servers is  \n", 400 | "simple hashing. This is how it works. First, for each \n", 401 | "object, we hash its key with a hashing function  \n", 402 | "like MD5 or MurmurHash. This maps the object \n", 403 | "key to a known range of numerical values.  \n", 404 | "A good hashing function distributes the \n", 405 | "hashes evenly across the entire range.  \n", 406 | "Second, we perform the modulo operation on the \n", 407 | "hash against the number of servers. This determines  \n", 408 | "which servers the object belongs to. As long \n", 409 | "as the number of servers stays the same,  \n", 410 | "an object key will always map to the same server. \n", 411 | "Here's a concrete example. We have four servers  \n", 412 | "with eight string keys with simple hashing \n", 413 | "this is how we distribute the eight string  \n", 414 | "keys among the four servers. Now, this approach \n", 415 | "works well when the size of the cluster is fixed,  \n", 416 | "and the data distribution is even. But what happens \n", 417 | "when new servers get added to meet new demand  \n", 418 | "or when existing servers get removed? \n", 419 | "Back to our example. If server 1 goes down,  \n", 420 | "the size of the cluster is now three. Even though \n", 421 | "the hashes for the object keys stay the same,  \n", 422 | "we are now applying the modulo operation to a \n", 423 | "different set of n. In this case, it is now three.  \n", 424 | "The impact is pretty drastic. Most of the keys get \n", 425 | "redistributed. This affects almost all objects, it's  \n", 426 | "not just the objects originally stored in the \n", 427 | "server that is now offline. This triggers a storm  \n", 428 | "of misses and lots of objects to be moved. For \n", 429 | "situations where servers constantly come and go,  \n", 430 | "this design is untenable. Consistent hashing is \n", 431 | "an effective technique to mitigate this issue.  \n", 432 | "The goal of consistent hashing is this. We want \n", 433 | "almost all objects to stay assigned to the same  \n", 434 | "server even as the number of servers changes. \n", 435 | "Here is the core insight of consistent hashing.  \n", 436 | "In addition to hashing the object keys like before, \n", 437 | "we also hash the server names. The objects and  \n", 438 | "servers are hashed with the same hashing function \n", 439 | "to the same range of values. In our example we have  \n", 440 | "a range of x0 to xn. This range is called a hash \n", 441 | "space. Next, we connect both ends of the hash space  \n", 442 | "to form a ring. This is a hash ring. Using a hashing \n", 443 | "function we hash each server by its name or ip  \n", 444 | "address, and place the server onto the ring. Here, we \n", 445 | "place our four servers onto the ring. Next, we hash  \n", 446 | "each object by its key with the same hashing \n", 447 | "function. Unlike simple hashing where we perform  \n", 448 | "a modulo operation on the hash, here we use the \n", 449 | "hash directly to map the object key onto the ring.  \n", 450 | "Here is what it would look like for our four \n", 451 | "objects. To locate the server for a particular  \n", 452 | "object, we go clockwise from the location of the \n", 453 | "object key on the ring until a server is found.  \n", 454 | "Continue with our example, key 0 is on server 0, \n", 455 | "and key 1 is on server 1. Now, let's take a look at  \n", 456 | "what happens when we add a server. Here we insert \n", 457 | "a new server s4 to the left of s0 on the ring.  \n", 458 | "Note that only k0 needs to be moved from s0 to \n", 459 | "s4. This is because s4 is the first server k0  \n", 460 | "encounters by going clockwise from k0's position \n", 461 | "on the ring. Keys k1, k2 ,and k3 are not affected.  \n", 462 | "With simple hashing, when a new server is added \n", 463 | "almost all the keys need to be remapped. With  \n", 464 | "consistent hashing, adding a new server only \n", 465 | "requires redistribution of a fraction of the keys.  \n", 466 | "Let's walk through a quick example of removing \n", 467 | "a server. When s1 is removed, only k1 needs to be  \n", 468 | "remapped to s2. The rest of the keys are unaffected. \n", 469 | "Let's recap. What have we learned so far. One,  \n", 470 | "we map both servers and objects onto the hash \n", 471 | "ring using a uniformly distributed hash function.  \n", 472 | "Two, to locate a server for an object, we go \n", 473 | "clockwise on the ring from the object's position  \n", 474 | "until a server is found. Now, let's consider \n", 475 | "a potential issue with this design.  \n", 476 | "The distribution of the objects in the \n", 477 | "servers on the ring is likely to be uneven.  \n", 478 | "Conceptually, we pick n random points on the ring, \n", 479 | "we are very unlikely to get a perfect partition of  \n", 480 | "the ring into equally sized segments. For example ,\n", 481 | "if servers are mapped on the ring like this,  \n", 482 | "most of the objects are stored in s2, with s1 \n", 483 | "and s3 storing no data. This problem gets worse  \n", 484 | "if servers come and go frequently. In our example, \n", 485 | "even if the servers were originally evenly spaced,  \n", 486 | "if s1 is removed, the segment for s2 is now \n", 487 | "twice as large as the ones for s0 and s3.  \n", 488 | "Virtual nodes are used to fix this problem. The \n", 489 | "idea is to have each server appear at multiple  \n", 490 | "locations on the ring. Each location is a virtual \n", 491 | "node representing a server. In this hash ring, we  \n", 492 | "have two servers, with each having three virtual \n", 493 | "nodes. Instead of having s0 and s1, we now have  \n", 494 | "s0_0, s0_1, and s0_2 to represent server 0, and s1_0 \n", 495 | "s1_1, and s1_2 to represent server 1 on the ring.  \n", 496 | "With virtual nodes, each server handles multiple \n", 497 | "segments on the ring. In our example, the segments  \n", 498 | "labeled s0 are managed by server 0, and \n", 499 | "those labeled s1 are handled by server 1.  \n", 500 | "In real world systems, the number of \n", 501 | "virtual nodes is much larger than  \n", 502 | "three. As the number of virtual nodes increases, \n", 503 | "the distribution of objects becomes more balanced.  \n", 504 | "Having more virtual nodes means taking more space \n", 505 | "to store the metadata about the virtual nodes.  \n", 506 | "This is a trade-off, and we can tune the number \n", 507 | "of virtual nodes to fit our system requirements. \n", 508 | "Let's see how consistent hashing is used in \n", 509 | "the real world. Some popular NoSQL databases  \n", 510 | "like Amazon DynamoDB and Apache Cassandra use \n", 511 | "consistent hashing, where it is used for data  \n", 512 | "partitioning. It helps these databases minimize \n", 513 | "data movement during rebalancing. Content delivery  \n", 514 | "networks like Akamai use consistent hashing to \n", 515 | "help distribute web contents evenly among the  \n", 516 | "edge servers. Load balancers like Google Load \n", 517 | "Balancer use consistent hashing to distribute  \n", 518 | "persistent connections evenly across backend \n", 519 | "servers. This limits the number of connections  \n", 520 | "that need to be re-established when a backend \n", 521 | "server goes down. That's it for consistent hashing.  \n", 522 | "If you would like to learn more about system \n", 523 | "design, check out our books and weekly newsletters.  \n", 524 | "Please subscribe if you learned something new. \n", 525 | "Thank you so much, and we'll see you next time." 526 | ] 527 | } 528 | ], 529 | "source": [ 530 | "%cat transcript.txt" 531 | ] 532 | }, 533 | { 534 | "cell_type": "code", 535 | "execution_count": 55, 536 | "metadata": { 537 | "id": "icFv_0W9z6j0" 538 | }, 539 | "outputs": [], 540 | "source": [ 541 | "from langchain.document_loaders import TextLoader" 542 | ] 543 | }, 544 | { 545 | "cell_type": "code", 546 | "execution_count": 56, 547 | "metadata": { 548 | "id": "gY5V-EjU0Mr9" 549 | }, 550 | "outputs": [], 551 | "source": [ 552 | "loader = TextLoader(\"./transcript.txt\")\n", 553 | "docs = loader.load()" 554 | ] 555 | }, 556 | { 557 | "cell_type": "code", 558 | "execution_count": 57, 559 | "metadata": { 560 | "id": "kFo5_CKF0WhH" 561 | }, 562 | "outputs": [ 563 | { 564 | "name": "stdout", 565 | "output_type": "stream", 566 | "text": [ 567 | "You have 1 document(s) in your data\n", 568 | "There are 6768 characters in your document\n" 569 | ] 570 | } 571 | ], 572 | "source": [ 573 | "print (f'You have {len(docs)} document(s) in your data')\n", 574 | "print (f'There are {len(docs[0].page_content)} characters in your document')" 575 | ] 576 | }, 577 | { 578 | "cell_type": "code", 579 | "execution_count": 58, 580 | "metadata": { 581 | "id": "kfgCPKjT0bIS" 582 | }, 583 | "outputs": [], 584 | "source": [ 585 | "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", 586 | "\n", 587 | "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", 588 | "split_docs = text_splitter.split_documents(docs)" 589 | ] 590 | }, 591 | { 592 | "cell_type": "code", 593 | "execution_count": 59, 594 | "metadata": { 595 | "id": "mZViCJ3V0uFn" 596 | }, 597 | "outputs": [ 598 | { 599 | "name": "stdout", 600 | "output_type": "stream", 601 | "text": [ 602 | "You have 7 split document(s)\n" 603 | ] 604 | } 605 | ], 606 | "source": [ 607 | "print (f'You have {len(split_docs)} split document(s)')" 608 | ] 609 | }, 610 | { 611 | "cell_type": "code", 612 | "execution_count": 60, 613 | "metadata": { 614 | "id": "YdDLU0GS0fmp" 615 | }, 616 | "outputs": [], 617 | "source": [ 618 | "import os\n", 619 | "OPENAI_API_KEY = os.environ['OPENAI_API_KEY']" 620 | ] 621 | }, 622 | { 623 | "cell_type": "code", 624 | "execution_count": 61, 625 | "metadata": { 626 | "id": "EAWhbUfQ0jdk" 627 | }, 628 | "outputs": [], 629 | "source": [ 630 | "from langchain.llms import OpenAI\n", 631 | "from langchain.chains.summarize import load_summarize_chain" 632 | ] 633 | }, 634 | { 635 | "cell_type": "code", 636 | "execution_count": 62, 637 | "metadata": { 638 | "id": "6W6Nm5ZV0mSu" 639 | }, 640 | "outputs": [], 641 | "source": [ 642 | "llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY, model_name=\"text-ada-001\")\n", 643 | "chain = load_summarize_chain(llm, chain_type=\"map_reduce\", verbose=True)" 644 | ] 645 | }, 646 | { 647 | "cell_type": "code", 648 | "execution_count": 63, 649 | "metadata": { 650 | "id": "KLmvBohG01cY" 651 | }, 652 | "outputs": [], 653 | "source": [ 654 | "input_docs = split_docs[:2]" 655 | ] 656 | }, 657 | { 658 | "cell_type": "code", 659 | "execution_count": 64, 660 | "metadata": { 661 | "id": "YnVBRIIl03X5" 662 | }, 663 | "outputs": [ 664 | { 665 | "name": "stdout", 666 | "output_type": "stream", 667 | "text": [ 668 | "\n", 669 | "\n", 670 | "\u001b[1m> Entering new MapReduceDocumentsChain chain...\u001b[0m\n", 671 | "Prompt after formatting:\n", 672 | "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following:\n", 673 | "\n", 674 | "\n", 675 | "\"What do DynamoDB, Apache Cassandra, Discord, and \n", 676 | "Akamai CDN have in common? They all use consistent  \n", 677 | "hashing. Now, what is consistent hashing? Why do all \n", 678 | "the cool kids use it? In this video, we'll learn  \n", 679 | "all about it. Let's dive right in. In a large scale \n", 680 | "distributed system, data does not fit on a single  \n", 681 | "server. They are distributed across many machines. \n", 682 | "This is called horizontal scaling. To build such  \n", 683 | "a system with predictable performance. it is \n", 684 | "important to distribute the data evenly across  \n", 685 | "those servers. A common method to distribute \n", 686 | "data as evenly as possible among servers is  \n", 687 | "simple hashing. This is how it works. First, for each \n", 688 | "object, we hash its key with a hashing function  \n", 689 | "like MD5 or MurmurHash. This maps the object \n", 690 | "key to a known range of numerical values.  \n", 691 | "A good hashing function distributes the \n", 692 | "hashes evenly across the entire range.  \n", 693 | "Second, we perform the modulo operation on the \n", 694 | "hash against the number of servers. This determines\"\n", 695 | "\n", 696 | "\n", 697 | "CONCISE SUMMARY:\u001b[0m\n", 698 | "Prompt after formatting:\n", 699 | "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following:\n", 700 | "\n", 701 | "\n", 702 | "\"which servers the object belongs to. As long \n", 703 | "as the number of servers stays the same,  \n", 704 | "an object key will always map to the same server. \n", 705 | "Here's a concrete example. We have four servers  \n", 706 | "with eight string keys with simple hashing \n", 707 | "this is how we distribute the eight string  \n", 708 | "keys among the four servers. Now, this approach \n", 709 | "works well when the size of the cluster is fixed,  \n", 710 | "and the data distribution is even. But what happens \n", 711 | "when new servers get added to meet new demand  \n", 712 | "or when existing servers get removed? \n", 713 | "Back to our example. If server 1 goes down,  \n", 714 | "the size of the cluster is now three. Even though \n", 715 | "the hashes for the object keys stay the same,  \n", 716 | "we are now applying the modulo operation to a \n", 717 | "different set of n. In this case, it is now three.  \n", 718 | "The impact is pretty drastic. Most of the keys get \n", 719 | "redistributed. This affects almost all objects, it's  \n", 720 | "not just the objects originally stored in the \n", 721 | "server that is now offline. This triggers a storm\"\n", 722 | "\n", 723 | "\n", 724 | "CONCISE SUMMARY:\u001b[0m\n", 725 | "\n", 726 | "\n", 727 | "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n", 728 | "Prompt after formatting:\n", 729 | "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following:\n", 730 | "\n", 731 | "\n", 732 | "\"\n", 733 | "\n", 734 | "In a large scale distributed system, data is distributed across many machines. To build such a system with predictable performance, it is important to distribute the data evenly across those servers. A common method to distribute data as evenly as possible among servers is simple hashing. This is how it works: First, for each object, we hash its key with a hashing function like MD5 or MurmurHash. This maps the object key to a known range of numerical values. Second, we perform the modulo operation on the hash against the number of servers. This determines which server the object will be stored on.\n", 735 | "\n", 736 | "\n", 737 | "\n", 738 | "The article discusses how adding or removing servers from a cluster can impact the distribution of data. In particular, it explains how this can cause \"a storm\" of data redistribution.\"\n", 739 | "\n", 740 | "\n", 741 | "CONCISE SUMMARY:\u001b[0m\n", 742 | "\n", 743 | "\u001b[1m> Finished chain.\u001b[0m\n", 744 | "\n", 745 | "\u001b[1m> Finished chain.\u001b[0m\n", 746 | "Total Tokens: 894\n", 747 | "Prompt Tokens: 698\n", 748 | "Completion Tokens: 196\n", 749 | "Successful Requests: 2\n", 750 | "Total Cost (USD): $0.01788\n" 751 | ] 752 | } 753 | ], 754 | "source": [ 755 | "from langchain.callbacks import get_openai_callback\n", 756 | "\n", 757 | "with get_openai_callback() as cb:\n", 758 | " chain.run(input_documents=input_docs)\n", 759 | " chain.run(input_documents=input_docs)\n", 760 | " print(f\"Total Tokens: {cb.total_tokens}\")\n", 761 | " print(f\"Prompt Tokens: {cb.prompt_tokens}\")\n", 762 | " print(f\"Completion Tokens: {cb.completion_tokens}\")\n", 763 | " print(f\"Successful Requests: {cb.successful_requests}\")\n", 764 | " print(f\"Total Cost (USD): ${cb.total_cost}\")" 765 | ] 766 | } 767 | ], 768 | "metadata": { 769 | "colab": { 770 | "authorship_tag": "ABX9TyMmoN24WxC9YPbZeCUtS0+a", 771 | "include_colab_link": true, 772 | "provenance": [] 773 | }, 774 | "kernelspec": { 775 | "display_name": "Python 3", 776 | "name": "python3" 777 | }, 778 | "language_info": { 779 | "codemirror_mode": { 780 | "name": "ipython", 781 | "version": 3 782 | }, 783 | "file_extension": ".py", 784 | "mimetype": "text/x-python", 785 | "name": "python", 786 | "nbconvert_exporter": "python", 787 | "pygments_lexer": "ipython3", 788 | "version": "3.9.16" 789 | } 790 | }, 791 | "nbformat": 4, 792 | "nbformat_minor": 0 793 | } 794 | --------------------------------------------------------------------------------