├── youtube_transcript_api ├── py.typed ├── test │ ├── __init__.py │ ├── assets │ │ ├── __init__.py │ │ ├── transcript.xml.static │ │ ├── youtube_too_many_requests.html.static │ │ ├── youtube_video_unavailable.innertube.json.static │ │ ├── youtube_unplayable.innertube.json.static │ │ ├── youtube_request_blocked.innertube.json.static │ │ ├── youtube_consent_page_invalid.html.static │ │ ├── youtube_consent_page.html.static │ │ └── youtube_age_restricted.innertube.json.static │ ├── test_proxies.py │ ├── test_formatters.py │ ├── test_cli.py │ └── test_api.py ├── _settings.py ├── __main__.py ├── __init__.py ├── _api.py ├── proxies.py ├── _cli.py ├── formatters.py ├── _errors.py └── _transcripts.py ├── .github ├── FUNDING.yml ├── ISSUE_TEMPLATE │ ├── feature_request.md │ └── bug_report.md └── workflows │ └── ci.yml ├── .gitignore ├── LICENSE ├── pyproject.toml └── README.md /youtube_transcript_api/py.typed: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /youtube_transcript_api/test/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /youtube_transcript_api/test/assets/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | github: jdepoix 2 | custom: https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | .venv 3 | virtualenv 4 | *.pyc 5 | dist 6 | build 7 | *.egg-info 8 | upload_new_version.sh 9 | .coverage 10 | coverage.xml 11 | .DS_STORE -------------------------------------------------------------------------------- /youtube_transcript_api/_settings.py: -------------------------------------------------------------------------------- 1 | WATCH_URL = "https://www.youtube.com/watch?v={video_id}" 2 | INNERTUBE_API_URL = "https://www.youtube.com/youtubei/v1/player?key={api_key}" 3 | INNERTUBE_CONTEXT = {"client": {"clientName": "ANDROID", "clientVersion": "20.10.38"}} 4 | -------------------------------------------------------------------------------- /youtube_transcript_api/__main__.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | import logging 4 | 5 | from ._cli import YouTubeTranscriptCli 6 | 7 | 8 | def main(): 9 | logging.basicConfig() 10 | 11 | print(YouTubeTranscriptCli(sys.argv[1:]).run()) 12 | 13 | 14 | if __name__ == "__main__": 15 | main() 16 | -------------------------------------------------------------------------------- /youtube_transcript_api/test/assets/transcript.xml.static: -------------------------------------------------------------------------------- 1 | 2 | 3 | Hey, this is just a test 4 | this is <i>not</i> the original transcript 5 | 6 | just something shorter, I made up for testing 7 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Is your feature request related to a problem? Please describe.** 11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 12 | 13 | **Describe the solution you'd like** 14 | A clear and concise description of how you want youtube-transcript-api to solve your problem. 15 | 16 | **Describe alternatives you've considered** 17 | A clear and concise description of any alternative solutions or features you've considered. 18 | 19 | **Additional context** 20 | Add any other context about the feature request here. If you have any additional technical information which could be relevant for the implementation, feel free to share them here. 21 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Jonas Depoix 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to **include the video id**. 11 | 12 | # To Reproduce 13 | Steps to reproduce the behavior: 14 | 15 | ### What code / cli command are you executing? 16 | For example: I am running 17 | ``` 18 | YouTubeTranscriptApi().fetch() ... 19 | ``` 20 | 21 | ### Which Python version are you using? 22 | Python x.y 23 | 24 | ### Which version of youtube-transcript-api are you using? 25 | youtube-transcript-api x.y.z 26 | 27 | 28 | # Expected behavior 29 | Describe what you expected to happen. 30 | 31 | For example: I expected to receive the english transcript 32 | 33 | # Actual behaviour 34 | Describe what is happening instead of the **Expected behavior**. Add **error messages** if there are any. 35 | 36 | For example: Instead I received the following error message: 37 | ``` 38 | # ... error message ... 39 | ``` 40 | -------------------------------------------------------------------------------- /youtube_transcript_api/__init__.py: -------------------------------------------------------------------------------- 1 | # ruff: noqa: F401 2 | from ._api import YouTubeTranscriptApi 3 | from ._transcripts import ( 4 | TranscriptList, 5 | Transcript, 6 | FetchedTranscript, 7 | FetchedTranscriptSnippet, 8 | ) 9 | from ._errors import ( 10 | YouTubeTranscriptApiException, 11 | CookieError, 12 | CookiePathInvalid, 13 | CookieInvalid, 14 | TranscriptsDisabled, 15 | NoTranscriptFound, 16 | CouldNotRetrieveTranscript, 17 | VideoUnavailable, 18 | VideoUnplayable, 19 | IpBlocked, 20 | RequestBlocked, 21 | NotTranslatable, 22 | TranslationLanguageNotAvailable, 23 | FailedToCreateConsentCookie, 24 | YouTubeRequestFailed, 25 | InvalidVideoId, 26 | AgeRestricted, 27 | YouTubeDataUnparsable, 28 | PoTokenRequired, 29 | ) 30 | 31 | __all__ = [ 32 | "YouTubeTranscriptApi", 33 | "TranscriptList", 34 | "Transcript", 35 | "FetchedTranscript", 36 | "FetchedTranscriptSnippet", 37 | "YouTubeTranscriptApiException", 38 | "CookieError", 39 | "CookiePathInvalid", 40 | "CookieInvalid", 41 | "TranscriptsDisabled", 42 | "NoTranscriptFound", 43 | "CouldNotRetrieveTranscript", 44 | "VideoUnavailable", 45 | "VideoUnplayable", 46 | "IpBlocked", 47 | "RequestBlocked", 48 | "NotTranslatable", 49 | "TranslationLanguageNotAvailable", 50 | "FailedToCreateConsentCookie", 51 | "YouTubeRequestFailed", 52 | "InvalidVideoId", 53 | "AgeRestricted", 54 | "YouTubeDataUnparsable", 55 | "PoTokenRequired", 56 | ] 57 | -------------------------------------------------------------------------------- /.github/workflows/ci.yml: -------------------------------------------------------------------------------- 1 | name: CI 2 | 3 | on: 4 | push: 5 | branches: [ "master" ] 6 | tags: 7 | - '**' 8 | pull_request: 9 | 10 | jobs: 11 | static-checks: 12 | runs-on: ubuntu-latest 13 | 14 | steps: 15 | - uses: actions/checkout@v4 16 | - name: Set up Python 3.9 17 | uses: actions/setup-python@v5 18 | with: 19 | python-version: 3.9 20 | - name: Install dependencies 21 | run: | 22 | pip install poetry poethepoet 23 | poetry install --only dev 24 | - name: Format 25 | run: poe ci-format 26 | - name: Lint 27 | run: poe lint 28 | 29 | test: 30 | runs-on: ubuntu-latest 31 | strategy: 32 | fail-fast: false 33 | matrix: 34 | python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14"] 35 | 36 | steps: 37 | - uses: actions/checkout@v4 38 | - name: Set up Python ${{ matrix.python-version }} 39 | uses: actions/setup-python@v5 40 | with: 41 | python-version: ${{ matrix.python-version }} 42 | - name: Install dependencies 43 | run: | 44 | pip install poetry poethepoet 45 | poetry install --with test 46 | - name: Run tests 47 | run: | 48 | poe ci-test 49 | - name: Report intermediate coverage report 50 | uses: coverallsapp/github-action@v2 51 | with: 52 | file: coverage.xml 53 | format: cobertura 54 | flag-name: run-python-${{ matrix.python-version }} 55 | parallel: true 56 | 57 | coverage: 58 | needs: test 59 | runs-on: ubuntu-latest 60 | 61 | steps: 62 | - name: Finalize coverage report 63 | uses: coverallsapp/github-action@v2 64 | with: 65 | parallel-finished: true 66 | carryforward: "run-python-3.8,run-python-3.9,run-python-3.10,run-python-3.11,run-python-3.12,run-python-3.13,run-python-3.14" 67 | - uses: actions/checkout@v4 68 | - name: Set up Python 3.9 69 | uses: actions/setup-python@v5 70 | with: 71 | python-version: 3.9 72 | - name: Install dependencies 73 | run: | 74 | pip install poetry poethepoet 75 | poetry install --with test 76 | - name: Check coverage 77 | run: poe coverage 78 | 79 | publish: 80 | if: github.event_name == 'push' && contains(github.ref, 'refs/tags/') 81 | needs: [coverage, static-checks] 82 | runs-on: ubuntu-latest 83 | 84 | steps: 85 | - uses: actions/checkout@v4 86 | - name: Set up Python 3.9 87 | uses: actions/setup-python@v5 88 | with: 89 | python-version: 3.9 90 | - name: Install dependencies 91 | run: | 92 | pip install poetry 93 | poetry install 94 | - name: Build 95 | run: poetry build 96 | - name: Publish 97 | run: poetry publish -u __token__ -p ${{ secrets.PYPI_TOKEN }} 98 | -------------------------------------------------------------------------------- /youtube_transcript_api/test/test_proxies.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | 3 | from youtube_transcript_api.proxies import ( 4 | GenericProxyConfig, 5 | InvalidProxyConfig, 6 | WebshareProxyConfig, 7 | ) 8 | 9 | 10 | class TestGenericProxyConfig: 11 | def test_to_requests_dict(self): 12 | proxy_config = GenericProxyConfig( 13 | http_url="http://myproxy.com", 14 | https_url="https://myproxy.com", 15 | ) 16 | 17 | request_dict = proxy_config.to_requests_dict() 18 | 19 | assert request_dict == { 20 | "http": "http://myproxy.com", 21 | "https": "https://myproxy.com", 22 | } 23 | 24 | def test_to_requests_dict__only_http(self): 25 | proxy_config = GenericProxyConfig( 26 | http_url="http://myproxy.com", 27 | ) 28 | 29 | request_dict = proxy_config.to_requests_dict() 30 | 31 | assert request_dict == { 32 | "http": "http://myproxy.com", 33 | "https": "http://myproxy.com", 34 | } 35 | 36 | def test_to_requests_dict__only_https(self): 37 | proxy_config = GenericProxyConfig( 38 | https_url="https://myproxy.com", 39 | ) 40 | 41 | request_dict = proxy_config.to_requests_dict() 42 | 43 | assert request_dict == { 44 | "http": "https://myproxy.com", 45 | "https": "https://myproxy.com", 46 | } 47 | 48 | def test__invalid_config(self): 49 | with pytest.raises(InvalidProxyConfig): 50 | GenericProxyConfig() 51 | 52 | 53 | class TestWebshareProxyConfig: 54 | def test_to_requests_dict(self): 55 | proxy_config = WebshareProxyConfig( 56 | proxy_username="user", 57 | proxy_password="password", 58 | ) 59 | 60 | request_dict = proxy_config.to_requests_dict() 61 | 62 | assert request_dict == { 63 | "http": "http://user-rotate:password@p.webshare.io:80/", 64 | "https": "http://user-rotate:password@p.webshare.io:80/", 65 | } 66 | 67 | def test_to_requests_dict__with_location_filter(self): 68 | proxy_config = WebshareProxyConfig( 69 | proxy_username="user", 70 | proxy_password="password", 71 | filter_ip_locations=["us"], 72 | ) 73 | 74 | request_dict = proxy_config.to_requests_dict() 75 | 76 | assert request_dict == { 77 | "http": "http://user-US-rotate:password@p.webshare.io:80/", 78 | "https": "http://user-US-rotate:password@p.webshare.io:80/", 79 | } 80 | 81 | def test_to_requests_dict__with_multiple_location_filters(self): 82 | proxy_config = WebshareProxyConfig( 83 | proxy_username="user", 84 | proxy_password="password", 85 | filter_ip_locations=["de", "us"], 86 | ) 87 | 88 | request_dict = proxy_config.to_requests_dict() 89 | 90 | assert request_dict == { 91 | "http": "http://user-DE-US-rotate:password@p.webshare.io:80/", 92 | "https": "http://user-DE-US-rotate:password@p.webshare.io:80/", 93 | } 94 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["poetry-core"] 3 | build-backend = "poetry.core.masonry.api" 4 | 5 | [tool.poetry] 6 | name = "youtube-transcript-api" 7 | version = "1.2.3" 8 | description = "This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!" 9 | readme = "README.md" 10 | license = "MIT" 11 | authors = [ 12 | "Jonas Depoix ", 13 | ] 14 | homepage = "https://github.com/jdepoix/youtube-transcript-api" 15 | repository = "https://github.com/jdepoix/youtube-transcript-api" 16 | keywords = [ 17 | "cli", 18 | "subtitle", 19 | "subtitles", 20 | "transcript", 21 | "transcripts", 22 | "youtube", 23 | "youtube-api", 24 | "youtube-subtitles", 25 | "youtube-transcripts", 26 | ] 27 | classifiers = [ 28 | "License :: OSI Approved :: MIT License", 29 | "Operating System :: OS Independent", 30 | "Programming Language :: Python :: 3.8", 31 | "Programming Language :: Python :: 3.9", 32 | "Programming Language :: Python :: 3.10", 33 | "Programming Language :: Python :: 3.11", 34 | "Programming Language :: Python :: 3.12", 35 | "Programming Language :: Python :: 3.13", 36 | "Programming Language :: Python :: 3.14", 37 | ] 38 | 39 | [tool.poetry.scripts] 40 | youtube_transcript_api = "youtube_transcript_api.__main__:main" 41 | 42 | [tool.poe.tasks] 43 | test = "pytest youtube_transcript_api" 44 | ci-test.shell = "coverage run -m pytest youtube_transcript_api && coverage xml" 45 | coverage.shell = "coverage run -m pytest youtube_transcript_api && coverage report -m --fail-under=100" 46 | format = "ruff format youtube_transcript_api" 47 | ci-format = "ruff format youtube_transcript_api --check" 48 | lint = "ruff check youtube_transcript_api" 49 | precommit.shell = "poe format && poe lint && poe coverage" 50 | 51 | [tool.poetry.dependencies] 52 | python = ">=3.8,<3.15" 53 | requests = "*" 54 | defusedxml = "^0.7.1" 55 | 56 | [tool.poetry.group.test] 57 | optional = true 58 | 59 | [tool.poetry.group.test.dependencies] 60 | pytest = "^8.3.3" 61 | coverage = "^7.6.1" 62 | httpretty = "<1.1" 63 | 64 | [tool.poetry.group.dev] 65 | optional = true 66 | 67 | [tool.poetry.group.dev.dependencies] 68 | ruff = "^0.6.8" 69 | 70 | [tool.coverage.run] 71 | source = ["youtube_transcript_api"] 72 | 73 | [tool.coverage.report] 74 | omit = ["*/__main__.py", "youtube_transcript_api/test/*"] 75 | exclude_lines = [ 76 | "pragma: no cover", 77 | 78 | # Don't complain about missing debug-only code: 79 | "def __unicode__", 80 | "def __repr__", 81 | "if self\\.debug", 82 | 83 | # Don't complain if tests don't hit defensive assertion code: 84 | "raise AssertionError", 85 | "raise NotImplementedError", 86 | 87 | # Don't complain if non-runnable code isn't run: 88 | "if 0:", 89 | "if __name__ == .__main__.:", 90 | 91 | # Don't complain about empty stubs of abstract methods 92 | "@abstractmethod", 93 | "@abstractclassmethod", 94 | "@abstractstaticmethod" 95 | ] 96 | show_missing = true -------------------------------------------------------------------------------- /youtube_transcript_api/test/test_formatters.py: -------------------------------------------------------------------------------- 1 | from unittest import TestCase 2 | 3 | import json 4 | 5 | import pprint 6 | 7 | from youtube_transcript_api.formatters import ( 8 | FetchedTranscript, 9 | FetchedTranscriptSnippet, 10 | Formatter, 11 | JSONFormatter, 12 | TextFormatter, 13 | SRTFormatter, 14 | WebVTTFormatter, 15 | PrettyPrintFormatter, 16 | FormatterLoader, 17 | ) 18 | 19 | 20 | class TestFormatters(TestCase): 21 | def setUp(self): 22 | self.transcript = FetchedTranscript( 23 | snippets=[ 24 | FetchedTranscriptSnippet(text="Test line 1", start=0.0, duration=1.50), 25 | FetchedTranscriptSnippet(text="line between", start=1.5, duration=2.0), 26 | FetchedTranscriptSnippet( 27 | text="testing the end line", start=2.5, duration=3.25 28 | ), 29 | ], 30 | language="English", 31 | language_code="en", 32 | is_generated=True, 33 | video_id="12345", 34 | ) 35 | self.transcripts = [self.transcript, self.transcript] 36 | self.transcript_raw = self.transcript.to_raw_data() 37 | self.transcripts_raw = [ 38 | transcript.to_raw_data() for transcript in self.transcripts 39 | ] 40 | 41 | def test_base_formatter_format_call(self): 42 | with self.assertRaises(NotImplementedError): 43 | Formatter().format_transcript(self.transcript) 44 | with self.assertRaises(NotImplementedError): 45 | Formatter().format_transcripts([self.transcript]) 46 | 47 | def test_srt_formatter_starting(self): 48 | content = SRTFormatter().format_transcript(self.transcript) 49 | lines = content.split("\n") 50 | 51 | # test starting lines 52 | self.assertEqual(lines[0], "1") 53 | self.assertEqual(lines[1], "00:00:00,000 --> 00:00:01,500") 54 | 55 | def test_srt_formatter_middle(self): 56 | content = SRTFormatter().format_transcript(self.transcript) 57 | lines = content.split("\n") 58 | 59 | # test middle lines 60 | self.assertEqual(lines[4], "2") 61 | self.assertEqual(lines[5], "00:00:01,500 --> 00:00:02,500") 62 | self.assertEqual(lines[6], self.transcript_raw[1]["text"]) 63 | 64 | def test_srt_formatter_ending(self): 65 | content = SRTFormatter().format_transcript(self.transcript) 66 | lines = content.split("\n") 67 | 68 | # test ending lines 69 | self.assertEqual(lines[-2], self.transcript_raw[-1]["text"]) 70 | self.assertEqual(lines[-1], "") 71 | 72 | def test_srt_formatter_many(self): 73 | formatter = SRTFormatter() 74 | content = formatter.format_transcripts(self.transcripts) 75 | formatted_single_transcript = formatter.format_transcript(self.transcript) 76 | 77 | self.assertEqual( 78 | content, 79 | formatted_single_transcript + "\n\n\n" + formatted_single_transcript, 80 | ) 81 | 82 | def test_webvtt_formatter_starting(self): 83 | content = WebVTTFormatter().format_transcript(self.transcript) 84 | lines = content.split("\n") 85 | 86 | # test starting lines 87 | self.assertEqual(lines[0], "WEBVTT") 88 | self.assertEqual(lines[1], "") 89 | 90 | def test_webvtt_formatter_ending(self): 91 | content = WebVTTFormatter().format_transcript(self.transcript) 92 | lines = content.split("\n") 93 | 94 | # test ending lines 95 | self.assertEqual(lines[-2], self.transcript_raw[-1]["text"]) 96 | self.assertEqual(lines[-1], "") 97 | 98 | def test_webvtt_formatter_many(self): 99 | formatter = WebVTTFormatter() 100 | content = formatter.format_transcripts(self.transcripts) 101 | formatted_single_transcript = formatter.format_transcript(self.transcript) 102 | 103 | self.assertEqual( 104 | content, 105 | formatted_single_transcript + "\n\n\n" + formatted_single_transcript, 106 | ) 107 | 108 | def test_pretty_print_formatter(self): 109 | content = PrettyPrintFormatter().format_transcript(self.transcript) 110 | 111 | self.assertEqual(content, pprint.pformat(self.transcript_raw)) 112 | 113 | def test_pretty_print_formatter_many(self): 114 | content = PrettyPrintFormatter().format_transcripts(self.transcripts) 115 | 116 | self.assertEqual(content, pprint.pformat(self.transcripts_raw)) 117 | 118 | def test_json_formatter(self): 119 | content = JSONFormatter().format_transcript(self.transcript) 120 | 121 | self.assertEqual(json.loads(content), self.transcript_raw) 122 | 123 | def test_json_formatter_many(self): 124 | content = JSONFormatter().format_transcripts(self.transcripts) 125 | 126 | self.assertEqual(json.loads(content), self.transcripts_raw) 127 | 128 | def test_text_formatter(self): 129 | content = TextFormatter().format_transcript(self.transcript) 130 | lines = content.split("\n") 131 | 132 | self.assertEqual(lines[0], self.transcript_raw[0]["text"]) 133 | self.assertEqual(lines[-1], self.transcript_raw[-1]["text"]) 134 | 135 | def test_text_formatter_many(self): 136 | formatter = TextFormatter() 137 | content = formatter.format_transcripts(self.transcripts) 138 | formatted_single_transcript = formatter.format_transcript(self.transcript) 139 | 140 | self.assertEqual( 141 | content, 142 | formatted_single_transcript + "\n\n\n" + formatted_single_transcript, 143 | ) 144 | 145 | def test_formatter_loader(self): 146 | loader = FormatterLoader() 147 | formatter = loader.load("json") 148 | 149 | self.assertTrue(isinstance(formatter, JSONFormatter)) 150 | 151 | def test_formatter_loader__default_formatter(self): 152 | loader = FormatterLoader() 153 | formatter = loader.load() 154 | 155 | self.assertTrue(isinstance(formatter, PrettyPrintFormatter)) 156 | 157 | def test_formatter_loader__unknown_format(self): 158 | with self.assertRaises(FormatterLoader.UnknownFormatterType): 159 | FormatterLoader().load("png") 160 | -------------------------------------------------------------------------------- /youtube_transcript_api/_api.py: -------------------------------------------------------------------------------- 1 | from typing import Optional, Iterable 2 | 3 | from requests import Session 4 | from requests.adapters import HTTPAdapter 5 | from urllib3 import Retry 6 | 7 | from .proxies import ProxyConfig 8 | 9 | from ._transcripts import TranscriptListFetcher, FetchedTranscript, TranscriptList 10 | 11 | 12 | class YouTubeTranscriptApi: 13 | def __init__( 14 | self, 15 | proxy_config: Optional[ProxyConfig] = None, 16 | http_client: Optional[Session] = None, 17 | ): 18 | """ 19 | Note on thread-safety: As this class will initialize a `requests.Session` 20 | object, it is not thread-safe. Make sure to initialize an instance of 21 | `YouTubeTranscriptApi` per thread, if used in a multi-threading scenario! 22 | 23 | :param proxy_config: an optional ProxyConfig object, defining proxies used for 24 | all network requests. This can be used to work around your IP being blocked 25 | by YouTube, as described in the "Working around IP bans" section of the 26 | README 27 | (https://github.com/jdepoix/youtube-transcript-api?tab=readme-ov-file#working-around-ip-bans-requestblocked-or-ipblocked-exception) 28 | :param http_client: You can optionally pass in a requests.Session object, if you 29 | manually want to share cookies between different instances of 30 | `YouTubeTranscriptApi`, overwrite defaults, specify SSL certificates, etc. 31 | """ 32 | http_client = Session() if http_client is None else http_client 33 | http_client.headers.update({"Accept-Language": "en-US"}) 34 | # Cookie auth has been temporarily disabled, as it is not working properly with 35 | # YouTube's most recent changes. 36 | # if cookie_path is not None: 37 | # http_client.cookies = _load_cookie_jar(cookie_path) 38 | if proxy_config is not None: 39 | http_client.proxies = proxy_config.to_requests_dict() 40 | if proxy_config.prevent_keeping_connections_alive: 41 | http_client.headers.update({"Connection": "close"}) 42 | if proxy_config.retries_when_blocked > 0: 43 | retry_config = Retry( 44 | total=proxy_config.retries_when_blocked, 45 | status_forcelist=[429], 46 | ) 47 | http_client.mount("http://", HTTPAdapter(max_retries=retry_config)) 48 | http_client.mount("https://", HTTPAdapter(max_retries=retry_config)) 49 | self._fetcher = TranscriptListFetcher(http_client, proxy_config=proxy_config) 50 | 51 | def fetch( 52 | self, 53 | video_id: str, 54 | languages: Iterable[str] = ("en",), 55 | preserve_formatting: bool = False, 56 | ) -> FetchedTranscript: 57 | """ 58 | Retrieves the transcript for a single video. This is just a shortcut for 59 | calling: 60 | `YouTubeTranscriptApi().list(video_id).find_transcript(languages).fetch(preserve_formatting=preserve_formatting)` 61 | 62 | :param video_id: the ID of the video you want to retrieve the transcript for. 63 | Make sure that this is the actual ID, NOT the full URL to the video! 64 | :param languages: A list of language codes in a descending priority. For 65 | example, if this is set to ["de", "en"] it will first try to fetch the 66 | german transcript (de) and then fetch the english transcript (en) if 67 | it fails to do so. This defaults to ["en"]. 68 | :param preserve_formatting: whether to keep select HTML text formatting 69 | """ 70 | return ( 71 | self.list(video_id) 72 | .find_transcript(languages) 73 | .fetch(preserve_formatting=preserve_formatting) 74 | ) 75 | 76 | def list( 77 | self, 78 | video_id: str, 79 | ) -> TranscriptList: 80 | """ 81 | Retrieves the list of transcripts which are available for a given video. It 82 | returns a `TranscriptList` object which is iterable and provides methods to 83 | filter the list of transcripts for specific languages. While iterating over 84 | the `TranscriptList` the individual transcripts are represented by 85 | `Transcript` objects, which provide metadata and can either be fetched by 86 | calling `transcript.fetch()` or translated by calling `transcript.translate( 87 | 'en')`. Example: 88 | 89 | ``` 90 | ytt_api = YouTubeTranscriptApi() 91 | 92 | # retrieve the available transcripts 93 | transcript_list = ytt_api.list('video_id') 94 | 95 | # iterate over all available transcripts 96 | for transcript in transcript_list: 97 | # the Transcript object provides metadata properties 98 | print( 99 | transcript.video_id, 100 | transcript.language, 101 | transcript.language_code, 102 | # whether it has been manually created or generated by YouTube 103 | transcript.is_generated, 104 | # a list of languages the transcript can be translated to 105 | transcript.translation_languages, 106 | ) 107 | 108 | # fetch the actual transcript data 109 | print(transcript.fetch()) 110 | 111 | # translating the transcript will return another transcript object 112 | print(transcript.translate('en').fetch()) 113 | 114 | # you can also directly filter for the language you are looking for, using the transcript list 115 | transcript = transcript_list.find_transcript(['de', 'en']) 116 | 117 | # or just filter for manually created transcripts 118 | transcript = transcript_list.find_manually_created_transcript(['de', 'en']) 119 | 120 | # or automatically generated ones 121 | transcript = transcript_list.find_generated_transcript(['de', 'en']) 122 | ``` 123 | 124 | :param video_id: the ID of the video you want to retrieve the transcript for. 125 | Make sure that this is the actual ID, NOT the full URL to the video! 126 | """ 127 | return self._fetcher.fetch(video_id) 128 | -------------------------------------------------------------------------------- /youtube_transcript_api/proxies.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | from typing import TypedDict, Optional, List 3 | 4 | 5 | class InvalidProxyConfig(Exception): 6 | pass 7 | 8 | 9 | class RequestsProxyConfigDict(TypedDict): 10 | """ 11 | This type represents the Dict that is used by the requests library to configure 12 | the proxies used. More information on this can be found in the official requests 13 | documentation: https://requests.readthedocs.io/en/latest/user/advanced/#proxies 14 | """ 15 | 16 | http: str 17 | https: str 18 | 19 | 20 | class ProxyConfig(ABC): 21 | """ 22 | The base class for all proxy configs. Anything can be a proxy config, as longs as 23 | it can be turned into a `RequestsProxyConfigDict` by calling `to_requests_dict`. 24 | """ 25 | 26 | @abstractmethod 27 | def to_requests_dict(self) -> RequestsProxyConfigDict: 28 | """ 29 | Turns this proxy config into the Dict that is expected by the requests library. 30 | More information on this can be found in the official requests documentation: 31 | https://requests.readthedocs.io/en/latest/user/advanced/#proxies 32 | """ 33 | pass 34 | 35 | @property 36 | def prevent_keeping_connections_alive(self) -> bool: 37 | """ 38 | If you are using rotating proxies, it can be useful to prevent the HTTP 39 | client from keeping TCP connections alive, as your IP won't be rotated on 40 | every request, if your connection stays open. 41 | """ 42 | return False 43 | 44 | @property 45 | def retries_when_blocked(self) -> int: 46 | """ 47 | Defines how many times we should retry if a request is blocked. When using 48 | rotating residential proxies with a large IP pool it can make sense to retry a 49 | couple of times when a blocked IP is encountered, since a retry will trigger 50 | an IP rotation and the next IP might not be blocked. 51 | """ 52 | return 0 53 | 54 | 55 | class GenericProxyConfig(ProxyConfig): 56 | """ 57 | This proxy config can be used to set up any generic HTTP/HTTPS/SOCKS proxy. As it 58 | the requests library is used under the hood, you can follow the requests 59 | documentation to get more detailed information on how to set up proxies: 60 | https://requests.readthedocs.io/en/latest/user/advanced/#proxies 61 | 62 | If only an HTTP or an HTTPS proxy is provided, it will be used for both types of 63 | connections. However, you will have to provide at least one of the two. 64 | """ 65 | 66 | def __init__(self, http_url: Optional[str] = None, https_url: Optional[str] = None): 67 | """ 68 | If only an HTTP or an HTTPS proxy is provided, it will be used for both types of 69 | connections. However, you will have to provide at least one of the two. 70 | 71 | :param http_url: the proxy URL used for HTTP requests. Defaults to `https_url` 72 | if None. 73 | :param https_url: the proxy URL used for HTTPS requests. Defaults to `http_url` 74 | if None. 75 | """ 76 | if not http_url and not https_url: 77 | raise InvalidProxyConfig( 78 | "GenericProxyConfig requires you to define at least one of the two: " 79 | "http or https" 80 | ) 81 | self.http_url = http_url 82 | self.https_url = https_url 83 | 84 | def to_requests_dict(self) -> RequestsProxyConfigDict: 85 | return { 86 | "http": self.http_url or self.https_url, 87 | "https": self.https_url or self.http_url, 88 | } 89 | 90 | 91 | class WebshareProxyConfig(GenericProxyConfig): 92 | """ 93 | Webshare is a provider offering rotating residential proxies, which is the 94 | most reliable way to work around being blocked by YouTube. 95 | 96 | If you don't have a Webshare account yet, you will have to create one 97 | at https://www.webshare.io/?referral_code=w0xno53eb50g and purchase a "Residential" 98 | proxy package that suits your workload, to be able to use this proxy config (make 99 | sure NOT to purchase "Proxy Server" or "Static Residential"!). 100 | 101 | Once you have created an account you only need the "Proxy Username" and 102 | "Proxy Password" that you can find in your Webshare settings 103 | at https://dashboard.webshare.io/proxy/settings to set up this config class, which 104 | will take care of setting up your proxies as needed, by defaulting to rotating 105 | proxies. 106 | 107 | Note that referral links are used here and any purchases made through these links 108 | will support this Open Source project, which is very much appreciated! :) 109 | However, you can of course integrate your own proxy solution by using the 110 | `GenericProxyConfig` class, if that's what you prefer. 111 | """ 112 | 113 | DEFAULT_DOMAIN_NAME = "p.webshare.io" 114 | DEFAULT_PORT = 80 115 | 116 | def __init__( 117 | self, 118 | proxy_username: str, 119 | proxy_password: str, 120 | filter_ip_locations: Optional[List[str]] = None, 121 | retries_when_blocked: int = 10, 122 | domain_name: str = DEFAULT_DOMAIN_NAME, 123 | proxy_port: int = DEFAULT_PORT, 124 | ): 125 | """ 126 | Once you have created a Webshare account at 127 | https://www.webshare.io/?referral_code=w0xno53eb50g and purchased a 128 | "Residential" package (make sure NOT to purchase "Proxy Server" or 129 | "Static Residential"!), this config class allows you to easily use it, 130 | by defaulting to the most reliable proxy settings (rotating residential 131 | proxies). 132 | 133 | :param proxy_username: "Proxy Username" found at 134 | https://dashboard.webshare.io/proxy/settings 135 | :param proxy_password: "Proxy Password" found at 136 | https://dashboard.webshare.io/proxy/settings 137 | :param filter_ip_locations: If you want to limit the pool of IPs that you will 138 | be rotating through to those located in specific countries, you can provide 139 | a list of location codes here. By choosing locations that are close to the 140 | machine that is running this code, you can reduce latency. Also, this can 141 | be used to work around location-based restrictions. 142 | You can find the full list of available locations (and how many IPs are 143 | available in each location) at 144 | https://www.webshare.io/features/proxy-locations?referral_code=w0xno53eb50g 145 | :param retries_when_blocked: Define how many times we should retry if a request 146 | is blocked. When using rotating residential proxies with a large IP pool it 147 | makes sense to retry a couple of times when a blocked IP is encountered, 148 | since a retry will trigger an IP rotation and the next IP might not be 149 | blocked. Defaults to 10. 150 | """ 151 | self.proxy_username = proxy_username 152 | self.proxy_password = proxy_password 153 | self.domain_name = domain_name 154 | self.proxy_port = proxy_port 155 | self._filter_ip_locations = filter_ip_locations or [] 156 | self._retries_when_blocked = retries_when_blocked 157 | 158 | @property 159 | def url(self) -> str: 160 | location_codes = "".join( 161 | f"-{location_code.upper()}" for location_code in self._filter_ip_locations 162 | ) 163 | return ( 164 | f"http://{self.proxy_username}{location_codes}-rotate:{self.proxy_password}" 165 | f"@{self.domain_name}:{self.proxy_port}/" 166 | ) 167 | 168 | @property 169 | def http_url(self) -> str: 170 | return self.url 171 | 172 | @property 173 | def https_url(self) -> str: 174 | return self.url 175 | 176 | @property 177 | def prevent_keeping_connections_alive(self) -> bool: 178 | return True 179 | 180 | @property 181 | def retries_when_blocked(self) -> int: 182 | return self._retries_when_blocked 183 | -------------------------------------------------------------------------------- /youtube_transcript_api/_cli.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from importlib.metadata import PackageNotFoundError, version 3 | from typing import List 4 | 5 | from .proxies import GenericProxyConfig, WebshareProxyConfig 6 | from .formatters import FormatterLoader 7 | 8 | from ._api import YouTubeTranscriptApi, FetchedTranscript, TranscriptList 9 | 10 | 11 | class YouTubeTranscriptCli: 12 | def __init__(self, args: List[str]): 13 | self._args = args 14 | 15 | def run(self) -> str: 16 | parsed_args = self._parse_args() 17 | 18 | if parsed_args.exclude_manually_created and parsed_args.exclude_generated: 19 | return "" 20 | 21 | proxy_config = None 22 | if parsed_args.http_proxy != "" or parsed_args.https_proxy != "": 23 | proxy_config = GenericProxyConfig( 24 | http_url=parsed_args.http_proxy, 25 | https_url=parsed_args.https_proxy, 26 | ) 27 | 28 | if ( 29 | parsed_args.webshare_proxy_username is not None 30 | or parsed_args.webshare_proxy_password is not None 31 | ): 32 | proxy_config = WebshareProxyConfig( 33 | proxy_username=parsed_args.webshare_proxy_username, 34 | proxy_password=parsed_args.webshare_proxy_password, 35 | ) 36 | 37 | transcripts = [] 38 | exceptions = [] 39 | 40 | ytt_api = YouTubeTranscriptApi( 41 | proxy_config=proxy_config, 42 | ) 43 | 44 | for video_id in parsed_args.video_ids: 45 | try: 46 | transcript_list = ytt_api.list(video_id) 47 | if parsed_args.list_transcripts: 48 | transcripts.append(transcript_list) 49 | else: 50 | transcripts.append( 51 | self._fetch_transcript( 52 | parsed_args, 53 | transcript_list, 54 | ) 55 | ) 56 | except Exception as exception: 57 | exceptions.append(exception) 58 | 59 | print_sections = [str(exception) for exception in exceptions] 60 | if transcripts: 61 | if parsed_args.list_transcripts: 62 | print_sections.extend( 63 | str(transcript_list) for transcript_list in transcripts 64 | ) 65 | else: 66 | print_sections.append( 67 | FormatterLoader() 68 | .load(parsed_args.format) 69 | .format_transcripts(transcripts) 70 | ) 71 | 72 | return "\n\n".join(print_sections) 73 | 74 | def _fetch_transcript( 75 | self, 76 | parsed_args, 77 | transcript_list: TranscriptList, 78 | ) -> FetchedTranscript: 79 | if parsed_args.exclude_manually_created: 80 | transcript = transcript_list.find_generated_transcript( 81 | parsed_args.languages 82 | ) 83 | elif parsed_args.exclude_generated: 84 | transcript = transcript_list.find_manually_created_transcript( 85 | parsed_args.languages 86 | ) 87 | else: 88 | transcript = transcript_list.find_transcript(parsed_args.languages) 89 | 90 | if parsed_args.translate: 91 | transcript = transcript.translate(parsed_args.translate) 92 | 93 | return transcript.fetch() 94 | 95 | def _get_version(self): 96 | try: 97 | return version("youtube-transcript-api") 98 | except PackageNotFoundError: 99 | return "unknown" 100 | 101 | def _parse_args(self): 102 | parser = argparse.ArgumentParser( 103 | description=( 104 | "This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. " 105 | "It also works for automatically generated subtitles and it does not require a headless browser, like " 106 | "other selenium based solutions do!" 107 | ) 108 | ) 109 | parser.add_argument( 110 | "--version", 111 | action="version", 112 | version=f"%(prog)s, version {self._get_version()}", 113 | ) 114 | parser.add_argument( 115 | "--list-transcripts", 116 | action="store_const", 117 | const=True, 118 | default=False, 119 | help="This will list the languages in which the given videos are available in.", 120 | ) 121 | parser.add_argument( 122 | "video_ids", nargs="+", type=str, help="List of YouTube video IDs." 123 | ) 124 | parser.add_argument( 125 | "--languages", 126 | nargs="*", 127 | default=[ 128 | "en", 129 | ], 130 | type=str, 131 | help=( 132 | 'A list of language codes in a descending priority. For example, if this is set to "de en" it will ' 133 | "first try to fetch the german transcript (de) and then fetch the english transcript (en) if it fails " 134 | "to do so. As I can't provide a complete list of all working language codes with full certainty, you " 135 | "may have to play around with the language codes a bit, to find the one which is working for you!" 136 | ), 137 | ) 138 | parser.add_argument( 139 | "--exclude-generated", 140 | action="store_const", 141 | const=True, 142 | default=False, 143 | help="If this flag is set transcripts which have been generated by YouTube will not be retrieved.", 144 | ) 145 | parser.add_argument( 146 | "--exclude-manually-created", 147 | action="store_const", 148 | const=True, 149 | default=False, 150 | help="If this flag is set transcripts which have been manually created will not be retrieved.", 151 | ) 152 | parser.add_argument( 153 | "--format", 154 | type=str, 155 | default="pretty", 156 | choices=tuple(FormatterLoader.TYPES.keys()), 157 | ) 158 | parser.add_argument( 159 | "--translate", 160 | default="", 161 | help=( 162 | "The language code for the language you want this transcript to be translated to. Use the " 163 | "--list-transcripts feature to find out which languages are translatable and which translation " 164 | "languages are available." 165 | ), 166 | ) 167 | parser.add_argument( 168 | "--webshare-proxy-username", 169 | default=None, 170 | type=str, 171 | help='Specify your Webshare "Proxy Username" found at https://dashboard.webshare.io/proxy/settings', 172 | ) 173 | parser.add_argument( 174 | "--webshare-proxy-password", 175 | default=None, 176 | type=str, 177 | help='Specify your Webshare "Proxy Password" found at https://dashboard.webshare.io/proxy/settings', 178 | ) 179 | parser.add_argument( 180 | "--http-proxy", 181 | default="", 182 | metavar="URL", 183 | help="Use the specified HTTP proxy.", 184 | ) 185 | parser.add_argument( 186 | "--https-proxy", 187 | default="", 188 | metavar="URL", 189 | help="Use the specified HTTPS proxy.", 190 | ) 191 | # Cookie auth has been temporarily disabled, as it is not working properly with 192 | # YouTube's most recent changes. 193 | # parser.add_argument( 194 | # "--cookies", 195 | # default=None, 196 | # help="The cookie file that will be used for authorization with youtube.", 197 | # ) 198 | 199 | return self._sanitize_video_ids(parser.parse_args(self._args)) 200 | 201 | def _sanitize_video_ids(self, args): 202 | args.video_ids = [video_id.replace("\\", "") for video_id in args.video_ids] 203 | return args 204 | -------------------------------------------------------------------------------- /youtube_transcript_api/test/assets/youtube_too_many_requests.html.static: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | YouTube 5 | 11 | 54 | 59 | 64 | 69 | 74 | 79 | 80 | 81 |

82 |

83 |

84 | Perdón por la interrupción. Hemos recibido un gran número de 85 | solicitudes de tu red. 86 |

87 |

88 | Para seguir disfrutando de YouTube, rellena el siguiente formulario. 89 |

90 | 105 | 198 | 235 |

236 | 237 |

238 | 239 | 240 | -------------------------------------------------------------------------------- /youtube_transcript_api/formatters.py: -------------------------------------------------------------------------------- 1 | import json 2 | 3 | import pprint 4 | from typing import List, Iterable 5 | 6 | from ._transcripts import FetchedTranscript, FetchedTranscriptSnippet 7 | 8 | 9 | class Formatter: 10 | """Formatter should be used as an abstract base class. 11 | 12 | Formatter classes should inherit from this class and implement 13 | their own .format() method which should return a string. A 14 | transcript is represented by a List of Dictionary items. 15 | """ 16 | 17 | def format_transcript(self, transcript: FetchedTranscript, **kwargs) -> str: 18 | raise NotImplementedError( 19 | "A subclass of Formatter must implement " 20 | "their own .format_transcript() method." 21 | ) 22 | 23 | def format_transcripts(self, transcripts: List[FetchedTranscript], **kwargs): 24 | raise NotImplementedError( 25 | "A subclass of Formatter must implement " 26 | "their own .format_transcripts() method." 27 | ) 28 | 29 | 30 | class PrettyPrintFormatter(Formatter): 31 | def format_transcript(self, transcript: FetchedTranscript, **kwargs) -> str: 32 | """Pretty prints a transcript. 33 | 34 | :param transcript: 35 | :return: A pretty printed string representation of the transcript. 36 | """ 37 | return pprint.pformat(transcript.to_raw_data(), **kwargs) 38 | 39 | def format_transcripts(self, transcripts: List[FetchedTranscript], **kwargs) -> str: 40 | """Converts a list of transcripts into a JSON string. 41 | 42 | :param transcripts: 43 | :return: A JSON string representation of the transcript. 44 | """ 45 | return pprint.pformat( 46 | [transcript.to_raw_data() for transcript in transcripts], **kwargs 47 | ) 48 | 49 | 50 | class JSONFormatter(Formatter): 51 | def format_transcript(self, transcript: FetchedTranscript, **kwargs) -> str: 52 | """Converts a transcript into a JSON string. 53 | 54 | :param transcript: 55 | :return: A JSON string representation of the transcript. 56 | """ 57 | return json.dumps(transcript.to_raw_data(), **kwargs) 58 | 59 | def format_transcripts(self, transcripts: List[FetchedTranscript], **kwargs) -> str: 60 | """Converts a list of transcripts into a JSON string. 61 | 62 | :param transcripts: 63 | :return: A JSON string representation of the transcript. 64 | """ 65 | return json.dumps( 66 | [transcript.to_raw_data() for transcript in transcripts], **kwargs 67 | ) 68 | 69 | 70 | class TextFormatter(Formatter): 71 | def format_transcript(self, transcript: FetchedTranscript, **kwargs) -> str: 72 | """Converts a transcript into plain text with no timestamps. 73 | 74 | :param transcript: 75 | :return: all transcript text lines separated by newline breaks. 76 | """ 77 | return "\n".join(line.text for line in transcript) 78 | 79 | def format_transcripts(self, transcripts: List[FetchedTranscript], **kwargs) -> str: 80 | """Converts a list of transcripts into plain text with no timestamps. 81 | 82 | :param transcripts: 83 | :return: all transcript text lines separated by newline breaks. 84 | """ 85 | return "\n\n\n".join( 86 | [self.format_transcript(transcript, **kwargs) for transcript in transcripts] 87 | ) 88 | 89 | 90 | class _TextBasedFormatter(TextFormatter): 91 | def _format_timestamp(self, hours: int, mins: int, secs: int, ms: int) -> str: 92 | raise NotImplementedError( 93 | "A subclass of _TextBasedFormatter must implement " 94 | "their own .format_timestamp() method." 95 | ) 96 | 97 | def _format_transcript_header(self, lines: Iterable[str]) -> str: 98 | raise NotImplementedError( 99 | "A subclass of _TextBasedFormatter must implement " 100 | "their own _format_transcript_header method." 101 | ) 102 | 103 | def _format_transcript_helper( 104 | self, i: int, time_text: str, snippet: FetchedTranscriptSnippet 105 | ) -> str: 106 | raise NotImplementedError( 107 | "A subclass of _TextBasedFormatter must implement " 108 | "their own _format_transcript_helper method." 109 | ) 110 | 111 | def _seconds_to_timestamp(self, time: float) -> str: 112 | """Helper that converts `time` into a transcript cue timestamp. 113 | 114 | :reference: https://www.w3.org/TR/webvtt1/#webvtt-timestamp 115 | 116 | :param time: a float representing time in seconds. 117 | :type time: float 118 | :return: a string formatted as a cue timestamp, 'HH:MM:SS.MS' 119 | :example: 120 | >>> self._seconds_to_timestamp(6.93) 121 | '00:00:06.930' 122 | """ 123 | time = float(time) 124 | hours_float, remainder = divmod(time, 3600) 125 | mins_float, secs_float = divmod(remainder, 60) 126 | hours, mins, secs = int(hours_float), int(mins_float), int(secs_float) 127 | ms = int(round((time - int(time)) * 1000, 2)) 128 | return self._format_timestamp(hours, mins, secs, ms) 129 | 130 | def format_transcript(self, transcript: FetchedTranscript, **kwargs) -> str: 131 | """A basic implementation of WEBVTT/SRT formatting. 132 | 133 | :param transcript: 134 | :reference: 135 | https://www.w3.org/TR/webvtt1/#introduction-caption 136 | https://www.3playmedia.com/blog/create-srt-file/ 137 | """ 138 | lines = [] 139 | for i, line in enumerate(transcript): 140 | end = line.start + line.duration 141 | time_text = "{} --> {}".format( 142 | self._seconds_to_timestamp(line.start), 143 | self._seconds_to_timestamp( 144 | transcript[i + 1].start 145 | if i < len(transcript) - 1 and transcript[i + 1].start < end 146 | else end 147 | ), 148 | ) 149 | lines.append(self._format_transcript_helper(i, time_text, line)) 150 | 151 | return self._format_transcript_header(lines) 152 | 153 | 154 | class SRTFormatter(_TextBasedFormatter): 155 | def _format_timestamp(self, hours: int, mins: int, secs: int, ms: int) -> str: 156 | return "{:02d}:{:02d}:{:02d},{:03d}".format(hours, mins, secs, ms) 157 | 158 | def _format_transcript_header(self, lines: Iterable[str]) -> str: 159 | return "\n\n".join(lines) + "\n" 160 | 161 | def _format_transcript_helper( 162 | self, i: int, time_text: str, snippet: FetchedTranscriptSnippet 163 | ) -> str: 164 | return "{}\n{}\n{}".format(i + 1, time_text, snippet.text) 165 | 166 | 167 | class WebVTTFormatter(_TextBasedFormatter): 168 | def _format_timestamp(self, hours: int, mins: int, secs: int, ms: int) -> str: 169 | return "{:02d}:{:02d}:{:02d}.{:03d}".format(hours, mins, secs, ms) 170 | 171 | def _format_transcript_header(self, lines: Iterable[str]) -> str: 172 | return "WEBVTT\n\n" + "\n\n".join(lines) + "\n" 173 | 174 | def _format_transcript_helper( 175 | self, i: int, time_text: str, snippet: FetchedTranscriptSnippet 176 | ) -> str: 177 | return "{}\n{}".format(time_text, snippet.text) 178 | 179 | 180 | class FormatterLoader: 181 | TYPES = { 182 | "json": JSONFormatter, 183 | "pretty": PrettyPrintFormatter, 184 | "text": TextFormatter, 185 | "webvtt": WebVTTFormatter, 186 | "srt": SRTFormatter, 187 | } 188 | 189 | class UnknownFormatterType(Exception): 190 | def __init__(self, formatter_type: str): 191 | super().__init__( 192 | "The format '{formatter_type}' is not supported. " 193 | "Choose one of the following formats: {supported_formatter_types}".format( 194 | formatter_type=formatter_type, 195 | supported_formatter_types=", ".join(FormatterLoader.TYPES.keys()), 196 | ) 197 | ) 198 | 199 | def load(self, formatter_type: str = "pretty") -> Formatter: 200 | """ 201 | Loads the Formatter for the given formatter type. 202 | 203 | :param formatter_type: 204 | :return: Formatter object 205 | """ 206 | if formatter_type not in FormatterLoader.TYPES.keys(): 207 | raise FormatterLoader.UnknownFormatterType(formatter_type) 208 | return FormatterLoader.TYPES[formatter_type]() 209 | -------------------------------------------------------------------------------- /youtube_transcript_api/test/assets/youtube_video_unavailable.innertube.json.static: -------------------------------------------------------------------------------- 1 | { 2 | "responseContext": { 3 | "visitorData": "CgtkVnYwR1MzN3pQTSjGu6bCBjIKCgJERRIEEgAgLjoMCAEgzKLJvOe456RoWJff7Nfpi977SA%3D%3D", 4 | "serviceTrackingParams": [ 5 | { 6 | "service": "GFEEDBACK", 7 | "params": [ 8 | { 9 | "key": "is_viewed_live", 10 | "value": "False" 11 | }, 12 | { 13 | "key": "logged_in", 14 | "value": "0" 15 | }, 16 | { 17 | "key": "e", 18 | "value": "9406121,23888716,24004644,24077241,24078649,24104894,24132305,24143331,24166867,24181174,24230811,24232551,24241378,24290153,24397985,24484132,24522874,24556101,24585737,39325413,39328442,39328646,51010008,51020570,51025415,51054999,51067700,51068313,51080128,51086511,51095478,51115184,51129105,51132535,51137671,51141472,51152050,51175149,51179435,51179748,51183208,51183910,51217334,51224491,51227037,51237842,51242448,51248777,51256074,51272458,51295372,51303432,51306453,51313109,51313767,51314496,51324733,51326139,51349914,51353393,51354083,51354114,51356621,51366423,51366620,51372971,51375205,51375719,51386541,51388661,51397332,51398647,51402689,51404808,51404810,51420457,51421832,51428417,51428624,51429106,51430311,51432294,51432529,51432560,51436956,51437205,51439763,51439874,51441100,51442501,51443707,51447191,51452420,51453239,51456413,51456629,51458927,51459424,51460559,51461268,51462839,51463930,51466698,51466900,51467076,51467525,51469820,51471138,51471685,51475592,51475960,51478931,51479781,51483631,51483888,51484221,51484412,51484746,51485249,51485417,51486018,51486471,51487681,51489047,51490158,51491436,51492252,51495585,51495706,51495859,51497133,51498591,51503024,51503027,51504828,51506681,51506715,51507237,51508739,51508979,51509214,51509314,51509614,51512708" 19 | }, 20 | { 21 | "key": "visitor_data", 22 | "value": "CgtkVnYwR1MzN3pQTSjGu6bCBjIKCgJERRIEEgAgLjoMCAEgzKLJvOe456Ro" 23 | } 24 | ] 25 | }, 26 | { 27 | "service": "CSI", 28 | "params": [ 29 | { 30 | "key": "c", 31 | "value": "ANDROID" 32 | }, 33 | { 34 | "key": "cver", 35 | "value": "20.10.38" 36 | }, 37 | { 38 | "key": "yt_li", 39 | "value": "0" 40 | }, 41 | { 42 | "key": "GetPlayer_rid", 43 | "value": "0xfa514d3157e18a41" 44 | } 45 | ] 46 | }, 47 | { 48 | "service": "GUIDED_HELP", 49 | "params": [ 50 | { 51 | "key": "logged_in", 52 | "value": "0" 53 | } 54 | ] 55 | }, 56 | { 57 | "service": "ECATCHER", 58 | "params": [ 59 | { 60 | "key": "client.version", 61 | "value": "20.10" 62 | }, 63 | { 64 | "key": "client.name", 65 | "value": "ANDROID" 66 | } 67 | ] 68 | }, 69 | { 70 | "service": "LISTNR", 71 | "params": [ 72 | { 73 | "key": "e", 74 | "value": "51498591,51183208,24181216,51366423,51248777,51397332,51478931,24522874,51442501,51175149,51388661,24024517,51508739,51466900,51479781,51486471,51461268,51495585,51421832,51375205,24195012,51439763,51504828,51441100,51489047,51189308,51020570,51086511,51456413,51306453,51353393,51137671,51313767,51430311,51354114,51217334,51469823,51471138,51490158,51303432,51054999,51461791,51467525,51429106,24104894,51466698,39328646,51237842,51506715,24181174,51475960,51469820,51483888,51010008,51486018,51506681,39325413,51179748,51483631,51404808,51375719,51484412,51452420,24143331,51467076,24290153,51492252,51507237,51314496,51491436,51436956,51512708,51256074,51461795,51067700,51485417,51495706,51402689,24033252,51295372,51428417,24397985,51272458,51484746,51453239,51192010,51459424,51432294,51484221,51503027,24585737,51432560,51354083,24556101,24166867,51509214,51475592,24254870,51485249,51132535,51324733,51509314,51179435,51428624,51447191,24220751,51270362,24250570,51372971,51509614,51456629,51497133,51202133,51242448,51349914,51387900,51439874,51080128,51443707,24232551,51025415,51458927,51462839,24230811,51129105,51141472,51404810,51095478,51463930,51495859,51152050,51508979,51420457,51313109,24286257,51366620,39328442" 75 | } 76 | ] 77 | } 78 | ], 79 | "maxAgeSeconds": 0, 80 | "rolloutToken": "CMimoJKC4PuILBDn0eTd1OmNAxjn0eTd1OmNAw%3D%3D" 81 | }, 82 | "playabilityStatus": { 83 | "status": "ERROR", 84 | "reason": "This video is unavailable", 85 | "errorScreen": { 86 | "playerErrorMessageRenderer": { 87 | "reason": { 88 | "runs": [ 89 | { 90 | "text": "This video is unavailable" 91 | } 92 | ] 93 | }, 94 | "thumbnail": { 95 | "thumbnails": [ 96 | { 97 | "url": "//s.ytimg.com/yts/img/meh7-vflGevej7.png", 98 | "width": 140, 99 | "height": 100 100 | } 101 | ] 102 | }, 103 | "icon": { 104 | "iconType": "ERROR_OUTLINE" 105 | } 106 | } 107 | }, 108 | "contextParams": "Q0FBU0FnZ0E=" 109 | }, 110 | "trackingParams": "CAAQu2kiEwi20OTd1OmNAxUY-UIFHVxGIYo=", 111 | "onResponseReceivedActions": [ 112 | { 113 | "clickTrackingParams": "CAAQu2kiEwi20OTd1OmNAxUY-UIFHVxGIYo=", 114 | "startEomFlowCommand": { 115 | "eomFlowRenderer": { 116 | "webViewRenderer": { 117 | "url": { 118 | "privateDoNotAccessOrElseTrustedResourceUrlWrappedValue": "https://consent.youtube.com/yt-app-main?gl=DE&m=1&pc=yt&cm=2&hl=en&src=1&app=1&vd=CgtkVnYwR1MzN3pQTSjGu6bCBjIKCgJERRIEEgAgLjoMCAEgzKLJvOe456Ro&utm_source=YT_ANDROID&dt=0&av=20.10.38" 119 | }, 120 | "onFailureCommand": { 121 | "clickTrackingParams": "CAIQmawJIhMIttDk3dTpjQMVGPlCBR1cRiGK", 122 | "updateEomStateCommand": { 123 | "mobileEomFlowState": { 124 | "updatedVisitorData": "CgtkVnYwR1MzN3pQTSjGu6bCBjIKCgJERRIEEgAgLjoaCAEaDAjGu6bCBhDA5v3UAiDMosm857jnpGg%3D", 125 | "isError": true 126 | } 127 | } 128 | }, 129 | "trackingParams": "CAIQmawJIhMIttDk3dTpjQMVGPlCBR1cRiGK", 130 | "webViewEntityKey": "Eg5Fb21GbG93V2VidmlldyD4AigB", 131 | "webToNativeMessageMap": [ 132 | { 133 | "key": "sign_in_endpoint", 134 | "value": { 135 | "clickTrackingParams": "CAIQmawJIhMIttDk3dTpjQMVGPlCBR1cRiGK", 136 | "signInEndpoint": { 137 | "hack": true 138 | } 139 | } 140 | }, 141 | { 142 | "key": "update_eom_state_command", 143 | "value": { 144 | "clickTrackingParams": "CAIQmawJIhMIttDk3dTpjQMVGPlCBR1cRiGK", 145 | "updateEomStateCommand": { 146 | "hack": true 147 | } 148 | } 149 | } 150 | ], 151 | "webViewUseCase": "WEB_VIEW_USE_CASE_EOM_CONSENT", 152 | "openInBrowserUrls": [ 153 | "https://policies.google.com", 154 | "https://support.google.com" 155 | ], 156 | "firstPartyHostNameAllowList": [ 157 | "consent.youtube.com" 158 | ] 159 | } 160 | }, 161 | "consentMoment": "CONSENT_MOMENT_INITIAL" 162 | } 163 | } 164 | ], 165 | "playerSettingsMenuData": { 166 | "loggingDirectives": { 167 | "trackingParams": "CAEQtc4GIhMIttDk3dTpjQMVGPlCBR1cRiGK", 168 | "visibility": { 169 | "types": "12" 170 | } 171 | } 172 | }, 173 | "frameworkUpdates": { 174 | "entityBatchUpdate": { 175 | "mutations": [ 176 | { 177 | "entityKey": "Eihjb21wb3NpdGUtbGl2ZS1zdHJlYW0tb2ZmbGluZS1lbnRpdHkta2V5IIUEKAE%3D", 178 | "type": "ENTITY_MUTATION_TYPE_DELETE" 179 | }, 180 | { 181 | "entityKey": "EgUKA2FiYyD2ASgB", 182 | "type": "ENTITY_MUTATION_TYPE_REPLACE", 183 | "payload": { 184 | "offlineabilityEntity": { 185 | "key": "EgUKA2FiYyD2ASgB", 186 | "addToOfflineButtonState": "ADD_TO_OFFLINE_BUTTON_STATE_UNKNOWN" 187 | } 188 | } 189 | } 190 | ], 191 | "timestamp": { 192 | "seconds": "1749654982", 193 | "nanos": 715113710 194 | } 195 | } 196 | } 197 | } 198 | -------------------------------------------------------------------------------- /youtube_transcript_api/test/assets/youtube_unplayable.innertube.json.static: -------------------------------------------------------------------------------- 1 | { 2 | "responseContext": { 3 | "visitorData": "Cgs3NmJkd2VWU1N2USj4uaLCBjIKCgJTQRIEGgAgJzoMCAEgkoPMhYCfp6RoWJfwuc_yqZ2rVQ%3D%3D", 4 | "serviceTrackingParams": [ 5 | { 6 | "service": "GFEEDBACK", 7 | "params": [ 8 | { 9 | "key": "is_viewed_live", 10 | "value": "False" 11 | }, 12 | { 13 | "key": "ipcc", 14 | "value": "0" 15 | }, 16 | { 17 | "key": "is_alc_surface", 18 | "value": "false" 19 | }, 20 | { 21 | "key": "logged_in", 22 | "value": "0" 23 | }, 24 | { 25 | "key": "e", 26 | "value": "24004644,24077241,24078649,24104894,24143331,24166867,24181174,24230811,24232551,24241378,24290153,24397985,24513381,24522874,24556101,24585737,39325413,39328442,39328647,51010008,51020570,51025415,51037346,51037353,51054999,51067700,51068313,51080128,51086511,51095478,51115184,51129105,51132535,51137671,51141472,51152050,51175149,51178320,51178333,51178342,51178355,51179435,51179748,51183909,51217334,51227037,51237842,51242448,51248777,51256074,51272458,51295372,51303432,51306453,51311031,51311036,51313109,51313767,51314496,51324733,51326139,51330753,51341228,51346985,51349914,51353393,51354083,51354114,51354567,51356621,51359177,51361830,51362071,51366123,51366423,51366620,51367487,51372971,51375205,51375719,51386541,51388661,51394776,51394779,51397332,51402689,51404808,51404810,51405647,51407634,51417450,51417473,51417480,51417497,51417508,51417523,51420458,51421832,51427573,51428417,51428624,51429106,51430311,51432294,51432529,51432560,51433499,51435249,51435805,51435845,51435875,51435884,51435893,51435905,51435912,51435918,51436950,51437205,51439763,51439874,51440725,51441100,51441710,51442501,51443707,51444218,51447191,51448332,51452420,51452479,51452495,51453239,51456413,51456421,51456628,51458927,51459425,51461268,51462839,51463532,51463930,51465523,51465804,51465955,51466698,51466900,51467076,51467525,51468320,51469820,51471138,51471685,51471785,51472877,51473079,51473810,51475247,51475592,51475686,51475960,51476590,51477496,51477506,51477582,51477845,51478690,51478931,51479232,51479706,51479906,51480511,51481240,51483631,51483888,51484222,51484412,51484746,51484750,51485249,51485417,51485661,51486018,51486232,51486471,51488577,51489047,51489149,51489197,51490157,51490995,51491436,51492251,51492548,51495585,51495706,51495859,51496968,51497133,51498591,51499562,51500337,51500785,51503024,51504828,51505739,51506681,51507237,51508242,51508738,51508979,51509314,51509613,51509678,51510317,51510817,51511950,51512707,51512805,51512852,51514264" 27 | }, 28 | { 29 | "key": "visitor_data", 30 | "value": "Cgs3NmJkd2VWU1N2USj4uaLCBjIKCgJTQRIEGgAgJzoMCAEgkoPMhYCfp6Ro" 31 | } 32 | ] 33 | }, 34 | { 35 | "service": "CSI", 36 | "params": [ 37 | { 38 | "key": "c", 39 | "value": "ANDROID" 40 | }, 41 | { 42 | "key": "cver", 43 | "value": "20.10.38" 44 | }, 45 | { 46 | "key": "yt_li", 47 | "value": "0" 48 | }, 49 | { 50 | "key": "GetPlayer_rid", 51 | "value": "0xbc62793ea45e2d9b" 52 | } 53 | ] 54 | }, 55 | { 56 | "service": "GUIDED_HELP", 57 | "params": [ 58 | { 59 | "key": "logged_in", 60 | "value": "0" 61 | } 62 | ] 63 | }, 64 | { 65 | "service": "ECATCHER", 66 | "params": [ 67 | { 68 | "key": "client.version", 69 | "value": "20.10" 70 | }, 71 | { 72 | "key": "client.name", 73 | "value": "ANDROID" 74 | } 75 | ] 76 | }, 77 | { 78 | "service": "LISTNR", 79 | "params": [ 80 | { 81 | "key": "e", 82 | "value": "51237842,51484222,51010008,51459425,51490157,51469820,51507237,51428417,51272458,51435249,51484746,51500337,51491436,51485417,51067700,51295372,51461795,51256074,24195012,51314496,51420458,51453239,51510365,51192010,24250570,24232551,51475592,51485249,51132535,51324733,51179435,51465955,51508738,51354083,24286257,51432294,51432560,51428624,51512707,24181174,51509314,51500785,51447191,51080128,51443707,51458927,51025415,51492251,51463930,51486232,51495859,51095478,51372971,51497133,51270362,24024517,24181216,39328442,51313109,51248777,51366620,51495706,51508979,51152050,51129105,51402689,51404810,51141472,51436950,24522874,51366423,51509613,51397332,51510817,51462839,51498591,24220751,51486471,51456628,51189308,51489047,24104894,24166867,51504828,24585737,51388661,51478931,51442501,51466900,51472877,51488577,51137671,51461268,51217334,51353393,24513381,51020570,51086511,51439874,51306453,51456413,51349914,51387900,51430311,39328647,51202133,51242448,51441100,51375205,51495585,51421832,24143331,51354114,51469823,51439763,51477845,51313767,24230811,51429106,51179748,51467525,39325413,51054999,51471138,51461791,51303432,51466698,24290153,51475960,51404808,51483631,24397985,51489197,51484412,51375719,24033252,51452420,51486018,24556101,51175149,51467076,51506681,24254870,51483888" 83 | } 84 | ] 85 | } 86 | ], 87 | "maxAgeSeconds": 0, 88 | "rolloutToken": "CNrZvdmasunZkgEQ3K6o6d_njQMY3a6o6d_njQM%3D" 89 | }, 90 | "playabilityStatus": { 91 | "status": "CUSTOM", 92 | "reason": "Custom Reason", 93 | "errorScreen": { 94 | "playerErrorMessageRenderer": { 95 | "subreason": { 96 | "runs": [ 97 | { 98 | "text": "Sub Reason 1" 99 | }, 100 | { 101 | "text": "Sub Reason 2", 102 | "navigationEndpoint": { 103 | "clickTrackingParams": "CAAQu2kiEwiZrKjp3-eNAxXimsIBHTHfKdc=", 104 | "urlEndpoint": { 105 | "url": "https://support.google.com/youtube/answer/3037019#zippy=%2Ccheck-that-youre-signed-into-youtube" 106 | } 107 | } 108 | } 109 | ] 110 | }, 111 | "reason": { 112 | "runs": [ 113 | { 114 | "text": "Sign in to confirm you’re not a bot" 115 | } 116 | ] 117 | }, 118 | "proceedButton": { 119 | "buttonRenderer": { 120 | "style": "STYLE_PRIMARY", 121 | "size": "SIZE_DEFAULT", 122 | "isDisabled": false, 123 | "text": { 124 | "simpleText": "Sign in" 125 | }, 126 | "navigationEndpoint": { 127 | "clickTrackingParams": "CAIQptEGIhMImayo6d_njQMV4prCAR0x3ynX", 128 | "signInEndpoint": { 129 | "nextEndpoint": { 130 | "clickTrackingParams": "CAIQptEGIhMImayo6d_njQMV4prCAR0x3ynX", 131 | "urlEndpoint": { 132 | "url": "" 133 | } 134 | } 135 | } 136 | }, 137 | "trackingParams": "CAIQptEGIhMImayo6d_njQMV4prCAR0x3ynX" 138 | } 139 | }, 140 | "thumbnail": { 141 | "thumbnails": [ 142 | { 143 | "url": "//s.ytimg.com/yts/img/meh7-vflGevej7.png", 144 | "width": 140, 145 | "height": 100 146 | } 147 | ] 148 | }, 149 | "icon": { 150 | "iconType": "ERROR_OUTLINE" 151 | } 152 | } 153 | }, 154 | "skip": { 155 | "playabilityErrorSkipConfig": { 156 | "skipOnPlayabilityError": false 157 | } 158 | }, 159 | "contextParams": "Q0FFU0FnZ0M=" 160 | }, 161 | "trackingParams": "CAAQu2kiEwiZrKjp3-eNAxXimsIBHTHfKdc=", 162 | "playerSettingsMenuData": { 163 | "loggingDirectives": { 164 | "trackingParams": "CAEQtc4GIhMImayo6d_njQMV4prCAR0x3ynX", 165 | "visibility": { 166 | "types": "12" 167 | } 168 | } 169 | }, 170 | "adBreakHeartbeatParams": "Q0FBJTNE", 171 | "frameworkUpdates": { 172 | "entityBatchUpdate": { 173 | "mutations": [ 174 | { 175 | "entityKey": "Eihjb21wb3NpdGUtbGl2ZS1zdHJlYW0tb2ZmbGluZS1lbnRpdHkta2V5IIUEKAE%3D", 176 | "type": "ENTITY_MUTATION_TYPE_DELETE" 177 | }, 178 | { 179 | "entityKey": "Eg0KC3d1dnd6SkY0eTdvIPYBKAE%3D", 180 | "type": "ENTITY_MUTATION_TYPE_REPLACE", 181 | "payload": { 182 | "offlineabilityEntity": { 183 | "key": "Eg0KC3d1dnd6SkY0eTdvIPYBKAE%3D", 184 | "addToOfflineButtonState": "ADD_TO_OFFLINE_BUTTON_STATE_UNKNOWN" 185 | } 186 | } 187 | } 188 | ], 189 | "timestamp": { 190 | "seconds": "1749589240", 191 | "nanos": 287157676 192 | } 193 | } 194 | } 195 | } 196 | -------------------------------------------------------------------------------- /youtube_transcript_api/test/assets/youtube_request_blocked.innertube.json.static: -------------------------------------------------------------------------------- 1 | { 2 | "responseContext": { 3 | "visitorData": "Cgs3NmJkd2VWU1N2USj4uaLCBjIKCgJTQRIEGgAgJzoMCAEgkoPMhYCfp6RoWJfwuc_yqZ2rVQ%3D%3D", 4 | "serviceTrackingParams": [ 5 | { 6 | "service": "GFEEDBACK", 7 | "params": [ 8 | { 9 | "key": "is_viewed_live", 10 | "value": "False" 11 | }, 12 | { 13 | "key": "ipcc", 14 | "value": "0" 15 | }, 16 | { 17 | "key": "is_alc_surface", 18 | "value": "false" 19 | }, 20 | { 21 | "key": "logged_in", 22 | "value": "0" 23 | }, 24 | { 25 | "key": "e", 26 | "value": "24004644,24077241,24078649,24104894,24143331,24166867,24181174,24230811,24232551,24241378,24290153,24397985,24513381,24522874,24556101,24585737,39325413,39328442,39328647,51010008,51020570,51025415,51037346,51037353,51054999,51067700,51068313,51080128,51086511,51095478,51115184,51129105,51132535,51137671,51141472,51152050,51175149,51178320,51178333,51178342,51178355,51179435,51179748,51183909,51217334,51227037,51237842,51242448,51248777,51256074,51272458,51295372,51303432,51306453,51311031,51311036,51313109,51313767,51314496,51324733,51326139,51330753,51341228,51346985,51349914,51353393,51354083,51354114,51354567,51356621,51359177,51361830,51362071,51366123,51366423,51366620,51367487,51372971,51375205,51375719,51386541,51388661,51394776,51394779,51397332,51402689,51404808,51404810,51405647,51407634,51417450,51417473,51417480,51417497,51417508,51417523,51420458,51421832,51427573,51428417,51428624,51429106,51430311,51432294,51432529,51432560,51433499,51435249,51435805,51435845,51435875,51435884,51435893,51435905,51435912,51435918,51436950,51437205,51439763,51439874,51440725,51441100,51441710,51442501,51443707,51444218,51447191,51448332,51452420,51452479,51452495,51453239,51456413,51456421,51456628,51458927,51459425,51461268,51462839,51463532,51463930,51465523,51465804,51465955,51466698,51466900,51467076,51467525,51468320,51469820,51471138,51471685,51471785,51472877,51473079,51473810,51475247,51475592,51475686,51475960,51476590,51477496,51477506,51477582,51477845,51478690,51478931,51479232,51479706,51479906,51480511,51481240,51483631,51483888,51484222,51484412,51484746,51484750,51485249,51485417,51485661,51486018,51486232,51486471,51488577,51489047,51489149,51489197,51490157,51490995,51491436,51492251,51492548,51495585,51495706,51495859,51496968,51497133,51498591,51499562,51500337,51500785,51503024,51504828,51505739,51506681,51507237,51508242,51508738,51508979,51509314,51509613,51509678,51510317,51510817,51511950,51512707,51512805,51512852,51514264" 27 | }, 28 | { 29 | "key": "visitor_data", 30 | "value": "Cgs3NmJkd2VWU1N2USj4uaLCBjIKCgJTQRIEGgAgJzoMCAEgkoPMhYCfp6Ro" 31 | } 32 | ] 33 | }, 34 | { 35 | "service": "CSI", 36 | "params": [ 37 | { 38 | "key": "c", 39 | "value": "ANDROID" 40 | }, 41 | { 42 | "key": "cver", 43 | "value": "20.10.38" 44 | }, 45 | { 46 | "key": "yt_li", 47 | "value": "0" 48 | }, 49 | { 50 | "key": "GetPlayer_rid", 51 | "value": "0xbc62793ea45e2d9b" 52 | } 53 | ] 54 | }, 55 | { 56 | "service": "GUIDED_HELP", 57 | "params": [ 58 | { 59 | "key": "logged_in", 60 | "value": "0" 61 | } 62 | ] 63 | }, 64 | { 65 | "service": "ECATCHER", 66 | "params": [ 67 | { 68 | "key": "client.version", 69 | "value": "20.10" 70 | }, 71 | { 72 | "key": "client.name", 73 | "value": "ANDROID" 74 | } 75 | ] 76 | }, 77 | { 78 | "service": "LISTNR", 79 | "params": [ 80 | { 81 | "key": "e", 82 | "value": "51237842,51484222,51010008,51459425,51490157,51469820,51507237,51428417,51272458,51435249,51484746,51500337,51491436,51485417,51067700,51295372,51461795,51256074,24195012,51314496,51420458,51453239,51510365,51192010,24250570,24232551,51475592,51485249,51132535,51324733,51179435,51465955,51508738,51354083,24286257,51432294,51432560,51428624,51512707,24181174,51509314,51500785,51447191,51080128,51443707,51458927,51025415,51492251,51463930,51486232,51495859,51095478,51372971,51497133,51270362,24024517,24181216,39328442,51313109,51248777,51366620,51495706,51508979,51152050,51129105,51402689,51404810,51141472,51436950,24522874,51366423,51509613,51397332,51510817,51462839,51498591,24220751,51486471,51456628,51189308,51489047,24104894,24166867,51504828,24585737,51388661,51478931,51442501,51466900,51472877,51488577,51137671,51461268,51217334,51353393,24513381,51020570,51086511,51439874,51306453,51456413,51349914,51387900,51430311,39328647,51202133,51242448,51441100,51375205,51495585,51421832,24143331,51354114,51469823,51439763,51477845,51313767,24230811,51429106,51179748,51467525,39325413,51054999,51471138,51461791,51303432,51466698,24290153,51475960,51404808,51483631,24397985,51489197,51484412,51375719,24033252,51452420,51486018,24556101,51175149,51467076,51506681,24254870,51483888" 83 | } 84 | ] 85 | } 86 | ], 87 | "maxAgeSeconds": 0, 88 | "rolloutToken": "CNrZvdmasunZkgEQ3K6o6d_njQMY3a6o6d_njQM%3D" 89 | }, 90 | "playabilityStatus": { 91 | "status": "LOGIN_REQUIRED", 92 | "reason": "Sign in to confirm you’re not a bot", 93 | "errorScreen": { 94 | "playerErrorMessageRenderer": { 95 | "subreason": { 96 | "runs": [ 97 | { 98 | "text": "This helps protect our community. " 99 | }, 100 | { 101 | "text": "Learn more", 102 | "navigationEndpoint": { 103 | "clickTrackingParams": "CAAQu2kiEwiZrKjp3-eNAxXimsIBHTHfKdc=", 104 | "urlEndpoint": { 105 | "url": "https://support.google.com/youtube/answer/3037019#zippy=%2Ccheck-that-youre-signed-into-youtube" 106 | } 107 | } 108 | } 109 | ] 110 | }, 111 | "reason": { 112 | "runs": [ 113 | { 114 | "text": "Sign in to confirm you’re not a bot" 115 | } 116 | ] 117 | }, 118 | "proceedButton": { 119 | "buttonRenderer": { 120 | "style": "STYLE_PRIMARY", 121 | "size": "SIZE_DEFAULT", 122 | "isDisabled": false, 123 | "text": { 124 | "simpleText": "Sign in" 125 | }, 126 | "navigationEndpoint": { 127 | "clickTrackingParams": "CAIQptEGIhMImayo6d_njQMV4prCAR0x3ynX", 128 | "signInEndpoint": { 129 | "nextEndpoint": { 130 | "clickTrackingParams": "CAIQptEGIhMImayo6d_njQMV4prCAR0x3ynX", 131 | "urlEndpoint": { 132 | "url": "" 133 | } 134 | } 135 | } 136 | }, 137 | "trackingParams": "CAIQptEGIhMImayo6d_njQMV4prCAR0x3ynX" 138 | } 139 | }, 140 | "thumbnail": { 141 | "thumbnails": [ 142 | { 143 | "url": "//s.ytimg.com/yts/img/meh7-vflGevej7.png", 144 | "width": 140, 145 | "height": 100 146 | } 147 | ] 148 | }, 149 | "icon": { 150 | "iconType": "ERROR_OUTLINE" 151 | } 152 | } 153 | }, 154 | "skip": { 155 | "playabilityErrorSkipConfig": { 156 | "skipOnPlayabilityError": false 157 | } 158 | }, 159 | "contextParams": "Q0FFU0FnZ0M=" 160 | }, 161 | "trackingParams": "CAAQu2kiEwiZrKjp3-eNAxXimsIBHTHfKdc=", 162 | "playerSettingsMenuData": { 163 | "loggingDirectives": { 164 | "trackingParams": "CAEQtc4GIhMImayo6d_njQMV4prCAR0x3ynX", 165 | "visibility": { 166 | "types": "12" 167 | } 168 | } 169 | }, 170 | "adBreakHeartbeatParams": "Q0FBJTNE", 171 | "frameworkUpdates": { 172 | "entityBatchUpdate": { 173 | "mutations": [ 174 | { 175 | "entityKey": "Eihjb21wb3NpdGUtbGl2ZS1zdHJlYW0tb2ZmbGluZS1lbnRpdHkta2V5IIUEKAE%3D", 176 | "type": "ENTITY_MUTATION_TYPE_DELETE" 177 | }, 178 | { 179 | "entityKey": "Eg0KC3d1dnd6SkY0eTdvIPYBKAE%3D", 180 | "type": "ENTITY_MUTATION_TYPE_REPLACE", 181 | "payload": { 182 | "offlineabilityEntity": { 183 | "key": "Eg0KC3d1dnd6SkY0eTdvIPYBKAE%3D", 184 | "addToOfflineButtonState": "ADD_TO_OFFLINE_BUTTON_STATE_UNKNOWN" 185 | } 186 | } 187 | } 188 | ], 189 | "timestamp": { 190 | "seconds": "1749589240", 191 | "nanos": 287157676 192 | } 193 | } 194 | } 195 | } 196 | -------------------------------------------------------------------------------- /youtube_transcript_api/test/assets/youtube_consent_page_invalid.html.static: -------------------------------------------------------------------------------- 1 | Bevor Sie zu YouTube weitergehen

ein Google-Unternehmen

Bevor Sie zu YouTube weitergehen

Google verwendet Cookies und Daten, um Dienste und Werbung zur Verfügung zu stellen, zu verwalten und zu verbessern. Wenn Sie zustimmen, nutzen wir Cookies für diese Zwecke und dazu, Inhalte und Werbung für Sie zu personalisieren, damit Sie z. B. relevantere Google-Suchergebnisse und relevantere Werbung bei YouTube erhalten. Die Personalisierung erfolgt auf Grundlage Ihrer Aktivitäten, beispielsweise Ihrer Google-Suchanfragen und der Videos, die Sie sich bei YouTube ansehen. Wir verwenden diese Daten auch für Analysen und Messungen. Klicken Sie auf „Anpassen“, um sich weitere Optionen anzusehen, oder besuchen Sie g.co/privacytools. Darüber hinaus haben Sie die Möglichkeit, Ihre Browsereinstellungen so zu konfigurieren, dass einige oder alle Cookies blockiert werden.

Anpassen

-------------------------------------------------------------------------------- /youtube_transcript_api/test/assets/youtube_consent_page.html.static: -------------------------------------------------------------------------------- 1 | Bevor Sie zu YouTube weitergehen

ein Google-Unternehmen

Bevor Sie zu YouTube weitergehen

Anpassen

-------------------------------------------------------------------------------- /youtube_transcript_api/_errors.py: -------------------------------------------------------------------------------- 1 | from pathlib import Path 2 | from typing import Iterable, Optional, List 3 | 4 | from requests import HTTPError 5 | 6 | from ._settings import WATCH_URL 7 | from .proxies import ProxyConfig, GenericProxyConfig, WebshareProxyConfig 8 | 9 | 10 | class YouTubeTranscriptApiException(Exception): 11 | pass 12 | 13 | 14 | class CookieError(YouTubeTranscriptApiException): 15 | pass 16 | 17 | 18 | class CookiePathInvalid(CookieError): 19 | def __init__( 20 | self, cookie_path: Path 21 | ): # pragma: no cover until cookie authentication is re-implemented 22 | super().__init__(f"Can't load the provided cookie file: {cookie_path}") 23 | 24 | 25 | class CookieInvalid(CookieError): 26 | def __init__( 27 | self, cookie_path: Path 28 | ): # pragma: no cover until cookie authentication is re-implemented 29 | super().__init__( 30 | f"The cookies provided are not valid (may have expired): {cookie_path}" 31 | ) 32 | 33 | 34 | class CouldNotRetrieveTranscript(YouTubeTranscriptApiException): 35 | """ 36 | Raised if a transcript could not be retrieved. 37 | """ 38 | 39 | ERROR_MESSAGE = "\nCould not retrieve a transcript for the video {video_url}!" 40 | CAUSE_MESSAGE_INTRO = " This is most likely caused by:\n\n{cause}" 41 | CAUSE_MESSAGE = "" 42 | GITHUB_REFERRAL = ( 43 | "\n\nIf you are sure that the described cause is not responsible for this error " 44 | "and that a transcript should be retrievable, please create an issue at " 45 | "https://github.com/jdepoix/youtube-transcript-api/issues. " 46 | "Please add which version of youtube_transcript_api you are using " 47 | "and provide the information needed to replicate the error. " 48 | "Also make sure that there are no open issues which already describe your problem!" 49 | ) 50 | 51 | def __init__(self, video_id: str): 52 | self.video_id = video_id 53 | super().__init__() 54 | 55 | def _build_error_message(self) -> str: 56 | error_message = self.ERROR_MESSAGE.format( 57 | video_url=WATCH_URL.format(video_id=self.video_id) 58 | ) 59 | 60 | cause = self.cause 61 | if cause: 62 | error_message += ( 63 | self.CAUSE_MESSAGE_INTRO.format(cause=cause) + self.GITHUB_REFERRAL 64 | ) 65 | 66 | return error_message 67 | 68 | @property 69 | def cause(self) -> str: 70 | return self.CAUSE_MESSAGE 71 | 72 | def __str__(self) -> str: 73 | return self._build_error_message() 74 | 75 | 76 | class YouTubeDataUnparsable(CouldNotRetrieveTranscript): 77 | CAUSE_MESSAGE = ( 78 | "The data required to fetch the transcript is not parsable. This should " 79 | "not happen, please open an issue (make sure to include the video ID)!" 80 | ) 81 | 82 | 83 | class YouTubeRequestFailed(CouldNotRetrieveTranscript): 84 | CAUSE_MESSAGE = "Request to YouTube failed: {reason}" 85 | 86 | def __init__(self, video_id: str, http_error: HTTPError): 87 | self.reason = str(http_error) 88 | super().__init__(video_id) 89 | 90 | @property 91 | def cause(self) -> str: 92 | return self.CAUSE_MESSAGE.format( 93 | reason=self.reason, 94 | ) 95 | 96 | 97 | class VideoUnplayable(CouldNotRetrieveTranscript): 98 | CAUSE_MESSAGE = "The video is unplayable for the following reason: {reason}" 99 | SUBREASON_MESSAGE = "\n\nAdditional Details:\n{sub_reasons}" 100 | 101 | def __init__(self, video_id: str, reason: Optional[str], sub_reasons: List[str]): 102 | self.reason = reason 103 | self.sub_reasons = sub_reasons 104 | super().__init__(video_id) 105 | 106 | @property 107 | def cause(self): 108 | reason = "No reason specified!" if self.reason is None else self.reason 109 | if self.sub_reasons: 110 | sub_reasons = "\n".join( 111 | f" - {sub_reason}" for sub_reason in self.sub_reasons 112 | ) 113 | reason = f"{reason}{self.SUBREASON_MESSAGE.format(sub_reasons=sub_reasons)}" 114 | return self.CAUSE_MESSAGE.format( 115 | reason=reason, 116 | ) 117 | 118 | 119 | class VideoUnavailable(CouldNotRetrieveTranscript): 120 | CAUSE_MESSAGE = "The video is no longer available" 121 | 122 | 123 | class InvalidVideoId(CouldNotRetrieveTranscript): 124 | CAUSE_MESSAGE = ( 125 | "You provided an invalid video id. Make sure you are using the video id and NOT the url!\n\n" 126 | 'Do NOT run: `YouTubeTranscriptApi().fetch("https://www.youtube.com/watch?v=1234")`\n' 127 | 'Instead run: `YouTubeTranscriptApi().fetch("1234")`' 128 | ) 129 | 130 | 131 | class RequestBlocked(CouldNotRetrieveTranscript): 132 | BASE_CAUSE_MESSAGE = ( 133 | "YouTube is blocking requests from your IP. This usually is due to one of the " 134 | "following reasons:\n" 135 | "- You have done too many requests and your IP has been blocked by YouTube\n" 136 | "- You are doing requests from an IP belonging to a cloud provider (like AWS, " 137 | "Google Cloud Platform, Azure, etc.). Unfortunately, most IPs from cloud " 138 | "providers are blocked by YouTube.\n\n" 139 | ) 140 | CAUSE_MESSAGE = ( 141 | f"{BASE_CAUSE_MESSAGE}" 142 | "There are two things you can do to work around this:\n" 143 | '1. Use proxies to hide your IP address, as explained in the "Working around ' 144 | 'IP bans" section of the README ' 145 | "(https://github.com/jdepoix/youtube-transcript-api" 146 | "?tab=readme-ov-file" 147 | "#working-around-ip-bans-requestblocked-or-ipblocked-exception).\n" 148 | "2. (NOT RECOMMENDED) If you authenticate your requests using cookies, you " 149 | "will be able to continue doing requests for a while. However, YouTube will " 150 | "eventually permanently ban the account that you have used to authenticate " 151 | "with! So only do this if you don't mind your account being banned!" 152 | ) 153 | WITH_GENERIC_PROXY_CAUSE_MESSAGE = ( 154 | "YouTube is blocking your requests, despite you using proxies. Keep in mind " 155 | "that a proxy is just a way to hide your real IP behind the IP of that proxy, " 156 | "but there is no guarantee that the IP of that proxy won't be blocked as " 157 | "well.\n\n" 158 | "The only truly reliable way to prevent IP blocks is rotating through a large " 159 | "pool of residential IPs, by using a provider like Webshare " 160 | "(https://www.webshare.io/?referral_code=w0xno53eb50g), which provides you " 161 | "with a pool of >30M residential IPs (make sure to purchase " 162 | '"Residential" proxies, NOT "Proxy Server" or "Static Residential"!).\n\n' 163 | "You will find more information on how to easily integrate Webshare here: " 164 | "https://github.com/jdepoix/youtube-transcript-api" 165 | "?tab=readme-ov-file#using-webshare" 166 | ) 167 | WITH_WEBSHARE_PROXY_CAUSE_MESSAGE = ( 168 | "YouTube is blocking your requests, despite you using Webshare proxies. " 169 | 'Please make sure that you have purchased "Residential" proxies and ' 170 | 'NOT "Proxy Server" or "Static Residential", as those won\'t work as ' 171 | 'reliably! The free tier also uses "Proxy Server" and will NOT work!\n\n' 172 | 'The only reliable option is using "Residential" proxies (not "Static ' 173 | 'Residential"), as this allows you to rotate through a pool of over 30M IPs, ' 174 | "which means you will always find an IP that hasn't been blocked by YouTube " 175 | "yet!\n\n" 176 | "You can support the development of this open source project by making your " 177 | "Webshare purchases through this affiliate link: " 178 | "https://www.webshare.io/?referral_code=w0xno53eb50g \n\n" 179 | "Thank you for your support! <3" 180 | ) 181 | 182 | def __init__(self, video_id: str): 183 | self._proxy_config = None 184 | super().__init__(video_id) 185 | 186 | def with_proxy_config( 187 | self, proxy_config: Optional[ProxyConfig] 188 | ) -> "RequestBlocked": 189 | self._proxy_config = proxy_config 190 | return self 191 | 192 | @property 193 | def cause(self) -> str: 194 | if isinstance(self._proxy_config, WebshareProxyConfig): 195 | return self.WITH_WEBSHARE_PROXY_CAUSE_MESSAGE 196 | if isinstance(self._proxy_config, GenericProxyConfig): 197 | return self.WITH_GENERIC_PROXY_CAUSE_MESSAGE 198 | return super().cause 199 | 200 | 201 | class IpBlocked(RequestBlocked): 202 | CAUSE_MESSAGE = ( 203 | f"{RequestBlocked.BASE_CAUSE_MESSAGE}" 204 | 'Ways to work around this are explained in the "Working around IP ' 205 | 'bans" section of the README (https://github.com/jdepoix/youtube-transcript-api' 206 | "?tab=readme-ov-file" 207 | "#working-around-ip-bans-requestblocked-or-ipblocked-exception).\n" 208 | ) 209 | 210 | 211 | class TranscriptsDisabled(CouldNotRetrieveTranscript): 212 | CAUSE_MESSAGE = "Subtitles are disabled for this video" 213 | 214 | 215 | class AgeRestricted(CouldNotRetrieveTranscript): 216 | # CAUSE_MESSAGE = ( 217 | # "This video is age-restricted. Therefore, you will have to authenticate to be " 218 | # "able to retrieve transcripts for it. You will have to provide a cookie to " 219 | # 'authenticate yourself, as explained in the "Cookie Authentication" section of ' 220 | # "the README (https://github.com/jdepoix/youtube-transcript-api" 221 | # "?tab=readme-ov-file#cookie-authentication)" 222 | # ) 223 | CAUSE_MESSAGE = ( 224 | "This video is age-restricted. Therefore, you are unable to retrieve " 225 | "transcripts for it without authenticating yourself.\n\n" 226 | "Unfortunately, Cookie Authentication is temporarily unsupported in " 227 | "youtube-transcript-api, as recent changes in YouTube's API broke the previous " 228 | "implementation. I will do my best to re-implement it as soon as possible." 229 | ) 230 | 231 | 232 | class NotTranslatable(CouldNotRetrieveTranscript): 233 | CAUSE_MESSAGE = "The requested language is not translatable" 234 | 235 | 236 | class TranslationLanguageNotAvailable(CouldNotRetrieveTranscript): 237 | CAUSE_MESSAGE = "The requested translation language is not available" 238 | 239 | 240 | class FailedToCreateConsentCookie(CouldNotRetrieveTranscript): 241 | CAUSE_MESSAGE = "Failed to automatically give consent to saving cookies" 242 | 243 | 244 | class NoTranscriptFound(CouldNotRetrieveTranscript): 245 | CAUSE_MESSAGE = ( 246 | "No transcripts were found for any of the requested language codes: {requested_language_codes}\n\n" 247 | "{transcript_data}" 248 | ) 249 | 250 | def __init__( 251 | self, 252 | video_id: str, 253 | requested_language_codes: Iterable[str], 254 | transcript_data: "TranscriptList", # noqa: F821 255 | ): 256 | self._requested_language_codes = requested_language_codes 257 | self._transcript_data = transcript_data 258 | super().__init__(video_id) 259 | 260 | @property 261 | def cause(self) -> str: 262 | return self.CAUSE_MESSAGE.format( 263 | requested_language_codes=self._requested_language_codes, 264 | transcript_data=str(self._transcript_data), 265 | ) 266 | 267 | 268 | class PoTokenRequired(CouldNotRetrieveTranscript): 269 | CAUSE_MESSAGE = ( 270 | "The requested video cannot be retrieved without a PO Token. If this happens, " 271 | "please open a GitHub issue!" 272 | ) 273 | -------------------------------------------------------------------------------- /youtube_transcript_api/test/assets/youtube_age_restricted.innertube.json.static: -------------------------------------------------------------------------------- 1 | { 2 | "responseContext": { 3 | "visitorData": "CgtsM0ROUWd2dG5HayjHu6bCBjIKCgJERRIEEgAgYjoMCAEg1-KZt_C456RoWPrW4uqYgNyukgE%3D", 4 | "serviceTrackingParams": [ 5 | { 6 | "service": "GFEEDBACK", 7 | "params": [ 8 | { 9 | "key": "is_viewed_live", 10 | "value": "False" 11 | }, 12 | { 13 | "key": "ipcc", 14 | "value": "0" 15 | }, 16 | { 17 | "key": "is_alc_surface", 18 | "value": "false" 19 | }, 20 | { 21 | "key": "logged_in", 22 | "value": "0" 23 | }, 24 | { 25 | "key": "e", 26 | "value": "9405982,23888717,24004644,24077241,24078649,24104894,24108447,24132305,24143331,24166867,24181174,24230811,24232551,24241378,24290153,24397985,24522874,24556101,24585737,39325413,39328442,39328646,51010008,51020570,51025415,51028056,51054999,51067700,51068313,51080128,51086511,51095478,51115184,51129105,51132535,51137671,51141472,51152050,51175149,51179435,51179748,51183910,51217334,51220160,51227037,51237842,51242448,51248777,51256074,51272458,51295372,51303432,51306453,51313109,51313767,51314496,51324733,51326139,51349914,51350816,51353393,51354083,51354114,51356621,51366123,51366423,51366620,51372971,51374199,51375205,51375719,51386540,51388660,51397095,51397332,51402689,51403603,51404808,51404810,51412775,51420457,51421832,51428417,51428624,51429106,51430311,51432294,51432529,51432560,51433499,51436953,51437206,51439763,51439874,51441100,51442501,51443707,51447191,51452420,51453239,51455371,51456413,51456628,51458927,51459425,51461268,51462839,51463930,51466900,51467076,51467524,51469820,51471138,51471685,51473771,51475246,51475592,51475961,51477846,51478931,51479906,51481239,51483631,51483888,51484221,51484412,51484746,51485249,51485417,51485662,51486018,51486232,51486471,51487681,51488801,51489047,51489568,51490158,51490842,51490994,51491436,51492252,51494026,51495585,51495706,51495859,51496968,51497133,51498591,51498917,51503024,51503027,51504828,51506682,51507237,51508738,51508979,51509214,51509314,51509614,51510189,51510638,51512708,51513113,51513245,51513543" 27 | }, 28 | { 29 | "key": "visitor_data", 30 | "value": "CgtsM0ROUWd2dG5HayjHu6bCBjIKCgJERRIEEgAgYjoMCAEg1-KZt_C456Ro" 31 | } 32 | ] 33 | }, 34 | { 35 | "service": "CSI", 36 | "params": [ 37 | { 38 | "key": "c", 39 | "value": "ANDROID" 40 | }, 41 | { 42 | "key": "cver", 43 | "value": "20.10.38" 44 | }, 45 | { 46 | "key": "yt_li", 47 | "value": "0" 48 | }, 49 | { 50 | "key": "GetPlayer_rid", 51 | "value": "0x8e32eb06afe4ec8c" 52 | } 53 | ] 54 | }, 55 | { 56 | "service": "GUIDED_HELP", 57 | "params": [ 58 | { 59 | "key": "logged_in", 60 | "value": "0" 61 | } 62 | ] 63 | }, 64 | { 65 | "service": "ECATCHER", 66 | "params": [ 67 | { 68 | "key": "client.version", 69 | "value": "20.10" 70 | }, 71 | { 72 | "key": "client.name", 73 | "value": "ANDROID" 74 | } 75 | ] 76 | }, 77 | { 78 | "service": "LISTNR", 79 | "params": [ 80 | { 81 | "key": "e", 82 | "value": "51513113,51456413,51020570,51504828,51466900,51217334,24232551,24250570,51488801,51397332,51442501,51477846,51478931,51498591,51366423,51456628,51486471,24195012,51486018,51175149,51237842,51469820,51483888,51010008,51459425,24522874,51484412,51498917,51375719,51483631,51404808,51452420,51467076,24181216,51179748,51054999,51354114,51461791,51489568,51490158,51469823,51471138,51473771,24024517,51429106,51303432,51388660,39328646,51313767,51494026,51354083,51508738,51432560,39325413,24181174,51503027,51432294,51436953,51132535,51192010,51453239,51484221,51485249,51509214,51295372,24143331,51324733,51179435,51485417,51461795,51475592,51512708,51314496,51067700,51256074,51491436,51510638,51507237,51492252,51028056,51467524,51428417,51272458,24585737,51484746,51141472,51129105,51404810,24104894,51402689,51495706,51462839,24556101,24166867,24220751,51420457,51313109,51508979,24254870,51248777,51366620,51152050,24290153,51095478,51486232,51495859,51463930,24397985,51372971,51497133,51270362,51025415,24033252,51509614,51509314,51387900,39328442,51080128,51428624,51443707,51447191,51458927,51506682,51439763,51475961,51202133,51242448,24230811,51349914,51421832,51439874,51375205,51495585,51353393,51137671,51441100,51461268,51489047,51189308,51430311,51306453,24286257,51086511" 83 | } 84 | ] 85 | } 86 | ], 87 | "maxAgeSeconds": 0, 88 | "rolloutToken": "CNeAy8fdkLO_IBDgx4Te1OmNAxjgx4Te1OmNAw%3D%3D" 89 | }, 90 | "playabilityStatus": { 91 | "status": "LOGIN_REQUIRED", 92 | "reason": "This video may be inappropriate for some users.", 93 | "errorScreen": { 94 | "confirmDialogRenderer": { 95 | "title": { 96 | "runs": [ 97 | { 98 | "text": "You must sign in to view this video" 99 | } 100 | ] 101 | }, 102 | "trackingParams": "CAMQxjgiEwj-xITe1OmNAxXK60IFHf00F5c=", 103 | "dialogMessages": [ 104 | { 105 | "runs": [ 106 | { 107 | "text": "This video may be inappropriate for some users." 108 | } 109 | ] 110 | } 111 | ], 112 | "confirmButton": { 113 | "buttonRenderer": { 114 | "style": "STYLE_BLUE_TEXT", 115 | "size": "SIZE_DEFAULT", 116 | "isDisabled": false, 117 | "text": { 118 | "runs": [ 119 | { 120 | "text": "Sign in" 121 | } 122 | ] 123 | }, 124 | "trackingParams": "CAUQ8FsiEwj-xITe1OmNAxXK60IFHf00F5c=" 125 | } 126 | }, 127 | "cancelButton": { 128 | "buttonRenderer": { 129 | "style": "STYLE_BLUE_TEXT", 130 | "size": "SIZE_DEFAULT", 131 | "isDisabled": false, 132 | "text": { 133 | "runs": [ 134 | { 135 | "text": "Go back" 136 | } 137 | ] 138 | }, 139 | "trackingParams": "CAQQ8FsiEwj-xITe1OmNAxXK60IFHf00F5c=" 140 | } 141 | } 142 | } 143 | }, 144 | "desktopLegacyAgeGateReason": 1, 145 | "reasonTitle": "You must sign in to view this video", 146 | "contextParams": "Q0FFU0FnZ0I=" 147 | }, 148 | "videoDetails": { 149 | "videoId": "Njp5uhTorCo", 150 | "title": "Laura Branigan - Self Control (Moreno J Remix) Age-restricted", 151 | "lengthSeconds": "452", 152 | "keywords": [ 153 | "Moreno J Remix", 154 | "Moreno J", 155 | "Remix", 156 | "Laura Branigan", 157 | "Self Control", 158 | "80s", 159 | "80s Music", 160 | "EDM", 161 | "Pop", 162 | "Poprock", 163 | "Italo Disco" 164 | ], 165 | "channelId": "UCJqyF-E8VW75fQz61ftchzg", 166 | "isOwnerViewing": false, 167 | "shortDescription": "Remixer: Moreno J\nVocal edit: Moreno J\nVocals: Laura Branigan\nSound mixing: Moreno J\nMastering: Moreno J\nVideo Edit: Moreno J\nVideo Scenes taken from movies:\nBe Cool (2005)\nBurlesque (2010) \nCinderella (2015) \nCoyote Ugly (2000)\nLove Actually (2003) \nShowgirls (1995)\nTropic Thunder (2008)\n\nInfo about the original artist (group members)\nhttps://en.wikipedia.org/wiki/Laura_Branigan\n\nThank You for Watching!\nRemember to Like, Share, and Subscribe to keep up to date with new remixes! \nLove, Moreno Remixes.\n\nFree download wav file.\nTo download the file go to the top right corner next to the login button of googledrive window after you clicket on the link. \nGoogledrive download link:\nhttps://drive.google.com/file/d/1pvcpcFjlGEOmuPR_NXSCNWRW8OQS6_1C/view?usp=sharing", 168 | "isCrawlable": true, 169 | "thumbnail": { 170 | "thumbnails": [ 171 | { 172 | "url": "https://i.ytimg.com/vi/Njp5uhTorCo/default.jpg?sqp=-oaymwEkCHgQWvKriqkDGvABAfgB_gmAAtAFigIMCAAQARhgIGAoYDAP&rs=AOn4CLBYdFfYcFUurCUG8z6f1N3UI3SWQQ", 173 | "width": 120, 174 | "height": 90 175 | }, 176 | { 177 | "url": "https://i.ytimg.com/vi/Njp5uhTorCo/mqdefault.jpg?sqp=-oaymwEmCMACELQB8quKqQMa8AEB-AH-CYAC0AWKAgwIABABGGAgYChgMA8=&rs=AOn4CLCS9KIwuVd7VVqbYzgfychekGW95Q", 178 | "width": 320, 179 | "height": 180 180 | }, 181 | { 182 | "url": "https://i.ytimg.com/vi/Njp5uhTorCo/hqdefault.jpg?sqp=-oaymwEmCOADEOgC8quKqQMa8AEB-AH-CYAC0AWKAgwIABABGGAgYChgMA8=&rs=AOn4CLCV_dNcDpHUYFXSmg6vHMevIGyadA", 183 | "width": 480, 184 | "height": 360 185 | }, 186 | { 187 | "url": "https://i.ytimg.com/vi/Njp5uhTorCo/sddefault.jpg?sqp=-oaymwEmCIAFEOAD8quKqQMa8AEB-AH-CYAC0AWKAgwIABABGGAgYChgMA8=&rs=AOn4CLB7IPqDokitsf0O1xRE9SZcN5qUVw", 188 | "width": 640, 189 | "height": 480 190 | } 191 | ] 192 | }, 193 | "allowRatings": true, 194 | "viewCount": "179579", 195 | "author": "Moreno J Remixes", 196 | "isPrivate": false, 197 | "isUnpluggedCorpus": false, 198 | "isLiveContent": false 199 | }, 200 | "trackingParams": "CAAQu2kiEwj-xITe1OmNAxXK60IFHf00F5c=", 201 | "onResponseReceivedActions": [ 202 | { 203 | "clickTrackingParams": "CAAQu2kiEwj-xITe1OmNAxXK60IFHf00F5c=", 204 | "startEomFlowCommand": { 205 | "eomFlowRenderer": { 206 | "webViewRenderer": { 207 | "url": { 208 | "privateDoNotAccessOrElseTrustedResourceUrlWrappedValue": "https://consent.youtube.com/yt-app-main?gl=DE&m=1&pc=yt&cm=2&hl=en&src=1&app=1&vd=CgtsM0ROUWd2dG5HayjHu6bCBjIKCgJERRIEEgAgYjoMCAEg1-KZt_C456Ro&utm_source=YT_ANDROID&dt=0&av=20.10.38" 209 | }, 210 | "onFailureCommand": { 211 | "clickTrackingParams": "CAIQmawJIhMI_sSE3tTpjQMVyutCBR39NBeX", 212 | "updateEomStateCommand": { 213 | "mobileEomFlowState": { 214 | "updatedVisitorData": "CgtsM0ROUWd2dG5HayjHu6bCBjIKCgJERRIEEgAgYjoZCAEaCwjHu6bCBhDV4cZ5INfimbfwuOekaA%3D%3D", 215 | "isError": true 216 | } 217 | } 218 | }, 219 | "trackingParams": "CAIQmawJIhMI_sSE3tTpjQMVyutCBR39NBeX", 220 | "webViewEntityKey": "Eg5Fb21GbG93V2VidmlldyD4AigB", 221 | "webToNativeMessageMap": [ 222 | { 223 | "key": "update_eom_state_command", 224 | "value": { 225 | "clickTrackingParams": "CAIQmawJIhMI_sSE3tTpjQMVyutCBR39NBeX", 226 | "updateEomStateCommand": { 227 | "hack": true 228 | } 229 | } 230 | }, 231 | { 232 | "key": "sign_in_endpoint", 233 | "value": { 234 | "clickTrackingParams": "CAIQmawJIhMI_sSE3tTpjQMVyutCBR39NBeX", 235 | "signInEndpoint": { 236 | "hack": true 237 | } 238 | } 239 | } 240 | ], 241 | "webViewUseCase": "WEB_VIEW_USE_CASE_EOM_CONSENT", 242 | "openInBrowserUrls": [ 243 | "https://policies.google.com", 244 | "https://support.google.com" 245 | ], 246 | "firstPartyHostNameAllowList": [ 247 | "consent.youtube.com" 248 | ] 249 | } 250 | }, 251 | "consentMoment": "CONSENT_MOMENT_INITIAL" 252 | } 253 | } 254 | ], 255 | "playerSettingsMenuData": { 256 | "loggingDirectives": { 257 | "trackingParams": "CAEQtc4GIhMI_sSE3tTpjQMVyutCBR39NBeX", 258 | "visibility": { 259 | "types": "12" 260 | } 261 | } 262 | }, 263 | "adBreakHeartbeatParams": "Q0FBJTNE", 264 | "frameworkUpdates": { 265 | "entityBatchUpdate": { 266 | "mutations": [ 267 | { 268 | "entityKey": "Eihjb21wb3NpdGUtbGl2ZS1zdHJlYW0tb2ZmbGluZS1lbnRpdHkta2V5IIUEKAE%3D", 269 | "type": "ENTITY_MUTATION_TYPE_DELETE" 270 | }, 271 | { 272 | "entityKey": "Eg0KC05qcDV1aFRvckNvIPYBKAE%3D", 273 | "type": "ENTITY_MUTATION_TYPE_REPLACE", 274 | "payload": { 275 | "offlineabilityEntity": { 276 | "key": "Eg0KC05qcDV1aFRvckNvIPYBKAE%3D", 277 | "addToOfflineButtonState": "ADD_TO_OFFLINE_BUTTON_STATE_UNKNOWN" 278 | } 279 | } 280 | } 281 | ], 282 | "timestamp": { 283 | "seconds": "1749654983", 284 | "nanos": 254943213 285 | } 286 | } 287 | } 288 | } 289 | -------------------------------------------------------------------------------- /youtube_transcript_api/test/test_cli.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | from importlib.metadata import PackageNotFoundError, version 3 | from unittest import TestCase 4 | from unittest.mock import MagicMock, patch 5 | 6 | import json 7 | import subprocess 8 | 9 | from youtube_transcript_api import ( 10 | YouTubeTranscriptApi, 11 | VideoUnavailable, 12 | FetchedTranscript, 13 | FetchedTranscriptSnippet, 14 | ) 15 | from youtube_transcript_api._cli import YouTubeTranscriptCli 16 | 17 | 18 | class TestYouTubeTranscriptCli(TestCase): 19 | def setUp(self): 20 | self.transcript_mock = MagicMock() 21 | self.transcript_mock.fetch = MagicMock( 22 | return_value=FetchedTranscript( 23 | snippets=[ 24 | FetchedTranscriptSnippet( 25 | text="Hey, this is just a test", 26 | start=0.0, 27 | duration=1.54, 28 | ), 29 | FetchedTranscriptSnippet( 30 | text="this is not the original transcript", 31 | start=1.54, 32 | duration=4.16, 33 | ), 34 | FetchedTranscriptSnippet( 35 | text="just something shorter, I made up for testing", 36 | start=5.7, 37 | duration=3.239, 38 | ), 39 | ], 40 | language="English", 41 | language_code="en", 42 | is_generated=True, 43 | video_id="GJLlxj_dtq8", 44 | ) 45 | ) 46 | self.transcript_mock.translate = MagicMock(return_value=self.transcript_mock) 47 | 48 | self.transcript_list_mock = MagicMock() 49 | self.transcript_list_mock.find_generated_transcript = MagicMock( 50 | return_value=self.transcript_mock 51 | ) 52 | self.transcript_list_mock.find_manually_created_transcript = MagicMock( 53 | return_value=self.transcript_mock 54 | ) 55 | self.transcript_list_mock.find_transcript = MagicMock( 56 | return_value=self.transcript_mock 57 | ) 58 | 59 | YouTubeTranscriptApi.__init__ = MagicMock(return_value=None) 60 | YouTubeTranscriptApi.list = MagicMock(return_value=self.transcript_list_mock) 61 | 62 | def test_argument_parsing(self): 63 | parsed_args = YouTubeTranscriptCli( 64 | "v1 v2 --format json --languages de en".split() 65 | )._parse_args() 66 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 67 | self.assertEqual(parsed_args.format, "json") 68 | self.assertEqual(parsed_args.languages, ["de", "en"]) 69 | self.assertEqual(parsed_args.http_proxy, "") 70 | self.assertEqual(parsed_args.https_proxy, "") 71 | 72 | parsed_args = YouTubeTranscriptCli( 73 | "v1 v2 --languages de en --format json".split() 74 | )._parse_args() 75 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 76 | self.assertEqual(parsed_args.format, "json") 77 | self.assertEqual(parsed_args.languages, ["de", "en"]) 78 | self.assertEqual(parsed_args.http_proxy, "") 79 | self.assertEqual(parsed_args.https_proxy, "") 80 | 81 | parsed_args = YouTubeTranscriptCli( 82 | " --format json v1 v2 --languages de en".split() 83 | )._parse_args() 84 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 85 | self.assertEqual(parsed_args.format, "json") 86 | self.assertEqual(parsed_args.languages, ["de", "en"]) 87 | self.assertEqual(parsed_args.http_proxy, "") 88 | self.assertEqual(parsed_args.https_proxy, "") 89 | 90 | parsed_args = YouTubeTranscriptCli( 91 | "v1 v2 --languages de en --format json " 92 | "--http-proxy http://user:pass@domain:port " 93 | "--https-proxy https://user:pass@domain:port".split() 94 | )._parse_args() 95 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 96 | self.assertEqual(parsed_args.format, "json") 97 | self.assertEqual(parsed_args.languages, ["de", "en"]) 98 | self.assertEqual(parsed_args.http_proxy, "http://user:pass@domain:port") 99 | self.assertEqual(parsed_args.https_proxy, "https://user:pass@domain:port") 100 | 101 | parsed_args = YouTubeTranscriptCli( 102 | "v1 v2 --languages de en --format json " 103 | "--webshare-proxy-username username " 104 | "--webshare-proxy-password password".split() 105 | )._parse_args() 106 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 107 | self.assertEqual(parsed_args.format, "json") 108 | self.assertEqual(parsed_args.languages, ["de", "en"]) 109 | self.assertEqual(parsed_args.webshare_proxy_username, "username") 110 | self.assertEqual(parsed_args.webshare_proxy_password, "password") 111 | 112 | parsed_args = YouTubeTranscriptCli( 113 | "v1 v2 --languages de en --format json --http-proxy http://user:pass@domain:port".split() 114 | )._parse_args() 115 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 116 | self.assertEqual(parsed_args.format, "json") 117 | self.assertEqual(parsed_args.languages, ["de", "en"]) 118 | self.assertEqual(parsed_args.http_proxy, "http://user:pass@domain:port") 119 | self.assertEqual(parsed_args.https_proxy, "") 120 | 121 | parsed_args = YouTubeTranscriptCli( 122 | "v1 v2 --languages de en --format json --https-proxy https://user:pass@domain:port".split() 123 | )._parse_args() 124 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 125 | self.assertEqual(parsed_args.format, "json") 126 | self.assertEqual(parsed_args.languages, ["de", "en"]) 127 | self.assertEqual(parsed_args.https_proxy, "https://user:pass@domain:port") 128 | self.assertEqual(parsed_args.http_proxy, "") 129 | 130 | def test_argument_parsing__only_video_ids(self): 131 | parsed_args = YouTubeTranscriptCli("v1 v2".split())._parse_args() 132 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 133 | self.assertEqual(parsed_args.format, "pretty") 134 | self.assertEqual(parsed_args.languages, ["en"]) 135 | 136 | def test_argument_parsing__video_ids_starting_with_dash(self): 137 | parsed_args = YouTubeTranscriptCli(r"\-v1 \-\-v2 \--v3".split())._parse_args() 138 | self.assertEqual(parsed_args.video_ids, ["-v1", "--v2", "--v3"]) 139 | self.assertEqual(parsed_args.format, "pretty") 140 | self.assertEqual(parsed_args.languages, ["en"]) 141 | 142 | def test_argument_parsing__fail_without_video_ids(self): 143 | with self.assertRaises(SystemExit): 144 | YouTubeTranscriptCli("--format json".split())._parse_args() 145 | 146 | def test_argument_parsing__json(self): 147 | parsed_args = YouTubeTranscriptCli("v1 v2 --format json".split())._parse_args() 148 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 149 | self.assertEqual(parsed_args.format, "json") 150 | self.assertEqual(parsed_args.languages, ["en"]) 151 | 152 | parsed_args = YouTubeTranscriptCli("--format json v1 v2".split())._parse_args() 153 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 154 | self.assertEqual(parsed_args.format, "json") 155 | self.assertEqual(parsed_args.languages, ["en"]) 156 | 157 | def test_argument_parsing__languages(self): 158 | parsed_args = YouTubeTranscriptCli( 159 | "v1 v2 --languages de en".split() 160 | )._parse_args() 161 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 162 | self.assertEqual(parsed_args.format, "pretty") 163 | self.assertEqual(parsed_args.languages, ["de", "en"]) 164 | 165 | def test_argument_parsing__proxies(self): 166 | parsed_args = YouTubeTranscriptCli( 167 | "v1 v2 --http-proxy http://user:pass@domain:port".split() 168 | )._parse_args() 169 | self.assertEqual(parsed_args.http_proxy, "http://user:pass@domain:port") 170 | 171 | parsed_args = YouTubeTranscriptCli( 172 | "v1 v2 --https-proxy https://user:pass@domain:port".split() 173 | )._parse_args() 174 | self.assertEqual(parsed_args.https_proxy, "https://user:pass@domain:port") 175 | 176 | parsed_args = YouTubeTranscriptCli( 177 | "v1 v2 --http-proxy http://user:pass@domain:port --https-proxy https://user:pass@domain:port".split() 178 | )._parse_args() 179 | self.assertEqual(parsed_args.http_proxy, "http://user:pass@domain:port") 180 | self.assertEqual(parsed_args.https_proxy, "https://user:pass@domain:port") 181 | 182 | parsed_args = YouTubeTranscriptCli("v1 v2".split())._parse_args() 183 | self.assertEqual(parsed_args.http_proxy, "") 184 | self.assertEqual(parsed_args.https_proxy, "") 185 | 186 | def test_argument_parsing__list_transcripts(self): 187 | parsed_args = YouTubeTranscriptCli( 188 | "--list-transcripts v1 v2".split() 189 | )._parse_args() 190 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 191 | self.assertTrue(parsed_args.list_transcripts) 192 | 193 | parsed_args = YouTubeTranscriptCli( 194 | "v1 v2 --list-transcripts".split() 195 | )._parse_args() 196 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 197 | self.assertTrue(parsed_args.list_transcripts) 198 | 199 | def test_argument_parsing__translate(self): 200 | parsed_args = YouTubeTranscriptCli( 201 | "v1 v2 --languages de en --translate cz".split() 202 | )._parse_args() 203 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 204 | self.assertEqual(parsed_args.format, "pretty") 205 | self.assertEqual(parsed_args.languages, ["de", "en"]) 206 | self.assertEqual(parsed_args.translate, "cz") 207 | 208 | parsed_args = YouTubeTranscriptCli( 209 | "v1 v2 --translate cz --languages de en".split() 210 | )._parse_args() 211 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 212 | self.assertEqual(parsed_args.format, "pretty") 213 | self.assertEqual(parsed_args.languages, ["de", "en"]) 214 | self.assertEqual(parsed_args.translate, "cz") 215 | 216 | def test_argument_parsing__manually_or_generated(self): 217 | parsed_args = YouTubeTranscriptCli( 218 | "v1 v2 --exclude-manually-created".split() 219 | )._parse_args() 220 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 221 | self.assertTrue(parsed_args.exclude_manually_created) 222 | self.assertFalse(parsed_args.exclude_generated) 223 | 224 | parsed_args = YouTubeTranscriptCli( 225 | "v1 v2 --exclude-generated".split() 226 | )._parse_args() 227 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 228 | self.assertFalse(parsed_args.exclude_manually_created) 229 | self.assertTrue(parsed_args.exclude_generated) 230 | 231 | parsed_args = YouTubeTranscriptCli( 232 | "v1 v2 --exclude-manually-created --exclude-generated".split() 233 | )._parse_args() 234 | self.assertEqual(parsed_args.video_ids, ["v1", "v2"]) 235 | self.assertTrue(parsed_args.exclude_manually_created) 236 | self.assertTrue(parsed_args.exclude_generated) 237 | 238 | def test_run(self): 239 | YouTubeTranscriptCli("v1 v2 --languages de en".split()).run() 240 | 241 | YouTubeTranscriptApi.list.assert_any_call("v1") 242 | YouTubeTranscriptApi.list.assert_any_call("v2") 243 | 244 | self.transcript_list_mock.find_transcript.assert_any_call(["de", "en"]) 245 | 246 | def test_run__failing_transcripts(self): 247 | YouTubeTranscriptApi.list = MagicMock(side_effect=VideoUnavailable("video_id")) 248 | 249 | output = YouTubeTranscriptCli("v1 --languages de en".split()).run() 250 | 251 | self.assertEqual(output, str(VideoUnavailable("video_id"))) 252 | 253 | def test_run__exclude_generated(self): 254 | YouTubeTranscriptCli( 255 | "v1 v2 --languages de en --exclude-generated".split() 256 | ).run() 257 | 258 | self.transcript_list_mock.find_manually_created_transcript.assert_any_call( 259 | ["de", "en"] 260 | ) 261 | 262 | def test_run__exclude_manually_created(self): 263 | YouTubeTranscriptCli( 264 | "v1 v2 --languages de en --exclude-manually-created".split() 265 | ).run() 266 | 267 | self.transcript_list_mock.find_generated_transcript.assert_any_call( 268 | ["de", "en"] 269 | ) 270 | 271 | def test_run__exclude_manually_created_and_generated(self): 272 | self.assertEqual( 273 | YouTubeTranscriptCli( 274 | "v1 v2 --languages de en --exclude-manually-created --exclude-generated".split() 275 | ).run(), 276 | "", 277 | ) 278 | 279 | def test_run__translate(self): 280 | (YouTubeTranscriptCli("v1 v2 --languages de en --translate cz".split()).run(),) 281 | 282 | self.transcript_mock.translate.assert_any_call("cz") 283 | 284 | def test_run__list_transcripts(self): 285 | YouTubeTranscriptCli("--list-transcripts v1 v2".split()).run() 286 | 287 | YouTubeTranscriptApi.list.assert_any_call("v1") 288 | YouTubeTranscriptApi.list.assert_any_call("v2") 289 | 290 | def test_run__json_output(self): 291 | output = YouTubeTranscriptCli( 292 | "v1 v2 --languages de en --format json".split() 293 | ).run() 294 | 295 | # will fail if output is not valid json 296 | json.loads(output) 297 | 298 | def test_run__webshare_proxy_config(self): 299 | YouTubeTranscriptCli( 300 | ( 301 | "v1 v2 --languages de en " 302 | "--webshare-proxy-username username " 303 | "--webshare-proxy-password password" 304 | ).split() 305 | ).run() 306 | 307 | proxy_config = YouTubeTranscriptApi.__init__.call_args.kwargs.get( 308 | "proxy_config" 309 | ) 310 | 311 | self.assertIsNotNone(proxy_config) 312 | self.assertEqual(proxy_config.proxy_username, "username") 313 | self.assertEqual(proxy_config.proxy_password, "password") 314 | 315 | def test_run__generic_proxy_config(self): 316 | YouTubeTranscriptCli( 317 | ( 318 | "v1 v2 --languages de en " 319 | "--http-proxy http://user:pass@domain:port " 320 | "--https-proxy https://user:pass@domain:port" 321 | ).split() 322 | ).run() 323 | 324 | proxy_config = YouTubeTranscriptApi.__init__.call_args.kwargs.get( 325 | "proxy_config" 326 | ) 327 | 328 | self.assertIsNotNone(proxy_config) 329 | self.assertEqual(proxy_config.http_url, "http://user:pass@domain:port") 330 | self.assertEqual(proxy_config.https_url, "https://user:pass@domain:port") 331 | 332 | @pytest.mark.skip( 333 | reason="This test is temporarily disabled because cookie auth is currently not " 334 | "working due to YouTube changes." 335 | ) 336 | def test_run__cookies(self): 337 | YouTubeTranscriptCli( 338 | ("v1 v2 --languages de en " "--cookies blahblah.txt").split() 339 | ).run() 340 | 341 | YouTubeTranscriptApi.__init__.assert_any_call( 342 | proxy_config=None, 343 | cookie_path="blahblah.txt", 344 | ) 345 | 346 | def test_version_matches_metadata(self): 347 | """ 348 | `youtube_transcript_api --version` should return the same version as in the package metadata. 349 | """ 350 | expected_version_msg = ( 351 | f"youtube_transcript_api, version {version('youtube-transcript-api')}" 352 | ) 353 | 354 | cli_version_msg = subprocess.run( 355 | ["youtube_transcript_api", "--version"], 356 | capture_output=True, 357 | text=True, 358 | check=True, 359 | ).stdout.strip() 360 | 361 | assert ( 362 | cli_version_msg == expected_version_msg 363 | ), f"Expected version '{expected_version_msg}', but got '{cli_version_msg}'" 364 | 365 | def test_get_version_package_not_found(self): 366 | with patch( 367 | "youtube_transcript_api._cli.version", side_effect=PackageNotFoundError 368 | ): 369 | cli = YouTubeTranscriptCli([]) 370 | assert cli._get_version() == "unknown" 371 | -------------------------------------------------------------------------------- /youtube_transcript_api/test/test_api.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | import os 3 | from pathlib import Path 4 | from unittest import TestCase 5 | from unittest.mock import patch 6 | 7 | import requests 8 | 9 | import httpretty 10 | 11 | from youtube_transcript_api import ( 12 | YouTubeTranscriptApi, 13 | TranscriptsDisabled, 14 | NoTranscriptFound, 15 | VideoUnavailable, 16 | IpBlocked, 17 | NotTranslatable, 18 | TranslationLanguageNotAvailable, 19 | CookiePathInvalid, 20 | CookieInvalid, 21 | FailedToCreateConsentCookie, 22 | YouTubeRequestFailed, 23 | InvalidVideoId, 24 | FetchedTranscript, 25 | FetchedTranscriptSnippet, 26 | AgeRestricted, 27 | RequestBlocked, 28 | VideoUnplayable, 29 | PoTokenRequired, 30 | ) 31 | from youtube_transcript_api.proxies import GenericProxyConfig, WebshareProxyConfig 32 | 33 | 34 | def get_asset_path(filename: str) -> Path: 35 | return Path( 36 | "{dirname}/assets/{filename}".format( 37 | dirname=os.path.dirname(__file__), filename=filename 38 | ) 39 | ) 40 | 41 | 42 | def load_asset(filename: str): 43 | with open(get_asset_path(filename), mode="rb") as file: 44 | return file.read() 45 | 46 | 47 | class TestYouTubeTranscriptApi(TestCase): 48 | def setUp(self): 49 | self.ref_transcript = FetchedTranscript( 50 | snippets=[ 51 | FetchedTranscriptSnippet( 52 | text="Hey, this is just a test", 53 | start=0.0, 54 | duration=1.54, 55 | ), 56 | FetchedTranscriptSnippet( 57 | text="this is not the original transcript", 58 | start=1.54, 59 | duration=4.16, 60 | ), 61 | FetchedTranscriptSnippet( 62 | text="just something shorter, I made up for testing", 63 | start=5.7, 64 | duration=3.239, 65 | ), 66 | ], 67 | language="English", 68 | language_code="en", 69 | is_generated=False, 70 | video_id="GJLlxj_dtq8", 71 | ) 72 | self.ref_transcript_raw = self.ref_transcript.to_raw_data() 73 | httpretty.enable() 74 | httpretty.register_uri( 75 | httpretty.POST, 76 | "https://www.youtube.com/youtubei/v1/player", 77 | body=load_asset("youtube.innertube.json.static"), 78 | ) 79 | httpretty.register_uri( 80 | httpretty.GET, 81 | "https://www.youtube.com/watch", 82 | body=load_asset("youtube.html.static"), 83 | ) 84 | httpretty.register_uri( 85 | httpretty.GET, 86 | "https://www.youtube.com/api/timedtext", 87 | body=load_asset("transcript.xml.static"), 88 | ) 89 | 90 | def tearDown(self): 91 | httpretty.reset() 92 | httpretty.disable() 93 | 94 | def test_fetch(self): 95 | transcript = YouTubeTranscriptApi().fetch("GJLlxj_dtq8") 96 | 97 | self.assertEqual( 98 | transcript, 99 | self.ref_transcript, 100 | ) 101 | 102 | def test_fetch_formatted(self): 103 | transcript = YouTubeTranscriptApi().fetch( 104 | "GJLlxj_dtq8", preserve_formatting=True 105 | ) 106 | 107 | self.ref_transcript[1].text = "this is not the original transcript" 108 | 109 | self.assertEqual( 110 | transcript, 111 | self.ref_transcript, 112 | ) 113 | 114 | def test_fetch__with_altered_user_agent(self): 115 | httpretty.register_uri( 116 | httpretty.POST, 117 | "https://www.youtube.com/youtubei/v1/player", 118 | body=load_asset("youtube_altered_user_agent.innertube.json.static"), 119 | ) 120 | 121 | transcript = YouTubeTranscriptApi().fetch("GJLlxj_dtq8") 122 | 123 | self.assertEqual( 124 | transcript, 125 | self.ref_transcript, 126 | ) 127 | 128 | def test_list(self): 129 | transcript_list = YouTubeTranscriptApi().list("GJLlxj_dtq8") 130 | 131 | language_codes = {transcript.language_code for transcript in transcript_list} 132 | 133 | self.assertEqual( 134 | language_codes, {"zh", "de", "en", "hi", "ja", "ko", "es", "cs", "en"} 135 | ) 136 | 137 | def test_list__find_manually_created(self): 138 | transcript_list = YouTubeTranscriptApi().list("GJLlxj_dtq8") 139 | transcript = transcript_list.find_manually_created_transcript(["cs"]) 140 | 141 | self.assertFalse(transcript.is_generated) 142 | 143 | def test_list__find_generated(self): 144 | transcript_list = YouTubeTranscriptApi().list("GJLlxj_dtq8") 145 | 146 | with self.assertRaises(NoTranscriptFound): 147 | transcript_list.find_generated_transcript(["cs"]) 148 | 149 | transcript = transcript_list.find_generated_transcript(["en"]) 150 | 151 | self.assertTrue(transcript.is_generated) 152 | 153 | def test_list__url_as_video_id(self): 154 | httpretty.register_uri( 155 | httpretty.POST, 156 | "https://www.youtube.com/youtubei/v1/player", 157 | body=load_asset("youtube_video_unavailable.innertube.json.static"), 158 | ) 159 | 160 | with self.assertRaises(InvalidVideoId): 161 | YouTubeTranscriptApi().list( 162 | "https://www.youtube.com/youtubei/v1/player?v=GJLlxj_dtq8" 163 | ) 164 | 165 | def test_translate_transcript(self): 166 | transcript = YouTubeTranscriptApi().list("GJLlxj_dtq8").find_transcript(["en"]) 167 | 168 | translated_transcript = transcript.translate("ar") 169 | 170 | self.assertEqual(translated_transcript.language_code, "ar") 171 | self.assertIn("&tlang=ar", translated_transcript._url) 172 | 173 | def test_translate_transcript__translation_language_not_available(self): 174 | transcript = YouTubeTranscriptApi().list("GJLlxj_dtq8").find_transcript(["en"]) 175 | 176 | with self.assertRaises(TranslationLanguageNotAvailable): 177 | transcript.translate("xyz") 178 | 179 | def test_translate_transcript__not_translatable(self): 180 | transcript = YouTubeTranscriptApi().list("GJLlxj_dtq8").find_transcript(["en"]) 181 | transcript.translation_languages = [] 182 | 183 | with self.assertRaises(NotTranslatable): 184 | transcript.translate("af") 185 | 186 | def test_fetch__correct_language_is_used(self): 187 | YouTubeTranscriptApi().fetch("GJLlxj_dtq8", ["de", "en"]) 188 | query_string = httpretty.last_request().querystring 189 | 190 | self.assertIn("lang", query_string) 191 | self.assertEqual(len(query_string["lang"]), 1) 192 | self.assertEqual(query_string["lang"][0], "de") 193 | 194 | def test_fetch__fallback_language_is_used(self): 195 | httpretty.register_uri( 196 | httpretty.POST, 197 | "https://www.youtube.com/youtubei/v1/player", 198 | body=load_asset("youtube_ww1_nl_en.innertube.json.static"), 199 | ) 200 | 201 | YouTubeTranscriptApi().fetch("F1xioXWb8CY", ["de", "en"]) 202 | query_string = httpretty.last_request().querystring 203 | 204 | self.assertIn("lang", query_string) 205 | self.assertEqual(len(query_string["lang"]), 1) 206 | self.assertEqual(query_string["lang"][0], "en") 207 | 208 | def test_fetch__create_consent_cookie_if_needed(self): 209 | httpretty.register_uri( 210 | httpretty.GET, 211 | "https://www.youtube.com/watch", 212 | body=load_asset("youtube_consent_page.html.static"), 213 | ) 214 | 215 | YouTubeTranscriptApi().fetch("F1xioXWb8CY") 216 | self.assertEqual(len(httpretty.latest_requests()), 4) 217 | for request in httpretty.latest_requests()[1:]: 218 | self.assertEqual( 219 | request.headers["cookie"], "CONSENT=YES+cb.20210328-17-p0.de+FX+119" 220 | ) 221 | 222 | def test_fetch__exception_if_create_consent_cookie_failed(self): 223 | for _ in range(2): 224 | httpretty.register_uri( 225 | httpretty.GET, 226 | "https://www.youtube.com/watch", 227 | body=load_asset("youtube_consent_page.html.static"), 228 | ) 229 | 230 | with self.assertRaises(FailedToCreateConsentCookie): 231 | YouTubeTranscriptApi().fetch("F1xioXWb8CY") 232 | 233 | def test_fetch__exception_if_consent_cookie_age_invalid(self): 234 | httpretty.register_uri( 235 | httpretty.GET, 236 | "https://www.youtube.com/watch", 237 | body=load_asset("youtube_consent_page_invalid.html.static"), 238 | ) 239 | 240 | with self.assertRaises(FailedToCreateConsentCookie): 241 | YouTubeTranscriptApi().fetch("F1xioXWb8CY") 242 | 243 | def test_fetch__exception_if_video_unavailable(self): 244 | httpretty.register_uri( 245 | httpretty.POST, 246 | "https://www.youtube.com/youtubei/v1/player", 247 | body=load_asset("youtube_video_unavailable.innertube.json.static"), 248 | ) 249 | 250 | with self.assertRaises(VideoUnavailable): 251 | YouTubeTranscriptApi().fetch("abc") 252 | 253 | def test_fetch__exception_if_youtube_request_fails(self): 254 | httpretty.register_uri( 255 | httpretty.POST, "https://www.youtube.com/youtubei/v1/player", status=500 256 | ) 257 | 258 | with self.assertRaises(YouTubeRequestFailed) as cm: 259 | YouTubeTranscriptApi().fetch("abc") 260 | 261 | self.assertIn("Request to YouTube failed: ", str(cm.exception)) 262 | 263 | def test_fetch__exception_if_youtube_request_limit_reached( 264 | self, 265 | ): 266 | httpretty.register_uri( 267 | httpretty.GET, 268 | "https://www.youtube.com/watch", 269 | body=load_asset("youtube_too_many_requests.html.static"), 270 | ) 271 | 272 | with self.assertRaises(IpBlocked): 273 | YouTubeTranscriptApi().fetch("abc") 274 | 275 | def test_fetch__exception_if_timedtext_request_limit_reached( 276 | self, 277 | ): 278 | httpretty.register_uri( 279 | httpretty.GET, 280 | "https://www.youtube.com/api/timedtext", 281 | status=429, 282 | ) 283 | 284 | with self.assertRaises(IpBlocked): 285 | YouTubeTranscriptApi().fetch("abc") 286 | 287 | def test_fetch__exception_if_age_restricted(self): 288 | httpretty.register_uri( 289 | httpretty.POST, 290 | "https://www.youtube.com/youtubei/v1/player", 291 | body=load_asset("youtube_age_restricted.innertube.json.static"), 292 | ) 293 | 294 | with self.assertRaises(AgeRestricted): 295 | YouTubeTranscriptApi().fetch("Njp5uhTorCo") 296 | 297 | def test_fetch__exception_if_ip_blocked(self): 298 | httpretty.register_uri( 299 | httpretty.GET, 300 | "https://www.youtube.com/watch", 301 | body=load_asset("youtube_too_many_requests.html.static"), 302 | ) 303 | 304 | with self.assertRaises(IpBlocked): 305 | YouTubeTranscriptApi().fetch("abc") 306 | 307 | def test_fetch__exception_if_po_token_required(self): 308 | httpretty.register_uri( 309 | httpretty.POST, 310 | "https://www.youtube.com/youtubei/v1/player", 311 | body=load_asset("youtube_po_token_required.innertube.json.static"), 312 | ) 313 | 314 | with self.assertRaises(PoTokenRequired): 315 | YouTubeTranscriptApi().fetch("GJLlxj_dtq8") 316 | 317 | def test_fetch__exception_request_blocked(self): 318 | httpretty.register_uri( 319 | httpretty.POST, 320 | "https://www.youtube.com/youtubei/v1/player", 321 | body=load_asset("youtube_request_blocked.innertube.json.static"), 322 | ) 323 | 324 | with self.assertRaises(RequestBlocked) as cm: 325 | YouTubeTranscriptApi().fetch("Njp5uhTorCo") 326 | 327 | self.assertIn("YouTube is blocking requests from your IP", str(cm.exception)) 328 | 329 | def test_fetch__exception_unplayable(self): 330 | httpretty.register_uri( 331 | httpretty.POST, 332 | "https://www.youtube.com/youtubei/v1/player", 333 | body=load_asset("youtube_unplayable.innertube.json.static"), 334 | ) 335 | 336 | with self.assertRaises(VideoUnplayable) as cm: 337 | YouTubeTranscriptApi().fetch("Njp5uhTorCo") 338 | exception = cm.exception 339 | self.assertEqual(exception.reason, "Custom Reason") 340 | self.assertEqual(exception.sub_reasons, ["Sub Reason 1", "Sub Reason 2"]) 341 | self.assertIn("Custom Reason", str(exception)) 342 | 343 | def test_fetch__exception_if_transcripts_disabled(self): 344 | httpretty.register_uri( 345 | httpretty.POST, 346 | "https://www.youtube.com/youtubei/v1/player", 347 | body=load_asset("youtube_transcripts_disabled.innertube.json.static"), 348 | ) 349 | 350 | with self.assertRaises(TranscriptsDisabled): 351 | YouTubeTranscriptApi().fetch("dsMFmonKDD4") 352 | 353 | httpretty.register_uri( 354 | httpretty.POST, 355 | "https://www.youtube.com/youtubei/v1/player", 356 | body=load_asset("youtube_transcripts_disabled2.innertube.json.static"), 357 | ) 358 | with self.assertRaises(TranscriptsDisabled): 359 | YouTubeTranscriptApi().fetch("Fjg5lYqvzUs") 360 | 361 | def test_fetch__exception_if_language_unavailable(self): 362 | with self.assertRaises(NoTranscriptFound) as cm: 363 | YouTubeTranscriptApi().fetch("GJLlxj_dtq8", languages=["cz"]) 364 | 365 | self.assertIn("No transcripts were found for", str(cm.exception)) 366 | 367 | @patch("youtube_transcript_api.proxies.GenericProxyConfig.to_requests_dict") 368 | def test_fetch__with_proxy(self, to_requests_dict): 369 | proxy_config = GenericProxyConfig( 370 | http_url="http://localhost:8080", 371 | https_url="http://localhost:8080", 372 | ) 373 | transcript = YouTubeTranscriptApi(proxy_config=proxy_config).fetch( 374 | "GJLlxj_dtq8" 375 | ) 376 | self.assertEqual( 377 | transcript, 378 | self.ref_transcript, 379 | ) 380 | to_requests_dict.assert_any_call() 381 | 382 | @patch("youtube_transcript_api.proxies.GenericProxyConfig.to_requests_dict") 383 | def test_fetch__with_proxy_prevent_alive_connections(self, to_requests_dict): 384 | proxy_config = WebshareProxyConfig( 385 | proxy_username="username", proxy_password="password" 386 | ) 387 | 388 | YouTubeTranscriptApi(proxy_config=proxy_config).fetch("GJLlxj_dtq8") 389 | 390 | request = httpretty.last_request() 391 | self.assertEqual(request.headers.get("Connection"), "close") 392 | 393 | @patch("youtube_transcript_api.proxies.GenericProxyConfig.to_requests_dict") 394 | def test_fetch__with_proxy_retry_when_blocked(self, to_requests_dict): 395 | for _ in range(3): 396 | httpretty.register_uri( 397 | httpretty.POST, 398 | "https://www.youtube.com/youtubei/v1/player", 399 | body=load_asset("youtube_request_blocked.innertube.json.static"), 400 | ) 401 | proxy_config = WebshareProxyConfig( 402 | proxy_username="username", 403 | proxy_password="password", 404 | ) 405 | 406 | YouTubeTranscriptApi(proxy_config=proxy_config).fetch("Njp5uhTorCo") 407 | 408 | self.assertEqual(len(httpretty.latest_requests()), 2 * 3 + 3) 409 | 410 | @patch("youtube_transcript_api.proxies.GenericProxyConfig.to_requests_dict") 411 | def test_fetch__with_webshare_proxy_reraise_when_blocked(self, to_requests_dict): 412 | retries = 5 413 | for _ in range(retries): 414 | httpretty.register_uri( 415 | httpretty.POST, 416 | "https://www.youtube.com/youtubei/v1/player", 417 | body=load_asset("youtube_request_blocked.innertube.json.static"), 418 | ) 419 | proxy_config = WebshareProxyConfig( 420 | proxy_username="username", 421 | proxy_password="password", 422 | retries_when_blocked=retries, 423 | ) 424 | 425 | with self.assertRaises(RequestBlocked) as cm: 426 | YouTubeTranscriptApi(proxy_config=proxy_config).fetch("Njp5uhTorCo") 427 | 428 | self.assertEqual(len(httpretty.latest_requests()), retries * 2) 429 | self.assertEqual(cm.exception._proxy_config, proxy_config) 430 | self.assertIn("Webshare", str(cm.exception)) 431 | 432 | @patch("youtube_transcript_api.proxies.GenericProxyConfig.to_requests_dict") 433 | def test_fetch__with_generic_proxy_reraise_when_blocked(self, to_requests_dict): 434 | httpretty.register_uri( 435 | httpretty.POST, 436 | "https://www.youtube.com/youtubei/v1/player", 437 | body=load_asset("youtube_request_blocked.innertube.json.static"), 438 | ) 439 | proxy_config = GenericProxyConfig( 440 | http_url="http://localhost:8080", 441 | https_url="http://localhost:8080", 442 | ) 443 | 444 | with self.assertRaises(RequestBlocked) as cm: 445 | YouTubeTranscriptApi(proxy_config=proxy_config).fetch("Njp5uhTorCo") 446 | 447 | self.assertEqual(len(httpretty.latest_requests()), 2) 448 | self.assertEqual(cm.exception._proxy_config, proxy_config) 449 | self.assertIn("YouTube is blocking your requests", str(cm.exception)) 450 | 451 | @pytest.mark.skip( 452 | reason="This test is temporarily disabled because cookie auth is currently not " 453 | "working due to YouTube changes." 454 | ) 455 | def test_fetch__with_cookies(self): 456 | cookie_path = get_asset_path("example_cookies.txt") 457 | transcript = YouTubeTranscriptApi(cookie_path=cookie_path).fetch("GJLlxj_dtq8") 458 | 459 | self.assertEqual( 460 | transcript, 461 | self.ref_transcript, 462 | ) 463 | 464 | @pytest.mark.skip( 465 | reason="This test is temporarily disabled because cookie auth is currently not " 466 | "working due to YouTube changes." 467 | ) 468 | def test_load_cookies(self): 469 | cookie_path = get_asset_path("example_cookies.txt") 470 | 471 | ytt_api = YouTubeTranscriptApi(cookie_path=cookie_path) 472 | 473 | session_cookies = ytt_api._fetcher._http_client.cookies 474 | self.assertEqual( 475 | {"TEST_FIELD": "TEST_VALUE"}, 476 | requests.utils.dict_from_cookiejar(session_cookies), 477 | ) 478 | 479 | @pytest.mark.skip( 480 | reason="This test is temporarily disabled because cookie auth is currently not " 481 | "working due to YouTube changes." 482 | ) 483 | def test_load_cookies__bad_file_path(self): 484 | cookie_path = get_asset_path("nonexistent_cookies.txt") 485 | with self.assertRaises(CookiePathInvalid): 486 | YouTubeTranscriptApi(cookie_path=cookie_path) 487 | 488 | @pytest.mark.skip( 489 | reason="This test is temporarily disabled because cookie auth is currently not " 490 | "working due to YouTube changes." 491 | ) 492 | def test_load_cookies__no_valid_cookies(self): 493 | cookie_path = get_asset_path("expired_example_cookies.txt") 494 | with self.assertRaises(CookieInvalid): 495 | YouTubeTranscriptApi(cookie_path=cookie_path) 496 | -------------------------------------------------------------------------------- /youtube_transcript_api/_transcripts.py: -------------------------------------------------------------------------------- 1 | from dataclasses import dataclass, asdict 2 | from enum import Enum 3 | from itertools import chain 4 | 5 | from html import unescape 6 | from typing import List, Dict, Iterator, Iterable, Pattern, Optional 7 | 8 | from defusedxml import ElementTree 9 | 10 | import re 11 | 12 | from requests import HTTPError, Session, Response 13 | 14 | from .proxies import ProxyConfig 15 | from ._settings import WATCH_URL, INNERTUBE_CONTEXT, INNERTUBE_API_URL 16 | from ._errors import ( 17 | VideoUnavailable, 18 | YouTubeRequestFailed, 19 | NoTranscriptFound, 20 | TranscriptsDisabled, 21 | NotTranslatable, 22 | TranslationLanguageNotAvailable, 23 | FailedToCreateConsentCookie, 24 | InvalidVideoId, 25 | IpBlocked, 26 | RequestBlocked, 27 | AgeRestricted, 28 | VideoUnplayable, 29 | YouTubeDataUnparsable, 30 | PoTokenRequired, 31 | ) 32 | 33 | 34 | @dataclass 35 | class FetchedTranscriptSnippet: 36 | text: str 37 | start: float 38 | """ 39 | The timestamp at which this transcript snippet appears on screen in seconds. 40 | """ 41 | duration: float 42 | """ 43 | The duration of how long the snippet in seconds. Be aware that this is not the 44 | duration of the transcribed speech, but how long the snippet stays on screen. 45 | Therefore, there can be overlaps between snippets! 46 | """ 47 | 48 | 49 | @dataclass 50 | class FetchedTranscript: 51 | """ 52 | Represents a fetched transcript. This object is iterable, which allows you to 53 | iterate over the transcript snippets. 54 | """ 55 | 56 | snippets: List[FetchedTranscriptSnippet] 57 | video_id: str 58 | language: str 59 | language_code: str 60 | is_generated: bool 61 | 62 | def __iter__(self) -> Iterator[FetchedTranscriptSnippet]: 63 | return iter(self.snippets) 64 | 65 | def __getitem__(self, index) -> FetchedTranscriptSnippet: 66 | return self.snippets[index] 67 | 68 | def __len__(self) -> int: 69 | return len(self.snippets) 70 | 71 | def to_raw_data(self) -> List[Dict]: 72 | return [asdict(snippet) for snippet in self] 73 | 74 | 75 | @dataclass 76 | class _TranslationLanguage: 77 | language: str 78 | language_code: str 79 | 80 | 81 | class _PlayabilityStatus(str, Enum): 82 | OK = "OK" 83 | ERROR = "ERROR" 84 | LOGIN_REQUIRED = "LOGIN_REQUIRED" 85 | 86 | 87 | class _PlayabilityFailedReason(str, Enum): 88 | BOT_DETECTED = "Sign in to confirm you’re not a bot" 89 | AGE_RESTRICTED = "This video may be inappropriate for some users." 90 | VIDEO_UNAVAILABLE = "This video is unavailable" 91 | 92 | 93 | def _raise_http_errors(response: Response, video_id: str) -> Response: 94 | try: 95 | if response.status_code == 429: 96 | raise IpBlocked(video_id) 97 | response.raise_for_status() 98 | return response 99 | except HTTPError as error: 100 | raise YouTubeRequestFailed(video_id, error) 101 | 102 | 103 | class Transcript: 104 | def __init__( 105 | self, 106 | http_client: Session, 107 | video_id: str, 108 | url: str, 109 | language: str, 110 | language_code: str, 111 | is_generated: bool, 112 | translation_languages: List[_TranslationLanguage], 113 | ): 114 | """ 115 | You probably don't want to initialize this directly. Usually you'll access Transcript objects using a 116 | TranscriptList. 117 | """ 118 | self._http_client = http_client 119 | self.video_id = video_id 120 | self._url = url 121 | self.language = language 122 | self.language_code = language_code 123 | self.is_generated = is_generated 124 | self.translation_languages = translation_languages 125 | self._translation_languages_dict = { 126 | translation_language.language_code: translation_language.language 127 | for translation_language in translation_languages 128 | } 129 | 130 | def fetch(self, preserve_formatting: bool = False) -> FetchedTranscript: 131 | """ 132 | Loads the actual transcript data. 133 | :param preserve_formatting: whether to keep select HTML text formatting 134 | """ 135 | if "&exp=xpe" in self._url: 136 | raise PoTokenRequired(self.video_id) 137 | response = self._http_client.get(self._url) 138 | snippets = _TranscriptParser(preserve_formatting=preserve_formatting).parse( 139 | _raise_http_errors(response, self.video_id).text, 140 | ) 141 | return FetchedTranscript( 142 | snippets=snippets, 143 | video_id=self.video_id, 144 | language=self.language, 145 | language_code=self.language_code, 146 | is_generated=self.is_generated, 147 | ) 148 | 149 | def __str__(self) -> str: 150 | return '{language_code} ("{language}"){translation_description}'.format( 151 | language=self.language, 152 | language_code=self.language_code, 153 | translation_description="[TRANSLATABLE]" if self.is_translatable else "", 154 | ) 155 | 156 | @property 157 | def is_translatable(self) -> bool: 158 | return len(self.translation_languages) > 0 159 | 160 | def translate(self, language_code: str) -> "Transcript": 161 | if not self.is_translatable: 162 | raise NotTranslatable(self.video_id) 163 | 164 | if language_code not in self._translation_languages_dict: 165 | raise TranslationLanguageNotAvailable(self.video_id) 166 | 167 | return Transcript( 168 | self._http_client, 169 | self.video_id, 170 | "{url}&tlang={language_code}".format( 171 | url=self._url, language_code=language_code 172 | ), 173 | self._translation_languages_dict[language_code], 174 | language_code, 175 | True, 176 | [], 177 | ) 178 | 179 | 180 | class TranscriptList: 181 | """ 182 | This object represents a list of transcripts. It can be iterated over to list all transcripts which are available 183 | for a given YouTube video. Also, it provides functionality to search for a transcript in a given language. 184 | """ 185 | 186 | def __init__( 187 | self, 188 | video_id: str, 189 | manually_created_transcripts: Dict[str, Transcript], 190 | generated_transcripts: Dict[str, Transcript], 191 | translation_languages: List[_TranslationLanguage], 192 | ): 193 | """ 194 | The constructor is only for internal use. Use the static build method instead. 195 | 196 | :param video_id: the id of the video this TranscriptList is for 197 | :param manually_created_transcripts: dict mapping language codes to the manually created transcripts 198 | :param generated_transcripts: dict mapping language codes to the generated transcripts 199 | :param translation_languages: list of languages which can be used for translatable languages 200 | """ 201 | self.video_id = video_id 202 | self._manually_created_transcripts = manually_created_transcripts 203 | self._generated_transcripts = generated_transcripts 204 | self._translation_languages = translation_languages 205 | 206 | @staticmethod 207 | def build( 208 | http_client: Session, video_id: str, captions_json: Dict 209 | ) -> "TranscriptList": 210 | """ 211 | Factory method for TranscriptList. 212 | 213 | :param http_client: http client which is used to make the transcript retrieving http calls 214 | :param video_id: the id of the video this TranscriptList is for 215 | :param captions_json: the JSON parsed from the YouTube pages static HTML 216 | :return: the created TranscriptList 217 | """ 218 | translation_languages = [ 219 | _TranslationLanguage( 220 | language=translation_language["languageName"]["runs"][0]["text"], 221 | language_code=translation_language["languageCode"], 222 | ) 223 | for translation_language in captions_json.get("translationLanguages", []) 224 | ] 225 | 226 | manually_created_transcripts = {} 227 | generated_transcripts = {} 228 | 229 | for caption in captions_json["captionTracks"]: 230 | if caption.get("kind", "") == "asr": 231 | transcript_dict = generated_transcripts 232 | else: 233 | transcript_dict = manually_created_transcripts 234 | 235 | transcript_dict[caption["languageCode"]] = Transcript( 236 | http_client, 237 | video_id, 238 | caption["baseUrl"].replace("&fmt=srv3", ""), 239 | caption["name"]["runs"][0]["text"], 240 | caption["languageCode"], 241 | caption.get("kind", "") == "asr", 242 | translation_languages if caption.get("isTranslatable", False) else [], 243 | ) 244 | 245 | return TranscriptList( 246 | video_id, 247 | manually_created_transcripts, 248 | generated_transcripts, 249 | translation_languages, 250 | ) 251 | 252 | def __iter__(self) -> Iterator[Transcript]: 253 | return chain( 254 | self._manually_created_transcripts.values(), 255 | self._generated_transcripts.values(), 256 | ) 257 | 258 | def find_transcript(self, language_codes: Iterable[str]) -> Transcript: 259 | """ 260 | Finds a transcript for a given language code. Manually created transcripts are returned first and only if none 261 | are found, generated transcripts are used. If you only want generated transcripts use 262 | `find_manually_created_transcript` instead. 263 | 264 | :param language_codes: A list of language codes in a descending priority. For example, if this is set to 265 | ['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if 266 | it fails to do so. 267 | :return: the found Transcript 268 | """ 269 | return self._find_transcript( 270 | language_codes, 271 | [self._manually_created_transcripts, self._generated_transcripts], 272 | ) 273 | 274 | def find_generated_transcript(self, language_codes: Iterable[str]) -> Transcript: 275 | """ 276 | Finds an automatically generated transcript for a given language code. 277 | 278 | :param language_codes: A list of language codes in a descending priority. For example, if this is set to 279 | ['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if 280 | it fails to do so. 281 | :return: the found Transcript 282 | """ 283 | return self._find_transcript(language_codes, [self._generated_transcripts]) 284 | 285 | def find_manually_created_transcript( 286 | self, language_codes: Iterable[str] 287 | ) -> Transcript: 288 | """ 289 | Finds a manually created transcript for a given language code. 290 | 291 | :param language_codes: A list of language codes in a descending priority. For example, if this is set to 292 | ['de', 'en'] it will first try to fetch the german transcript (de) and then fetch the english transcript (en) if 293 | it fails to do so. 294 | :return: the found Transcript 295 | """ 296 | return self._find_transcript( 297 | language_codes, [self._manually_created_transcripts] 298 | ) 299 | 300 | def _find_transcript( 301 | self, 302 | language_codes: Iterable[str], 303 | transcript_dicts: List[Dict[str, Transcript]], 304 | ) -> Transcript: 305 | for language_code in language_codes: 306 | for transcript_dict in transcript_dicts: 307 | if language_code in transcript_dict: 308 | return transcript_dict[language_code] 309 | 310 | raise NoTranscriptFound(self.video_id, language_codes, self) 311 | 312 | def __str__(self) -> str: 313 | return ( 314 | "For this video ({video_id}) transcripts are available in the following languages:\n\n" 315 | "(MANUALLY CREATED)\n" 316 | "{available_manually_created_transcript_languages}\n\n" 317 | "(GENERATED)\n" 318 | "{available_generated_transcripts}\n\n" 319 | "(TRANSLATION LANGUAGES)\n" 320 | "{available_translation_languages}" 321 | ).format( 322 | video_id=self.video_id, 323 | available_manually_created_transcript_languages=self._get_language_description( 324 | str(transcript) 325 | for transcript in self._manually_created_transcripts.values() 326 | ), 327 | available_generated_transcripts=self._get_language_description( 328 | str(transcript) for transcript in self._generated_transcripts.values() 329 | ), 330 | available_translation_languages=self._get_language_description( 331 | '{language_code} ("{language}")'.format( 332 | language=translation_language.language, 333 | language_code=translation_language.language_code, 334 | ) 335 | for translation_language in self._translation_languages 336 | ), 337 | ) 338 | 339 | def _get_language_description(self, transcript_strings: Iterable[str]) -> str: 340 | description = "\n".join( 341 | " - {transcript}".format(transcript=transcript) 342 | for transcript in transcript_strings 343 | ) 344 | return description if description else "None" 345 | 346 | 347 | class TranscriptListFetcher: 348 | def __init__(self, http_client: Session, proxy_config: Optional[ProxyConfig]): 349 | self._http_client = http_client 350 | self._proxy_config = proxy_config 351 | 352 | def fetch(self, video_id: str) -> TranscriptList: 353 | return TranscriptList.build( 354 | self._http_client, 355 | video_id, 356 | self._fetch_captions_json(video_id), 357 | ) 358 | 359 | def _fetch_captions_json(self, video_id: str, try_number: int = 0) -> Dict: 360 | try: 361 | html = self._fetch_video_html(video_id) 362 | api_key = self._extract_innertube_api_key(html, video_id) 363 | innertube_data = self._fetch_innertube_data(video_id, api_key) 364 | return self._extract_captions_json(innertube_data, video_id) 365 | except RequestBlocked as exception: 366 | retries = ( 367 | 0 368 | if self._proxy_config is None 369 | else self._proxy_config.retries_when_blocked 370 | ) 371 | if try_number + 1 < retries: 372 | return self._fetch_captions_json(video_id, try_number=try_number + 1) 373 | raise exception.with_proxy_config(self._proxy_config) 374 | 375 | def _extract_innertube_api_key(self, html: str, video_id: str) -> str: 376 | pattern = r'"INNERTUBE_API_KEY":\s*"([a-zA-Z0-9_-]+)"' 377 | match = re.search(pattern, html) 378 | if match and len(match.groups()) == 1: 379 | return match.group(1) 380 | if 'class="g-recaptcha"' in html: 381 | raise IpBlocked(video_id) 382 | raise YouTubeDataUnparsable(video_id) # pragma: no cover 383 | 384 | def _extract_captions_json(self, innertube_data: Dict, video_id: str) -> Dict: 385 | self._assert_playability(innertube_data.get("playabilityStatus"), video_id) 386 | 387 | captions_json = innertube_data.get("captions", {}).get( 388 | "playerCaptionsTracklistRenderer" 389 | ) 390 | if captions_json is None or "captionTracks" not in captions_json: 391 | raise TranscriptsDisabled(video_id) 392 | 393 | return captions_json 394 | 395 | def _assert_playability(self, playability_status_data: Dict, video_id: str) -> None: 396 | playability_status = playability_status_data.get("status") 397 | if ( 398 | playability_status != _PlayabilityStatus.OK.value 399 | and playability_status is not None 400 | ): 401 | reason = playability_status_data.get("reason") 402 | if playability_status == _PlayabilityStatus.LOGIN_REQUIRED.value: 403 | if reason == _PlayabilityFailedReason.BOT_DETECTED.value: 404 | raise RequestBlocked(video_id) 405 | if reason == _PlayabilityFailedReason.AGE_RESTRICTED.value: 406 | raise AgeRestricted(video_id) 407 | if ( 408 | playability_status == _PlayabilityStatus.ERROR.value 409 | and reason == _PlayabilityFailedReason.VIDEO_UNAVAILABLE.value 410 | ): 411 | if video_id.startswith("http://") or video_id.startswith("https://"): 412 | raise InvalidVideoId(video_id) 413 | raise VideoUnavailable(video_id) 414 | subreasons = ( 415 | playability_status_data.get("errorScreen", {}) 416 | .get("playerErrorMessageRenderer", {}) 417 | .get("subreason", {}) 418 | .get("runs", []) 419 | ) 420 | raise VideoUnplayable( 421 | video_id, reason, [run.get("text", "") for run in subreasons] 422 | ) 423 | 424 | def _create_consent_cookie(self, html: str, video_id: str) -> None: 425 | match = re.search('name="v" value="(.*?)"', html) 426 | if match is None: 427 | raise FailedToCreateConsentCookie(video_id) 428 | self._http_client.cookies.set( 429 | "CONSENT", "YES+" + match.group(1), domain=".youtube.com" 430 | ) 431 | 432 | def _fetch_video_html(self, video_id: str) -> str: 433 | html = self._fetch_html(video_id) 434 | if 'action="https://consent.youtube.com/s"' in html: 435 | self._create_consent_cookie(html, video_id) 436 | html = self._fetch_html(video_id) 437 | if 'action="https://consent.youtube.com/s"' in html: 438 | raise FailedToCreateConsentCookie(video_id) 439 | return html 440 | 441 | def _fetch_html(self, video_id: str) -> str: 442 | response = self._http_client.get(WATCH_URL.format(video_id=video_id)) 443 | return unescape(_raise_http_errors(response, video_id).text) 444 | 445 | def _fetch_innertube_data(self, video_id: str, api_key: str) -> Dict: 446 | response = self._http_client.post( 447 | INNERTUBE_API_URL.format(api_key=api_key), 448 | json={ 449 | "context": INNERTUBE_CONTEXT, 450 | "videoId": video_id, 451 | }, 452 | ) 453 | data = _raise_http_errors(response, video_id).json() 454 | return data 455 | 456 | 457 | class _TranscriptParser: 458 | _FORMATTING_TAGS = [ 459 | "strong", # important 460 | "em", # emphasized 461 | "b", # bold 462 | "i", # italic 463 | "mark", # marked 464 | "small", # smaller 465 | "del", # deleted 466 | "ins", # inserted 467 | "sub", # subscript 468 | "sup", # superscript 469 | ] 470 | 471 | def __init__(self, preserve_formatting: bool = False): 472 | self._html_regex = self._get_html_regex(preserve_formatting) 473 | 474 | def _get_html_regex(self, preserve_formatting: bool) -> Pattern[str]: 475 | if preserve_formatting: 476 | formats_regex = "|".join(self._FORMATTING_TAGS) 477 | formats_regex = r"<\/?(?!\/?(" + formats_regex + r")\b).*?\b>" 478 | html_regex = re.compile(formats_regex, re.IGNORECASE) 479 | else: 480 | html_regex = re.compile(r"<[^>]*>", re.IGNORECASE) 481 | return html_regex 482 | 483 | def parse(self, raw_data: str) -> List[FetchedTranscriptSnippet]: 484 | return [ 485 | FetchedTranscriptSnippet( 486 | text=re.sub(self._html_regex, "", unescape(xml_element.text)), 487 | start=float(xml_element.attrib["start"]), 488 | duration=float(xml_element.attrib.get("dur", "0.0")), 489 | ) 490 | for xml_element in ElementTree.fromstring(raw_data) 491 | if xml_element.text is not None 492 | ] 493 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 | ✨ YouTube Transcript API ✨ 3 |

4 | 5 |

6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | Current Version 20 | 21 | 22 | Supported Python Versions 23 | 24 |

25 | 26 |

27 | This is a python API which allows you to retrieve the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do! 28 |

29 |

30 | Maintenance of this project is made possible by all the contributors and sponsors. If you'd like to sponsor this project and have your avatar or company logo appear below click here. 💖 31 |

32 | 33 |

34 | 35 | 36 | 37 | 38 | SearchAPI 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 |

66 | 67 | ## Install 68 | 69 | It is recommended to [install this module by using pip](https://pypi.org/project/youtube-transcript-api/): 70 | 71 | ``` 72 | pip install youtube-transcript-api 73 | ``` 74 | 75 | You can either integrate this module [into an existing application](#api) or just use it via a [CLI](#cli). 76 | 77 | ## API 78 | 79 | The easiest way to get a transcript for a given video is to execute: 80 | 81 | ```python 82 | from youtube_transcript_api import YouTubeTranscriptApi 83 | 84 | ytt_api = YouTubeTranscriptApi() 85 | ytt_api.fetch(video_id) 86 | ``` 87 | 88 | > **Note:** By default, this will try to access the English transcript of the video. If your video has a different 89 | > language, or you are interested in fetching a transcript in a different language, please read the section below. 90 | 91 | > **Note:** Pass in the video ID, NOT the video URL. For a video with the URL `https://www.youtube.com/watch?v=12345` 92 | > the ID is `12345`. 93 | 94 | This will return a `FetchedTranscript` object looking somewhat like this: 95 | 96 | ```python 97 | FetchedTranscript( 98 | snippets=[ 99 | FetchedTranscriptSnippet( 100 | text="Hey there", 101 | start=0.0, 102 | duration=1.54, 103 | ), 104 | FetchedTranscriptSnippet( 105 | text="how are you", 106 | start=1.54, 107 | duration=4.16, 108 | ), 109 | # ... 110 | ], 111 | video_id="12345", 112 | language="English", 113 | language_code="en", 114 | is_generated=False, 115 | ) 116 | ``` 117 | 118 | This object implements most interfaces of a `List`: 119 | 120 | ```python 121 | ytt_api = YouTubeTranscriptApi() 122 | fetched_transcript = ytt_api.fetch(video_id) 123 | 124 | # is iterable 125 | for snippet in fetched_transcript: 126 | print(snippet.text) 127 | 128 | # indexable 129 | last_snippet = fetched_transcript[-1] 130 | 131 | # provides a length 132 | snippet_count = len(fetched_transcript) 133 | ``` 134 | 135 | If you prefer to handle the raw transcript data you can call `fetched_transcript.to_raw_data()`, which will return 136 | a list of dictionaries: 137 | 138 | ```python 139 | [ 140 | { 141 | 'text': 'Hey there', 142 | 'start': 0.0, 143 | 'duration': 1.54 144 | }, 145 | { 146 | 'text': 'how are you', 147 | 'start': 1.54 148 | 'duration': 4.16 149 | }, 150 | # ... 151 | ] 152 | ``` 153 | ### Retrieve different languages 154 | 155 | You can add the `languages` param if you want to make sure the transcripts are retrieved in your desired language 156 | (it defaults to english). 157 | 158 | ```python 159 | YouTubeTranscriptApi().fetch(video_id, languages=['de', 'en']) 160 | ``` 161 | 162 | It's a list of language codes in a descending priority. In this example it will first try to fetch the german 163 | transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. If you want to find out 164 | which languages are available first, [have a look at `list()`](#list-available-transcripts). 165 | 166 | If you only want one language, you still need to format the `languages` argument as a list 167 | 168 | ```python 169 | YouTubeTranscriptApi().fetch(video_id, languages=['de']) 170 | ``` 171 | 172 | ### Preserve formatting 173 | 174 | You can also add `preserve_formatting=True` if you'd like to keep HTML formatting elements such as `` (italics) 175 | and `` (bold). 176 | 177 | ```python 178 | YouTubeTranscriptApi().fetch(video_ids, languages=['de', 'en'], preserve_formatting=True) 179 | ``` 180 | 181 | ### List available transcripts 182 | 183 | If you want to list all transcripts which are available for a given video you can call: 184 | 185 | ```python 186 | ytt_api = YouTubeTranscriptApi() 187 | transcript_list = ytt_api.list(video_id) 188 | ``` 189 | 190 | This will return a `TranscriptList` object which is iterable and provides methods to filter the list of transcripts for 191 | specific languages and types, like: 192 | 193 | ```python 194 | transcript = transcript_list.find_transcript(['de', 'en']) 195 | ``` 196 | 197 | By default this module always chooses manually created transcripts over automatically created ones, if a transcript in 198 | the requested language is available both manually created and generated. The `TranscriptList` allows you to bypass this 199 | default behaviour by searching for specific transcript types: 200 | 201 | ```python 202 | # filter for manually created transcripts 203 | transcript = transcript_list.find_manually_created_transcript(['de', 'en']) 204 | 205 | # or automatically generated ones 206 | transcript = transcript_list.find_generated_transcript(['de', 'en']) 207 | ``` 208 | 209 | The methods `find_generated_transcript`, `find_manually_created_transcript`, `find_transcript` return `Transcript` 210 | objects. They contain metadata regarding the transcript: 211 | 212 | ```python 213 | print( 214 | transcript.video_id, 215 | transcript.language, 216 | transcript.language_code, 217 | # whether it has been manually created or generated by YouTube 218 | transcript.is_generated, 219 | # whether this transcript can be translated or not 220 | transcript.is_translatable, 221 | # a list of languages the transcript can be translated to 222 | transcript.translation_languages, 223 | ) 224 | ``` 225 | 226 | and provide the method, which allows you to fetch the actual transcript data: 227 | 228 | ```python 229 | transcript.fetch() 230 | ``` 231 | 232 | This returns a `FetchedTranscript` object, just like `YouTubeTranscriptApi().fetch()` does. 233 | 234 | ### Translate transcript 235 | 236 | YouTube has a feature which allows you to automatically translate subtitles. This module also makes it possible to 237 | access this feature. To do so `Transcript` objects provide a `translate()` method, which returns a new translated 238 | `Transcript` object: 239 | 240 | ```python 241 | transcript = transcript_list.find_transcript(['en']) 242 | translated_transcript = transcript.translate('de') 243 | print(translated_transcript.fetch()) 244 | ``` 245 | 246 | ### By example 247 | ```python 248 | from youtube_transcript_api import YouTubeTranscriptApi 249 | 250 | ytt_api = YouTubeTranscriptApi() 251 | 252 | # retrieve the available transcripts 253 | transcript_list = ytt_api.list('video_id') 254 | 255 | # iterate over all available transcripts 256 | for transcript in transcript_list: 257 | 258 | # the Transcript object provides metadata properties 259 | print( 260 | transcript.video_id, 261 | transcript.language, 262 | transcript.language_code, 263 | # whether it has been manually created or generated by YouTube 264 | transcript.is_generated, 265 | # whether this transcript can be translated or not 266 | transcript.is_translatable, 267 | # a list of languages the transcript can be translated to 268 | transcript.translation_languages, 269 | ) 270 | 271 | # fetch the actual transcript data 272 | print(transcript.fetch()) 273 | 274 | # translating the transcript will return another transcript object 275 | print(transcript.translate('en').fetch()) 276 | 277 | # you can also directly filter for the language you are looking for, using the transcript list 278 | transcript = transcript_list.find_transcript(['de', 'en']) 279 | 280 | # or just filter for manually created transcripts 281 | transcript = transcript_list.find_manually_created_transcript(['de', 'en']) 282 | 283 | # or automatically generated ones 284 | transcript = transcript_list.find_generated_transcript(['de', 'en']) 285 | ``` 286 | 287 | ## Working around IP bans (`RequestBlocked` or `IpBlocked` exception) 288 | 289 | Unfortunately, YouTube has started blocking most IPs that are known to belong to cloud providers (like AWS, Google Cloud 290 | Platform, Azure, etc.), which means you will most likely run into `RequestBlocked` or `IpBlocked` exceptions when 291 | deploying your code to any cloud solutions. Same can happen to the IP of your self-hosted solution, if you are doing 292 | too many requests. You can work around these IP bans using proxies. However, since YouTube will ban static proxies 293 | after extended use, going for rotating residential proxies provide is the most reliable option. 294 | 295 | There are different providers that offer rotating residential proxies, but after testing different 296 | offerings I have found [Webshare](https://www.webshare.io/?referral_code=w0xno53eb50g) to be the most reliable and have 297 | therefore integrated it into this module, to make setting it up as easy as possible. 298 | 299 | ### Using [Webshare](https://www.webshare.io/?referral_code=w0xno53eb50g) 300 | 301 | Once you have created a [Webshare account](https://www.webshare.io/?referral_code=w0xno53eb50g) and purchased a 302 | "Residential" proxy package that suits your workload (make sure NOT to purchase "Proxy Server" or 303 | "Static Residential"!), open the 304 | [Webshare Proxy Settings](https://dashboard.webshare.io/proxy/settings?referral_code=w0xno53eb50g) to retrieve 305 | your "Proxy Username" and "Proxy Password". Using this information you can initialize the `YouTubeTranscriptApi` as 306 | follows: 307 | 308 | ```python 309 | from youtube_transcript_api import YouTubeTranscriptApi 310 | from youtube_transcript_api.proxies import WebshareProxyConfig 311 | 312 | ytt_api = YouTubeTranscriptApi( 313 | proxy_config=WebshareProxyConfig( 314 | proxy_username="", 315 | proxy_password="", 316 | ) 317 | ) 318 | 319 | # all requests done by ytt_api will now be proxied through Webshare 320 | ytt_api.fetch(video_id) 321 | ``` 322 | 323 | Using the `WebshareProxyConfig` will default to using rotating residential proxies and requires no further 324 | configuration. 325 | 326 | You can also limit the pool of IPs that you will be rotating through to those located in specific countries. By 327 | choosing locations that are close to the machine that is running your code, you can reduce latency. Also, this 328 | can be used to work around location-based restrictions. 329 | 330 | ```python 331 | ytt_api = YouTubeTranscriptApi( 332 | proxy_config=WebshareProxyConfig( 333 | proxy_username="", 334 | proxy_password="", 335 | filter_ip_locations=["de", "us"], 336 | ) 337 | ) 338 | 339 | # Webshare will now only rotate through IPs located in Germany or the United States! 340 | ytt_api.fetch(video_id) 341 | ``` 342 | 343 | You can find the 344 | full list of available locations (and how many IPs are available in each location) 345 | [here](https://www.webshare.io/features/proxy-locations?referral_code=w0xno53eb50g). 346 | 347 | Note that [referral links are used here](https://www.webshare.io/?referral_code=w0xno53eb50g) and any purchases 348 | made through these links will support this Open Source project (at no additional cost of course!), which is very much 349 | appreciated! 💖😊🙏💖 350 | 351 | However, you are of course free to integrate your own proxy solution using the `GenericProxyConfig` class, if you 352 | prefer using another provider or want to implement your own solution, as covered by the following section. 353 | 354 | ### Using other Proxy solutions 355 | 356 | Alternatively to using [Webshare](#using-webshare), you can set up any generic HTTP/HTTPS/SOCKS proxy using the 357 | `GenericProxyConfig` class: 358 | 359 | ```python 360 | from youtube_transcript_api import YouTubeTranscriptApi 361 | from youtube_transcript_api.proxies import GenericProxyConfig 362 | 363 | ytt_api = YouTubeTranscriptApi( 364 | proxy_config=GenericProxyConfig( 365 | http_url="http://user:pass@my-custom-proxy.org:port", 366 | https_url="https://user:pass@my-custom-proxy.org:port", 367 | ) 368 | ) 369 | 370 | # all requests done by ytt_api will now be proxied using the defined proxy URLs 371 | ytt_api.fetch(video_id) 372 | ``` 373 | 374 | Be aware that using a proxy doesn't guarantee that you won't be blocked, as YouTube can always block the IP of your 375 | proxy! Therefore, you should always choose a solution that rotates through a pool of proxy addresses, if you want to 376 | maximize reliability. 377 | 378 | ## Overwriting request defaults 379 | 380 | When initializing a `YouTubeTranscriptApi` object, it will create a `requests.Session` which will be used for all 381 | HTTP(S) request. This allows for caching cookies when retrieving multiple requests. However, you can optionally pass a 382 | `requests.Session` object into its constructor, if you manually want to share cookies between different instances of 383 | `YouTubeTranscriptApi`, overwrite defaults, set custom headers, specify SSL certificates, etc. 384 | 385 | ```python 386 | from requests import Session 387 | 388 | http_client = Session() 389 | 390 | # set custom header 391 | http_client.headers.update({"Accept-Encoding": "gzip, deflate"}) 392 | 393 | # set path to CA_BUNDLE file 394 | http_client.verify = "/path/to/certfile" 395 | 396 | ytt_api = YouTubeTranscriptApi(http_client=http_client) 397 | ytt_api.fetch(video_id) 398 | 399 | # share same Session between two instances of YouTubeTranscriptApi 400 | ytt_api_2 = YouTubeTranscriptApi(http_client=http_client) 401 | # now shares cookies with ytt_api 402 | ytt_api_2.fetch(video_id) 403 | ``` 404 | 405 | ## Cookie Authentication 406 | 407 | Some videos are age restricted, so this module won't be able to access those videos without some sort of 408 | authentication. Unfortunately, some recent changes to the YouTube API have broken the current implementation of cookie 409 | based authentication, so this feature is currently not available. 410 | 411 | ## Using Formatters 412 | Formatters are meant to be an additional layer of processing of the transcript you pass it. The goal is to convert a 413 | `FetchedTranscript` object into a consistent string of a given "format". Such as a basic text (`.txt`) or even formats 414 | that have a defined specification such as JSON (`.json`), WebVTT (`.vtt`), SRT (`.srt`), Comma-separated format 415 | (`.csv`), etc... 416 | 417 | The `formatters` submodule provides a few basic formatters, which can be used as is, or extended to your needs: 418 | 419 | - JSONFormatter 420 | - PrettyPrintFormatter 421 | - TextFormatter 422 | - WebVTTFormatter 423 | - SRTFormatter 424 | 425 | Here is how to import from the `formatters` module. 426 | 427 | ```python 428 | # the base class to inherit from when creating your own formatter. 429 | from youtube_transcript_api.formatters import Formatter 430 | 431 | # some provided subclasses, each outputs a different string format. 432 | from youtube_transcript_api.formatters import JSONFormatter 433 | from youtube_transcript_api.formatters import TextFormatter 434 | from youtube_transcript_api.formatters import WebVTTFormatter 435 | from youtube_transcript_api.formatters import SRTFormatter 436 | ``` 437 | 438 | ### Formatter Example 439 | Let's say we wanted to retrieve a transcript and store it to a JSON file. That would look something like this: 440 | 441 | ```python 442 | # your_custom_script.py 443 | 444 | from youtube_transcript_api import YouTubeTranscriptApi 445 | from youtube_transcript_api.formatters import JSONFormatter 446 | 447 | ytt_api = YouTubeTranscriptApi() 448 | transcript = ytt_api.fetch(video_id) 449 | 450 | formatter = JSONFormatter() 451 | 452 | # .format_transcript(transcript) turns the transcript into a JSON string. 453 | json_formatted = formatter.format_transcript(transcript) 454 | 455 | # Now we can write it out to a file. 456 | with open('your_filename.json', 'w', encoding='utf-8') as json_file: 457 | json_file.write(json_formatted) 458 | 459 | # Now should have a new JSON file that you can easily read back into Python. 460 | ``` 461 | 462 | **Passing extra keyword arguments** 463 | 464 | Since JSONFormatter leverages `json.dumps()` you can also forward keyword arguments into 465 | `.format_transcript(transcript)` such as making your file output prettier by forwarding the `indent=2` keyword argument. 466 | 467 | ```python 468 | json_formatted = JSONFormatter().format_transcript(transcript, indent=2) 469 | ``` 470 | 471 | ### Custom Formatter Example 472 | You can implement your own formatter class. Just inherit from the `Formatter` base class and ensure you implement the 473 | `format_transcript(self, transcript: FetchedTranscript, **kwargs) -> str` and 474 | `format_transcripts(self, transcripts: List[FetchedTranscript], **kwargs) -> str` methods which should ultimately 475 | return a string when called on your formatter instance. 476 | 477 | ```python 478 | class MyCustomFormatter(Formatter): 479 | def format_transcript(self, transcript: FetchedTranscript, **kwargs) -> str: 480 | # Do your custom work in here, but return a string. 481 | return 'your processed output data as a string.' 482 | 483 | def format_transcripts(self, transcripts: List[FetchedTranscript], **kwargs) -> str: 484 | # Do your custom work in here to format a list of transcripts, but return a string. 485 | return 'your processed output data as a string.' 486 | ``` 487 | 488 | ## CLI 489 | 490 | Execute the CLI script using the video ids as parameters and the results will be printed out to the command line: 491 | 492 | ``` 493 | youtube_transcript_api ... 494 | ``` 495 | 496 | The CLI also gives you the option to provide a list of preferred languages: 497 | 498 | ``` 499 | youtube_transcript_api ... --languages de en 500 | ``` 501 | 502 | You can also specify if you want to exclude automatically generated or manually created subtitles: 503 | 504 | ``` 505 | youtube_transcript_api ... --languages de en --exclude-generated 506 | youtube_transcript_api ... --languages de en --exclude-manually-created 507 | ``` 508 | 509 | If you would prefer to write it into a file or pipe it into another application, you can also output the results as 510 | json using the following line: 511 | 512 | ``` 513 | youtube_transcript_api ... --languages de en --format json > transcripts.json 514 | ``` 515 | 516 | Translating transcripts using the CLI is also possible: 517 | 518 | ``` 519 | youtube_transcript_api ... --languages en --translate de 520 | ``` 521 | 522 | If you are not sure which languages are available for a given video you can call, to list all available transcripts: 523 | 524 | ``` 525 | youtube_transcript_api --list-transcripts 526 | ``` 527 | 528 | If a video's ID starts with a hyphen you'll have to mask the hyphen using `\` to prevent the CLI from mistaking it for 529 | a argument name. For example to get the transcript for the video with the ID `-abc123` run: 530 | 531 | ``` 532 | youtube_transcript_api "\-abc123" 533 | ``` 534 | 535 | ### Working around IP bans using the CLI 536 | 537 | If you are running into `ReqestBlocked` or `IpBlocked` errors, because YouTube blocks your IP, you can work around this 538 | using residential proxies as explained in 539 | [Working around IP bans](#working-around-ip-bans-requestblocked-or-ipblocked-exception). To use 540 | [Webshare "Residential" proxies](https://www.webshare.io/?referral_code=w0xno53eb50g) through the CLI, you will have to 541 | create a [Webshare account](https://www.webshare.io/?referral_code=w0xno53eb50g) and purchase a "Residential" proxy 542 | package that suits your workload (make sure NOT to purchase "Proxy Server" or "Static Residential"!). Then you can use 543 | the "Proxy Username" and "Proxy Password" which you can find in your 544 | [Webshare Proxy Settings](https://dashboard.webshare.io/proxy/settings?referral_code=w0xno53eb50g), to run the following command: 545 | 546 | ``` 547 | youtube_transcript_api --webshare-proxy-username "username" --webshare-proxy-password "password" 548 | ``` 549 | 550 | If you prefer to use another proxy solution, you can set up a generic HTTP/HTTPS proxy using the following command: 551 | 552 | ``` 553 | youtube_transcript_api --http-proxy http://user:pass@domain:port --https-proxy https://user:pass@domain:port 554 | ``` 555 | 556 | ### Cookie Authentication using the CLI 557 | 558 | To authenticate using cookies through the CLI as explained in [Cookie Authentication](#cookie-authentication) run: 559 | 560 | ``` 561 | youtube_transcript_api --cookies /path/to/your/cookies.txt 562 | ``` 563 | 564 | ## Warning 565 | 566 | This code uses an undocumented part of the YouTube API, which is called by the YouTube web-client. So there is no 567 | guarantee that it won't stop working tomorrow, if they change how things work. I will however do my best to make things 568 | working again as soon as possible if that happens. So if it stops working, let me know! 569 | 570 | ## Contributing 571 | 572 | To setup the project locally run the following (requires [poetry](https://python-poetry.org/docs/) to be installed): 573 | ```shell 574 | poetry install --with test,dev 575 | ``` 576 | 577 | There's [poe](https://github.com/nat-n/poethepoet?tab=readme-ov-file#quick-start) tasks to run tests, coverage, the 578 | linter and formatter (you'll need to pass all of those for the build to pass): 579 | ```shell 580 | poe test 581 | poe coverage 582 | poe format 583 | poe lint 584 | ``` 585 | 586 | If you just want to make sure that your code passes all the necessary checks to get a green build, you can simply run: 587 | ```shell 588 | poe precommit 589 | ``` 590 | 591 | ## Donations 592 | 593 | If this project makes you happy by reducing your development time, you can make me happy by treating me to a cup of 594 | coffee, or become a [Sponsor of this project](https://github.com/sponsors/jdepoix) :) 595 | 596 | [![Donate](https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BAENLEW8VUJ6G&source=url) 597 | --------------------------------------------------------------------------------