├── LICENSE
├── README.md
├── .gitignore
├── add_fades_captions_to_video.py
├── transcribe_from_video_aws.py
├── clean_video_from_transcription.py
├── transcribe_from_video_whisper.py
└── summary_chapters_blog.py
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 Roy Shilkrot
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Video Transcript Helper
2 |
3 |
4 |
5 | [](https://discord.gg/KbjGU2vvUz)
6 |
7 |
8 |
9 | A comprehensive toolkit designed for content creators, educators, digital marketers, and video editing enthusiasts.
10 | It harnesses the power of AI and video processing through a suite of Python scripts that simplify the post-production process.
11 | This free open-source project aims to transform the way users handle video content, turning hours of editing into a task of a few command lines.
12 |
13 | This project contains three scripts:
14 | - `transcribe_from_video_XXX.py`: Transcribe a video
15 | - `clean_video_from_transcription.py`: Zap filler words ('uh', 'um') in videos using FFMPEG
16 | - `summary_chapters_blog.py`: Generate a summary, video chapters and a blog post
17 |
18 | Roadmap of future features:
19 | - Remove or speedup (shorten) periods of "silence"
20 | - Enhance speech by voice separation models
21 | - Generate a supercut for a quick video snippet
22 | - Add Audiogram / Kareoke kind of subtitles on the video
23 | - Translate the subtitles to any language
24 |
25 | ## Usage
26 | Transcribe the video: (either AWS Transcribe API or [Faster-Whisper](https://github.com/guillaumekln/faster-whisper))
27 |
28 | ```sh
29 | $ python transcribe_from_video_XXX.py
30 | ```
31 |
32 | The output will be a file called `.json` in the same directory as the video.
33 |
34 | Zap the filler words:
35 |
36 | ```sh
37 | $ python clean_video_from_transcription.py
38 | ```
39 |
40 | The output will be a file called `-clean.mp4` in the same directory as the video.
41 |
42 | Generate the summary, chapters and blog post:
43 |
44 | ```sh
45 | $ python summary_chapters_blog.py --generate_summary --generate_chapters --generate_blog
46 | ```
47 |
48 | ## Dependencies
49 | - Python 3.6+
50 | - [FFMPEG](https://ffmpeg.org/)
51 | - [AWS CLI](https://aws.amazon.com/cli/)
52 |
53 | Make sure to configure your AWS CLI with your credentials and region.
54 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | share/python-wheels/
24 | *.egg-info/
25 | .installed.cfg
26 | *.egg
27 | MANIFEST
28 |
29 | # PyInstaller
30 | # Usually these files are written by a python script from a template
31 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
32 | *.manifest
33 | *.spec
34 |
35 | # Installer logs
36 | pip-log.txt
37 | pip-delete-this-directory.txt
38 |
39 | # Unit test / coverage reports
40 | htmlcov/
41 | .tox/
42 | .nox/
43 | .coverage
44 | .coverage.*
45 | .cache
46 | nosetests.xml
47 | coverage.xml
48 | *.cover
49 | *.py,cover
50 | .hypothesis/
51 | .pytest_cache/
52 | cover/
53 |
54 | # Translations
55 | *.mo
56 | *.pot
57 |
58 | # Django stuff:
59 | *.log
60 | local_settings.py
61 | db.sqlite3
62 | db.sqlite3-journal
63 |
64 | # Flask stuff:
65 | instance/
66 | .webassets-cache
67 |
68 | # Scrapy stuff:
69 | .scrapy
70 |
71 | # Sphinx documentation
72 | docs/_build/
73 |
74 | # PyBuilder
75 | .pybuilder/
76 | target/
77 |
78 | # Jupyter Notebook
79 | .ipynb_checkpoints
80 |
81 | # IPython
82 | profile_default/
83 | ipython_config.py
84 |
85 | # pyenv
86 | # For a library or package, you might want to ignore these files since the code is
87 | # intended to run in multiple environments; otherwise, check them in:
88 | # .python-version
89 |
90 | # pipenv
91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
94 | # install all needed dependencies.
95 | #Pipfile.lock
96 |
97 | # poetry
98 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
99 | # This is especially recommended for binary packages to ensure reproducibility, and is more
100 | # commonly ignored for libraries.
101 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102 | #poetry.lock
103 |
104 | # pdm
105 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
106 | #pdm.lock
107 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108 | # in version control.
109 | # https://pdm.fming.dev/#use-with-ide
110 | .pdm.toml
111 |
112 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
113 | __pypackages__/
114 |
115 | # Celery stuff
116 | celerybeat-schedule
117 | celerybeat.pid
118 |
119 | # SageMath parsed files
120 | *.sage.py
121 |
122 | # Environments
123 | .env
124 | .venv
125 | env/
126 | venv/
127 | ENV/
128 | env.bak/
129 | venv.bak/
130 |
131 | # Spyder project settings
132 | .spyderproject
133 | .spyproject
134 |
135 | # Rope project settings
136 | .ropeproject
137 |
138 | # mkdocs documentation
139 | /site
140 |
141 | # mypy
142 | .mypy_cache/
143 | .dmypy.json
144 | dmypy.json
145 |
146 | # Pyre type checker
147 | .pyre/
148 |
149 | # pytype static type analyzer
150 | .pytype/
151 |
152 | # Cython debug symbols
153 | cython_debug/
154 |
155 | # PyCharm
156 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
157 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
158 | # and can be added to the global gitignore or merged into this file. For a more nuclear
159 | # option (not recommended) you can uncomment the following to ignore the entire idea folder.
160 | #.idea/
161 |
162 | # Mac
163 | .DS_Store
164 | .AppleDouble
165 | .LSOverride
166 |
167 | # Thumbnails
168 | ._*
169 |
170 | # Files that might appear on external disk
171 | .Spotlight-V100
172 | .Trashes
173 |
174 | # Directories potentially created on remote AFP share
175 | .AppleDB
176 | .AppleDesktop
177 | Network Trash Folder
178 | Temporary Items
179 | .apdisk
180 |
181 | # Windows
182 | # Windows thumbnail cache files
183 | Thumbs.db
184 | ehthumbs.db
185 |
186 | # Folder config file
187 | Desktop.ini
188 |
189 | # Recycle Bin used on file shares
190 | $RECYCLE.BIN/
191 |
--------------------------------------------------------------------------------
/add_fades_captions_to_video.py:
--------------------------------------------------------------------------------
1 | # this script will add fade-in, fade-out effects and captions to a video
2 | # based on the input timed chapters (output from summary_chapters_blog.py file)
3 |
4 | import argparse
5 | import json
6 | import subprocess
7 | import os
8 |
9 | # get the input video file and the output text file
10 | parser = argparse.ArgumentParser()
11 | parser.add_argument("input_video_file", help="input video file")
12 | parser.add_argument(
13 | "input_timed_chapters_file", help="input text file with timed chapters"
14 | )
15 | args = parser.parse_args()
16 |
17 | # get the input video file name and the output text file name
18 | input_video_file = args.input_video_file
19 | input_timed_chapters_file = args.input_timed_chapters_file
20 |
21 | chapters = []
22 |
23 | # read the input text file
24 | print("Parsing the input text file...")
25 | with open(input_timed_chapters_file) as f:
26 | # each line in the file is a chapter, in the format:
27 | # -
28 | # e.g.
29 | # 00:00 - 00:10 Introduction
30 | # 00:10 - 00:20 Chapter 1
31 |
32 | # read the lines
33 | lines = f.readlines()
34 |
35 | # split each line into start_time, end_time and chapter_title
36 | # and convert the start_time and end_time to seconds
37 | for line in lines:
38 | if line == "\n" or line == "":
39 | continue
40 | # split the line into start_time, end_time and chapter_title
41 | start_time, end_time_and_chapter_title = line.split(" - ")
42 | end_time, chapter_title = end_time_and_chapter_title.split(" ", 1)
43 |
44 | # convert the start_time and end_time to seconds
45 | start_time_seconds = sum(
46 | x * float(t) for x, t in zip([60, 1], start_time.split(":"))
47 | )
48 | end_time_seconds = sum(
49 | x * float(t) for x, t in zip([60, 1], end_time.split(":"))
50 | )
51 |
52 | # add the chapter to the list of chapters
53 | chapters.append((start_time_seconds, end_time_seconds, chapter_title.strip()))
54 |
55 | print(f"Found {len(chapters)} chapters.")
56 |
57 | # sort the chapters by start_time
58 | chapters.sort(key=lambda x: x[0])
59 |
60 | # create an .ass file with the captions in Advanced SSA format
61 | # each chapter will have a caption at the beginning of the chapter
62 |
63 | # create the output file name
64 | output_srt_file = os.path.splitext(input_video_file)[0] + ".ass"
65 | print(f"Creating the output file {output_srt_file}...")
66 |
67 | # write a times in seconds in HH:MM:SS,MS format
68 | write_srt_format = (
69 | lambda x: f"{str(int(x//3600)).zfill(2)}:{str(int((x%3600)//60)).zfill(2)}:{str(int(x%60)).zfill(2)},000"
70 | )
71 |
72 | ssa_prefix = """
73 | [Script Info]
74 | Title:
75 | ScriptType: v4.00+
76 | Collisions: Normal
77 | PlayDepth: 0
78 |
79 | [v4+ Styles]
80 | Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
81 | Style: Default,Arial,20,&H00FFFFFF,&H000080FF,&H00000000,&H80000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,20,0
82 |
83 | [Events]
84 | Format: Layer, Start, End, Style, Actor, MarginL, MarginR, MarginV, Effect, Text
85 | """
86 |
87 | # create the output file
88 | with open(output_srt_file, "w") as f:
89 | # write the prefix
90 | f.write(ssa_prefix)
91 | # write the captions
92 | for i, chapter in enumerate(chapters):
93 | # each subtitle is of the form e.g.
94 | # Dialogue: 0,0:00:03.00,0:00:08.00,Default,,0,0,0,,subtitle text
95 |
96 | # write the subtitle line to the file. The subtitle will be shown for 5 seconds
97 | f.write(
98 | f"Dialogue: 0,{write_srt_format(chapter[0])},{write_srt_format(chapter[0]+5)},Default,,0,0,0,,{'{'}\\fad(1200,250){'}'}{chapter[2]}\n"
99 | )
100 |
101 | f.close()
102 |
103 | # get the duration of the video
104 | print("Getting the duration of the video...")
105 | result = subprocess.run(
106 | [
107 | "ffprobe",
108 | "-v",
109 | "error",
110 | "-show_entries",
111 | "format=duration",
112 | "-of",
113 | "default=noprint_wrappers=1:nokey=1",
114 | input_video_file,
115 | ],
116 | stdout=subprocess.PIPE,
117 | stderr=subprocess.STDOUT,
118 | )
119 | duration = int(float(result.stdout))
120 |
121 | print(f"Video duration: {duration} seconds.")
122 |
123 | output_video_file_path = os.path.splitext(input_video_file)[0] + "_with_captions.mp4"
124 |
125 | # add the captions to the video with ffmpeg
126 | print("Adding captions and fades to the video...")
127 | subprocess.run(
128 | [
129 | "ffmpeg",
130 | "-i",
131 | input_video_file,
132 | "-vf",
133 | f"subtitles={output_srt_file}:force_style='Fontsize=24,PrimaryColour=&Hffffff&'[v];[v]fade=in:st=0:n=30,fade=out:st={duration-30}:n=30",
134 | "-c:v",
135 | "libx264",
136 | "-c:a",
137 | "copy",
138 | "-y",
139 | output_video_file_path,
140 | ]
141 | )
142 |
143 | # delete the temporary files
144 | print("Deleting the temporary files...")
145 | os.remove(output_srt_file)
146 |
147 | print("Done!")
148 |
--------------------------------------------------------------------------------
/transcribe_from_video_aws.py:
--------------------------------------------------------------------------------
1 | # this script will transcribe the audio from an input video file
2 | # the output will be a JSON file with the transcription
3 | # use argparse to get the input video file and the output text file
4 | # use the AWS transcribe using the AWS CLI to transcribe the audio
5 | #
6 | # Usage:
7 | # python transcribe_from_video.py
8 | #
9 | # The output JSON file will be saved in the same directory as the input video file
10 | #
11 | # Example:
12 | # python transcribe_from_video.py "input_video.mp4"
13 | #
14 | # The output JSON file will have the name "input_video.json"
15 |
16 | import argparse
17 | import json
18 | import subprocess
19 | import os
20 | import re
21 | import uuid
22 |
23 | # get the input video file and the output text file
24 | parser = argparse.ArgumentParser()
25 | parser.add_argument("input_video_file", help="input video file")
26 | args = parser.parse_args()
27 |
28 | # get the input video file name and the output text file name
29 | input_video_file = args.input_video_file
30 |
31 | # get the input video file name without the extension
32 | input_video_file_name = os.path.splitext(input_video_file)[0]
33 |
34 | # get the input video file name without the extension and without the path
35 | input_video_file_name_without_path = os.path.basename(input_video_file_name)
36 |
37 |
38 | def cleanup(job_name, s3_uri, flac_audio_file):
39 | if s3_uri is not None:
40 | # delete the temporary S3 audio file
41 | print("Deleting the temporary S3 audio file...")
42 | subprocess.run(["aws", "s3", "rm", s3_uri])
43 |
44 | if job_name is not None:
45 | # delete the trascription job in AWS Transcribe
46 | print("Deleting the transcription job in AWS Transcribe...")
47 | subprocess.run(["aws", "transcribe", "delete-transcription-job",
48 | "--region", "us-east-1", "--transcription-job-name", job_name])
49 |
50 | # delete the temporary S3 bucket
51 | print("Deleting the temporary S3 bucket...")
52 | subprocess.run(["aws", "s3", "rb", f"s3://{job_name}", "--force"])
53 |
54 | if flac_audio_file is not None:
55 | # delete the FLAC audio file
56 | print("Deleting the local FLAC audio file...")
57 | subprocess.run(["rm", flac_audio_file])
58 |
59 |
60 | # convert the video file to a FLAC audio file using ffmpeg (quiet mode)
61 | # the FLAC audio file will be saved in the same directory as the input video file
62 | # the FLAC audio file will have the same name as the input video file but with a FLAC extension
63 | flac_audio_file = input_video_file_name_without_path
64 | # make sure the file has s3 compatible name..
65 | flac_audio_file = re.sub('[^0-9a-zA-Z]+', '-', flac_audio_file)
66 | flac_audio_file_without_path = flac_audio_file + ".flac"
67 | # add the path to the FLAC audio file
68 | flac_audio_file = os.path.join(os.path.dirname(input_video_file), flac_audio_file_without_path)
69 |
70 | print(f"Converting video file to FLAC audio file using ffmpeg... {flac_audio_file}")
71 | subprocess.run(["ffmpeg", "-i", input_video_file, "-vn", "-ac", "1", "-ar", "16000", "-c:a", "flac",
72 | "-qscale:a", "0", "-loglevel", "quiet", "-copyts", "-y", flac_audio_file])
73 |
74 | # generate a UUID for the job name
75 | job_name = f"transcribe-job-{uuid.uuid4().hex}"
76 |
77 | # create a temporary S3 bucket for the transcription job
78 | # the bucket name will be the same as the job name
79 | print("Creating temporary S3 bucket for the transcription job...")
80 | process = subprocess.run(["aws", "s3", "mb", f"s3://{job_name}"])
81 |
82 | if process.returncode != 0:
83 | print("Error creating temporary S3 bucket for the transcription job")
84 | exit(1)
85 |
86 | # upload the FLAC audio file to the temporary S3 bucket
87 | print("Uploading FLAC audio file to the temporary S3 bucket...")
88 | process = subprocess.run(["aws", "s3", "cp", flac_audio_file, f"s3://{job_name}"])
89 |
90 | if process.returncode != 0:
91 | print("Error uploading FLAC audio file to the temporary S3 bucket")
92 | cleanup(job_name, None, flac_audio_file)
93 | exit(1)
94 |
95 | # get the S3 URI for the FLAC audio file
96 | s3_uri = f"s3://{job_name}/{flac_audio_file_without_path}"
97 |
98 | print(s3_uri)
99 |
100 | # start the transcription job
101 | # aws transcribe start-transcription-job \
102 | # --region us-east-1 \
103 | # --transcription-job-name "$TEMP_NAME" \
104 | # --media "MediaFileUri=$S3_URI" \
105 | # --language-code en-U
106 | print("Starting the transcription job...")
107 | process = subprocess.run(["aws", "transcribe", "start-transcription-job",
108 | "--region", "us-east-1", "--transcription-job-name", job_name,
109 | "--media", f"MediaFileUri={s3_uri}",
110 | "--language-code", "en-US"])
111 |
112 | if process.returncode != 0:
113 | print("Error starting the transcription job")
114 | cleanup(job_name, s3_uri, flac_audio_file)
115 | exit(1)
116 |
117 | # wait for the transcription job to complete
118 | # run `aws transcribe get-transcription-job`` and capture the output JSON
119 | # e.g. aws transcribe get-transcription-job \
120 | # --region us-east-1 \
121 | # --transcription-job-name "$TEMP_NAME"
122 | # check the `TranscriptionJobStatus` field in the JSON if it is `COMPLETED`
123 | # if it is not `COMPLETED`, wait for 5 seconds and then check again
124 | # if it is `COMPLETED`, then break out of the loop
125 | print("Waiting for the transcription job to complete...")
126 | while True:
127 | process = subprocess.run(["aws", "transcribe", "get-transcription-job",
128 | "--region", "us-east-1", "--transcription-job-name", job_name],
129 | capture_output=True)
130 | output = process.stdout.decode("utf-8")
131 | if "COMPLETED" in output:
132 | break
133 | else:
134 | print("Transcription job not completed yet. Waiting for 5 seconds...")
135 | subprocess.run(["sleep", "5"])
136 |
137 | # get the transcription job output JSON
138 | # use the last output JSON from the previous loop iteration to get the output JSON
139 | # from the `TranscriptionJob.Transcript.TranscriptFileUri` field
140 | # parse the JSON and get the `TranscriptFileUri` field
141 | parsed = json.loads(output)
142 | output_uri = parsed["TranscriptionJob"]["Transcript"]["TranscriptFileUri"]
143 |
144 | # download the transcription job output JSON file using regular `curl`
145 | # the transcription job output JSON file will be saved in the same directory as the input video file
146 | # and have the same name as the input video file but with a JSON extension
147 | output_json_file = input_video_file_name + ".json"
148 | print("Downloading the transcription job output JSON file...")
149 | subprocess.run(["curl", "-o", output_json_file, output_uri])
150 |
151 | cleanup(job_name, s3_uri, flac_audio_file)
152 |
--------------------------------------------------------------------------------
/clean_video_from_transcription.py:
--------------------------------------------------------------------------------
1 | # this script will read the transcription from the output JSON file and then clean the video
2 | # from filler words (e.g. um, uh, like, etc.)
3 | #
4 | # Usage:
5 | # python clean_video_from_transcription.py
6 | #
7 | # The output video file will be saved in the same directory as the input video file
8 | #
9 | # Example:
10 | # python clean_video_from_transcription.py "input_video.mp4" "input_json.json"
11 |
12 | import argparse
13 | import json
14 | import subprocess
15 | import os
16 |
17 | # get the input video file and the output text file
18 | parser = argparse.ArgumentParser()
19 | parser.add_argument("input_video_file", help="input video file")
20 | parser.add_argument("input_json_file", help="input json transcription file")
21 | args = parser.parse_args()
22 |
23 | # get the input video file name and the output text file name
24 | input_video_file = args.input_video_file
25 | input_json_file = args.input_json_file
26 |
27 | # read the input JSON file
28 | print("Parsing the input JSON file...")
29 | with open(input_json_file) as f:
30 | data = json.load(f)
31 |
32 | # get all the items where .results.items.alternatives.content is a filler word
33 | filler_words = ["um", "uh", "so"]
34 |
35 | # filter to keep only pronunciations
36 | pronunciation_items = list(
37 | filter(lambda x: x["type"] == "pronunciation", data["results"]["items"])
38 | )
39 |
40 | # merge consecutive filler words in pronunciation_items
41 | i = 0
42 | while i < len(pronunciation_items) - 1:
43 | if (
44 | pronunciation_items[i]["alternatives"][0]["content"].lower() in filler_words
45 | and pronunciation_items[i + 1]["alternatives"][0]["content"].lower()
46 | in filler_words
47 | ):
48 | print(
49 | "Found consecutive filler words: "
50 | "{pronunciation_items[i]['alternatives'][0]['content']} "
51 | f"{pronunciation_items[i+1]['alternatives'][0]['content']} "
52 | "at "
53 | f"{pronunciation_items[i]['start_time']} "
54 | f"{pronunciation_items[i+1]['start_time']}"
55 | )
56 | # merge the start and end timings of the two items
57 | pronunciation_items[i]["end_time"] = pronunciation_items[i + 1]["end_time"]
58 |
59 | # remove the second item
60 | pronunciation_items.pop(i + 1)
61 | else:
62 | i += 1
63 |
64 | # extract the timings from the filler words items, in (start, end) tuples
65 | # parse float from string
66 | # the end time of a filler word is the start time of the next pronunciation
67 | # unless the next pronunciation is also a filler word, in which case the end time is the end time
68 | # of the next pronunciation
69 | filler_words_timings = [(0.0, 0.0)]
70 | for i, item in enumerate(pronunciation_items[:-1]):
71 | # check in lowercase
72 | if item["alternatives"][0]["content"].lower() in filler_words:
73 | # get the start & end time of the filler word
74 | start_time = float(item["start_time"])
75 | # end_time = float(pronunciation_items[i+1]["start_time"]) + 0.1
76 | end_time = float(item["end_time"])
77 | # the duration of a filler word is at least 0.3 seconds
78 | if end_time - start_time < 0.3:
79 | end_time = start_time + 0.3
80 | # if the next pronunciation is farther ahead than 0.3 seconds, then the start time of the
81 | # next pronunciation as the end time of this filler word
82 | if float(pronunciation_items[i + 1]["start_time"]) > end_time:
83 | end_time = float(pronunciation_items[i + 1]["start_time"])
84 |
85 | if start_time >= end_time:
86 | continue
87 |
88 | filler_words_timings.append((start_time, end_time))
89 |
90 | # append in the end the duration of the video
91 | # find the duration of the video using ffprobe
92 | print("Finding the duration of the video...")
93 | ffprobe_output = subprocess.check_output(
94 | [
95 | "ffprobe",
96 | "-v",
97 | "error",
98 | "-show_entries",
99 | "format=duration",
100 | "-of",
101 | "default=noprint_wrappers=1:nokey=1",
102 | input_video_file,
103 | ]
104 | )
105 | video_duration = float(ffprobe_output)
106 | filler_words_timings.append((video_duration, video_duration))
107 |
108 | # sort the filler words timings by start time
109 | filler_words_timings.sort(key=lambda x: x[0])
110 |
111 | print(f"Found {len(filler_words_timings)-2} filler words in the video.")
112 |
113 | print("Filler words timings:")
114 | print(filler_words_timings[:5] + ["..."] + filler_words_timings[-5:])
115 |
116 | # build an ffmpeg filter to remove the filler words by using the timings
117 | # e.g.
118 | # [0:v]trim=start=10:end=20,setpts=PTS-STARTPTS,format=yuv420p[0v];
119 | # [0:a]atrim=start=10:end=20,asetpts=PTS-STARTPTS[0a];
120 | # [0:v]trim=start=30:end=40,setpts=PTS-STARTPTS,format=yuv420p[1v];
121 | # [0:a]atrim=start=30:end=40,asetpts=PTS-STARTPTS[1a];
122 | # [0:v]trim=start=30:end=40,setpts=PTS-STARTPTS,format=yuv420p[2v];
123 | # [0:a]atrim=start=30:end=40,asetpts=PTS-STARTPTS[2a];
124 | # and then concatenate the inputs
125 | # [0v][0a][1v][1a][2v][2a]concat=n=3:v=1:a=1[outv][outa]
126 |
127 |
128 |
129 | def build_ffmpeg_cmd_with_filter():
130 | n_filrs = len(filler_words_timings)
131 | filter = ""
132 | for i in range(1, n_filrs):
133 | # stagger the start and end time of the video and audio filters
134 | # so that we take the "non-filler" portion of the video
135 | start_time = filler_words_timings[i - 1][1]
136 | end_time = filler_words_timings[i][0]
137 |
138 | # add the video filter
139 | filter += (
140 | f"[0:v]trim=start={start_time}:end={end_time},setpts=PTS-STARTPTS[{i}v];"
141 | )
142 |
143 | # add the audio filter
144 | filter += (
145 | f"[0:a]atrim=start={start_time}:end={end_time},asetpts=PTS-STARTPTS[{i}a];"
146 | )
147 |
148 | # add the concat filter
149 | all_inputs = "".join([f"[{i}v][{i}a]" for i in range(n_filrs)])
150 | filter += f"{all_inputs}concat=n={n_filrs}:v=1:a=1[outv][outa]"
151 | print("Filter:")
152 | print(filter)
153 |
154 | return [
155 | "ffmpeg",
156 | "-i",
157 | input_video_file,
158 | "-filter_complex",
159 | filter,
160 | "-map",
161 | "[outv]",
162 | "-map",
163 | "[outa]",
164 | "-avoid_negative_ts",
165 | "1",
166 | "-y",
167 | ]
168 |
169 |
170 | def build_ffmpeg_cmd_with_ss_to():
171 | n_filrs = len(filler_words_timings)
172 | cmd = ["ffmpeg"]
173 | remove_fillers = 0
174 | for i in range(1, n_filrs):
175 | # stagger the start and end time of the video and audio filters
176 | # so that we take the "non-filler" portion of the video
177 | start_time = filler_words_timings[i - 1][1] # end of last filler word
178 | end_time = filler_words_timings[i][0] # start of next filler word
179 |
180 | if start_time >= end_time:
181 | remove_fillers += 1
182 | continue
183 |
184 | # add the start and end time to the ffmpeg command
185 | cmd += [
186 | "-ss",
187 | str(start_time) + "s",
188 | "-to",
189 | str(end_time) + "s",
190 | "-i",
191 | input_video_file,
192 | ]
193 |
194 | # add the number of filler words to remove
195 | print(f"Found {remove_fillers} inconsistent-timing filler words.")
196 | n_filrs -= remove_fillers
197 |
198 | # add the concat filter
199 | all_inputs = "".join([f"[{i}:v][{i}:a]" for i in range(n_filrs - 1)])
200 | filter = f"{all_inputs}concat=n={n_filrs-1}:v=1:a=1[outv][outa]"
201 |
202 | cmd += [
203 | "-filter_complex",
204 | filter,
205 | "-map",
206 | "[outv]",
207 | "-map",
208 | "[outa]",
209 | "-avoid_negative_ts",
210 | "1",
211 | "-y",
212 | "-loglevel",
213 | "error",
214 | ]
215 | return cmd
216 |
217 |
218 | # build the ffmpeg command
219 | ffmpeg_cmd = build_ffmpeg_cmd_with_ss_to()
220 |
221 | output_video_file = os.path.splitext(input_video_file)[0] + "_cleaned.mp4"
222 |
223 | # run ffmpeg to remove the filler words
224 | print("Removing the filler words from the video...")
225 | subprocess.run([*ffmpeg_cmd, output_video_file])
226 |
227 | print("Done.")
228 |
--------------------------------------------------------------------------------
/transcribe_from_video_whisper.py:
--------------------------------------------------------------------------------
1 | # this script will transcribe the audio from an input video file
2 | # the output will be a JSON file with the transcription
3 | # use argparse to get the input video file and the output text file
4 | # use whisper from openai to transcribe the audio
5 | #
6 | # Usage:
7 | # python transcribe_from_video.py
8 | #
9 | # The output JSON file will be saved in the same directory as the input video file
10 | #
11 | # Example:
12 | # python transcribe_from_video.py "input_video.mp4"
13 | #
14 | # The output JSON file will have the name "input_video.json"
15 |
16 | import argparse
17 | import json
18 | import subprocess
19 | import os
20 | import re
21 | import uuid
22 | from faster_whisper import WhisperModel
23 |
24 | # get the input video file and the output text file
25 | parser = argparse.ArgumentParser()
26 | parser.add_argument("input_video_file", help="input video file")
27 | args = parser.parse_args()
28 |
29 | # get the input video file name and the output text file name
30 | input_video_file = args.input_video_file
31 |
32 | # get the input video file name without the extension
33 | input_video_file_name = os.path.splitext(input_video_file)[0]
34 |
35 | # get the input video file name without the extension and without the path
36 | input_video_file_name_without_path = os.path.basename(input_video_file_name)
37 |
38 | # transcribe the audio from the input video file
39 | # the output will be a JSON file with the transcription
40 | # use whisper from openai to transcribe the audio
41 | # the output JSON file will be saved in the same directory as the input video file
42 | # the output JSON file will have the name "input_video.json"
43 |
44 | # get the audio from the input video file
45 | # the output will be a wav file with the same name as the input video file
46 | # the output wav file will be saved in the same directory as the input video file
47 |
48 | # get the input video file name without the extension
49 | input_video_file_name = os.path.splitext(input_video_file)[0]
50 |
51 | # get the input video file name without the extension and without the path
52 | input_video_file_name_without_path = os.path.basename(input_video_file_name)
53 |
54 | # get the output wav file name
55 | output_wav_file_name = input_video_file_name_without_path + ".wav"
56 |
57 | # get the output wav file name with the path
58 | output_wav_file_name_with_path = os.path.join(
59 | os.path.dirname(input_video_file), output_wav_file_name
60 | )
61 |
62 | print("converting video to audio...")
63 | # execute the command to extract the audio from the input video file
64 | # the output will be a wav file with the same name as the input video file
65 | # the output wav file will be saved in the same directory as the input video file
66 | subprocess.run(
67 | [
68 | "ffmpeg",
69 | "-i",
70 | input_video_file,
71 | "-vn",
72 | "-ac",
73 | "1",
74 | "-ar",
75 | "16000",
76 | "-loglevel",
77 | "quiet",
78 | "-copyts",
79 | "-y",
80 | output_wav_file_name_with_path,
81 | ]
82 | )
83 |
84 | # check if the output wav file exists
85 | if not os.path.exists(output_wav_file_name_with_path):
86 | print(
87 | 'Error: the output wav file does not exist "'
88 | + output_wav_file_name_with_path
89 | + '"'
90 | )
91 | exit(1)
92 |
93 | print("transcribing audio...")
94 | model = WhisperModel("base")
95 | # hack the model to produce filler words by adding them as an input prompt
96 | segments, transcriptionInfo = model.transcribe(
97 | output_wav_file_name_with_path,
98 | initial_prompt="So uhm, yeaah. Uh, um. Uhh, Umm. Like, Okay, ehm, uuuh.",
99 | word_timestamps=True,
100 | suppress_blank=True,
101 | )
102 |
103 | punctuation_marks = "\"'.。,,!!??::”)]}、"
104 |
105 | new_segments = []
106 | # split punctuation from words into new items
107 | for segment in segments:
108 | new_words = []
109 | for word in segment.words:
110 | wordStr = word.word.strip()
111 | if len(wordStr) < 1:
112 | continue
113 | if wordStr[-1] in punctuation_marks:
114 | punctuation = wordStr[-1]
115 | new_words.append(
116 | {
117 | "word": wordStr[:-1].strip(),
118 | "start": word.start,
119 | "end": word.end,
120 | "probability": word.probability,
121 | }
122 | )
123 | new_words.append(
124 | {
125 | "word": punctuation,
126 | "start": word.end,
127 | "end": word.end,
128 | "probability": word.probability,
129 | }
130 | )
131 | else:
132 | new_words.append(
133 | {
134 | "word": word.word,
135 | "start": word.start,
136 | "end": word.end,
137 | "probability": word.probability,
138 | }
139 | )
140 | new_segment = {"words": new_words}
141 | new_segments.append(new_segment)
142 |
143 | # print(json.dumps(result, indent=4))
144 |
145 | # get the output json file name
146 | output_json_file_name = input_video_file_name_without_path + ".json"
147 |
148 | # get the output json file name with the path
149 | output_json_file_name_with_path = os.path.join(
150 | os.path.dirname(input_video_file), output_json_file_name
151 | )
152 |
153 | # write the output json file where the output format is:
154 | # {
155 | # "results": {
156 | # "transcripts": [{
157 | # "transcript": "the transcript"
158 | # }],
159 | # "items": [
160 | # {
161 | # "alternatives": [
162 | # {
163 | # "content": "the word",
164 | # "confidence": 0.0
165 | # }
166 | # ],
167 | # "start_time": 0.0,
168 | # "end_time": 0.0,
169 | # "type": "pronunciation"
170 | # },
171 | # ...
172 | # ]
173 | # }
174 | # }
175 | #
176 | # the input forma from whisper is:
177 | # {
178 | # "segments": [
179 | # {
180 | # "words": [
181 | # {
182 | # "word": "the word",
183 | # "start": 0.0,
184 | # "end": 0.0,
185 | # "probability": 0.0
186 | # },
187 | # ...
188 | # ]
189 | # },
190 | # ...
191 | # ]
192 | # }
193 | #
194 | # translate from whisper format to output format
195 | with open(output_json_file_name_with_path, "w") as outfile:
196 | json.dump(
197 | {
198 | "results": {
199 | "transcripts": [
200 | {
201 | "transcript": " ".join(
202 | [
203 | word["word"].strip()
204 | for segment in new_segments
205 | for word in segment["words"]
206 | ]
207 | ),
208 | }
209 | ],
210 | "items": [
211 | {
212 | "alternatives": [
213 | {
214 | "content": word["word"],
215 | "confidence": word["probability"],
216 | },
217 | ],
218 | "start_time": word["start"],
219 | "end_time": word["end"],
220 | "confidence": word["probability"],
221 | "type": (
222 | "pronunciation"
223 | if word["word"] not in punctuation_marks
224 | else "punctuation"
225 | ),
226 | }
227 | for segment in new_segments
228 | for word in segment["words"]
229 | ],
230 | }
231 | },
232 | outfile,
233 | indent=2,
234 | )
235 |
236 | # cleanup the output wav file
237 | os.remove(output_wav_file_name_with_path)
238 |
--------------------------------------------------------------------------------
/summary_chapters_blog.py:
--------------------------------------------------------------------------------
1 | # Description: This script takes a JSON file as input and outputs a summary of the video and the
2 | # chapters for adding to the video description on YouTube.
3 | #
4 | # Usage:
5 | # python summary_and_chapters.py [--generate_summary] [--generate_chapters] \
6 | # [--generate_blog] [--print_prompts] [--trim_length]
7 | #
8 | # Example:
9 | # python summary_and_chapters.py "input_json.json"
10 |
11 | import argparse
12 | import json
13 |
14 | import openai
15 |
16 |
17 | # get the input video file and the output text file
18 | parser = argparse.ArgumentParser()
19 | parser.add_argument("input_json_file", help="input json transcription file")
20 | # non positional arguments for generating summary and chapters
21 | parser.add_argument("--generate_summary", action="store_true", help="generate summary")
22 | parser.add_argument(
23 | "--generate_chapters", action="store_true", help="generate chapters"
24 | )
25 | parser.add_argument("--generate_blog", action="store_true", help="generate blog")
26 | parser.add_argument("--print_prompts", action="store_true", help="print prompts")
27 | parser.add_argument("--trim_length", type=int, default=100, help="trim length")
28 | parser.add_argument(
29 | "--wshiper_cpp_json", action="store_true", help="is this a whisper cpp json file?"
30 | )
31 | # optional arguments for generating summary and chapters
32 | parser.add_argument("--summary_prompt", type=str, default="", help="prompt to use for summary")
33 | args = parser.parse_args()
34 |
35 | # get the input video file name and the output text file name
36 | input_json_file = args.input_json_file
37 |
38 | # read the input JSON file
39 | # print("Parsing the input JSON file...")
40 | with open(input_json_file) as f:
41 | data = json.load(f)
42 |
43 | # combine words into sentences and keep the timings, using the start time of the first word
44 | # and the end time of the last word.
45 | # sentences are separated by a `punctuation` type item in the JSON file.
46 | # collect sentences in a list of lists of items from the JSON file.
47 | sentences = []
48 |
49 | if not args.wshiper_cpp_json:
50 | sentence = []
51 | for item in data["results"]["items"]:
52 | # if the item is a punctuation, then it's the end of the sentence
53 | if item["type"] == "punctuation" and item["alternatives"][0]["content"] in [
54 | ".",
55 | "?",
56 | "!",
57 | ]:
58 | # add an 'end_time' to the punctuation item by using the end time of the last word
59 | item["end_time"] = (
60 | sentence[-1]["end_time"] if len(sentence) > 0 else item["start_time"]
61 | )
62 |
63 | # add the punctuation to the sentence
64 | sentence.append(item)
65 |
66 | # add the sentence to the list of sentences
67 | sentences.append(sentence)
68 |
69 | # start a new sentence
70 | sentence = []
71 | else:
72 | # filter out the filler words
73 | if item["type"] == "pronunciation" and item["alternatives"][0][
74 | "content"
75 | ].lower() in ["um", "uh", "so", "hmm", "like"]:
76 | continue
77 |
78 | # filter out punctuation
79 | if item["type"] == "punctuation":
80 | continue
81 |
82 | # add the word to the sentence
83 | sentence.append(item)
84 |
85 | # get the timings of the sentences
86 | sentences_timings = []
87 | for sentence in sentences:
88 | # get the start time of the sentence
89 | start_time = float(sentence[0]["start_time"])
90 |
91 | # get the end time of the sentence
92 | end_time = float(sentence[-1]["end_time"])
93 |
94 | # add the timings to the list of timings
95 | sentences_timings.append((start_time, end_time))
96 |
97 |
98 | def convert_senconds_to_mmss(seconds):
99 | return f"{int(seconds // 60):02d}:{int(seconds % 60):02d}"
100 |
101 |
102 | def build_summary(trim=True, remove_filler_words=True):
103 | # build a summary list from the senstences and their timings
104 | summary = []
105 | if not args.wshiper_cpp_json:
106 | for sentence, timings in zip(sentences, sentences_timings):
107 | # get the pronounciations from the sentence
108 | pronounciations = [
109 | item["alternatives"][0]["content"].strip()
110 | for item in sentence
111 | if item["type"] == "pronunciation"
112 | ]
113 |
114 | if remove_filler_words:
115 | # remove the filler words from the sentence
116 | pronounciations = [
117 | word
118 | for word in pronounciations
119 | if word.lower() not in ["um", "uh", "so", "hmm", "like"]
120 | ]
121 |
122 | # get the sentence text
123 | sentence_text = " ".join(pronounciations) + "."
124 |
125 | if trim:
126 | # trim the sentence text to a maximum of 100 characters
127 | sentence_text = sentence_text[: args.trim_length]
128 |
129 | # get the sentence start and end timings
130 | sentence_start_time, sentence_end_time = timings
131 |
132 | # convert the timings to strings in the format MM:SS
133 | sentence_start_time = f"{convert_senconds_to_mmss(sentence_start_time)}"
134 | sentence_end_time = f"{convert_senconds_to_mmss(sentence_end_time)}"
135 |
136 | # add the sentence to the summary
137 | summary.append(
138 | {
139 | "text": sentence_text,
140 | "start_time": sentence_start_time,
141 | "end_time": sentence_end_time,
142 | }
143 | )
144 | else:
145 | for sentence in data["transcription"]:
146 | # get the sentence text
147 | sentence_text = sentence["text"]
148 |
149 | if trim:
150 | # trim the sentence text to a maximum of 100 characters
151 | sentence_text = sentence_text[: args.trim_length]
152 |
153 | # get the sentence start and end timings
154 | sentence_start_time = sentence["timestamps"]["from"]
155 | sentence_end_time = sentence["timestamps"]["to"]
156 |
157 | # add the sentence to the summary
158 | summary.append(
159 | {
160 | "text": sentence_text,
161 | "start_time": sentence_start_time,
162 | "end_time": sentence_end_time,
163 | }
164 | )
165 |
166 | return summary
167 |
168 |
169 | if args.generate_summary:
170 | # build a prompt for OpenAI generation:
171 | prompt = "transcript for the video:\n"
172 | prompt += "---\n"
173 | for sentence in build_summary(trim=args.trim_length > 0):
174 | prompt += f"{sentence['text']}\n"
175 | prompt += "---\n"
176 | if args.summary_prompt is not None and args.summary_prompt != "":
177 | prompt += args.summary_prompt
178 | else:
179 | prompt += (
180 | "write a short summary description paragraph for the above video on YouTube.\n"
181 | )
182 | prompt += "Summary for the video:\n"
183 |
184 | if args.print_prompts:
185 | print(prompt)
186 |
187 | history = [{"role": "user", "content": prompt}]
188 |
189 | # send a request to the OpenAI API (model gpt-3.5-turbo) to generate the summary
190 | # print("Sending a request to the OpenAI API to generate the summary...")
191 | print("Generating the summary...")
192 | response = openai.ChatCompletion.create(
193 | model="gpt-3.5-turbo-16k",
194 | messages=history,
195 | )
196 |
197 | # get the generated summary
198 | generated_summary = response["choices"][0]["message"]["content"]
199 | history += [{"role": "assistant", "content": generated_summary}]
200 |
201 | # print the generated summary
202 | print("----------------------")
203 | print(generated_summary)
204 | print("----------------------")
205 |
206 | if args.generate_chapters:
207 | prompt = "transcript for the video:\n"
208 | prompt += "---\n"
209 | for sentence in build_summary(trim=True):
210 | prompt += (
211 | f"[{sentence['start_time']} - {sentence['end_time']}] {sentence['text']}\n"
212 | )
213 | prompt += "---\n"
214 | prompt += (
215 | "write up to 10 high-level chapters for the video on YouTube in the format: "
216 | + "'MM:SS .'\n"
217 | )
218 | prompt += "Chapters for the video:\n"
219 |
220 | if args.print_prompts:
221 | print(prompt)
222 |
223 | history = [{"role": "user", "content": prompt}]
224 |
225 | # send a request to the OpenAI API (model gpt-3.5-turbo) to generate the chapters
226 | print("Sending a request to the OpenAI API to generate the chapters...")
227 | response = openai.ChatCompletion.create(
228 | model="gpt-3.5-turbo",
229 | messages=history,
230 | )
231 |
232 | # get the generated chapters
233 | generated_chapters = response["choices"][0]["message"]["content"]
234 | history += [{"role": "assistant", "content": generated_chapters}]
235 |
236 | # print the generated chapters
237 | print("----------------------")
238 | print(generated_chapters)
239 | print("----------------------")
240 |
241 | if args.generate_blog:
242 | prompt = "transcript for the video:\n"
243 | prompt += "---\n"
244 | for sentence in build_summary(trim=False):
245 | prompt += f"{sentence['text']}\n"
246 | prompt += "---\n"
247 | prompt += "write a blog post of at least 500 words for the above video. write the title and then the post body.\n"
248 | prompt += "Title of the blog post:\n"
249 |
250 | if args.print_prompts:
251 | print(prompt)
252 |
253 | history = [{"role": "user", "content": prompt}]
254 |
255 | # send a request to the OpenAI API (model gpt-3.5-turbo) to generate the blog post
256 | print("Sending a request to the OpenAI API to generate the blog post...")
257 | response = openai.ChatCompletion.create(
258 | model="gpt-3.5-turbo",
259 | messages=history,
260 | )
261 |
262 | # get the generated blog post
263 | generated_blog = response["choices"][0]["message"]["content"]
264 | history += [{"role": "assistant", "content": generated_blog}]
265 |
266 | # print the generated blog post
267 | print("----------------------")
268 | print(generated_blog)
269 | print("----------------------")
270 |
271 | print("Done.")
272 |
--------------------------------------------------------------------------------