├── .gitignore ├── LICENSE ├── README.md ├── main.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | *.mp4 2 | *.pdf 3 | *.jpg 4 | /frames 5 | /videos 6 | /slides -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Sidharth Anand, Yashaswi Yenugu 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Lecture2Slides 2 | 3 | ## About 4 | 5 | Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture. (You can download the videos from Google Drive even if you only have View-Only permissions. Google it) 6 | 7 | 8 | The utility only captures slides when it detects that a slide has changed and does not capture every frame. Thus your pdf will be very close to the actual slides used. (If you find that slides are being repeated due to being written on, try reducing the threshold as detailed below). You can also automatically run OCR on the resulting PDF files to make your slides searchable and copy/paste-able 9 | 10 | ## Features 11 | 12 | - Convert a video file into a .PDF of the respective slides 13 | - MultiProcessing for parallel conversion of videos in an folder 14 | - Progress Indication and approximate ETA 15 | - Run automatic OCR on slides and save a text layer to the resulting PDF 16 | 17 | ## Running 18 | 19 | This program requires [python](https://www.python.org/downloads/) to run. Additionaly you also need to have installed [OpenCV](https://opencv.org/releases/) and have your path configured correctly. 20 | 21 | - Clone the repo 22 | - Install requirements using `pip install -r requirements.txt` 23 | - Run the program `python main.py ` 24 | 25 | OCR is disabled by default. If you want to run OCR you must also install [tesseract](https://github.com/tesseract-ocr/tesseract) and [GhostScript](https://ghostscript.com/) and configure your path with these as well 26 | 27 | - To run the program with OCR simply use the `--ocr` flag (`python main.py --ocr `) 28 | 29 | ## Options 30 | 31 | The program provides many command line options to customize the execution 32 | 33 | -h, --help Print this help text and exit 34 | 35 | -t, --threshold Similarity threshold to add slide 36 | 37 | -p, --processes Number of parallel processes 38 | 39 | -s, --save-initial Use this option if you find the first 40 | slide from the video is missing 41 | 42 | -sl, --left The left coordinate of the slide in the video 43 | -st, --top The top coordinate of the slide in the video 44 | -sr, --right The right coordinate of the slide in the video 45 | -sb, --bottom The bottom coordinate of the slide in the video 46 | 47 | -o, --output The folder to store the slides in 48 | 49 | -f, --frequency How many seconds elapse before a frame is 50 | processed 51 | 52 | --ocr Run OCR to make the resulting PDF searchable 53 | 54 | ### Note 55 | - The slide coordinates are configured by default for the standard presentation size for a 720p Google Meet recording. 56 | - Increasing the frequency improves performance 57 | - If you find slides being repeated due to writing, reduce the threshold. We find that the sweet spot is between 0.82 to 0.85. (Do not set it below 0.8 as it will omit almost every slide. Conversely, setting it above 0.9 will make even tiny changes into a new slide) 58 | 59 | ## Future updates 60 | 61 | Stay tuned for the following updates 62 | 63 | - Automatic video download and conversion from a Google Drive folder link 64 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import shutil 3 | import glob 4 | import time 5 | import typing 6 | import sys 7 | import subprocess 8 | 9 | import cv2 10 | import img2pdf 11 | import argparse 12 | 13 | from multiprocessing import Pool, cpu_count 14 | from math import ceil 15 | from pathlib import Path 16 | 17 | from skimage.metrics import structural_similarity 18 | from tqdm import tqdm 19 | 20 | def save_frame(frame, path, frame_id): 21 | filename = path + "/frame_" + \ 22 | str(int(time.time())) + '_' + str(int(frame_id)) + ".jpg" 23 | cv2.imwrite(filename, frame) 24 | 25 | 26 | def get_video_paths(root_dir: str): 27 | if not os.path.exists(root_dir): 28 | print("Directory not found, please enter valid directory..") 29 | sys.exit(1) 30 | 31 | paths = [] 32 | for rootDir, directory, filenames in os.walk(root_dir): 33 | for filename in filenames: 34 | if filename.lower().endswith(('.mp4')): 35 | paths.append(os.path.join(rootDir, filename)) 36 | 37 | return paths 38 | 39 | 40 | def extract_slides_from_vid(video_path: str, threshold: float, save_initial: bool, capture_frequency: int, slide_bounds: typing.List[int], temp_captures_path: str, output_path: str, position: int): 41 | 42 | video_name = os.path.splitext(os.path.basename(video_path))[0] 43 | 44 | video_folder = os.path.join(temp_captures_path, video_name) 45 | 46 | pdf_path = os.path.join(output_path, video_name) + '.pdf' 47 | 48 | if not os.path.exists(video_folder): 49 | os.mkdir(video_folder) 50 | 51 | capture = cv2.VideoCapture(video_path) 52 | 53 | frame_rate = capture.get(cv2.CAP_PROP_FPS) 54 | frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT)) 55 | frame_skip = int(capture_frequency * frame_rate) 56 | eval_count = int(frame_count / frame_skip) 57 | 58 | prev_frame = None 59 | 60 | bar = tqdm(total=eval_count, position=position) 61 | 62 | try: 63 | while capture.isOpened(): 64 | frame_id = capture.get(cv2.CAP_PROP_POS_FRAMES) 65 | 66 | ret, frame = capture.read() 67 | 68 | if not ret: 69 | break 70 | 71 | frame = frame[slide_bounds[1]:slide_bounds[3], 72 | slide_bounds[0]:slide_bounds[2], :] 73 | 74 | if prev_frame is not None: 75 | prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY) 76 | current_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) 77 | 78 | (score, diff) = structural_similarity( 79 | prev_gray, current_gray, full=True) 80 | 81 | if score < threshold: 82 | save_frame(frame, video_folder, frame_id) 83 | 84 | elif save_initial: 85 | save_frame(frame, video_folder, frame_id) 86 | 87 | prev_frame = frame 88 | 89 | bar.update() 90 | 91 | capture.set(cv2.CAP_PROP_POS_FRAMES, min( 92 | frame_id + frame_skip, frame_count)) 93 | 94 | capture.release() 95 | bar.close() 96 | bar.clear() 97 | 98 | with open(pdf_path, "wb") as f: 99 | f.write(img2pdf.convert(glob.glob(video_folder + '/*.jpg'))) 100 | 101 | except KeyboardInterrupt: 102 | capture.release() 103 | bar.close() 104 | bar.clear() 105 | 106 | shutil.rmtree(video_folder) 107 | 108 | 109 | def extract_slides_from_batch(process_data: dict): 110 | 111 | threshold = process_data['threshold'] 112 | save_initial = process_data['save_initial'] 113 | slide_bounds = process_data['slide_bounds'] 114 | capture_frequency = process_data['capture_frequency'] 115 | output_path = process_data['output_path'] 116 | temp_captures_path = process_data['temp_captures_path'] 117 | position = process_data['process_id'] 118 | 119 | for video in process_data['video_paths']: 120 | extract_slides_from_vid( 121 | video, threshold, save_initial, capture_frequency, slide_bounds, temp_captures_path, output_path, position) 122 | 123 | 124 | def lecture2slides(root_dir: str, threshold: float, processes: int, save_initial: bool, slide_bounds: typing.List[int], output_path: str, capture_frequency: int, run_ocr: bool) -> None: 125 | if not os.path.exists(root_dir): 126 | print('Could not find the folder:', root_dir) 127 | return 128 | 129 | video_paths = get_video_paths(root_dir=root_dir) 130 | 131 | if len(video_paths) == 0: 132 | print("Found 0 videos. Please enter a directory with images..") 133 | return 134 | 135 | print("Found {} videos..".format(len(video_paths))) 136 | 137 | if processes > cpu_count(): 138 | print("Number of processes greater than system capacity..") 139 | processes = cpu_count() 140 | print("Defaulting to {} parallel processes..".format(processes)) 141 | 142 | processes = min(processes, len(video_paths)) 143 | 144 | vids_per_process = ceil(len(video_paths)/processes) 145 | 146 | split_paths = [] 147 | for i in range(0, len(video_paths), vids_per_process): 148 | split_paths.append(video_paths[i:i+vids_per_process]) 149 | 150 | temp_captures_path = './frames' 151 | 152 | if not os.path.exists(temp_captures_path): 153 | os.mkdir(temp_captures_path) 154 | 155 | if not os.path.exists(output_path): 156 | os.mkdir(output_path) 157 | 158 | split_data = [] 159 | for process_id, batch in enumerate(split_paths): 160 | 161 | process_data = { 162 | "process_id": process_id, 163 | "video_paths": batch, 164 | "threshold": threshold, 165 | "save_initial": save_initial, 166 | "slide_bounds": slide_bounds, 167 | "capture_frequency": capture_frequency, 168 | "temp_captures_path": temp_captures_path, 169 | "output_path": output_path, 170 | } 171 | 172 | split_data.append(process_data) 173 | 174 | # Create a pool which can execute more than one process paralelly 175 | pool = Pool(processes=processes) 176 | 177 | try: 178 | # Map the function 179 | print("Started {} processes..".format(processes)) 180 | pool.map(extract_slides_from_batch, split_data) 181 | 182 | # Wait until all parallel processes are done and then execute main script 183 | pool.close() 184 | pool.join() 185 | 186 | if run_ocr: 187 | print('Running OCR on pdf outputs...') 188 | 189 | for rootDir, directory, filenames in os.walk(output_path): 190 | for filename in filenames: 191 | file_ext = os.path.splitext(filename)[1] 192 | 193 | if file_ext == '.pdf': 194 | cmd = ["ocrmypdf", os.path.join(output_path, filename), os.path.join(output_path, filename)] 195 | proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) 196 | 197 | if proc.stdout is not None: 198 | result = proc.stdout.read() 199 | if proc.returncode == 6: 200 | print("Skipped document because it already contained text", filename) 201 | elif proc.returncode == 0: 202 | print("OCR complete for", filename) 203 | 204 | except KeyboardInterrupt: 205 | pool.close() 206 | pool.join() 207 | 208 | 209 | if __name__ == '__main__': 210 | parser = argparse.ArgumentParser() 211 | 212 | parser.add_argument('root', type=str, 213 | help='The path to the folder containing videos') 214 | 215 | parser.add_argument('-t', '--threshold', default=0.82, 216 | type=float, help='Similarity threshold to add slide') 217 | 218 | parser.add_argument( 219 | '-p', "--processes", required=False, type=int, default=cpu_count(), help="Number of parallel processes" 220 | ) 221 | 222 | parser.add_argument('-s', '--save-initial', default=False, 223 | help='Save the first frame. (Defaults to false)', action='store_true') 224 | 225 | parser.add_argument('-sl', '--left', default=95, type=int, 226 | help='Left coordinate of slide in video') 227 | parser.add_argument('-st', '--top', default=70, type=int, 228 | help='Top coordinate of slide in video') 229 | parser.add_argument('-sr', '--right', default=865, 230 | type=int, help='Right coordinate of slide in video') 231 | parser.add_argument('-sb', '--bottom', default=650, 232 | type=int, help='Bottom coordinate of slide in video') 233 | 234 | parser.add_argument('-o', '--output', type=str, help='The output path') 235 | 236 | parser.add_argument('-f', '--frequency', type=int, 237 | default=10, help='Inverse of slide capture frame rate') 238 | 239 | parser.add_argument('--ocr', default=False, help='Run OCR on the resulting PDF', action='store_true') 240 | 241 | args = parser.parse_args() 242 | 243 | if not args.output: 244 | args.output = 'slides' 245 | 246 | lecture2slides(args.root, args.threshold, args.processes, args.save_initial, [ 247 | args.left, args.top, args.right, args.bottom], args.output, args.frequency, args.ocr) 248 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | opencv-python 2 | img2pdf 3 | argparse 4 | tqdm 5 | scikit-image 6 | ocrmypdf --------------------------------------------------------------------------------