├── .gitignore
├── LICENSE
├── README.md
├── main.py
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | *.mp4
2 | *.pdf
3 | *.jpg
4 | /frames
5 | /videos
6 | /slides


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2021 Sidharth Anand, Yashaswi Yenugu
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Lecture2Slides
 2 | 
 3 | ## About
 4 | 
 5 | Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture. (You can download the videos from Google Drive even if you only have View-Only permissions. Google it)
 6 | 
 7 | 
 8 | The utility only captures slides when it detects that a slide has changed and does not capture every frame. Thus your pdf will be very close to the actual slides used. (If you find that slides are being repeated due to being written on, try reducing the threshold as detailed below). You can also automatically run OCR on the resulting PDF files to make your slides searchable and copy/paste-able
 9 | 
10 | ## Features
11 | 
12 | - Convert a video file into a .PDF of the respective slides
13 | - MultiProcessing for parallel conversion of videos in an folder
14 | - Progress Indication and approximate ETA
15 | - Run automatic OCR on slides and save a text layer to the resulting PDF
16 | 
17 | ## Running
18 | 
19 | This program requires [python](https://www.python.org/downloads/) to run. Additionaly you also need to have installed [OpenCV](https://opencv.org/releases/) and have your path configured correctly.
20 | 
21 | - Clone the repo
22 | - Install requirements using `pip install -r requirements.txt`
23 | - Run the program `python main.py <videos_folder_name>`
24 | 
25 | OCR is disabled by default. If you want to run OCR you must also install [tesseract](https://github.com/tesseract-ocr/tesseract) and [GhostScript](https://ghostscript.com/) and configure your path with these as well
26 | 
27 |  - To run the program with OCR simply use the `--ocr` flag (`python main.py --ocr <videos_folder_name>`)
28 | 
29 | ## Options
30 | 
31 | The program provides many command line options to customize the execution
32 | 
33 |     -h, --help                           Print this help text and exit
34 | 
35 |     -t, --threshold                      Similarity threshold to add slide
36 | 
37 |     -p, --processes                      Number of parallel processes
38 | 
39 |     -s, --save-initial                   Use this option if you find the first
40 |                                          slide from the video is missing
41 | 
42 |     -sl, --left                          The left coordinate of the slide in the video
43 |     -st, --top                           The top coordinate of the slide in the video
44 |     -sr, --right                         The right coordinate of the slide in the video
45 |     -sb, --bottom                        The bottom coordinate of the slide in the video
46 | 
47 |     -o, --output                         The folder to store the slides in
48 | 
49 |     -f, --frequency                      How many seconds elapse before a frame is
50 |                                          processed
51 |     
52 |     --ocr                                Run OCR to make the resulting PDF searchable
53 | 
54 | ### Note
55 | - The slide coordinates are configured by default for the standard presentation size for a 720p Google Meet recording.
56 | - Increasing the frequency improves performance
57 | - If you find slides being repeated due to writing, reduce the threshold. We find that the sweet spot is between 0.82 to 0.85. (Do not set it below 0.8 as it will omit almost every slide. Conversely, setting it above 0.9 will make even tiny changes into a new slide)
58 | 
59 | ## Future updates
60 | 
61 | Stay tuned for the following updates
62 | 
63 |  - Automatic video download and conversion from a Google Drive folder link
64 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import shutil
  3 | import glob
  4 | import time
  5 | import typing
  6 | import sys
  7 | import subprocess
  8 | 
  9 | import cv2
 10 | import img2pdf
 11 | import argparse
 12 | 
 13 | from multiprocessing import Pool, cpu_count
 14 | from math import ceil
 15 | from pathlib import Path
 16 | 
 17 | from skimage.metrics import structural_similarity
 18 | from tqdm import tqdm
 19 | 
 20 | def save_frame(frame, path, frame_id):
 21 |     filename = path + "/frame_" + \
 22 |         str(int(time.time())) + '_' + str(int(frame_id)) + ".jpg"
 23 |     cv2.imwrite(filename, frame)
 24 | 
 25 | 
 26 | def get_video_paths(root_dir: str):
 27 |     if not os.path.exists(root_dir):
 28 |         print("Directory not found, please enter valid directory..")
 29 |         sys.exit(1)
 30 | 
 31 |     paths = []
 32 |     for rootDir, directory, filenames in os.walk(root_dir):
 33 |         for filename in filenames:
 34 |             if filename.lower().endswith(('.mp4')):
 35 |                 paths.append(os.path.join(rootDir, filename))
 36 | 
 37 |     return paths
 38 | 
 39 | 
 40 | def extract_slides_from_vid(video_path: str, threshold: float, save_initial: bool, capture_frequency: int, slide_bounds: typing.List[int], temp_captures_path: str, output_path: str, position: int):
 41 | 
 42 |     video_name = os.path.splitext(os.path.basename(video_path))[0]
 43 | 
 44 |     video_folder = os.path.join(temp_captures_path, video_name)
 45 | 
 46 |     pdf_path = os.path.join(output_path, video_name) + '.pdf'
 47 | 
 48 |     if not os.path.exists(video_folder):
 49 |         os.mkdir(video_folder)
 50 | 
 51 |     capture = cv2.VideoCapture(video_path)
 52 | 
 53 |     frame_rate = capture.get(cv2.CAP_PROP_FPS)
 54 |     frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
 55 |     frame_skip = int(capture_frequency * frame_rate)
 56 |     eval_count = int(frame_count / frame_skip)
 57 | 
 58 |     prev_frame = None
 59 | 
 60 |     bar = tqdm(total=eval_count, position=position)
 61 | 
 62 |     try:
 63 |         while capture.isOpened():
 64 |             frame_id = capture.get(cv2.CAP_PROP_POS_FRAMES)
 65 | 
 66 |             ret, frame = capture.read()
 67 | 
 68 |             if not ret:
 69 |                 break
 70 | 
 71 |             frame = frame[slide_bounds[1]:slide_bounds[3],
 72 |                         slide_bounds[0]:slide_bounds[2], :]
 73 | 
 74 |             if prev_frame is not None:
 75 |                 prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
 76 |                 current_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
 77 | 
 78 |                 (score, diff) = structural_similarity(
 79 |                     prev_gray, current_gray, full=True)
 80 | 
 81 |                 if score < threshold:
 82 |                     save_frame(frame, video_folder, frame_id)
 83 | 
 84 |             elif save_initial:
 85 |                 save_frame(frame, video_folder, frame_id)
 86 | 
 87 |             prev_frame = frame
 88 | 
 89 |             bar.update()
 90 | 
 91 |             capture.set(cv2.CAP_PROP_POS_FRAMES, min(
 92 |                 frame_id + frame_skip, frame_count))
 93 | 
 94 |         capture.release()
 95 |         bar.close()
 96 |         bar.clear()
 97 | 
 98 |         with open(pdf_path, "wb") as f:
 99 |             f.write(img2pdf.convert(glob.glob(video_folder + '/*.jpg')))
100 |         
101 |     except KeyboardInterrupt:
102 |         capture.release()
103 |         bar.close()
104 |         bar.clear()
105 | 
106 |     shutil.rmtree(video_folder)
107 | 
108 | 
109 | def extract_slides_from_batch(process_data: dict):
110 | 
111 |     threshold = process_data['threshold']
112 |     save_initial = process_data['save_initial']
113 |     slide_bounds = process_data['slide_bounds']
114 |     capture_frequency = process_data['capture_frequency']
115 |     output_path = process_data['output_path']
116 |     temp_captures_path = process_data['temp_captures_path']
117 |     position = process_data['process_id']
118 | 
119 |     for video in process_data['video_paths']:
120 |         extract_slides_from_vid(
121 |             video, threshold, save_initial, capture_frequency, slide_bounds, temp_captures_path, output_path, position)
122 | 
123 | 
124 | def lecture2slides(root_dir: str, threshold: float, processes: int, save_initial: bool, slide_bounds: typing.List[int], output_path: str, capture_frequency: int, run_ocr: bool) -> None:
125 |     if not os.path.exists(root_dir):
126 |         print('Could not find the folder:', root_dir)
127 |         return
128 | 
129 |     video_paths = get_video_paths(root_dir=root_dir)
130 | 
131 |     if len(video_paths) == 0:
132 |         print("Found 0 videos. Please enter a directory with images..")
133 |         return
134 | 
135 |     print("Found {} videos..".format(len(video_paths)))
136 | 
137 |     if processes > cpu_count():
138 |         print("Number of processes greater than system capacity..")
139 |         processes = cpu_count()
140 |         print("Defaulting to {} parallel processes..".format(processes))
141 | 
142 |     processes = min(processes, len(video_paths))
143 | 
144 |     vids_per_process = ceil(len(video_paths)/processes)
145 | 
146 |     split_paths = []
147 |     for i in range(0, len(video_paths), vids_per_process):
148 |         split_paths.append(video_paths[i:i+vids_per_process])
149 | 
150 |     temp_captures_path = './frames'
151 | 
152 |     if not os.path.exists(temp_captures_path):
153 |         os.mkdir(temp_captures_path)
154 | 
155 |     if not os.path.exists(output_path):
156 |         os.mkdir(output_path)
157 | 
158 |     split_data = []
159 |     for process_id, batch in enumerate(split_paths):
160 | 
161 |         process_data = {
162 |             "process_id": process_id,
163 |             "video_paths": batch,
164 |             "threshold": threshold,
165 |             "save_initial": save_initial,
166 |             "slide_bounds": slide_bounds,
167 |             "capture_frequency": capture_frequency,
168 |             "temp_captures_path": temp_captures_path,
169 |             "output_path": output_path,
170 |         }
171 | 
172 |         split_data.append(process_data)
173 | 
174 |     # Create a pool which can execute more than one process paralelly
175 |     pool = Pool(processes=processes)
176 | 
177 |     try:
178 |         # Map the function
179 |         print("Started {} processes..".format(processes))
180 |         pool.map(extract_slides_from_batch, split_data)
181 | 
182 |         # Wait until all parallel processes are done and then execute main script
183 |         pool.close()
184 |         pool.join()
185 | 
186 |         if run_ocr:
187 |             print('Running OCR on pdf outputs...')
188 | 
189 |             for rootDir, directory, filenames in os.walk(output_path):
190 |                 for filename in filenames:
191 |                     file_ext = os.path.splitext(filename)[1]
192 | 
193 |                     if file_ext == '.pdf':
194 |                         cmd = ["ocrmypdf", os.path.join(output_path, filename), os.path.join(output_path, filename)]
195 |                         proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
196 | 
197 |                         if proc.stdout is not None:
198 |                             result = proc.stdout.read()
199 |                             if proc.returncode == 6:
200 |                                 print("Skipped document because it already contained text", filename)
201 |                             elif proc.returncode == 0:
202 |                                 print("OCR complete for", filename)
203 | 
204 |     except KeyboardInterrupt:
205 |         pool.close()
206 |         pool.join()
207 | 
208 | 
209 | if __name__ == '__main__':
210 |     parser = argparse.ArgumentParser()
211 | 
212 |     parser.add_argument('root', type=str,
213 |                         help='The path to the folder containing videos')
214 | 
215 |     parser.add_argument('-t', '--threshold', default=0.82,
216 |                         type=float, help='Similarity threshold to add slide')
217 | 
218 |     parser.add_argument(
219 |         '-p', "--processes", required=False, type=int, default=cpu_count(), help="Number of parallel processes"
220 |     )
221 | 
222 |     parser.add_argument('-s', '--save-initial', default=False,
223 |                         help='Save the first frame. (Defaults to false)', action='store_true')
224 | 
225 |     parser.add_argument('-sl', '--left', default=95, type=int,
226 |                         help='Left coordinate of slide in video')
227 |     parser.add_argument('-st', '--top', default=70, type=int,
228 |                         help='Top coordinate of slide in video')
229 |     parser.add_argument('-sr', '--right', default=865,
230 |                         type=int, help='Right coordinate of slide in video')
231 |     parser.add_argument('-sb', '--bottom', default=650,
232 |                         type=int, help='Bottom coordinate of slide in video')
233 | 
234 |     parser.add_argument('-o', '--output', type=str, help='The output path')
235 | 
236 |     parser.add_argument('-f', '--frequency', type=int,
237 |                         default=10, help='Inverse of slide capture frame rate')
238 | 
239 |     parser.add_argument('--ocr', default=False, help='Run OCR on the resulting PDF', action='store_true')
240 | 
241 |     args = parser.parse_args()
242 | 
243 |     if not args.output:
244 |         args.output = 'slides'
245 | 
246 |     lecture2slides(args.root, args.threshold, args.processes, args.save_initial, [
247 |                    args.left, args.top, args.right, args.bottom], args.output, args.frequency, args.ocr)
248 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | opencv-python
2 | img2pdf
3 | argparse
4 | tqdm
5 | scikit-image
6 | ocrmypdf


--------------------------------------------------------------------------------