├── GO.sh ├── README.md ├── create-slide-urls.py ├── create-slide-video.py └── download-slides.sh /GO.sh: -------------------------------------------------------------------------------- 1 | python3 create-slide-urls.py && mkdir -p slides && ./download-slides.sh -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # slideslive-downloader 2 | download slideslive presentations (video + slides) + 🎉👍🙌👌✨👏**SYNCED SLIDES VIDEO**👏✨👌🙌👍🎉 3 | 4 | ## what is it 5 | I wanted to download ICLR 2020 videos, so I hacked this together. It should work for any presentation hosted on slideslive.com 6 | 7 | ## how to use 8 | - I have some hardcoded values, change them as appropriate 9 | - I also download the main video directly from the webpage in browser by just finding the mp4 file in source. 10 | - and I use packages like progressbar2, opencv, and idk maybe some other stuff. Look at the source and install those packages. 11 | 12 | e.g. Beyond “tabula rasa” in reinforcement learning: agents that remember, adapt, and generalize 13 | https://slideslive.com/38926829/beyond-tabula-rasa-in-reinforcement-learning-agents-that-remember-adapt-and-generalize 14 | 15 | slide URL: https://d2ygwrecguqg66.cloudfront.net/data/presentations/38926829/slides/big/0-0001.jpg 16 | 17 | slide XML: https://d2ygwrecguqg66.cloudfront.net/data/presentations/38926829/38926829.xml 18 | (contains sync times for slides) 19 | 20 | 21 | 1. download mp4 from page 22 | 2. find the slide URL from the page source 23 | 3. find the slide XML from the page source and download it. it contains sync times for slides 24 | 4. modify `create-slide-urls.py` with correct `slide_count` and `url` 25 | 5. run `GO.sh` 26 | 6. modify `create-slide-video.py` with correct XML file 27 | 7. run `python3 create-slide-video.py` to turn the slide images into a video that is synced with the main mp4 28 | 8. now you can use something like DaVinci Resolve video editor to place the videos side-by-side and render it out 29 | 30 | done! 31 | 32 | ### notes 33 | http://www.betr-rl.ml/2020/program/ 34 | -------------------------------------------------------------------------------- /create-slide-urls.py: -------------------------------------------------------------------------------- 1 | 2 | slide_count = 1102 3 | 4 | with open('slide-urls.txt', 'w+') as f: 5 | for i in range(1, slide_count+1): 6 | filename = f"0-{str(i).zfill(4)}.jpg" 7 | url = f"https://d2ygwrecguqg66.cloudfront.net/data/presentations/38926829/slides/big/{filename}" 8 | f.write(url) 9 | f.write('\n') 10 | -------------------------------------------------------------------------------- /create-slide-video.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | import operator 4 | import progressbar 5 | import xml.etree.ElementTree as ET 6 | 7 | class Slide: 8 | def __init__(self, order_id, time_sec, time, slide_name): 9 | self.order_id = order_id 10 | self.time_sec = time_sec 11 | self.time = time 12 | self.slide_name = slide_name 13 | 14 | def __str__(self): 15 | return f"orderId: {self.order_id} timeSec: {self.time_sec} time: {self.time} slideName: {self.slide_name}" 16 | 17 | def __repr__(self): 18 | return self.__str__() 19 | 20 | 21 | slides = [] 22 | 23 | for child in ET.parse('38926829.xml').getroot().iter('slide'): 24 | order_id = int(child.find('orderId').text) 25 | time_sec = int(child.find('timeSec').text) 26 | time = int(child.find('time').text) 27 | slide_name = child.find('slideName').text 28 | 29 | slide = Slide(order_id, time_sec, time, slide_name) 30 | 31 | slides.append(slide) 32 | 33 | slides = sorted(slides, key=operator.attrgetter('order_id')) 34 | 35 | fourcc = cv2.VideoWriter_fourcc(*'mp4v') 36 | out = cv2.VideoWriter('slides.mp4', fourcc, 1.0, (1024, 576)) 37 | 38 | for i in progressbar.progressbar(range(0, len(slides)-1)): 39 | frame_count = slides[i + 1].time_sec - slides[i].time_sec 40 | frame = cv2.imread(f"slides/{slides[i].slide_name}.jpg") 41 | for j in range(frame_count): 42 | out.write(frame) 43 | 44 | out.release() 45 | -------------------------------------------------------------------------------- /download-slides.sh: -------------------------------------------------------------------------------- 1 | parallel -j 16 wget -P slides/ < slide-urls.txt --------------------------------------------------------------------------------