├── html ├── archives │ └── .keep ├── replay-web-page │ └── .keep └── embed │ ├── index.html │ └── index.js ├── .gitignore ├── Dockerfile ├── start-sandbox.sh ├── start-dev.sh ├── pull-replay-web-page.sh ├── sandbox └── index.html ├── nginx.conf ├── README.md └── LICENSE /html/archives/.keep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /html/replay-web-page/.keep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.warc 2 | *.warc.gz 3 | *.wacz 4 | fly.toml 5 | .DS_Store -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM nginx:latest 2 | COPY ./html/ /usr/share/nginx/html/ 3 | COPY ./nginx.conf /etc/nginx/conf.d/nginx.conf 4 | -------------------------------------------------------------------------------- /start-sandbox.sh: -------------------------------------------------------------------------------- 1 | # Serves the files present in the "sandbox" folder, for testing purposes. 2 | cd sandbox; 3 | python3 -m http.server; -------------------------------------------------------------------------------- /start-dev.sh: -------------------------------------------------------------------------------- 1 | # Build Docker image and run the container as single use 2 | docker build . -t wacz-exhibitor-local; 3 | docker run --rm -p 8080:8080 wacz-exhibitor-local; -------------------------------------------------------------------------------- /pull-replay-web-page.sh: -------------------------------------------------------------------------------- 1 | # Pulls the latest version of replayweb.page (https://replayweb.page) 2 | curl https://cdn.jsdelivr.net/npm/replaywebpage@2.3.8/sw.js > html/replay-web-page/sw.js 3 | curl https://cdn.jsdelivr.net/npm/replaywebpage@2.3.8/ui.js > html/replay-web-page/ui.js 4 | -------------------------------------------------------------------------------- /html/embed/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | WARC.GZ / WACZ HTML Embed 5 | 6 | 7 | 8 | 9 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | -------------------------------------------------------------------------------- /sandbox/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | wacz-exhibitor sandbox 5 | 6 | 7 | 8 | 9 | 34 | 35 | 36 | 37 | 38 |

wacz-exhibitor sandbox

39 | 40 |

Test: embedding a wacz-exhibitor <iframe>

41 | 42 | 46 | 47 | 48 | 49 | -------------------------------------------------------------------------------- /nginx.conf: -------------------------------------------------------------------------------- 1 | # 2 | # Default NGINX configuration file for wacz-exhibitor. 3 | # 4 | # Inspiration: 5 | # - https://www.nginx.com/blog/smart-efficient-byte-range-caching-nginx/ 6 | # - https://kevincox.ca/2021/06/04/http-range-caching/ 7 | # - https://github.com/KyleAMathews/docker-nginx/blob/master/nginx.conf 8 | # 9 | proxy_cache_path /var/cache/nginx keys_zone=rangecache:10m inactive=60m; 10 | 11 | server { 12 | listen 8080; 13 | listen [::]:8080; 14 | 15 | root /usr/share/nginx/html/; 16 | 17 | # Range request caching setup (slice-by-slice approach) 18 | slice 1m; 19 | proxy_cache_key $host$uri$is_args$args$slice_range; 20 | proxy_set_header Range $slice_range; 21 | proxy_http_version 1.1; 22 | proxy_cache_valid 200 206 1h; 23 | proxy_cache rangecache; 24 | 25 | # Gzip compression setup 26 | gzip on; 27 | gzip_http_version 1.0; # Minimum HTTP version for gzip compression 28 | gzip_comp_level 5; 29 | gzip_min_length 256; 30 | gzip_proxied any; 31 | gzip_vary on; 32 | gzip_types 33 | application/javascript 34 | application/json 35 | text/css 36 | text/plain; 37 | # text/html is always compressed by HttpGzipModule 38 | 39 | # Intended CSP: 40 | # "Everything's allowed within the 16 | ``` 17 | 18 | See also: [Live Demo](https://warcembed-demo.lil.tools), [Blog post](https://lil.law.harvard.edu/blog/2022/09/15/opportunities-and-challenges-of-client-side-playback/ "Blog post on lil.law.harvard.edu - Web Archiving: Opportunities and Challenges of Client-Side Playback") 19 | 20 | Perma Tools 21 | 22 | --- 23 | 24 | ## Summary 25 | - [Concept](#concept) 26 | - [Routes](#routes) 27 | - [Deployment](#deployment) 28 | - [Local development](#local-development) 29 | - [Communicating with the embedded archive](#communicating-with-the-embedded-archive) 30 | - [Changelog](/CHANGELOG.md) 31 | 32 | --- 33 | 34 | ## Concept 35 | 36 | ### "It's a wrapper" 37 | `wacz-exhibitor` serves an HTML document containing a pre-configured instance of [replayweb.page](https://replayweb.page/), [webrecorder's client-side web archives playback system](https://webrecorder.net/), pointing at a proxied version of the requested WARC/WACZ file. 38 | 39 | The playback will only start if said HTML document is embedded in a cross-origin ` 89 | ``` 90 | 91 | ### /*.[wacz|warc|warc.gz] 92 | 93 | ### Role 94 | Pulls, caches and serves a given `.warc`, `.warc.gz` or `.wacz` file, with full support for range requests. 95 | 96 | Will first look for the path + file given in the local [`/archives/` folder](/html/archives/), and try to proxy it from the remote server defined in `nginx.conf`. 97 | 98 | [☝️ Back to summary](#summary) 99 | 100 | --- 101 | 102 | ## Deployment 103 | This project consists of a single `Dockerfile` derived from [the official NGINX Docker image](https://hub.docker.com/_/nginx), which can be deployed on any docker-compatible machine. 104 | 105 | ### Example 106 | The following example describes the process of deploying `wacz-exhibitor` on [fly.io](https://fly.io), a platform-as-a-service provider. 107 | 1. `nginx.conf` needs to be edited. See comments starting with `EDIT:` in the document for instructions. 108 | 2. Install the [`flyctl`](https://fly.io/docs/hands-on/install-flyctl/) client and [sign-in](https://fly.io/docs/hands-on/sign-in/), if not already done. 109 | 3. Initialize and deploy the project by running the `flyctl launch` command _(use `flyctl deploy` for subsequent deploys)_. 110 | 4. `wacz-exhibitor` is now live and visible on the [`fly.io` dashboard](https://fly.io/dashboard). 111 | 5. We highly recommend setting up a **custom domain and SSL certificate**. This can be done directly from the `fly.io` dashboard. Ideally, the target domain should be a subdomain of the website on which `wacz-exhibitor` iframes are going to be embedded: for example, `www.domain.ext` embedding an `