├── Dockerfile ├── README.md ├── check.py └── screenshots ├── after.png └── before.png /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM zenika/alpine-chrome:with-node 2 | USER root 3 | RUN apk add --no-cache python3 py3-pip; \ 4 | npm install --production single-file-cli; \ 5 | pip3 install shaarli-client apprise; \ 6 | ln -s /usr/src/app/node_modules/single-file-cli/single-file /usr/local/bin/single-file; \ 7 | mkdir /archives; \ 8 | echo "*/60 * * * * /usr/bin/python3 /opt/check.py >> /var/log/shaarli-archiver.log" >> /var/spool/cron/crontabs/root 9 | VOLUME /archives 10 | COPY check.py /opt/ 11 | WORKDIR /archives 12 | CMD crond -f -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Shaarli Archiver 2 | 3 | [Shaarli](https://github.com/shaarli/Shaarli) doesn't have native local archiving possibilities. 4 | 5 | [SingleFile](https://github.com/gildas-lormeau/SingleFile) provides an easy way to archive web pages in a single HTML file (embedding pictures!). 6 | 7 | This container image combines the power of both! 8 | 9 | ## How does it work? 10 | 11 | - this container will query your Shaarli instance every hour 12 | - it searches your Shaarli for links with a specific tag that you define (e.g. `to_archive`) 13 | - if bookmarks are found with that tag, SingleFile processes the links and saves the single HTML under `/archives` on the container filesystem (mount the folder!) 14 | - when processed bookmarks are edited 15 | - description is updated with a link to the archive (e.g.: `file:///home/user/archives/1234_20200101_120000.html` or `https://archive.example.com/1234_20200101_120000.html`) 16 | - tag `shaarli-archiver` is added, making it easy to find archived bookmarks 17 | - an (optional) notification is sent to Pushover (it uses the [apprise](https://github.com/caronc/apprise) library) 18 | - when all links are processed, the dedicated and unique tag is deleted 19 | 20 | ## How a bookmark looks before processing 21 | 22 | ![](https://raw.githubusercontent.com/sebw/shaarli-archiver/master/screenshots/before.png) 23 | 24 | ## How it looks after processing 25 | 26 | ![](https://raw.githubusercontent.com/sebw/shaarli-archiver/master/screenshots/after.png) 27 | 28 | The "Archived on..." is clickable and goes to `ARCHIVE_URL`/linkID_archivalDate.html 29 | 30 | ## Run the container 31 | 32 | `SHAARLI_TAG` is the dedicated and unique tag that triggers the archiving. 33 | 34 | `SHAARLI_TOKEN` is the token that can be found in your Shaarli under Tools > Configure your Shaarli > REST API secret 35 | 36 | `ARCHIVE_URL` is where you will expose your archives (e.g. `file:///home/user/archives/`, `https://archive.example.com` or `https://archive.example.com/subfolder`) 37 | 38 | `PUSHOVER_USER` (optional) is your Pushover user token, if you want to get notified when a link is processed 39 | 40 | `PUSHOVER_TOKEN` (optional) is your Pushover application token, if you want to get notified when a link is processed 41 | 42 | ```bash 43 | sudo docker run -d \ 44 | --name=shaarli-archiver \ 45 | -e SHAARLI_URL=https://shaarli.example.com \ 46 | -e SHAARLI_TOKEN=abcdef \ 47 | -e SHAARLI_TAG=to_archive \ 48 | -e ARCHIVE_URL=https://archive.example.com \ 49 | -e PUSHOVER_USER=abc \ 50 | -e PUSHOVER_TOKEN=xyz \ 51 | -v /some/local/folder/archives:/archives \ 52 | ghcr.io/sebw/shaarli-archiver:0.4 53 | ``` 54 | 55 | ## Exposing your archives 56 | 57 | If you want to expose your archives, you can use the `nginx` container image mounted to the same local folder: 58 | 59 | ```bash 60 | docker run -d --restart unless-stopped --name shaarli-archiver-site -p 80:80 -v /some/local/folder/archives:/usr/share/nginx/html:ro -d nginx 61 | ``` 62 | 63 | ## Build the container yourself 64 | 65 | ```bash 66 | git clone https://github.com/sebw/shaarli-archiver 67 | cd shaarli-archiver 68 | docker build . -t shaarli-archiver:0.4 69 | ``` 70 | 71 | ## Troubleshooting 72 | 73 | ### Checking the logs 74 | 75 | ```bash 76 | docker exec -it shaarli-archiver tail -f /var/log/shaarli-archiver.log 77 | ``` 78 | 79 | ### Execute manually 80 | 81 | ```bash 82 | docker exec -it shaarli-archiver sh 83 | /usr/bin/python3 /opt/check.py 84 | ``` 85 | -------------------------------------------------------------------------------- /check.py: -------------------------------------------------------------------------------- 1 | #/usr/bin/env python3 2 | 3 | import shaarli_client.client as c 4 | import json 5 | import os 6 | import sys 7 | import subprocess 8 | import apprise 9 | from datetime import datetime 10 | 11 | shaarli_url = os.environ.get('SHAARLI_URL') 12 | shaarli_token = os.environ.get('SHAARLI_TOKEN') 13 | shaarli_tag = os.environ.get('SHAARLI_TAG') 14 | archive_url = os.environ.get('ARCHIVE_URL') 15 | 16 | pushover_user = os.environ.get('PUSHOVER_USER') 17 | pushover_token = os.environ.get('PUSHOVER_TOKEN') 18 | 19 | now = datetime.now() 20 | archive_date = now.strftime("%Y%m%d_%H%M%S") 21 | archive_date_readable = now.strftime("%Y-%m-%d %H:%M:%S") 22 | 23 | # Connect to your instance 24 | response = c.ShaarliV1Client(shaarli_url, shaarli_token) 25 | 26 | # Find the links with the dedicated tag 27 | answer = response.get_links({'searchtags': shaarli_tag, 'limit': 'all'}) 28 | 29 | j = answer.text 30 | x = json.loads(j) 31 | 32 | if x == []: 33 | print(archive_date_readable + " - Nothing to process...") 34 | else: 35 | for i in x: 36 | bookmark_id=i['id'] 37 | bookmark_url=i['url'] 38 | bookmark_title=i['title'] 39 | bookmark_description=i['description'] 40 | bookmark_private=i['private'] 41 | bookmark_tags=i['tags'] 42 | 43 | output_file = str(bookmark_id) + "_" + archive_date + ".html" 44 | 45 | print(archive_date_readable + " - Archiving bookmark ID " + str(i['id']) + " " + i['url'] + " at " + archive_url + "/" + str(bookmark_id) + "_" + archive_date + ".html") 46 | 47 | try: 48 | process = subprocess.run(['/usr/local/bin/single-file','--browser-executable-path','/usr/bin/chromium-browser','--output-directory','/archives/','--filename-template',output_file,'--browser-args','["--no-sandbox"]', bookmark_url], stdout=subprocess.PIPE, universal_newlines=True) 49 | 50 | # update description with the archive link 51 | params = { 52 | "description": bookmark_description + "\n\n---\n\n__*[Archived on " + archive_date_readable + "](" + archive_url + '/' + str(bookmark_id) + "_" + archive_date + ".html)*__", 53 | "private": bookmark_private, 54 | "tags": bookmark_tags + ['shaarli-archiver'], 55 | "title": bookmark_title, 56 | "url": bookmark_url 57 | } 58 | 59 | print("Updating bookmark with link to archive") 60 | response.put_link(bookmark_id, params) 61 | 62 | if pushover_user: 63 | print("Notifying Pushover") 64 | apobj = apprise.Apprise() 65 | apobj.add('pover://' + pushover_user + '@' + pushover_token) 66 | 67 | apobj.notify( 68 | body='URL ' + bookmark_url + ' has been processed', 69 | title='Shaarli Archiver', 70 | ) 71 | except: 72 | sys.exit("Something failed when trying to process the link") 73 | 74 | # Delete the tag when all links have been processed 75 | print("Deleting tag " + shaarli_tag) 76 | response.delete_tag(shaarli_tag, params=False) -------------------------------------------------------------------------------- /screenshots/after.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sebw/shaarli-archiver/339175fc87df74308681d609e00cfdbd8dd05d6b/screenshots/after.png -------------------------------------------------------------------------------- /screenshots/before.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sebw/shaarli-archiver/339175fc87df74308681d609e00cfdbd8dd05d6b/screenshots/before.png --------------------------------------------------------------------------------