├── __init__.py ├── tox.ini ├── .gitignore ├── requirements.txt ├── test_flickrmirrorer.py ├── README.md └── flickrmirrorer.py /__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [flake8] 2 | max-line-length = 120 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | /.venv/ 2 | /.cache/ 3 | /.idea/ 4 | *.pyc 5 | .DS_Store 6 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | flickrapi ~= 2.0 2 | python-dateutil 3 | requests 4 | -------------------------------------------------------------------------------- /test_flickrmirrorer.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | 3 | from flickrmirrorer import get_photo_datetime 4 | 5 | class Tests(unittest.TestCase): 6 | def test_unparseable_title_timestamp(self): 7 | timestamp = get_photo_datetime({ 8 | 'datetakenunknown': '1', 9 | 'datetaken': '2014-10-01 13:45:37', 10 | 'title': 'flaskpost' 11 | }) 12 | 13 | # Fall back on datetaken if we can't parse the date from the title 14 | self.assertEqual(timestamp.isoformat(), "2014-10-01T13:45:37") 15 | 16 | def test_plain_title_timestamp(self): 17 | timestamp = get_photo_datetime({ 18 | 'datetakenunknown': '1', 19 | 'datetaken': '2014-10-01 13:45:37', 20 | 'title': '20151130_135610' 21 | }) 22 | self.assertEqual(timestamp.isoformat(), "2015-11-30T13:56:10") 23 | 24 | def test_known_timestamp(self): 25 | timestamp = get_photo_datetime({ 26 | 'datetakenunknown': '0', 27 | 'datetaken': '2015-11-02 12:35:07' 28 | }) 29 | self.assertEqual(timestamp.isoformat(), "2015-11-02T12:35:07") 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Overview 2 | ======== 3 | A small command-line python script that creates a local backup of your 4 | Flickr data. It mirrors images, video metadata, titles, description, tags, 5 | albums and collections. 6 | 7 | Available at https://github.com/markdoliner/flickrmirrorer 8 | 9 | Note that if you just want to download your Flickr data once you can use 10 | the "Request my Flickr data" button at the bottom of 11 | https://www.flickr.com/account — this script is intended for keeping a 12 | local copy of your Flickr data updated on an ongoing basis. 13 | 14 | Usage 15 | ===== 16 | The script was developed on Linux. It should work on other Unixy operating 17 | systems like macOS, hopefully without changes. It could probably be made 18 | to work on Microsoft Windows with minor changes. 19 | 20 | One time setup: 21 | 22 | ``` 23 | git clone https://github.com/markdoliner/flickrmirrorer 24 | cd flickrmirrorer 25 | python3 -m venv .venv 26 | .venv/bin/pip install -r requirements.txt 27 | ``` 28 | 29 | Then run this to backup your Flickr data: 30 | 31 | ``` 32 | .venv/bin/python flickrmirrorer.py /mnt/backup/flickr/ 33 | ``` 34 | 35 | (Replace `/mnt/backup/flickr` with the path to your backup) 36 | 37 | The first time you run this command, it will open your web browser and request permission from Flickr. 38 | 39 | See `--help` for options. 40 | 41 | 42 | Features 43 | ======== 44 | The script allows you to mirror only photos, only videos, or both. See 45 | the `--ignore-videos` and `--ignore-photos` command line options. 46 | 47 | Your local backup can be cleaned automatically, so that files that were 48 | deleted in Flickr are deleted locally. Deletion is disabled by default. See 49 | the `--delete-unknown` command line option. 50 | 51 | The script displays a summary of its actions if `--statistics` is passed on 52 | the command line. 53 | 54 | Requirements 55 | ============ 56 | 57 | (These are covered by running `pip install -r requirements.txt` as mentioned above) 58 | 59 | * python 3 60 | * python dateutil 61 | * python flickrapi library 2.0 or newer. 62 | * Homepage: https://stuvel.eu/software/flickrapi/ 63 | * python requests 64 | 65 | Running via Cron 66 | ================ 67 | Running this script regularly via cron is a good way to keep your backup 68 | up to date. On Linux you can use `crontab -e` to configure per-user cron jobs: 69 | 70 | ``` 71 | # Run Flickr photo mirroring script. 72 | # Sleep between 0 and 4 hours to distribute load on Flickr's API servers. 73 | 0 3 * * 2 root sleep $((`bash -c 'echo $RANDOM'` \% 14400)) && /home/my_user/flickrmirrorer/.venv/bin/python flickrmirrorer.py --quiet /mnt/backup/flickr/ 74 | ``` 75 | 76 | When using per-user cron jobs you shouldn't need to do anything special to 77 | allow the script to authenticate. However, if you run it as a system-wide 78 | cron job and it runs as a user other than yourself then you will 79 | need to take additional steps to make sure the cron user is able to 80 | authenticate. The steps are something like this: 81 | 82 | 1. Run the script as yourself the first time around. It should open 83 | your web browser and request permission. 84 | 2. After granting permission an authorization token is stored in 85 | `~/.flickr/oauth-tokens.sqlite` 86 | 3. Copy this file to the home directory of the cron user: 87 | ``` 88 | sudo mkdir -p /root/.flickr/ 89 | sudo cp ~/.flickr/oauth-tokens.sqlite /root/.flickr/oauth-tokens.sqlite 90 | ``` 91 | 92 | 93 | Output 94 | ====== 95 | The script creates this directory hierarchy: 96 | 97 | ``` 98 | dest_dir 99 | dest_dir/photostream/ 100 | dest_dir/photostream/12345.jpg 101 | dest_dir/photostream/12345.jpg.metadata 102 | dest_dir/photostream/12346.jpg 103 | dest_dir/photostream/12346.jpg.metadata 104 | dest_dir/photostream/12347.jpg 105 | dest_dir/photostream/12347.jpg.metadata 106 | dest_dir/Not in any album/ 107 | dest_dir/Not in any album/12345.jpg -> ../photostream/12345.jpg 108 | dest_dir/Albums/ 109 | dest_dir/Albums/Waterfalls - 6789/ 110 | dest_dir/Albums/Waterfalls - 6789/1_12346.jpg -> ../../photostream/12346.jpg 111 | dest_dir/Albums/Waterfalls - 6789/2_12347.jpg -> ../../photostream/12347.jpg 112 | dest_dir/Collections/ 113 | dest_dir/Collections/Nature - 2634-98761234/Waterfalls - 6789 -> ../../Albums/Waterfalls - 6789 114 | dest_dir/Collections/Nature - 2634-98761234/Mountains - 6790 -> ../../Albums/Mountains - 6790 115 | ``` 116 | 117 | The metadata files contain JSON data dumped from the Flickr API. 118 | It's not the prettiest thing in the world... but it does contain 119 | all the necessary data in case you want to recover from it. 120 | 121 | The album and collection directories contain symlinks to the files in 122 | the photostream. The symlink names in albums are numbered so as to 123 | preserve the order. 124 | 125 | Routine status is printed to stdout by default. 126 | 127 | Errors are printed to stderr. 128 | 129 | To see more options run with the `--help` flag. 130 | 131 | 132 | A note about videos 133 | =================== 134 | The Flickr API does not support downloading original video files. If this 135 | script encounters videos in your photostream, it asks you download them 136 | (you must be logged in to your Flickr account). 137 | 138 | 139 | Running unit tests 140 | ================== 141 | Run `python -m unittest` 142 | 143 | 144 | TODO 145 | ==== 146 | * Handle download errors better: 147 | * Add retry logic. 148 | * Continue trying to download other photos. 149 | * Stop running only if there are many download errors. 150 | * Mirror comments 151 | * Store order of photos in photostream 152 | * Store order of albums in collections 153 | 154 | 155 | Changes 156 | ======= 157 | 2023-12-27 158 | - Drop support for Python 2. 159 | - Change tests to use standard Python unittest library instead of pytest. 160 | - Update documentation to suggest using a venv. 161 | 162 | 2018-06-02 163 | - Support for nested collections and empty collections. 164 | 165 | 2017-01-02 166 | - Don't warn about downloading videos if they've already been downloaded. 167 | - Unknown files are no longer deleted by default. 168 | - Added new command line option `--delete-unknown` 169 | - Added new command line option `--ignore-photos` 170 | - Added new command line option `--ignore-videos` 171 | - Print statistics even if script is killed by CTRL+C. 172 | -------------------------------------------------------------------------------- /flickrmirrorer.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # A small command-line python script that creates a local backup of your 4 | # Flickr data. It mirrors images, titles, description, tags, albums and 5 | # collections. 6 | # 7 | # Available at https://github.com/markdoliner/flickrmirrorer 8 | # 9 | # Licensed as follows (this is the 2-clause BSD license, aka 10 | # "Simplified BSD License" or "FreeBSD License"): 11 | # 12 | # Copyright (c) 13 | # Ciprian Radu, 2016 14 | # Johan Walles, 2016 15 | # Mark Doliner, 2012-2023 16 | # Mattias Holmlund, 2013 17 | # Steve Cassidy, 2016 18 | # Victor Engmark, 2016 19 | # All rights reserved. 20 | # 21 | # Redistribution and use in source and binary forms, with or without 22 | # modification, are permitted provided that the following conditions are met: 23 | # - Redistributions of source code must retain the above copyright notice, 24 | # this list of conditions and the following disclaimer. 25 | # - Redistributions in binary form must reproduce the above copyright notice, 26 | # this list of conditions and the following disclaimer in the documentation 27 | # and/or other materials provided with the distribution. 28 | # 29 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 30 | # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 31 | # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 32 | # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE 33 | # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 34 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 35 | # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 36 | # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 37 | # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 38 | # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 39 | # POSSIBILITY OF SUCH DAMAGE. 40 | 41 | import argparse 42 | import datetime 43 | import dateutil.parser 44 | import errno 45 | import glob 46 | import math 47 | import os 48 | import requests 49 | import shutil 50 | import signal 51 | import sys 52 | import time 53 | import urllib.parse 54 | import webbrowser 55 | 56 | try: 57 | # We try importing simplejson first because it's faster than json 58 | # in python 2.7 and lower 59 | import simplejson as json 60 | except ImportError: 61 | import json 62 | 63 | try: 64 | import flickrapi 65 | except ImportError: 66 | sys.stderr.write('Error importing flickrapi python library. Is it installed?\n') 67 | sys.exit(1) 68 | 69 | API_KEY = '9c5c431017e712bde232a2f142703bb2' 70 | API_SECRET = '7c024f6e7a36fc03' 71 | 72 | PLEASE_GRANT_AUTHORIZATION_MSG = """ 73 | Please authorize Flickr Mirrorer to read your photos, titles, tags, etc. 74 | 75 | 1. Visit %s 76 | 2. Click "OK, I'LL AUTHORIZE IT" 77 | 3. Copy and paste the code here and press 'return' 78 | 79 | """ 80 | 81 | NUM_PHOTOS_PER_BATCH = 500 82 | 83 | 84 | class VideoDownloadError(Exception): 85 | def __str__(self): 86 | return '%s' % self.args[0] 87 | 88 | 89 | def _ensure_dir_exists(path): 90 | """Create the directory 'path' if it does not exist. 91 | Calls sys.exit(1) if any directory could not be created.""" 92 | try: 93 | os.makedirs(path) 94 | except OSError as ex: 95 | if ex.errno != errno.EEXIST: 96 | sys.stderr.write('Error creating destination directory %s: %s\n' 97 | % (path, ex.strerror)) 98 | sys.exit(1) 99 | 100 | 101 | def _ensure_dir_doesnt_exist(path): 102 | """Remove the directory 'path' and all contents if it exists. 103 | Calls sys.exit(1) if the directory or any contents could not be removed.""" 104 | try: 105 | shutil.rmtree(path) 106 | except OSError as ex: 107 | if ex.errno != errno.ENOENT: 108 | sys.stderr.write('Error removing %s: %s\n' % (path, ex.strerror)) 109 | sys.exit(1) 110 | 111 | 112 | def _validate_json_response(rsp): 113 | """Exits the script with an error if the response is a failure. 114 | 115 | Args: 116 | rsp (dict): A parse JSON response from the Flickr API. 117 | """ 118 | if rsp['stat'] != 'ok': 119 | sys.stderr.write('API request failed: Error %(code)s: %(message)s\n' % rsp) 120 | sys.exit(1) 121 | 122 | 123 | def get_photo_datetime(photo): 124 | """Return date a photo was taken. 125 | 126 | Obtained from: 127 | 1. 'datetaken' unless 'datetakenunknown' 128 | 2. Parsed from photo title 'YYYYMMDD_HHmmss' 129 | 3. 'datetaken' anyway; it's available even if unknown, so we just 130 | go with whatever Flickr made up for us. 131 | 132 | Returns: 133 | datetime.datetime 134 | """ 135 | if photo['datetakenunknown'] == "0": 136 | return dateutil.parser.parse(photo['datetaken']) 137 | 138 | try: 139 | parsed = datetime.datetime.strptime(photo['title'], '%Y%m%d_%H%M%S') 140 | if parsed.year > 2000 and parsed < datetime.datetime.now(): 141 | return parsed 142 | except ValueError: 143 | # Unable to parse photo title as datetime 144 | pass 145 | 146 | return dateutil.parser.parse(photo['datetaken']) 147 | 148 | 149 | class FlickrMirrorer(object): 150 | dest_dir = None 151 | photostream_dir = None 152 | tmp_filename = None 153 | flickr = None 154 | 155 | def __init__(self, args): 156 | self.dest_dir = args.destdir 157 | self.verbosity = args.verbosity 158 | self.print_statistics = args.statistics 159 | self.include_views = args.include_views 160 | self.ignore_photos = args.ignore_photos 161 | self.ignore_videos = args.ignore_videos 162 | self.delete_unknown = args.delete_unknown 163 | 164 | self.photostream_dir = os.path.join(self.dest_dir, 'photostream') 165 | self.albums_dir = os.path.join(self.dest_dir, 'Albums') 166 | self.collections_dir = os.path.join(self.dest_dir, 'Collections') 167 | self.tmp_filename = os.path.join(self.dest_dir, 'tmp') 168 | 169 | # Statistics 170 | self.deleted_photos = 0 171 | self.modified_photos = 0 172 | self.new_photos = 0 173 | self.modified_albums = 0 174 | self.modified_collections = 0 175 | 176 | # Register a SIGINT (Ctrl-C) handler 177 | signal.signal(signal.SIGINT, self._sig_int_handler) 178 | 179 | # Create flickrapi instance 180 | self.flickr = flickrapi.FlickrAPI(api_key=API_KEY, secret=API_SECRET, format='parsed-json') 181 | 182 | def run(self): 183 | try: 184 | self._run_helper() 185 | finally: 186 | self._cleanup() 187 | 188 | def _run_helper(self): 189 | # Authenticate 190 | # The user-friendly way to do this is with this command: 191 | # self.flickr.authenticate_via_browser(perms='read') 192 | # However, the nature of this script is such that we don't want 193 | # to rely on people running it somwhere with a web browser 194 | # installed. So use the manual authentication process. A 195 | # reasonable compromise might be to try browser auth first and 196 | # if it fails then fall back to manual auth. Really flickrapi 197 | # should do that for us. Or at least print the URL to the 198 | # console. 199 | if not self.flickr.token_valid(perms='read'): 200 | self.flickr.get_request_token(oauth_callback='oob') 201 | authorize_url = self.flickr.auth_url(perms='read') 202 | webbrowser.open_new_tab(authorize_url) 203 | 204 | verifier = input(PLEASE_GRANT_AUTHORIZATION_MSG % authorize_url) 205 | 206 | self.flickr.get_access_token(verifier) 207 | 208 | if self.ignore_photos and self.ignore_videos: 209 | sys.stderr.write( 210 | 'There is nothing to do because photos and videos are ignored. ' 211 | 'Please choose to mirror at least photos or videos.\n') 212 | return 213 | 214 | self._verbose('Photos will be %s' % ('ignored' if self.ignore_photos else 'mirrored')) 215 | self._verbose('Videos will be %s' % ('ignored' if self.ignore_videos else 'mirrored')) 216 | self._verbose('Unknown files in %s will%s be deleted' % ( 217 | self.dest_dir, '' if self.delete_unknown else ' not')) 218 | 219 | # Create destination directory 220 | _ensure_dir_exists(self.dest_dir) 221 | 222 | # Fetch photos 223 | self._download_all_photos() 224 | 225 | # Create albums and collections 226 | self._mirror_albums() 227 | self._create_not_in_any_album_dir() 228 | self._mirror_collections() 229 | 230 | self._print_statistics() 231 | 232 | def _print_statistics(self): 233 | if not self.print_statistics: 234 | return 235 | print('New photos / videos: %d' % self.new_photos) 236 | print('Deleted photos / videos: %d' % self.deleted_photos) 237 | print('Modified photos /videos: %d' % self.modified_photos) 238 | print('Modified albums: %d' % self.modified_albums) 239 | print('Modified collections: %d' % self.modified_collections) 240 | 241 | def _download_all_photos(self): 242 | """Download all our pictures and metadata. 243 | If you have a lot of photos then this function will take a while.""" 244 | 245 | self._verbose('Mirroring all photos and videos in photostream') 246 | 247 | _ensure_dir_exists(self.photostream_dir) 248 | 249 | new_files = set() 250 | 251 | current_page = 1 252 | 253 | metadata_fields = ('description,license,date_upload,date_taken,owner_name,icon_server,original_format,' 254 | 'last_update,geo,tags,machine_tags,o_dims,media') 255 | 256 | if self.include_views: 257 | metadata_fields += ',views' 258 | 259 | download_errors = [] 260 | while True: 261 | rsp = self.flickr.people_getPhotos( 262 | user_id='me', 263 | extras=metadata_fields, 264 | per_page=NUM_PHOTOS_PER_BATCH, 265 | page=current_page, 266 | ) 267 | _validate_json_response(rsp) 268 | 269 | photos = rsp['photos']['photo'] 270 | for photo in photos: 271 | if (photo['media'] == 'photo' and not self.ignore_photos) or ( 272 | photo['media'] == 'video' and not self.ignore_videos): 273 | try: 274 | new_files |= self._download_photo(photo) 275 | except VideoDownloadError as e: 276 | download_errors.append(e) 277 | 278 | if current_page >= rsp['photos']['pages']: 279 | # We've reached the end of the photostream. Stop looping. 280 | break 281 | 282 | current_page += 1 283 | 284 | # Error out if there were exceptions 285 | if download_errors: 286 | sys.stderr.write( 287 | 'The Flickr API does not allow downloading original video files.\n' 288 | 'Please save the files listed below to the %s directory.\n' 289 | 'Note: You must be logged into your Flickr account in order to download ' 290 | 'your full resolution videos!\n' % self.photostream_dir) 291 | for error in download_errors: 292 | sys.stderr.write(' %s\n' % error) 293 | sys.exit(1) 294 | 295 | # Error out if we didn't fetch any photos 296 | if not new_files: 297 | sys.stderr.write('Error: The Flickr API returned an empty list of photos. ' 298 | 'Bailing out without deleting any local copies in case this is an anomaly.\n') 299 | sys.exit(1) 300 | 301 | # Divide by 2 because we want to ignore the photo metadata files 302 | # for the purposes of our statistics. 303 | self.deleted_photos = self._delete_unknown_files(self.photostream_dir, new_files, 'file') / 2 304 | 305 | def _download_photo(self, photo): 306 | """Fetch and save a media item (photo or video) and the metadata 307 | associated with it. 308 | 309 | Returns a python set containing the filenames for the data. 310 | """ 311 | url = self._get_photo_url(photo) 312 | photo_basename = self._get_photo_basename(photo) 313 | photo_filename = os.path.join(self.photostream_dir, photo_basename) 314 | metadata_basename = '%s.metadata' % photo_basename 315 | metadata_filename = '%s.metadata' % photo_filename 316 | 317 | # Sanity check 318 | if os.path.isdir(photo_filename) or os.path.islink(photo_filename): 319 | sys.stderr.write('Error: %s exists but is not a file. This is not allowed.\n' % photo_filename) 320 | sys.exit(1) 321 | 322 | # Sanity check 323 | if os.path.isdir(metadata_filename) or os.path.islink(metadata_filename): 324 | sys.stderr.write('Error: %s exists but is not a file. This is not allowed.\n' % metadata_filename) 325 | sys.exit(1) 326 | 327 | # Download photo if it doesn't exist locally or if the metadata 328 | # file exists and the lastupdate timestamp has changed. 329 | # TODO: Should ideally also set should_download_photo to True if 330 | # not os.path.exists(metadata_filename), but that doesn't work 331 | # correctly for videos because the metadata file won't have been 332 | # created when the video file was created because the video was 333 | # downloaded out of band by the user. 334 | should_download_photo = not os.path.exists(photo_filename) 335 | if not should_download_photo: 336 | # Download photo if lastupdate timestamp has changed. 337 | try: 338 | with open(metadata_filename) as json_file: 339 | metadata = json.load(json_file) 340 | should_download_photo |= metadata['lastupdate'] != photo['lastupdate'] 341 | except IOError as ex: 342 | if ex.errno != errno.ENOENT: 343 | sys.stderr.write('Error reading %s: %s\n' % (metadata_filename, ex)) 344 | sys.exit(1) 345 | 346 | if should_download_photo: 347 | if not os.path.exists(photo_filename): 348 | self.new_photos += 1 349 | else: 350 | self.modified_photos += 1 351 | 352 | self._progress('Fetching %s' % photo_basename) 353 | request = requests.get(url, stream=True) 354 | if not request.ok: 355 | if photo['media'] == 'video': 356 | raise VideoDownloadError( 357 | 'Manual download required (video may have changed): ' 358 | 'https://www.flickr.com/video_download.gne?id=%s' % photo['id']) 359 | 360 | sys.stderr.write( 361 | 'Error: Failed to fetch %s: %s: %s\n' 362 | % (url, request.status_code, request.reason)) 363 | sys.exit(1) 364 | 365 | # Write to temp file then rename to avoid incomplete files 366 | # in case of failure part-way through. 367 | with open(self.tmp_filename, 'wb') as tmp_file: 368 | # Use 1 MiB chunks. 369 | for chunk in request.iter_content(2**20): 370 | tmp_file.write(chunk) 371 | os.rename(self.tmp_filename, photo_filename) 372 | else: 373 | self._verbose('Skipping %s because we already have it' 374 | % photo_basename) 375 | 376 | # Write metadata 377 | if self._write_json_if_different(metadata_filename, photo): 378 | self._progress('Updated metadata for %s' % photo_basename) 379 | else: 380 | self._verbose( 381 | 'Skipping metadata for %s because we already have it' % 382 | photo_basename) 383 | 384 | photo_datetime = get_photo_datetime(photo) 385 | self._set_timestamp_if_different(photo_datetime, photo_filename) 386 | self._set_timestamp_if_different(photo_datetime, metadata_filename) 387 | 388 | return {photo_basename, metadata_basename} 389 | 390 | def _mirror_albums(self): 391 | """Create a directory for each album, and create symlinks to the 392 | files in the photostream.""" 393 | self._verbose('Mirroring albums') 394 | 395 | album_dirs = set() 396 | 397 | # Fetch albums 398 | rsp = self.flickr.photosets_getList() 399 | _validate_json_response(rsp) 400 | if rsp['photosets']: 401 | for album in rsp['photosets']['photoset']: 402 | album_dirs |= self._mirror_album(album) 403 | 404 | self._delete_unknown_files(self.albums_dir, album_dirs, 'album') 405 | 406 | def _mirror_album(self, album): 407 | album_basename = self._get_album_dirname(album['id'], album['title']['_content']) 408 | album_dir = os.path.join(self.albums_dir, album_basename) 409 | 410 | # Fetch list of photos 411 | photos = [] 412 | 413 | num_pages = int(math.ceil(float(album['photos']) / NUM_PHOTOS_PER_BATCH)) 414 | for current_page in range(1, num_pages + 1): 415 | # Fetch photos in this album 416 | rsp = self.flickr.photosets_getPhotos( 417 | photoset_id=album['id'], 418 | extras='original_format,media', 419 | per_page=NUM_PHOTOS_PER_BATCH, 420 | page=current_page, 421 | ) 422 | _validate_json_response(rsp) 423 | 424 | for photo in rsp['photoset']['photo']: 425 | if (photo['media'] == 'photo' and not self.ignore_photos) or ( 426 | photo['media'] == 'video' and not self.ignore_videos): 427 | photos += [photo] 428 | 429 | # Include list of photo IDs in metadata, so we can tell if photos 430 | # were added or removed from the album when mirroring in the future. 431 | album['photos'] = [photo['id'] for photo in photos] 432 | 433 | if (not self.include_views) and 'count_views' in album: 434 | del album['count_views'] 435 | 436 | # Add a version number to the album metadata. This gives us an 437 | # easy way to invalidate the local copy and cause the album to 438 | # be recreated, if needed. More specifically this causes the 439 | # albums to be recreated now that I've fixed the bug where 440 | # symlinks to videos were broken. 441 | album['flickrmirrorer_album_metadata_version'] = 2 442 | 443 | metadata_filename = os.path.join(album_dir, 'metadata') 444 | 445 | # TODO: Should ensure local album directory accurately reflects the 446 | # remote album data even if the metadata hasn't changed (important in 447 | # case the local album data has been tampered with). 448 | if not os.path.exists(album_dir) or self._is_file_different(metadata_filename, album): 449 | # Metadata changed, might be due to updated list of photos. 450 | self._progress('Updating album %s' % album['title']['_content']) 451 | self.modified_albums += 1 452 | 453 | # Delete and recreate the album 454 | _ensure_dir_doesnt_exist(album_dir) 455 | _ensure_dir_exists(album_dir) 456 | 457 | # Create symlinks for each photo, prefixed with a number so that 458 | # the local alphanumeric sort order matches the order on Flickr. 459 | digits = len(str(len(photos))) 460 | for i, photo in enumerate(photos): 461 | photo_basename = self._get_photo_basename(photo) 462 | photo_fullname = os.path.join(self.photostream_dir, photo_basename) 463 | photo_relname = os.path.relpath(photo_fullname, album_dir) 464 | symlink_basename = '%s_%s' % (str(i+1).zfill(digits), photo_basename) 465 | symlink_filename = os.path.join(album_dir, symlink_basename) 466 | os.symlink(photo_relname, symlink_filename) 467 | 468 | # Write metadata 469 | self._write_json_if_different(metadata_filename, album) 470 | 471 | else: 472 | self._verbose('Album %s is up-to-date' % album['title']['_content']) 473 | 474 | return {album_basename} 475 | 476 | def _create_not_in_any_album_dir(self): 477 | """Create a directory for photos that aren't in any album, and 478 | create symlinks to the files in the photostream.""" 479 | 480 | self._verbose('Creating local directory for photos not in any album') 481 | 482 | album_dir = os.path.join(self.dest_dir, 'Not in any album') 483 | 484 | # TODO: Ideally we would inspect the existing directory and 485 | # make sure it's correct, but that's a lot of work. For now 486 | # just recreate the album. Fixing this would also allow us to 487 | # log _progress() messages when the album has changed. 488 | _ensure_dir_doesnt_exist(album_dir) 489 | _ensure_dir_exists(album_dir) 490 | 491 | current_page = 1 492 | while True: 493 | # Fetch list of photos that aren't in any album 494 | rsp = self.flickr.photos_getNotInSet( 495 | extras='original_format,media', 496 | per_page=NUM_PHOTOS_PER_BATCH, 497 | page=current_page, 498 | ) 499 | _validate_json_response(rsp) 500 | photos = [] 501 | for photo in rsp['photos']['photo']: 502 | if (photo['media'] == 'photo' and not self.ignore_photos) or ( 503 | photo['media'] == 'video' and not self.ignore_videos): 504 | photos += [photo] 505 | if not photos: 506 | # We've reached the end of the photostream. Stop looping. 507 | break 508 | 509 | for photo in photos: 510 | photo_basename = self._get_photo_basename(photo) 511 | photo_fullname = os.path.join(self.photostream_dir, photo_basename) 512 | photo_relname = os.path.relpath(photo_fullname, album_dir) 513 | symlink_filename = os.path.join(album_dir, photo_basename) 514 | os.symlink(photo_relname, symlink_filename) 515 | 516 | current_page += 1 517 | 518 | def _mirror_collections(self): 519 | """Create a directory for each collection, and create symlinks to the 520 | albums.""" 521 | self._verbose('Mirroring collections') 522 | 523 | collection_dirs = set() 524 | 525 | # Fetch collections 526 | rsp = self.flickr.collections_getTree() 527 | _validate_json_response(rsp) 528 | if rsp['collections']: 529 | for collection in rsp['collections']['collection']: 530 | collection_dirs |= self._mirror_collection(self.collections_dir, collection) 531 | 532 | self._delete_unknown_files(self.collections_dir, collection_dirs, 'collection') 533 | 534 | def _mirror_collection(self, parent_dir, collection): 535 | """ 536 | Args: 537 | parent_dir (str): The full path to the directory where this 538 | collection should be written. 539 | collection (dict): The collection metadata dict as returned 540 | by the flickr.collections.getTree API call. 541 | """ 542 | collection_basename = self._get_collection_dirname(collection['id'], collection['title']) 543 | collection_dir = os.path.join(parent_dir, collection_basename) 544 | 545 | metadata_filename = os.path.join(collection_dir, 'metadata') 546 | 547 | if not os.path.exists(collection_dir) or self._is_file_different(metadata_filename, collection): 548 | # Metadata changed, might be due to updated list of albums. 549 | self._progress('Updating collection %s' % collection['title']) 550 | self.modified_collections += 1 551 | 552 | # Delete and recreate the collection 553 | _ensure_dir_doesnt_exist(collection_dir) 554 | _ensure_dir_exists(collection_dir) 555 | 556 | # Create symlinks for each album 557 | for album in collection.get('set') or []: 558 | album_basename = self._get_album_dirname(album['id'], album['title']) 559 | album_fullname = os.path.join(self.albums_dir, album_basename) 560 | album_relname = os.path.relpath(album_fullname, collection_dir) 561 | symlink_filename = os.path.join(collection_dir, album_basename) 562 | os.symlink(album_relname, symlink_filename) 563 | 564 | # Collections can contain infinitely nested collections. 565 | for child_collection in collection.get('collection') or []: 566 | self._mirror_collection(collection_dir, child_collection) 567 | 568 | # Write metadata 569 | self._write_json_if_different(metadata_filename, collection) 570 | 571 | return {collection_basename} 572 | 573 | def _get_photo_url(self, photo): 574 | mediatype = photo['media'] 575 | 576 | if mediatype == 'photo': 577 | return 'https://farm%(farm)s.staticflickr.com/%(server)s/%(id)s_%(originalsecret)s_o.%(originalformat)s' \ 578 | % photo 579 | 580 | if mediatype == 'video': 581 | # URL created according to these instructions: 582 | # http://code.flickr.net/2009/03/02/videos-in-the-flickr-api-part-deux/ 583 | owner = self.flickr.token_cache.token.user_nsid 584 | return 'http://www.flickr.com/photos/%s/%s/play/orig/%s/' % ( 585 | owner, photo['id'], photo['originalsecret']) 586 | 587 | sys.stderr.write('Error: Unsupported media type "%s":\n' % mediatype) 588 | sys.stderr.write(json.dumps(photo, indent=2) + '\n') 589 | sys.exit(1) 590 | 591 | def _get_photo_basename(self, photo): 592 | mediatype = photo['media'] 593 | 594 | if mediatype == 'photo': 595 | return '%s.%s' % (photo['id'], photo['originalformat']) 596 | 597 | if mediatype == 'video': 598 | # TODO: If Flickr begins including the file extension in the 599 | # video metadata then this code should be changed to behave 600 | # like the photo case, above. 601 | # The photo metadata for videos does not indicate the file 602 | # extension. If we've already saved the video locally then 603 | # we can get the basename from the local file. 604 | for f in glob.iglob(os.path.join(self.photostream_dir, photo['id']) + '*'): 605 | if not f.endswith('metadata'): 606 | return os.path.basename(f) 607 | 608 | # Otherwise, make an HTTP HEAD request to get the response 609 | # headers we'd see when trying to download the photo. This 610 | # URL gets redirected to the CDN with a URL that includes 611 | # the video's original name. 612 | # TODO: Note that this started failing on 2016-06-25. It 613 | # seems to be impossible to download original video files 614 | # via the Flickr API now. The best we can do is show the 615 | # user a download URL and ask them to download. For a little 616 | # more context see: 617 | # https://www.flickr.com/groups/51035612836@N01/discuss/72157671986445591/72157673833636861 618 | # https://groups.yahoo.com/neo/groups/yws-flickr/conversations/topics/9610 619 | # https://groups.yahoo.com/neo/groups/yws-flickr/conversations/topics/9617 620 | head = requests.head(self._get_photo_url(photo), allow_redirects=True) 621 | if head.status_code != 200: 622 | raise VideoDownloadError( 623 | 'Manual download required: ' 624 | 'https://www.flickr.com/video_download.gne?id=%s' % photo['id']) 625 | 626 | return os.path.basename(urllib.parse.urlparse(head.url).path) 627 | 628 | sys.stderr.write('Error: Unsupported media type "%s":\n' % mediatype) 629 | sys.stderr.write(json.dumps(photo, indent=2) + '\n') 630 | sys.exit(1) 631 | 632 | @staticmethod 633 | def _get_album_dirname(id_, title): 634 | safe_title = urllib.parse.quote(title.encode('utf-8'), " ',") 635 | # The ID is included in the name to avoid collisions when there 636 | # are two albums with the same name. 637 | return '%s - %s' % (safe_title, id_) 638 | 639 | @staticmethod 640 | def _get_collection_dirname(id_, title): 641 | safe_title = urllib.parse.quote(title.encode('utf-8'), " ',") 642 | # The ID is included in the name to avoid collisions when there 643 | # are two collections with the same name. 644 | return '%s - %s' % (safe_title, id_) 645 | 646 | @staticmethod 647 | def _is_file_different(filename, data): 648 | """Return True if the contents of the file 'filename' differ 649 | from 'data'. Otherwise return False.""" 650 | try: 651 | with open(filename) as json_file: 652 | orig_data = json.load(json_file) 653 | return orig_data != data 654 | except IOError as ex: 655 | if ex.errno != errno.ENOENT: 656 | sys.stderr.write('Error reading %s: %s\n' % (filename, ex)) 657 | sys.exit(1) 658 | return True 659 | 660 | def _set_timestamp_if_different(self, photo_datetime, filename): 661 | """Set the access and modified times of a file to the specified 662 | datetime. 663 | 664 | Args: 665 | photo_datetime (datetime.datetime) 666 | """ 667 | try: 668 | timestamp = time.mktime(photo_datetime.timetuple()) 669 | if timestamp != os.path.getmtime(filename): 670 | os.utime(filename, (timestamp, timestamp)) 671 | except OverflowError: 672 | self._progress('Error updating timestamp for: %s' % filename) 673 | 674 | def _write_json_if_different(self, filename, data): 675 | """Write the given data to the specified filename, but only if it's 676 | different from what is currently there. Return true if the file was 677 | written. 678 | 679 | We use this function mostly to avoid changing the timestamps on 680 | metadata files.""" 681 | if not self._is_file_different(filename, data): 682 | # Data has not changed--do nothing. 683 | return False 684 | 685 | # Write to temp file then rename to avoid incomplete files 686 | # in case of failure part-way through. 687 | with open(self.tmp_filename, 'w') as json_file: 688 | json.dump(data, json_file) 689 | os.rename(self.tmp_filename, filename) 690 | return True 691 | 692 | def _delete_unknown_files(self, rootdir, known, knowntype): 693 | """If the delete_unknown option is used, delete all files and 694 | directories in rootdir except the known files. 695 | 696 | knowntype is only used for the log message. 697 | 698 | Returns the number of deleted entries. 699 | """ 700 | # return early if the rootdir doesn't exist 701 | if not os.path.isdir(rootdir): 702 | return 0 703 | 704 | # delete only if the --delete-unknown was specified. 705 | if not self.delete_unknown: 706 | return 0 707 | 708 | delete_count = 0 709 | curr_entries = os.listdir(rootdir) 710 | 711 | unknown_entries = set(curr_entries) - set(known) 712 | for unknown_entry in unknown_entries: 713 | fullname = os.path.join(rootdir, unknown_entry) 714 | self._progress('Deleting unknown %s: %s' % (knowntype, unknown_entry)) 715 | delete_count += 1 716 | 717 | try: 718 | if os.path.isdir(fullname): 719 | shutil.rmtree(fullname) 720 | else: 721 | os.remove(fullname) 722 | except OSError as ex: 723 | sys.stderr.write('Error deleting %s: %s\n' % (fullname, ex.strerror)) 724 | sys.exit(1) 725 | 726 | return delete_count 727 | 728 | def _verbose(self, msg): 729 | if self.verbosity >= 2: 730 | print(msg) 731 | 732 | def _progress(self, msg): 733 | if self.verbosity >= 1: 734 | print(msg) 735 | 736 | def _cleanup(self): 737 | # Remove a temp file, if one exists 738 | try: 739 | os.remove(self.tmp_filename) 740 | except OSError as ex: 741 | if ex.errno != errno.ENOENT: 742 | sys.stderr.write('Error deleting temp file %s: %s\n' % (self.tmp_filename, ex.strerror)) 743 | 744 | def _sig_int_handler(self, signum, frame): 745 | # User exited with CTRL+C 746 | print('') 747 | self._print_statistics() 748 | sys.exit() 749 | 750 | 751 | def main(): 752 | parser = argparse.ArgumentParser( 753 | description='Create a local mirror of your flickr data.') 754 | 755 | parser.add_argument( 756 | 'destdir', 757 | help='the path to where the mirror shall be stored') 758 | 759 | parser.add_argument( 760 | '-v', '--verbose', 761 | dest='verbosity', action='store_const', const=2, 762 | default=1, 763 | help='print progress information to stdout') 764 | 765 | parser.add_argument( 766 | '-q', '--quiet', 767 | dest='verbosity', action='store_const', const=0, 768 | help='print nothing to stdout if the mirror succeeds') 769 | 770 | parser.add_argument( 771 | '-s', '--statistics', action='store_const', 772 | default=False, const=True, 773 | help='print transfer-statistics at the end') 774 | 775 | parser.add_argument( 776 | '--ignore-views', action='store_const', 777 | dest='include_views', default=True, const=False, 778 | help='do not include views-counter in metadata') 779 | 780 | parser.add_argument( 781 | '--ignore-photos', action='store_const', 782 | dest='ignore_photos', default=False, const=True, 783 | help='do not mirror photos') 784 | 785 | parser.add_argument( 786 | '--ignore-videos', action='store_const', 787 | dest='ignore_videos', default=False, const=True, 788 | help='do not mirror videos') 789 | 790 | parser.add_argument( 791 | '--delete-unknown', action='store_const', 792 | dest='delete_unknown', default=False, const=True, 793 | help='delete unrecognized files in the destination directory. ' 794 | 'Warning: if you choose to ignore photos or videos, they will be deleted!') 795 | 796 | args = parser.parse_args() 797 | 798 | mirrorer = FlickrMirrorer(args) 799 | mirrorer.run() 800 | 801 | 802 | if __name__ == '__main__': 803 | try: 804 | main() 805 | except KeyboardInterrupt: 806 | # User exited with CTRL+C 807 | # Print a newline to leave the console in a prettier state. 808 | print('') 809 | --------------------------------------------------------------------------------