├── __init__.py
├── tox.ini
├── .gitignore
├── requirements.txt
├── test_flickrmirrorer.py
├── README.md
└── flickrmirrorer.py


/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/tox.ini:
--------------------------------------------------------------------------------
1 | [flake8]
2 | max-line-length = 120
3 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | /.venv/
2 | /.cache/
3 | /.idea/
4 | *.pyc
5 | .DS_Store
6 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | flickrapi ~= 2.0
2 | python-dateutil
3 | requests
4 | 


--------------------------------------------------------------------------------
/test_flickrmirrorer.py:
--------------------------------------------------------------------------------
 1 | import unittest
 2 | 
 3 | from flickrmirrorer import get_photo_datetime
 4 | 
 5 | class Tests(unittest.TestCase):
 6 |     def test_unparseable_title_timestamp(self):
 7 |         timestamp = get_photo_datetime({
 8 |             'datetakenunknown': '1',
 9 |             'datetaken': '2014-10-01 13:45:37',
10 |             'title': 'flaskpost'
11 |         })
12 | 
13 |         # Fall back on datetaken if we can't parse the date from the title
14 |         self.assertEqual(timestamp.isoformat(), "2014-10-01T13:45:37")
15 | 
16 |     def test_plain_title_timestamp(self):
17 |         timestamp = get_photo_datetime({
18 |             'datetakenunknown': '1',
19 |             'datetaken': '2014-10-01 13:45:37',
20 |             'title': '20151130_135610'
21 |         })
22 |         self.assertEqual(timestamp.isoformat(), "2015-11-30T13:56:10")
23 | 
24 |     def test_known_timestamp(self):
25 |         timestamp = get_photo_datetime({
26 |             'datetakenunknown': '0',
27 |             'datetaken': '2015-11-02 12:35:07'
28 |         })
29 |         self.assertEqual(timestamp.isoformat(), "2015-11-02T12:35:07")
30 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | Overview
  2 | ========
  3 | A small command-line python script that creates a local backup of your
  4 | Flickr data. It mirrors images, video metadata, titles, description, tags,
  5 | albums and collections.
  6 | 
  7 | Available at https://github.com/markdoliner/flickrmirrorer
  8 | 
  9 | Note that if you just want to download your Flickr data once you can use
 10 | the "Request my Flickr data" button at the bottom of
 11 | https://www.flickr.com/account — this script is intended for keeping a
 12 | local copy of your Flickr data updated on an ongoing basis.
 13 | 
 14 | Usage
 15 | =====
 16 | The script was developed on Linux. It should work on other Unixy operating
 17 | systems like macOS, hopefully without changes. It could probably be made
 18 | to work on Microsoft Windows with minor changes.
 19 | 
 20 | One time setup:
 21 | 
 22 | ```
 23 | git clone https://github.com/markdoliner/flickrmirrorer
 24 | cd flickrmirrorer
 25 | python3 -m venv .venv
 26 | .venv/bin/pip install -r requirements.txt
 27 | ```
 28 | 
 29 | Then run this to backup your Flickr data:
 30 | 
 31 | ```
 32 | .venv/bin/python flickrmirrorer.py /mnt/backup/flickr/
 33 | ```
 34 | 
 35 | (Replace `/mnt/backup/flickr` with the path to your backup)
 36 | 
 37 | The first time you run this command, it will open your web browser and request permission from Flickr.
 38 | 
 39 | See `--help` for options.
 40 | 
 41 | 
 42 | Features
 43 | ========
 44 | The script allows you to mirror only photos, only videos, or both. See
 45 | the `--ignore-videos` and `--ignore-photos` command line options.
 46 | 
 47 | Your local backup can be cleaned automatically, so that files that were
 48 | deleted in Flickr are deleted locally. Deletion is disabled by default. See
 49 | the `--delete-unknown` command line option.
 50 | 
 51 | The script displays a summary of its actions if `--statistics` is passed on
 52 | the command line.
 53 | 
 54 | Requirements
 55 | ============
 56 | 
 57 | (These are covered by running `pip install -r requirements.txt` as mentioned above)
 58 | 
 59 | * python 3
 60 | * python dateutil
 61 | * python flickrapi library 2.0 or newer.
 62 |   * Homepage: https://stuvel.eu/software/flickrapi/
 63 | * python requests
 64 | 
 65 | Running via Cron
 66 | ================
 67 | Running this script regularly via cron is a good way to keep your backup
 68 | up to date. On Linux you can use `crontab -e` to configure per-user cron jobs:
 69 | 
 70 | ```
 71 | # Run Flickr photo mirroring script.
 72 | # Sleep between 0 and 4 hours to distribute load on Flickr's API servers.
 73 | 0 3 * * 2  root  sleep $((`bash -c 'echo $RANDOM'` \% 14400)) && /home/my_user/flickrmirrorer/.venv/bin/python flickrmirrorer.py --quiet /mnt/backup/flickr/
 74 | ```
 75 | 
 76 | When using per-user cron jobs you shouldn't need to do anything special to
 77 | allow the script to authenticate. However, if you run it as a system-wide
 78 | cron job and it runs as a user other than yourself then you will
 79 | need to take additional steps to make sure the cron user is able to
 80 | authenticate. The steps are something like this:
 81 | 
 82 | 1. Run the script as yourself the first time around. It should open
 83 |    your web browser and request permission.
 84 | 2. After granting permission an authorization token is stored in
 85 |    `~/.flickr/oauth-tokens.sqlite`
 86 | 3. Copy this file to the home directory of the cron user:
 87 |    ```
 88 |    sudo mkdir -p /root/.flickr/
 89 |    sudo cp ~/.flickr/oauth-tokens.sqlite /root/.flickr/oauth-tokens.sqlite
 90 |    ```
 91 | 
 92 | 
 93 | Output
 94 | ======
 95 | The script creates this directory hierarchy:
 96 | 
 97 | ```
 98 | dest_dir
 99 | dest_dir/photostream/
100 | dest_dir/photostream/12345.jpg
101 | dest_dir/photostream/12345.jpg.metadata
102 | dest_dir/photostream/12346.jpg
103 | dest_dir/photostream/12346.jpg.metadata
104 | dest_dir/photostream/12347.jpg
105 | dest_dir/photostream/12347.jpg.metadata
106 | dest_dir/Not in any album/
107 | dest_dir/Not in any album/12345.jpg -> ../photostream/12345.jpg
108 | dest_dir/Albums/
109 | dest_dir/Albums/Waterfalls - 6789/
110 | dest_dir/Albums/Waterfalls - 6789/1_12346.jpg -> ../../photostream/12346.jpg
111 | dest_dir/Albums/Waterfalls - 6789/2_12347.jpg -> ../../photostream/12347.jpg
112 | dest_dir/Collections/
113 | dest_dir/Collections/Nature - 2634-98761234/Waterfalls - 6789 -> ../../Albums/Waterfalls - 6789
114 | dest_dir/Collections/Nature - 2634-98761234/Mountains - 6790  -> ../../Albums/Mountains - 6790
115 | ```
116 | 
117 | The metadata files contain JSON data dumped from the Flickr API.
118 | It's not the prettiest thing in the world... but it does contain
119 | all the necessary data in case you want to recover from it.
120 | 
121 | The album and collection directories contain symlinks to the files in
122 | the photostream. The symlink names in albums are numbered so as to
123 | preserve the order.
124 | 
125 | Routine status is printed to stdout by default.
126 | 
127 | Errors are printed to stderr.
128 | 
129 | To see more options run with the `--help` flag.
130 | 
131 | 
132 | A note about videos
133 | ===================
134 | The Flickr API does not support downloading original video files. If this
135 | script encounters videos in your photostream, it asks you download them
136 | (you must be logged in to your Flickr account).
137 | 
138 | 
139 | Running unit tests
140 | ==================
141 | Run `python -m unittest`
142 | 
143 | 
144 | TODO
145 | ====
146 | * Handle download errors better:
147 |   * Add retry logic.
148 |   * Continue trying to download other photos.
149 |   * Stop running only if there are many download errors.
150 | * Mirror comments
151 | * Store order of photos in photostream
152 | * Store order of albums in collections
153 | 
154 | 
155 | Changes
156 | =======
157 | 2023-12-27
158 | - Drop support for Python 2.
159 | - Change tests to use standard Python unittest library instead of pytest.
160 | - Update documentation to suggest using a venv.
161 | 
162 | 2018-06-02
163 | - Support for nested collections and empty collections.
164 | 
165 | 2017-01-02
166 | - Don't warn about downloading videos if they've already been downloaded.
167 | - Unknown files are no longer deleted by default.
168 | - Added new command line option `--delete-unknown`
169 | - Added new command line option `--ignore-photos`
170 | - Added new command line option `--ignore-videos`
171 | - Print statistics even if script is killed by CTRL+C.
172 | 


--------------------------------------------------------------------------------
/flickrmirrorer.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # A small command-line python script that creates a local backup of your
  4 | # Flickr data. It mirrors images, titles, description, tags, albums and
  5 | # collections.
  6 | #
  7 | # Available at https://github.com/markdoliner/flickrmirrorer
  8 | #
  9 | # Licensed as follows (this is the 2-clause BSD license, aka
 10 | # "Simplified BSD License" or "FreeBSD License"):
 11 | #
 12 | # Copyright (c)
 13 | #   Ciprian Radu, 2016
 14 | #   Johan Walles, 2016
 15 | #   Mark Doliner, 2012-2023
 16 | #   Mattias Holmlund, 2013
 17 | #   Steve Cassidy, 2016
 18 | #   Victor Engmark, 2016
 19 | # All rights reserved.
 20 | #
 21 | # Redistribution and use in source and binary forms, with or without
 22 | # modification, are permitted provided that the following conditions are met:
 23 | # - Redistributions of source code must retain the above copyright notice,
 24 | #   this list of conditions and the following disclaimer.
 25 | # - Redistributions in binary form must reproduce the above copyright notice,
 26 | #   this list of conditions and the following disclaimer in the documentation
 27 | #   and/or other materials provided with the distribution.
 28 | #
 29 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 30 | # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 31 | # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 32 | # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
 33 | # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 34 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 35 | # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 36 | # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 37 | # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 38 | # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 39 | # POSSIBILITY OF SUCH DAMAGE.
 40 | 
 41 | import argparse
 42 | import datetime
 43 | import dateutil.parser
 44 | import errno
 45 | import glob
 46 | import math
 47 | import os
 48 | import requests
 49 | import shutil
 50 | import signal
 51 | import sys
 52 | import time
 53 | import urllib.parse
 54 | import webbrowser
 55 | 
 56 | try:
 57 |     # We try importing simplejson first because it's faster than json
 58 |     # in python 2.7 and lower
 59 |     import simplejson as json
 60 | except ImportError:
 61 |     import json
 62 | 
 63 | try:
 64 |     import flickrapi
 65 | except ImportError:
 66 |     sys.stderr.write('Error importing flickrapi python library. Is it installed?\n')
 67 |     sys.exit(1)
 68 | 
 69 | API_KEY = '9c5c431017e712bde232a2f142703bb2'
 70 | API_SECRET = '7c024f6e7a36fc03'
 71 | 
 72 | PLEASE_GRANT_AUTHORIZATION_MSG = """
 73 | Please authorize Flickr Mirrorer to read your photos, titles, tags, etc.
 74 | 
 75 | 1. Visit %s
 76 | 2. Click "OK, I'LL AUTHORIZE IT"
 77 | 3. Copy and paste the code here and press 'return'
 78 | 
 79 | """
 80 | 
 81 | NUM_PHOTOS_PER_BATCH = 500
 82 | 
 83 | 
 84 | class VideoDownloadError(Exception):
 85 |     def __str__(self):
 86 |         return '%s' % self.args[0]
 87 | 
 88 | 
 89 | def _ensure_dir_exists(path):
 90 |     """Create the directory 'path' if it does not exist.
 91 |     Calls sys.exit(1) if any directory could not be created."""
 92 |     try:
 93 |         os.makedirs(path)
 94 |     except OSError as ex:
 95 |         if ex.errno != errno.EEXIST:
 96 |             sys.stderr.write('Error creating destination directory %s: %s\n'
 97 |                              % (path, ex.strerror))
 98 |             sys.exit(1)
 99 | 
100 | 
101 | def _ensure_dir_doesnt_exist(path):
102 |     """Remove the directory 'path' and all contents if it exists.
103 |     Calls sys.exit(1) if the directory or any contents could not be removed."""
104 |     try:
105 |         shutil.rmtree(path)
106 |     except OSError as ex:
107 |         if ex.errno != errno.ENOENT:
108 |             sys.stderr.write('Error removing %s: %s\n' % (path, ex.strerror))
109 |             sys.exit(1)
110 | 
111 | 
112 | def _validate_json_response(rsp):
113 |     """Exits the script with an error if the response is a failure.
114 | 
115 |     Args:
116 |        rsp (dict): A parse JSON response from the Flickr API.
117 |     """
118 |     if rsp['stat'] != 'ok':
119 |         sys.stderr.write('API request failed: Error %(code)s: %(message)s\n' % rsp)
120 |         sys.exit(1)
121 | 
122 | 
123 | def get_photo_datetime(photo):
124 |     """Return date a photo was taken.
125 | 
126 |     Obtained from:
127 |     1. 'datetaken' unless 'datetakenunknown'
128 |     2. Parsed from photo title 'YYYYMMDD_HHmmss'
129 |     3. 'datetaken' anyway; it's available even if unknown, so we just
130 |        go with whatever Flickr made up for us.
131 | 
132 |     Returns:
133 |         datetime.datetime
134 |     """
135 |     if photo['datetakenunknown'] == "0":
136 |         return dateutil.parser.parse(photo['datetaken'])
137 | 
138 |     try:
139 |         parsed = datetime.datetime.strptime(photo['title'], '%Y%m%d_%H%M%S')
140 |         if parsed.year > 2000 and parsed < datetime.datetime.now():
141 |             return parsed
142 |     except ValueError:
143 |         # Unable to parse photo title as datetime
144 |         pass
145 | 
146 |     return dateutil.parser.parse(photo['datetaken'])
147 | 
148 | 
149 | class FlickrMirrorer(object):
150 |     dest_dir = None
151 |     photostream_dir = None
152 |     tmp_filename = None
153 |     flickr = None
154 | 
155 |     def __init__(self, args):
156 |         self.dest_dir = args.destdir
157 |         self.verbosity = args.verbosity
158 |         self.print_statistics = args.statistics
159 |         self.include_views = args.include_views
160 |         self.ignore_photos = args.ignore_photos
161 |         self.ignore_videos = args.ignore_videos
162 |         self.delete_unknown = args.delete_unknown
163 | 
164 |         self.photostream_dir = os.path.join(self.dest_dir, 'photostream')
165 |         self.albums_dir = os.path.join(self.dest_dir, 'Albums')
166 |         self.collections_dir = os.path.join(self.dest_dir, 'Collections')
167 |         self.tmp_filename = os.path.join(self.dest_dir, 'tmp')
168 | 
169 |         # Statistics
170 |         self.deleted_photos = 0
171 |         self.modified_photos = 0
172 |         self.new_photos = 0
173 |         self.modified_albums = 0
174 |         self.modified_collections = 0
175 | 
176 |         # Register a SIGINT (Ctrl-C) handler
177 |         signal.signal(signal.SIGINT, self._sig_int_handler)
178 | 
179 |         # Create flickrapi instance
180 |         self.flickr = flickrapi.FlickrAPI(api_key=API_KEY, secret=API_SECRET, format='parsed-json')
181 | 
182 |     def run(self):
183 |         try:
184 |             self._run_helper()
185 |         finally:
186 |             self._cleanup()
187 | 
188 |     def _run_helper(self):
189 |         # Authenticate
190 |         # The user-friendly way to do this is with this command:
191 |         #     self.flickr.authenticate_via_browser(perms='read')
192 |         # However, the nature of this script is such that we don't want
193 |         # to rely on people running it somwhere with a web browser
194 |         # installed. So use the manual authentication process. A
195 |         # reasonable compromise might be to try browser auth first and
196 |         # if it fails then fall back to manual auth. Really flickrapi
197 |         # should do that for us. Or at least print the URL to the
198 |         # console.
199 |         if not self.flickr.token_valid(perms='read'):
200 |             self.flickr.get_request_token(oauth_callback='oob')
201 |             authorize_url = self.flickr.auth_url(perms='read')
202 |             webbrowser.open_new_tab(authorize_url)
203 | 
204 |             verifier = input(PLEASE_GRANT_AUTHORIZATION_MSG % authorize_url)
205 | 
206 |             self.flickr.get_access_token(verifier)
207 | 
208 |         if self.ignore_photos and self.ignore_videos:
209 |             sys.stderr.write(
210 |                 'There is nothing to do because photos and videos are ignored. '
211 |                 'Please choose to mirror at least photos or videos.\n')
212 |             return
213 | 
214 |         self._verbose('Photos will be %s' % ('ignored' if self.ignore_photos else 'mirrored'))
215 |         self._verbose('Videos will be %s' % ('ignored' if self.ignore_videos else 'mirrored'))
216 |         self._verbose('Unknown files in %s will%s be deleted' % (
217 |             self.dest_dir, '' if self.delete_unknown else ' not'))
218 | 
219 |         # Create destination directory
220 |         _ensure_dir_exists(self.dest_dir)
221 | 
222 |         # Fetch photos
223 |         self._download_all_photos()
224 | 
225 |         # Create albums and collections
226 |         self._mirror_albums()
227 |         self._create_not_in_any_album_dir()
228 |         self._mirror_collections()
229 | 
230 |         self._print_statistics()
231 | 
232 |     def _print_statistics(self):
233 |         if not self.print_statistics:
234 |             return
235 |         print('New photos / videos: %d' % self.new_photos)
236 |         print('Deleted photos / videos: %d' % self.deleted_photos)
237 |         print('Modified photos /videos: %d' % self.modified_photos)
238 |         print('Modified albums: %d' % self.modified_albums)
239 |         print('Modified collections: %d' % self.modified_collections)
240 | 
241 |     def _download_all_photos(self):
242 |         """Download all our pictures and metadata.
243 |         If you have a lot of photos then this function will take a while."""
244 | 
245 |         self._verbose('Mirroring all photos and videos in photostream')
246 | 
247 |         _ensure_dir_exists(self.photostream_dir)
248 | 
249 |         new_files = set()
250 | 
251 |         current_page = 1
252 | 
253 |         metadata_fields = ('description,license,date_upload,date_taken,owner_name,icon_server,original_format,'
254 |                            'last_update,geo,tags,machine_tags,o_dims,media')
255 | 
256 |         if self.include_views:
257 |             metadata_fields += ',views'
258 | 
259 |         download_errors = []
260 |         while True:
261 |             rsp = self.flickr.people_getPhotos(
262 |                 user_id='me',
263 |                 extras=metadata_fields,
264 |                 per_page=NUM_PHOTOS_PER_BATCH,
265 |                 page=current_page,
266 |             )
267 |             _validate_json_response(rsp)
268 | 
269 |             photos = rsp['photos']['photo']
270 |             for photo in photos:
271 |                 if (photo['media'] == 'photo' and not self.ignore_photos) or (
272 |                         photo['media'] == 'video' and not self.ignore_videos):
273 |                     try:
274 |                         new_files |= self._download_photo(photo)
275 |                     except VideoDownloadError as e:
276 |                         download_errors.append(e)
277 | 
278 |             if current_page >= rsp['photos']['pages']:
279 |                 # We've reached the end of the photostream. Stop looping.
280 |                 break
281 | 
282 |             current_page += 1
283 | 
284 |         # Error out if there were exceptions
285 |         if download_errors:
286 |             sys.stderr.write(
287 |                 'The Flickr API does not allow downloading original video files.\n'
288 |                 'Please save the files listed below to the %s directory.\n'
289 |                 'Note: You must be logged into your Flickr account in order to download '
290 |                 'your full resolution videos!\n' % self.photostream_dir)
291 |             for error in download_errors:
292 |                 sys.stderr.write('  %s\n' % error)
293 |             sys.exit(1)
294 | 
295 |         # Error out if we didn't fetch any photos
296 |         if not new_files:
297 |             sys.stderr.write('Error: The Flickr API returned an empty list of photos. '
298 |                              'Bailing out without deleting any local copies in case this is an anomaly.\n')
299 |             sys.exit(1)
300 | 
301 |         # Divide by 2 because we want to ignore the photo metadata files
302 |         # for the purposes of our statistics.
303 |         self.deleted_photos = self._delete_unknown_files(self.photostream_dir, new_files, 'file') / 2
304 | 
305 |     def _download_photo(self, photo):
306 |         """Fetch and save a media item (photo or video) and the metadata
307 |         associated with it.
308 | 
309 |         Returns a python set containing the filenames for the data.
310 |         """
311 |         url = self._get_photo_url(photo)
312 |         photo_basename = self._get_photo_basename(photo)
313 |         photo_filename = os.path.join(self.photostream_dir, photo_basename)
314 |         metadata_basename = '%s.metadata' % photo_basename
315 |         metadata_filename = '%s.metadata' % photo_filename
316 | 
317 |         # Sanity check
318 |         if os.path.isdir(photo_filename) or os.path.islink(photo_filename):
319 |             sys.stderr.write('Error: %s exists but is not a file. This is not allowed.\n' % photo_filename)
320 |             sys.exit(1)
321 | 
322 |         # Sanity check
323 |         if os.path.isdir(metadata_filename) or os.path.islink(metadata_filename):
324 |             sys.stderr.write('Error: %s exists but is not a file. This is not allowed.\n' % metadata_filename)
325 |             sys.exit(1)
326 | 
327 |         # Download photo if it doesn't exist locally or if the metadata
328 |         # file exists and the lastupdate timestamp has changed.
329 |         # TODO: Should ideally also set should_download_photo to True if
330 |         # not os.path.exists(metadata_filename), but that doesn't work
331 |         # correctly for videos because the metadata file won't have been
332 |         # created when the video file was created because the video was
333 |         # downloaded out of band by the user.
334 |         should_download_photo = not os.path.exists(photo_filename)
335 |         if not should_download_photo:
336 |             # Download photo if lastupdate timestamp has changed.
337 |             try:
338 |                 with open(metadata_filename) as json_file:
339 |                     metadata = json.load(json_file)
340 |                 should_download_photo |= metadata['lastupdate'] != photo['lastupdate']
341 |             except IOError as ex:
342 |                 if ex.errno != errno.ENOENT:
343 |                     sys.stderr.write('Error reading %s: %s\n' % (metadata_filename, ex))
344 |                     sys.exit(1)
345 | 
346 |         if should_download_photo:
347 |             if not os.path.exists(photo_filename):
348 |                 self.new_photos += 1
349 |             else:
350 |                 self.modified_photos += 1
351 | 
352 |             self._progress('Fetching %s' % photo_basename)
353 |             request = requests.get(url, stream=True)
354 |             if not request.ok:
355 |                 if photo['media'] == 'video':
356 |                     raise VideoDownloadError(
357 |                         'Manual download required (video may have changed): '
358 |                         'https://www.flickr.com/video_download.gne?id=%s' % photo['id'])
359 | 
360 |                 sys.stderr.write(
361 |                     'Error: Failed to fetch %s: %s: %s\n'
362 |                     % (url, request.status_code, request.reason))
363 |                 sys.exit(1)
364 | 
365 |             # Write to temp file then rename to avoid incomplete files
366 |             # in case of failure part-way through.
367 |             with open(self.tmp_filename, 'wb') as tmp_file:
368 |                 # Use 1 MiB chunks.
369 |                 for chunk in request.iter_content(2**20):
370 |                     tmp_file.write(chunk)
371 |             os.rename(self.tmp_filename, photo_filename)
372 |         else:
373 |             self._verbose('Skipping %s because we already have it'
374 |                           % photo_basename)
375 | 
376 |         # Write metadata
377 |         if self._write_json_if_different(metadata_filename, photo):
378 |             self._progress('Updated metadata for %s' % photo_basename)
379 |         else:
380 |             self._verbose(
381 |                 'Skipping metadata for %s because we already have it' %
382 |                 photo_basename)
383 | 
384 |         photo_datetime = get_photo_datetime(photo)
385 |         self._set_timestamp_if_different(photo_datetime, photo_filename)
386 |         self._set_timestamp_if_different(photo_datetime, metadata_filename)
387 | 
388 |         return {photo_basename, metadata_basename}
389 | 
390 |     def _mirror_albums(self):
391 |         """Create a directory for each album, and create symlinks to the
392 |         files in the photostream."""
393 |         self._verbose('Mirroring albums')
394 | 
395 |         album_dirs = set()
396 | 
397 |         # Fetch albums
398 |         rsp = self.flickr.photosets_getList()
399 |         _validate_json_response(rsp)
400 |         if rsp['photosets']:
401 |             for album in rsp['photosets']['photoset']:
402 |                 album_dirs |= self._mirror_album(album)
403 | 
404 |         self._delete_unknown_files(self.albums_dir, album_dirs, 'album')
405 | 
406 |     def _mirror_album(self, album):
407 |         album_basename = self._get_album_dirname(album['id'], album['title']['_content'])
408 |         album_dir = os.path.join(self.albums_dir, album_basename)
409 | 
410 |         # Fetch list of photos
411 |         photos = []
412 | 
413 |         num_pages = int(math.ceil(float(album['photos']) / NUM_PHOTOS_PER_BATCH))
414 |         for current_page in range(1, num_pages + 1):
415 |             # Fetch photos in this album
416 |             rsp = self.flickr.photosets_getPhotos(
417 |                 photoset_id=album['id'],
418 |                 extras='original_format,media',
419 |                 per_page=NUM_PHOTOS_PER_BATCH,
420 |                 page=current_page,
421 |             )
422 |             _validate_json_response(rsp)
423 | 
424 |             for photo in rsp['photoset']['photo']:
425 |                 if (photo['media'] == 'photo' and not self.ignore_photos) or (
426 |                         photo['media'] == 'video' and not self.ignore_videos):
427 |                     photos += [photo]
428 | 
429 |         # Include list of photo IDs in metadata, so we can tell if photos
430 |         # were added or removed from the album when mirroring in the future.
431 |         album['photos'] = [photo['id'] for photo in photos]
432 | 
433 |         if (not self.include_views) and 'count_views' in album:
434 |             del album['count_views']
435 | 
436 |         # Add a version number to the album metadata. This gives us an
437 |         # easy way to invalidate the local copy and cause the album to
438 |         # be recreated, if needed. More specifically this causes the
439 |         # albums to be recreated now that I've fixed the bug where
440 |         # symlinks to videos were broken.
441 |         album['flickrmirrorer_album_metadata_version'] = 2
442 | 
443 |         metadata_filename = os.path.join(album_dir, 'metadata')
444 | 
445 |         # TODO: Should ensure local album directory accurately reflects the
446 |         # remote album data even if the metadata hasn't changed (important in
447 |         # case the local album data has been tampered with).
448 |         if not os.path.exists(album_dir) or self._is_file_different(metadata_filename, album):
449 |             # Metadata changed, might be due to updated list of photos.
450 |             self._progress('Updating album %s' % album['title']['_content'])
451 |             self.modified_albums += 1
452 | 
453 |             # Delete and recreate the album
454 |             _ensure_dir_doesnt_exist(album_dir)
455 |             _ensure_dir_exists(album_dir)
456 | 
457 |             # Create symlinks for each photo, prefixed with a number so that
458 |             # the local alphanumeric sort order matches the order on Flickr.
459 |             digits = len(str(len(photos)))
460 |             for i, photo in enumerate(photos):
461 |                 photo_basename = self._get_photo_basename(photo)
462 |                 photo_fullname = os.path.join(self.photostream_dir, photo_basename)
463 |                 photo_relname = os.path.relpath(photo_fullname, album_dir)
464 |                 symlink_basename = '%s_%s' % (str(i+1).zfill(digits), photo_basename)
465 |                 symlink_filename = os.path.join(album_dir, symlink_basename)
466 |                 os.symlink(photo_relname, symlink_filename)
467 | 
468 |             # Write metadata
469 |             self._write_json_if_different(metadata_filename, album)
470 | 
471 |         else:
472 |             self._verbose('Album %s is up-to-date' % album['title']['_content'])
473 | 
474 |         return {album_basename}
475 | 
476 |     def _create_not_in_any_album_dir(self):
477 |         """Create a directory for photos that aren't in any album, and
478 |         create symlinks to the files in the photostream."""
479 | 
480 |         self._verbose('Creating local directory for photos not in any album')
481 | 
482 |         album_dir = os.path.join(self.dest_dir, 'Not in any album')
483 | 
484 |         # TODO: Ideally we would inspect the existing directory and
485 |         # make sure it's correct, but that's a lot of work. For now
486 |         # just recreate the album. Fixing this would also allow us to
487 |         # log _progress() messages when the album has changed.
488 |         _ensure_dir_doesnt_exist(album_dir)
489 |         _ensure_dir_exists(album_dir)
490 | 
491 |         current_page = 1
492 |         while True:
493 |             # Fetch list of photos that aren't in any album
494 |             rsp = self.flickr.photos_getNotInSet(
495 |                 extras='original_format,media',
496 |                 per_page=NUM_PHOTOS_PER_BATCH,
497 |                 page=current_page,
498 |             )
499 |             _validate_json_response(rsp)
500 |             photos = []
501 |             for photo in rsp['photos']['photo']:
502 |                 if (photo['media'] == 'photo' and not self.ignore_photos) or (
503 |                         photo['media'] == 'video' and not self.ignore_videos):
504 |                     photos += [photo]
505 |             if not photos:
506 |                 # We've reached the end of the photostream. Stop looping.
507 |                 break
508 | 
509 |             for photo in photos:
510 |                 photo_basename = self._get_photo_basename(photo)
511 |                 photo_fullname = os.path.join(self.photostream_dir, photo_basename)
512 |                 photo_relname = os.path.relpath(photo_fullname, album_dir)
513 |                 symlink_filename = os.path.join(album_dir, photo_basename)
514 |                 os.symlink(photo_relname, symlink_filename)
515 | 
516 |             current_page += 1
517 | 
518 |     def _mirror_collections(self):
519 |         """Create a directory for each collection, and create symlinks to the
520 |         albums."""
521 |         self._verbose('Mirroring collections')
522 | 
523 |         collection_dirs = set()
524 | 
525 |         # Fetch collections
526 |         rsp = self.flickr.collections_getTree()
527 |         _validate_json_response(rsp)
528 |         if rsp['collections']:
529 |             for collection in rsp['collections']['collection']:
530 |                 collection_dirs |= self._mirror_collection(self.collections_dir, collection)
531 | 
532 |         self._delete_unknown_files(self.collections_dir, collection_dirs, 'collection')
533 | 
534 |     def _mirror_collection(self, parent_dir, collection):
535 |         """
536 |         Args:
537 |             parent_dir (str): The full path to the directory where this
538 |                 collection should be written.
539 |             collection (dict): The collection metadata dict as returned
540 |                 by the flickr.collections.getTree API call.
541 |         """
542 |         collection_basename = self._get_collection_dirname(collection['id'], collection['title'])
543 |         collection_dir = os.path.join(parent_dir, collection_basename)
544 | 
545 |         metadata_filename = os.path.join(collection_dir, 'metadata')
546 | 
547 |         if not os.path.exists(collection_dir) or self._is_file_different(metadata_filename, collection):
548 |             # Metadata changed, might be due to updated list of albums.
549 |             self._progress('Updating collection %s' % collection['title'])
550 |             self.modified_collections += 1
551 | 
552 |             # Delete and recreate the collection
553 |             _ensure_dir_doesnt_exist(collection_dir)
554 |             _ensure_dir_exists(collection_dir)
555 | 
556 |             # Create symlinks for each album
557 |             for album in collection.get('set') or []:
558 |                 album_basename = self._get_album_dirname(album['id'], album['title'])
559 |                 album_fullname = os.path.join(self.albums_dir, album_basename)
560 |                 album_relname = os.path.relpath(album_fullname, collection_dir)
561 |                 symlink_filename = os.path.join(collection_dir, album_basename)
562 |                 os.symlink(album_relname, symlink_filename)
563 | 
564 |             # Collections can contain infinitely nested collections.
565 |             for child_collection in collection.get('collection') or []:
566 |                 self._mirror_collection(collection_dir, child_collection)
567 | 
568 |             # Write metadata
569 |             self._write_json_if_different(metadata_filename, collection)
570 | 
571 |         return {collection_basename}
572 | 
573 |     def _get_photo_url(self, photo):
574 |         mediatype = photo['media']
575 | 
576 |         if mediatype == 'photo':
577 |             return 'https://farm%(farm)s.staticflickr.com/%(server)s/%(id)s_%(originalsecret)s_o.%(originalformat)s' \
578 |                 % photo
579 | 
580 |         if mediatype == 'video':
581 |             # URL created according to these instructions:
582 |             # http://code.flickr.net/2009/03/02/videos-in-the-flickr-api-part-deux/
583 |             owner = self.flickr.token_cache.token.user_nsid
584 |             return 'http://www.flickr.com/photos/%s/%s/play/orig/%s/' % (
585 |                 owner, photo['id'], photo['originalsecret'])
586 | 
587 |         sys.stderr.write('Error: Unsupported media type "%s":\n' % mediatype)
588 |         sys.stderr.write(json.dumps(photo, indent=2) + '\n')
589 |         sys.exit(1)
590 | 
591 |     def _get_photo_basename(self, photo):
592 |         mediatype = photo['media']
593 | 
594 |         if mediatype == 'photo':
595 |             return '%s.%s' % (photo['id'], photo['originalformat'])
596 | 
597 |         if mediatype == 'video':
598 |             # TODO: If Flickr begins including the file extension in the
599 |             # video metadata then this code should be changed to behave
600 |             # like the photo case, above.
601 |             # The photo metadata for videos does not indicate the file
602 |             # extension. If we've already saved the video locally then
603 |             # we can get the basename from the local file.
604 |             for f in glob.iglob(os.path.join(self.photostream_dir, photo['id']) + '*'):
605 |                 if not f.endswith('metadata'):
606 |                     return os.path.basename(f)
607 | 
608 |             # Otherwise, make an HTTP HEAD request to get the response
609 |             # headers we'd see when trying to download the photo. This
610 |             # URL gets redirected to the CDN with a URL that includes
611 |             # the video's original name.
612 |             # TODO: Note that this started failing on 2016-06-25. It
613 |             # seems to be impossible to download original video files
614 |             # via the Flickr API now. The best we can do is show the
615 |             # user a download URL and ask them to download. For a little
616 |             # more context see:
617 |             # https://www.flickr.com/groups/51035612836@N01/discuss/72157671986445591/72157673833636861
618 |             # https://groups.yahoo.com/neo/groups/yws-flickr/conversations/topics/9610
619 |             # https://groups.yahoo.com/neo/groups/yws-flickr/conversations/topics/9617
620 |             head = requests.head(self._get_photo_url(photo), allow_redirects=True)
621 |             if head.status_code != 200:
622 |                 raise VideoDownloadError(
623 |                     'Manual download required: '
624 |                     'https://www.flickr.com/video_download.gne?id=%s' % photo['id'])
625 | 
626 |             return os.path.basename(urllib.parse.urlparse(head.url).path)
627 | 
628 |         sys.stderr.write('Error: Unsupported media type "%s":\n' % mediatype)
629 |         sys.stderr.write(json.dumps(photo, indent=2) + '\n')
630 |         sys.exit(1)
631 | 
632 |     @staticmethod
633 |     def _get_album_dirname(id_, title):
634 |         safe_title = urllib.parse.quote(title.encode('utf-8'), " ',")
635 |         # The ID is included in the name to avoid collisions when there
636 |         # are two albums with the same name.
637 |         return '%s - %s' % (safe_title, id_)
638 | 
639 |     @staticmethod
640 |     def _get_collection_dirname(id_, title):
641 |         safe_title = urllib.parse.quote(title.encode('utf-8'), " ',")
642 |         # The ID is included in the name to avoid collisions when there
643 |         # are two collections with the same name.
644 |         return '%s - %s' % (safe_title, id_)
645 | 
646 |     @staticmethod
647 |     def _is_file_different(filename, data):
648 |         """Return True if the contents of the file 'filename' differ
649 |         from 'data'. Otherwise return False."""
650 |         try:
651 |             with open(filename) as json_file:
652 |                 orig_data = json.load(json_file)
653 |             return orig_data != data
654 |         except IOError as ex:
655 |             if ex.errno != errno.ENOENT:
656 |                 sys.stderr.write('Error reading %s: %s\n' % (filename, ex))
657 |                 sys.exit(1)
658 |             return True
659 | 
660 |     def _set_timestamp_if_different(self, photo_datetime, filename):
661 |         """Set the access and modified times of a file to the specified
662 |         datetime.
663 | 
664 |         Args:
665 |             photo_datetime (datetime.datetime)
666 |         """
667 |         try:
668 |             timestamp = time.mktime(photo_datetime.timetuple())
669 |             if timestamp != os.path.getmtime(filename):
670 |                 os.utime(filename, (timestamp, timestamp))
671 |         except OverflowError:
672 |             self._progress('Error updating timestamp for: %s' % filename)
673 | 
674 |     def _write_json_if_different(self, filename, data):
675 |         """Write the given data to the specified filename, but only if it's
676 |         different from what is currently there. Return true if the file was
677 |         written.
678 | 
679 |         We use this function mostly to avoid changing the timestamps on
680 |         metadata files."""
681 |         if not self._is_file_different(filename, data):
682 |             # Data has not changed--do nothing.
683 |             return False
684 | 
685 |         # Write to temp file then rename to avoid incomplete files
686 |         # in case of failure part-way through.
687 |         with open(self.tmp_filename, 'w') as json_file:
688 |             json.dump(data, json_file)
689 |         os.rename(self.tmp_filename, filename)
690 |         return True
691 | 
692 |     def _delete_unknown_files(self, rootdir, known, knowntype):
693 |         """If the delete_unknown option is used, delete all files and
694 |         directories in rootdir except the known files.
695 | 
696 |         knowntype is only used for the log message.
697 | 
698 |         Returns the number of deleted entries.
699 |         """
700 |         # return early if the rootdir doesn't exist
701 |         if not os.path.isdir(rootdir):
702 |             return 0
703 | 
704 |         # delete only if the --delete-unknown was specified.
705 |         if not self.delete_unknown:
706 |             return 0
707 | 
708 |         delete_count = 0
709 |         curr_entries = os.listdir(rootdir)
710 | 
711 |         unknown_entries = set(curr_entries) - set(known)
712 |         for unknown_entry in unknown_entries:
713 |             fullname = os.path.join(rootdir, unknown_entry)
714 |             self._progress('Deleting unknown %s: %s' % (knowntype, unknown_entry))
715 |             delete_count += 1
716 | 
717 |             try:
718 |                 if os.path.isdir(fullname):
719 |                     shutil.rmtree(fullname)
720 |                 else:
721 |                     os.remove(fullname)
722 |             except OSError as ex:
723 |                 sys.stderr.write('Error deleting %s: %s\n' % (fullname, ex.strerror))
724 |                 sys.exit(1)
725 | 
726 |         return delete_count
727 | 
728 |     def _verbose(self, msg):
729 |         if self.verbosity >= 2:
730 |             print(msg)
731 | 
732 |     def _progress(self, msg):
733 |         if self.verbosity >= 1:
734 |             print(msg)
735 | 
736 |     def _cleanup(self):
737 |         # Remove a temp file, if one exists
738 |         try:
739 |             os.remove(self.tmp_filename)
740 |         except OSError as ex:
741 |             if ex.errno != errno.ENOENT:
742 |                 sys.stderr.write('Error deleting temp file %s: %s\n' % (self.tmp_filename, ex.strerror))
743 | 
744 |     def _sig_int_handler(self, signum, frame):
745 |         # User exited with CTRL+C
746 |         print('')
747 |         self._print_statistics()
748 |         sys.exit()
749 | 
750 | 
751 | def main():
752 |     parser = argparse.ArgumentParser(
753 |         description='Create a local mirror of your flickr data.')
754 | 
755 |     parser.add_argument(
756 |         'destdir',
757 |         help='the path to where the mirror shall be stored')
758 | 
759 |     parser.add_argument(
760 |         '-v', '--verbose',
761 |         dest='verbosity', action='store_const', const=2,
762 |         default=1,
763 |         help='print progress information to stdout')
764 | 
765 |     parser.add_argument(
766 |         '-q', '--quiet',
767 |         dest='verbosity', action='store_const', const=0,
768 |         help='print nothing to stdout if the mirror succeeds')
769 | 
770 |     parser.add_argument(
771 |         '-s', '--statistics', action='store_const',
772 |         default=False, const=True,
773 |         help='print transfer-statistics at the end')
774 | 
775 |     parser.add_argument(
776 |         '--ignore-views', action='store_const',
777 |         dest='include_views', default=True, const=False,
778 |         help='do not include views-counter in metadata')
779 | 
780 |     parser.add_argument(
781 |         '--ignore-photos', action='store_const',
782 |         dest='ignore_photos', default=False, const=True,
783 |         help='do not mirror photos')
784 | 
785 |     parser.add_argument(
786 |         '--ignore-videos', action='store_const',
787 |         dest='ignore_videos', default=False, const=True,
788 |         help='do not mirror videos')
789 | 
790 |     parser.add_argument(
791 |         '--delete-unknown', action='store_const',
792 |         dest='delete_unknown', default=False, const=True,
793 |         help='delete unrecognized files in the destination directory. '
794 |              'Warning: if you choose to ignore photos or videos, they will be deleted!')
795 | 
796 |     args = parser.parse_args()
797 | 
798 |     mirrorer = FlickrMirrorer(args)
799 |     mirrorer.run()
800 | 
801 | 
802 | if __name__ == '__main__':
803 |     try:
804 |         main()
805 |     except KeyboardInterrupt:
806 |         # User exited with CTRL+C
807 |         # Print a newline to leave the console in a prettier state.
808 |         print('')
809 | 


--------------------------------------------------------------------------------