├── .gitignore ├── LICENSE ├── README.md ├── flickr-savr.py ├── ns.xml └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | ## App specific stuff 2 | .savr 3 | nsids 4 | .venv 5 | 6 | ## Platform stuff 7 | *~ 8 | .DS_Store 9 | 10 | # Byte-compiled / optimized / DLL files 11 | __pycache__/ 12 | *.py[cod] 13 | 14 | # C extensions 15 | *.so 16 | 17 | # Distribution / packaging 18 | .Python 19 | env/ 20 | bin/ 21 | build/ 22 | develop-eggs/ 23 | dist/ 24 | eggs/ 25 | lib/ 26 | lib64/ 27 | parts/ 28 | sdist/ 29 | var/ 30 | *.egg-info/ 31 | .installed.cfg 32 | *.egg 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .cache 43 | nosetests.xml 44 | coverage.xml 45 | 46 | # Translations 47 | *.mo 48 | 49 | # Mr Developer 50 | .mr.developer.cfg 51 | .project 52 | .pydevproject 53 | 54 | # Rope 55 | .ropeproject 56 | 57 | # Django stuff: 58 | *.log 59 | *.pot 60 | 61 | # Sphinx documentation 62 | docs/_build/ 63 | 64 | runner.sh 65 | nsid 66 | venv 67 | py3exiv2-0.6.1 68 | master.zip 69 | py3exiv2-0.6.1.tar.gz 70 | test.jpg 71 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2014 ayman 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Flickr Savr 2 | 3 | Save photos from Flickr to your disk with metadata embedded in the 4 | photos. 5 | 6 | ## About 7 | 8 | This is a digital preservation experiment. It crawls your Flickr 9 | Photos and saves them to disk with all the Flickr metadata stored in 10 | the EXIF. The idea here is, most preservation methods keep the photos 11 | and then keep a separate database (in whatever format) elsewhere. I 12 | thought I could have all the data stored in the photo so that it is 13 | not only coupled with the primary data but could be reconstructed from 14 | arbitrary collections of photos across people. It's just an 15 | experiment...it's not complete and I make no promises. 16 | 17 | ## Version 2 18 | 19 | Version 2 makes an explicit XMP namespace (with RDF XML) and stores 20 | all the metadata is neatly stored there. Version 1 just added well 21 | formatted strings to the ICTP keyword array. Not only was this messier 22 | and unorganized like that drawer in your kitchen, it had a consequence 23 | on general photo search tools. As a result, **if you used Version 1** 24 | leave that old photos dir alone and make a new one for Version 2. 25 | They are incompatible. 26 | 27 | ## Running this 28 | 29 | You'll need to make your own [Flickr API Key and 30 | Seceret](https://www.flickr.com/services/apps/by/ayman). You'll also 31 | need to look up your NSID. You can find it by visiting a [URL in the 32 | Flickr App 33 | Garden](https://www.flickr.com/services/api/explore/flickr.profile.getProfile) 34 | listed as *Your user ID*. 35 | 36 | You can run the script as: 37 | ``` 38 | python flickr-savr.py -b PHOTODIR YOURAPIKEY YOURAPISECRET YOURNSID 39 | ``` 40 | 41 | ## Dependencies 42 | 43 | * Python 3 44 | * [https://pypi.python.org/pypi/flickrapi](https://pypi.python.org/pypi/flickrapi) 45 | Can be pip installed. 46 | 47 | ## MacOS 48 | 49 | I think you need all this stuff with [*homebrew*](https://brew.sh): 50 | 51 | ``` 52 | brew install boost-python3 gexiv2 pygobject3 py3cairo 53 | ``` 54 | 55 | And if you are using a VENV and notice it cant find the `gi` package, you might need to point the PYTHONPATH: 56 | 57 | ``` 58 | export PYTHONPATH=/usr/local/lib/python3.9/site-packages 59 | ``` 60 | 61 | That's mine, yours might be different. 62 | 63 | ## Limits 64 | 65 | 3600 queries per hour is all thats allowed. Thats 1 query per 66 | second. Each photo takes 3 queries to get its metadata...so 67 | that's a cap of 1200 photos per hour. It takes, by rough 68 | estimate, 3 seconds to query 3 times, download, and write to disk. 69 | So by most estimates you won't overrun the limit...that said, 70 | we'll sleep 200 ms between photos just to be nice 71 | -------------------------------------------------------------------------------- /flickr-savr.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import flickrapi 3 | import gi 4 | import os 5 | import tempfile 6 | import time 7 | import urllib.error 8 | import urllib.parse 9 | import urllib.request 10 | import webbrowser 11 | gi.require_version('GExiv2', '0.10') 12 | from gi.repository import GExiv2 13 | 14 | 15 | class FlickrSavr(object): 16 | """This is a digital preservation experiment. It crawls your Flickr 17 | Photos and saves them to disk with all the Flickr metadata stored 18 | in the EXIF. The idea here is, most preservation methods keep the 19 | photos and then keep a separate database (in whatever format) 20 | elsewhere. I thought I could have all the date stored in the 21 | photo so that it is not only coupled with the primary data but 22 | could be reconstructed from arbitrary collections of photos across 23 | people. It's just an experiment...it's not complete and I make no 24 | promises. 25 | 26 | 3600 queries per hour is all thats allowed (thats 1 query per 27 | second). Each photo takes 3 queries to get it's metadata...so 28 | that's a cap of 1200 photos per hour. It takes (by rough estimate) 29 | 3 seconds to query 3 times, download, and write to disk. So by 30 | most estimates you won't overrun the limit...that said, we'll sleep 31 | 200 ms between photos just to be nice. 32 | 33 | """ 34 | 35 | def __init__(self, 36 | key, 37 | secret, 38 | nsid, 39 | basepath, 40 | verbose, 41 | force): 42 | """do 43 | 44 | Arguments: 45 | - `key`: 46 | - `secret`: 47 | - `ndsid`: 48 | - `basepath`: 49 | - `verbose`: 50 | - `force`: 51 | """ 52 | # auth 53 | self.api_key = key 54 | self.api_secret = secret 55 | self.nsid = nsid 56 | self.basepath = os.path.join(basepath, "nsid", self.nsid) 57 | self.verbose = not verbose 58 | self.force = force 59 | 60 | self.photo_count = 0 61 | self.photo_page = 0 62 | self.photo_total = 0 63 | self.per_page = 500 64 | 65 | self.flickr = flickrapi.FlickrAPI(self.api_key, 66 | self.api_secret, 67 | format='parsed-json') 68 | 69 | # Only do this if we don't have a valid token already 70 | if not self.flickr.token_valid(perms=str('write')): 71 | # Get a request token 72 | self.flickr.get_request_token(oauth_callback='oob') 73 | # Open a browser at the authentication URL. Do this however 74 | # you want, as long as the user visits that URL. 75 | authorize_url = self.flickr.auth_url(perms=str('write')) 76 | webbrowser.open_new_tab(authorize_url) 77 | # Get the verifier code from the user. Do this however you 78 | # want, as long as the user gives the application the code. 79 | verifier = str(input('Verifier code: ')) 80 | # Trade the request token for an access token 81 | self.flickr.get_access_token(verifier) 82 | 83 | # dir 84 | if not os.path.exists(self.basepath): 85 | os.makedirs(self.basepath) 86 | 87 | # search: this has a 'pages' field with how many photos left 88 | # accounting per page..max is 500 89 | extras = "%s,%s,%s,%s,%s,%s,%s,%s," % ("url_o", 90 | "geo", 91 | "tags", 92 | "machine_tags", 93 | "views", 94 | "description", 95 | "date_upload", 96 | "date_taken") 97 | 98 | photos = self.flickr.photos_search(user_id=self.nsid, 99 | per_page=self.per_page, 100 | extras=extras) 101 | photos = photos['photos'] 102 | 103 | self.photo_total = photos['perpage'] * photos['pages'] 104 | self.photo_page = 0 105 | 106 | for i in range(photos['perpage']): 107 | self.photo_count = i 108 | photo = photos['photo'][i] 109 | self.get_photo(photo) 110 | 111 | # this page counter is for the next page actually 112 | for page in range(photos['pages'])[1:]: 113 | self.photo_page = page 114 | photos = self.flickr.photos_search(user_id=self.nsid, 115 | page=str(self.photo_page + 1), 116 | per_page=self.per_page, 117 | extras=extras) 118 | photos = photos['photos'] 119 | for i in range(len(photos['photo'])): 120 | self.photo_count = i 121 | photo = photos['photo'][i] 122 | try: 123 | self.get_photo(photo) 124 | except KeyboardInterrupt as e: 125 | print(photo) 126 | exit() 127 | finally: 128 | time.sleep(0.5) 129 | 130 | def get_photo(self, photo): 131 | """Get the data around a photo. 132 | 133 | Arguments: 134 | - `self`: 135 | - `photo`: 136 | """ 137 | 138 | # Check if we have it already 139 | url = photo['url_o'] 140 | fname = os.path.join(self.get_date_path(photo), 141 | photo['id'] + url[-4:]) 142 | exists = not self.force and os.path.isfile(fname) 143 | if exists: 144 | self.print_status_count(True) 145 | return 146 | 147 | _, fname_temp = tempfile.mkstemp() 148 | 149 | # get image 150 | # TODO: This should exit with the photo number and then put 151 | # that in a second queue. 152 | try: 153 | resp = urllib.request.urlopen(url) 154 | except: 155 | print("Sleeping for 2 seconds...") 156 | time.sleep(2) 157 | print("Trying %s" % url) 158 | resp = urllib.request.urlopen(url) 159 | image_data = resp.read() 160 | 161 | # Open output file in binary mode, write, and close. 162 | with open(fname_temp, 'wb') as f: 163 | f.write(image_data) 164 | self.print_status_count() 165 | 166 | # get more metadata 167 | favorites = [] 168 | favs = self.flickr.photos_getFavorites(photo_id=photo['id']) 169 | time.sleep(0.25) 170 | for person in favs['photo']['person']: 171 | favorites.append(person['username']) 172 | favorites.append(person['nsid']) 173 | favorites.append(person['favedate']) 174 | comments = [] 175 | comms = self.flickr.photos_comments_getList(photo_id=photo['id']) 176 | time.sleep(0.25) 177 | try: 178 | for comment in comms['comments']['comment']: 179 | comments.append(comment['author']) 180 | comments.append(comment['authorname']) 181 | comments.append(comment['datecreate']) 182 | comments.append(comment['_content']) 183 | except: 184 | pass 185 | pools = [] 186 | # TODO: this call failed...maybe try to wrap all the bad ids 187 | # somewhere for later handling 188 | pool = self.flickr.photos_getAllContexts(photo_id=photo['id']) 189 | sets = [] 190 | try: 191 | for set in pool['set']: 192 | sets.append(sets['title']) 193 | sets.append(sets['id']) 194 | except: 195 | pass 196 | metadata = GExiv2.Metadata() 197 | metadata.open_path(fname_temp) 198 | metadata.try_register_xmp_namespace("https://shamur.ai/bin/flickrsavr/ns", 199 | "flickrsavr") 200 | metadata.set_tag_long("Xmp.flickrsavr.id", int(photo['id'])) 201 | metadata.set_tag_string("Xmp.flickrsavr.owner", self.nsid) 202 | metadata.set_tag_string("Xmp.flickrsavr.title", photo['title']) 203 | metadata.set_tag_long("Xmp.flickrsavr.ispublic", photo['ispublic']) 204 | metadata.set_tag_long("Xmp.flickrsavr.isfriend", photo['isfriend']) 205 | metadata.set_tag_long("Xmp.flickrsavr.isfamily", photo['isfamily']) 206 | metadata.set_tag_string("Xmp.flickrsavr.description", 207 | photo['description']['_content']) 208 | metadata.set_tag_string("Xmp.flickrsavr.dateupload", 209 | photo['dateupload']) 210 | metadata.set_tag_string("Xmp.flickrsavr.datetaken", 211 | photo['datetaken']) 212 | metadata.set_tag_long("Xmp.flickrsavr.datetakengranularity", 213 | int(photo['datetakengranularity'])) 214 | metadata.set_tag_long("Xmp.flickrsavr.datetakenunknown", 215 | int(photo['datetakenunknown'])) 216 | metadata.set_tag_long("Xmp.flickrsavr.views", int(photo['views'])) 217 | metadata.set_tag_string("Xmp.flickrsavr.machine_tags", 218 | photo['machine_tags']) 219 | metadata.set_tag_string("Xmp.flickrsavr.url_o", photo['url_o']) 220 | metadata.set_tag_long("Xmp.flickrsavr.height_o", photo['height_o']) 221 | metadata.set_tag_long("Xmp.flickrsavr.width_o", photo['width_o']) 222 | if ('latitude' in photo and 'longitude' in photo): 223 | metadata.set_tag_string( 224 | "Xmp.flickrsavr.latitude", str(photo['latitude'])) 225 | metadata.set_tag_string( 226 | "Xmp.flickrsavr.longitude", str(photo['longitude'])) 227 | if ('accuracy' in photo): 228 | metadata.set_tag_long( 229 | "Xmp.flickrsavr.accuracy", int(photo['accuracy'])) 230 | if ('tags' in photo): 231 | metadata.set_tag_string("Xmp.flickrsavr.tags", photo['tags']) 232 | metadata.set_tag_multiple("Xmp.flickrsavr.favorites", favorites) 233 | metadata.set_tag_multiple("Xmp.flickrsavr.comments", comments) 234 | metadata.set_tag_multiple("Xmp.flickrsavr.sets", sets) 235 | metadata.save_file(fname_temp) 236 | try: 237 | os.replace(fname_temp, fname) 238 | except KeyboardInterrupt as e: 239 | print("Aborting, restart to resume") 240 | 241 | def get_date_path(self, photo): 242 | datetaken = photo['datetaken'] 243 | date = datetaken.split(' ')[0] 244 | parsed = date.split('-') 245 | path = os.path.join(self.basepath, parsed[0], parsed[1], date) 246 | if not os.path.exists(path): 247 | os.makedirs(path) 248 | return path 249 | 250 | def print_status(self, s): 251 | if self.verbose: 252 | print(s) 253 | return 254 | 255 | def print_status_count(self, exists=False): 256 | condition = "" 257 | if exists: 258 | condition = "(file exists)" 259 | if self.verbose: 260 | self.print_status("%s / %s %s" % 261 | (1 + self.photo_count + (self.per_page * 262 | self.photo_page), 263 | self.photo_total, 264 | condition)) 265 | return 266 | 267 | 268 | def main(): 269 | desc = 'Download a Flickr Account. See https://github.com/ayman/flickrsavr/ for setup help.' 270 | parser = argparse.ArgumentParser(prog='FlickrSavr', 271 | usage='%(prog)s key secret nsid', 272 | description=desc, 273 | epilog='thats how its done.') 274 | parser.add_argument('key', help='Flickr API Key') 275 | parser.add_argument('secret', help='Flickr API Secret') 276 | parser.add_argument('nsid', help='Flickr Account NSID') 277 | parser.add_argument('-b', '--basepath', 278 | nargs=1, 279 | default='./', 280 | help='Basedirectory to use for storing files.') 281 | parser.add_argument("-f", 282 | "--force", 283 | help="Force download if file already exists. ", 284 | action="store_true") 285 | parser.add_argument("-q", 286 | "--quiet", 287 | help="increase output verbosity", 288 | action="store_true") 289 | args = parser.parse_args() 290 | FlickrSavr(args.key, args.secret, args.nsid, args.basepath[0], 291 | args.quiet, args.force) 292 | 293 | 294 | if __name__ == "__main__": 295 | main() 296 | -------------------------------------------------------------------------------- /ns.xml: -------------------------------------------------------------------------------- 1 | 2 | 4 | ]> 5 | 9 | 10 | 11 | Photo ID 12 | Flickr Photo ID of the photo 13 | 14 | 15 | 16 | 17 | 18 | Flickr NSID 19 | Flickr NSID of the photo owner 20 | 21 | 22 | 23 | 24 | 25 | Flickr title 26 | Flickr title of the photo 27 | 28 | 29 | 30 | 31 | 32 | Flickr ispublic flag 33 | Boolean if the Flickr photo is public 34 | 35 | 36 | 37 | 38 | 39 | Flickr isfriend flag 40 | Boolean if the Flickr photo is friend visible 41 | 42 | 43 | 44 | 45 | 46 | Flickr isfamily flag 47 | Boolean if the Flickr photo is family visible 48 | 49 | 50 | 51 | 52 | 53 | Flickr description 54 | Flickr description of the photo 55 | 56 | 57 | 58 | 59 | 60 | Flickr dateupload 61 | Flickr dateupload of the photo 62 | 63 | 64 | 65 | 66 | 67 | Flickr datetaken 68 | Flickr datetaken of the photo 69 | 70 | 71 | 72 | 73 | 74 | Flickr dateuploadgranularity 75 | Flickr dateuploadgranularity of the photo 76 | 77 | 78 | 79 | 80 | 81 | Flickr datetakenunknown 82 | Flickr datetakenunknown of the photo 83 | 84 | 85 | 86 | 87 | 88 | Flickr view count 89 | Flickr views count of the photo 90 | 91 | 92 | 93 | 94 | 95 | Flickr tags 96 | Flickr tags of the photo 97 | 98 | 99 | 100 | 101 | 102 | Flickr machinetags 103 | Flickr machinetags of the photo 104 | 105 | 106 | 107 | 108 | 109 | Flickr original url 110 | Flickr original url of the photo 111 | 112 | 113 | 114 | 115 | 116 | Flickr height 117 | Flickr height of the original photo 118 | 119 | 120 | 121 | 122 | 123 | Flickr width 124 | Flickr width of the original photo 125 | 126 | 127 | 128 | 129 | 130 | Flickr latitude 131 | Flickr latitude of the original photo 132 | 133 | 134 | 135 | 136 | 137 | Flickr longitude 138 | Flickr longitude of the original photo 139 | 140 | 141 | 142 | 143 | 144 | Flickr GPS accuracy 145 | Flickr GPS accuracy of the original photo 146 | 147 | 148 | 149 | 150 | 151 | Flickr favorites 152 | Flickr favorites array of the photo 153 | 154 | 155 | 156 | 157 | 158 | Flickr comments 159 | Flickr comments array of the photo 160 | 161 | 162 | 163 | 164 | 165 | Flickr sets 166 | Flickr sets array of the photo 167 | 168 | 169 | 170 | 171 | 172 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | flickrapi==2.4.0 2 | PyGObject 3 | --------------------------------------------------------------------------------