├── README.md └── getpinboard.rb /README.md: -------------------------------------------------------------------------------- 1 | [adore pinboard]: http://brettterpstra.com/i-adore-pinboard/ 2 | [delicious script]: http://brettterpstra.com/delicious-spotlight-and-openmeta-tags-revisited/ 3 | [pinboard]: http://pinboard.in/ 4 | [delibar]: http://www.delibarapp.com/ 5 | [caseapps]: http://www.caseapps.com/tags/ 6 | [openmeta cli]: http://code.google.com/p/openmeta/downloads/list 7 | [historyhound]: http://www.stclairsoft.com/HistoryHound/ 8 | [webbla]: http://www.celmaro.com/webbla/ 9 | 10 | **Update [April 3rd, 2011]:** The current version, 1.0.4 at the moment, has bugfixes for running without Tags.app installed, more error handling and a new setting for locations where the date format is `dd-mm-yyy`. If you had a previous version and run into trouble, please replace the script with the [latest](#download) and delete your `~/getpinboard.yaml` file to regenerate a new one with the additional localization setting. 11 | 12 | This script is for people who want to take advantage of Pinboard--with its full text search, easy privacy settings, accessible API, etc.--yet still want to be able to search their bookmarks in local Spotlight (and similar) searches. While it has the option to save bookmarks with a certain tag as searchable PDF files, it doesn't attempt to replicate the full spectrum of Pinboard features. It's just a way to make your remote bookmarks locally searchable, available system-wide and OpenMeta compatible. 13 | 14 | I toyed around for a long time with using safaribookmark files instead of webloc files. They let you store a larger preview image, and you can include full text from websites within the XML of the file. Lots of possibilities there. For many reasons, I decided to stick with these little webloc files. If I want fancier images and web text, I'll use [Webbla][], and if I want comprehensive full text search I'll use [HistoryHound][], both excellent programs in their own right. I want OpenMeta and simplicity, though. If I know I'm looking for a bookmark from Pinboard, I can just go to Delibar and do some searching. The goal is to be able to include my web discoveries in larger searches on my Mac. 15 | 16 | ### Setup and Features 17 | 18 | First, put the script somewhere you can leave it, preferably somewhere in your system path. That's not a huge deal, though, because you'll be supplying an absolute path in most automation cases anyway. Once you have it situated, open Terminal and run `chmod a+x /path/to/your/script.rb`. Now you can run the script from the command line to configure and test. 19 | 20 | When you run the script the first time (do it from the command line with `/path/to/script/getpinboard.rb`), it puts a configuration file at `~/getpinboard.yaml`. It will let you know exactly where it is, and will automatically open it in your text editor. You *must* edit the configuration settings before you're ready to run it again. The configuration has options for all of the main features of the script, so these instructions are also going to be the tour. You can edit any of these options at any time. Note that the next time you run the script it will pull in up to 500 of your bookmarks, starting with the oldest. If you decide you didn't like a setting, you may want to trash those files and the database and start over. Try not to let that happen. 21 | 22 | #### Configuration options 23 | 24 | user and password (string) 25 | : Set these to your Pinboard credentials 26 | 27 | dateformat (string) 28 | : Leave this as 'US' if your local date format is `mm-dd-yyyy`. Set it to 'UK' if your date format is `dd-mm-yyyy`. 29 | 30 | target (absolute path) 31 | : This is where the webloc files will be collected. It works great with a Dropbox folder, but put it anywhere you like. On my system, I have my `~/Library/Caches/Metadata/Tags/Bookmark` folder (where Tags.app stores its tagged bookmarks) symlinked to `~/Dropbox/Sync/Bookmark`. That Dropbox folder is my target for the script, so I'm saving my Pinboard bookmarks to my Tags folder and still syncing them (and their OpenMeta tags) to my other computers. Further Tags integration will be covered at the end of the options. 32 | : If a folder specified in the config is missing, the script will attempt to create it. 33 | 34 | db_location (absolute path) 35 | : This is the location of the bookmarks database. The filename will be `bookmarks.stash`, and it's perfectly fine for it to exist in the same folder as you set for your TARGET. 36 | 37 | pdf_location (absolute path) 38 | : If the PDF_TAG below isn't set to false, this is where PDFs of bookmarks with that tag will be created. This requires the latest version of [Paparazzi!](http://derailer.org/paparazzi/) (which does run fine on Snow Leopard). 39 | 40 | tag_method (integer 0-2) 41 | : This determines how the OpenMeta tags will be applied. Use 0 to disable, 1 for [Tags.app][caseapps] or 2 for the [OpenMeta CLI utility][openmeta cli]. 42 | 43 | always_tag (string) 44 | : I like to mark my tags which come from bookmark services for top-level grouping. This setting defaults to "pinboard", but you can change it to anything (or leave it blank). 45 | 46 | update\_tags\_db (boolean) 47 | : If you're using Tags.app, you know you can tag a web page with it and it will remember the tags next time you visit that address. That doesn't work with external tools, though, because Tags keeps a separate database for those links. Setting this to true will let the script update the Tags database and keep everything in sync. 48 | 49 | create_thumbs (boolean) 50 | : If set to true, this feature will add custom icons to your webloc files using a screengrab of the website and the website's favicon. It looks great in icon and CoverFlow views. 51 | : This is another external requirement. To get thumbnails, you must have [setWeblocThumb](http://hasseg.org/setWeblocThumb/), a free utility for doing just that. The utility must be located at `/usr/local/bin/setWeblocThumb`. 52 | : Note that creating thumbs takes a typical 4-8k webloc file and makes it around 160k average. My bookmarks folder has nearly 200MB of bookmarks in it, tagged and thumbnailed. That's ok with me, but if you want to keep it small and agile, skip the thumbs. 53 | 54 | pdf_tag (string) 55 | : The string defined here will determine which bookmarks, if any, are also saved as searchable PDF files. Just use it as a tag and Paparazzi! will download the url in the background to the location you set above. 56 | 57 | debug (boolean) 58 | : Leave this off (false), unless you need a little more info about what's going on. It will use Growl and STDOUT to display progress if enabled. 59 | 60 | gzip_db (boolean) 61 | : I can't imagine the database file that this generates being large enough to worry about size, but this option will cut the disk space it requires significantly. I leave it off, but it's your choice. 62 | 63 | ### Optional additions ### 64 | 65 | As mentioned above, if you want to create thumbnails for your webloc files from screenshots of the web page, you'll need [setWeblocThumb](http://hasseg.org/setWeblocThumb/), a free utility for doing just that. Its functionality is included in the script, just install the utility and make sure thumbnailing is enabled in the config. The script expects the utility to be located at `/usr/local/bin/setWeblocThumb`. 66 | 67 | If you want the option to save bookmarks with a certain tag as fully-searchable PDF files, you'll need the latest (I use the term loosely) version of [Paparazzi!](http://derailer.org/paparazzi/). 68 | 69 | You'll also probably want [Growl](http://growl.info/) installed. I can't recall if the command line utility `growlnotify` is set up by default, but that's what the script uses to send notifications. It will live if you don't have it, but it generally won't try to communicate by any other channels when debugging is turned off. 70 | 71 | ## Running the script 72 | 73 | There are a couple of options for automating the script. You can have it run at regular intervals; it stores its last update time and compares it with the Pinboard server before it bothers downloading anything. Once you're up-to-date on your sync, you could run it every 15-30 minutes without any trouble. The easiest way to do that is with `launchd`, and the easiest way to do *that* is with Lingon. If you don't already have it, grab it [from the Mac App Store](http://itunes.apple.com/us/app/lingon/id411211026?mt=12). It's worth the five bucks. Use the wizard to set up a schedule and run the script. 74 | 75 | What *I* do is set up [Hazel](http://www.noodlesoft.com/hazel.php) to watch the database file for [Delibar][delibar]. Delibar is my favorite app for bookmarking and searching my online bookmarks. It works wonderfully with Pinboard, and I can't recommend it highly enough. I can hit a key when I'm on a website in any browser and be able to quickly comment, tag and save the page (either privately or publicly) using the same Cocoa interface every time. Anyway, Delibar keeps its database in `~/Library/Application Support/Delibar` and the file is named `DelibarDB.xml`. I simply watch for changes since the last match, and then run the script when one is found. I'm sure you could accomplish something similar with Webbla, or even one of the browser plugins if it modified a local store at all when you add the bookmark. 76 | 77 | You could resort to `cron`, or run it manually once in a while, I suppose. It's far handier to have it out of mind, though, and just have your bookmarks show up in OpenMeta and Spotlight searches within minutes of bookmarking them. 78 | 79 | ## Tips 80 | 81 | 1. If you imported a ton of bookmarks from Delicious and haven't spent a lot of time "weeding" them, you've probably got a lot of dead links that you'd be better off *not* downloading. Here's a great solution: [stale.py](https://github.com/jparise/stale). It's a Python script that you run locally, and it will traverse your entire collection of links and test them for error responses. You can run it in test mode first, and then turn on the delete mode to get rid of the dead ones. Instructions are at the [bottom of the GitHub page](https://github.com/jparise/stale). 82 | 2. Use descriptions *and* tags on things you want to make sure you can find. Clip some text out of the web page or write yourself a note in the description field. These notes are transferred by the script to your Spotlight Comments for the webloc file, making them instantly searchable, in addition to the convenience of tag search. 83 | 3. Don't be shy about saving PDFs. If a page is a tutorial that you know you'll need to reference, just go for it. They don't take up much space, they can be annotated easily (seriously, have you tried [Skim](http://skim-app.sourceforge.net/)?), and they allow for full text search locally. 84 | 4. Use [Choosy](http://www.choosyosx.com/). With a local store of Spotlight-searchable bookmarks synced with Dropbox, and Choosy to determine what browser you open them in, you have cross-browser, cross-machine support for your entire bookmark collection. 85 | 5. The timestamp of the last check is stored in your user's preference files using the `defaults` command. For the purposes of debugging, it's sometimes useful to set that back a bit and force it to update. Just use `getpinboard.rb -r` to set it back 24 hours, or use a number after the -r to specify a number of days to revert. 86 | 87 | 88 | ## Notes and todo 89 | 90 | * The script is a one-way sync. All of the pieces are there to start working on a system that would update Pinboard with local deletions and insertions, but I just don't have a need for it. At that point, I might as well make a dedicated application with an SQLite database to do all of this, and that gets way outside the scope of what I started here. Anyway, the only reason I'm not using Webbla for more of this is that it's sandboxed from the rest of the system, and this script can work in tandem with Webbla to fix that. 91 | * Also, the local database created by this script is essentially a text dump, and it can be easily read into a Ruby script and manipulated. It serves as a good backup of all of your Pinboard info, and has some other possibilities as well. Here's a quick [demo script](https://gist.github.com/899757) for outputting an HTML file, and a little tweaking could make it output a format that Safari, Firefox and Chrome can read to import bookmarks. All of the keys and values are there, so you can sift and sort any way you want. Lots of fun to be had, for the adventurous (or easily distracted). 92 | * There is currently no error handling if you don't have [Paparazzi!](http://derailer.org/paparazzi/) installed. If you don't have it and don't want it, delete that block from the AppleScript section. This will be fixed soon. 93 | 94 | ## Uninstalling 95 | 96 | If you need to uninstall the script, remove it from whatever you're using to schedule its activity, delete the script and locate two files: 97 | 98 | * `~/getpinboard.yaml` (in your User's home folder) 99 | * `~/Library/Preferences/com.brettterpstra.PinboardTagger.plist` (in your User's Library/Preferences folder) 100 | 101 | Hope the script comes in useful for somebody, feel free to let me know if you have any trouble with it. 102 | -------------------------------------------------------------------------------- /getpinboard.rb: -------------------------------------------------------------------------------- 1 | #!/usr/bin/ruby 2 | ############################################################################## 3 | ### getpinboard.rb by Brett Terpstra, 2014 4 | ### Retrieves Pinboard.in bookmarks, saves as local .webloc files for 5 | ### Spotlight searching. 6 | ### 7 | ### Optionally adds OpenMeta tags and/or saves page as PDF (see config below) 8 | ### Use -r [NUMBER OF DAYS] to reset the "last_checked" timestamp (primarily 9 | ### for debugging). 10 | ### 11 | ### This script is released to the public, modify at will but please leave 12 | ### credit 13 | ############################################################################## 14 | 15 | def pick_editor 16 | editors = ['TextMate','Espresso','MacVim','Coda','TextEdit'] 17 | ps = %x{ps Ao comm|grep .app|awk '{match($0,/([^\\/]+).app/); print substr($0,RSTART,RLENGTH)}'}.split("\n") 18 | editors.each {|editor| 19 | return editor if ps.include?(editor+".app") 20 | } 21 | return "TextEdit" 22 | end 23 | 24 | require 'yaml' 25 | configfile = "#{ENV['HOME']}/getpinboard.yaml" 26 | 27 | # If config file `getpinboard.yaml` doesn't exist in the user's home folder, 28 | # create it, exit the script and open the config for editing. 29 | # Otherwise, read the config and go for it. 30 | if File.exists? configfile 31 | $conf = File.open(configfile) {|f| YAML.load(f) } 32 | if $conf['debug'] 33 | puts "Read config from #{configfile}..." 34 | $conf.each {|k,v| 35 | puts k+": "+v.to_s 36 | } 37 | end 38 | else 39 | if File.exists?('/Applications/Tags.app') 40 | default_tagger = 1 41 | elsif File.exists?('/usr/local/bin/openmeta') 42 | default_tagger = 2 43 | else 44 | default_tagger = 0 45 | end 46 | comments = <<-GAMEOVER 47 | --- 48 | user: pinboarduser 49 | password: pinboardpass 50 | # (string) Pinboard user and password 51 | # 52 | dateformat: US 53 | # (string) US (12-31-2011) or UK (31-12-2011) 54 | # 55 | target: #{ENV['HOME']}/Dropbox/Sync/Bookmark 56 | # (absolute path) Location for webloc files 57 | # 58 | db_location: #{ENV['HOME']}/Dropbox/Sync/Bookmark 59 | # (absolute path) Location for the database. Can be the same as target. 60 | # 61 | pdf_tag: pdfit 62 | # (string) If this tag exists on a bookmark, save a PDF (false to disable) 63 | # requires latest version of 64 | # 65 | pdf_location: #{ENV['HOME']}/Dropbox/Sync/WebPDF 66 | # (absolute path) Location for PDF files, if pdf_tag option above is set and triggered 67 | # 68 | tag_method: #{default_tagger} 69 | # (integer) OpenMeta tagging method, 0 to disable, 1 for Tags.app, 2 for openmeta, 70 | # 3 for jdberry's tag CLI (Mavericks) 71 | # 72 | always_tag: pinboard 73 | # (string) A tag to add to all saved bookmarks. set to '' for none 74 | # 75 | update_tags_db: #{default_tagger == 1 ? 'true' : 'false'} 76 | # (true/false) Sync Tags.app bookmark database. If you use Tags.app, use this 77 | # 78 | create_thumbs: #{File.exists?('/usr/local/bin/setWeblocThumb') ? 'true' : 'false'} 79 | # (true/false) Create thumbnail icons for webloc files. 80 | # requires setWeblocThumb 81 | # 82 | debug: false 83 | # (true/false) Only turn on if needed, adds additional status messages and responses 84 | gzip_db: false 85 | # (true/false) Saves some space, if you really need it 86 | GAMEOVER 87 | 88 | File.open(configfile, 'w') {|f| 89 | f.puts(comments) 90 | } 91 | editor = pick_editor 92 | puts "Initial configuration file written to #{configfile}, opening in #{editor}." 93 | %x{open -a "#{editor}.app" "#{configfile}"} 94 | # %x{osascript -e 'tell app "Finder" to reveal POSIX file "#{configfile}"'} 95 | exit 96 | end 97 | 98 | %w[fileutils set net/https zlib rexml/document time base64 cgi stringio shellwords].each do |filename| 99 | require filename 100 | end 101 | 102 | # = plist 103 | # 104 | # Copyright 2006-2010 Ben Bleything and Patrick May 105 | # Distributed under the MIT License 106 | # 107 | 108 | module Plist ; end 109 | 110 | # === Create a plist 111 | # You can dump an object to a plist in one of two ways: 112 | # 113 | # * Plist::Emit.dump(obj) 114 | # * obj.to_plist 115 | # * This requires that you mixin the Plist::Emit module, which is already done for +Array+ and +Hash+. 116 | # 117 | # The following Ruby classes are converted into native plist types: 118 | # Array, Bignum, Date, DateTime, Fixnum, Float, Hash, Integer, String, Symbol, Time, true, false 119 | # * +Array+ and +Hash+ are both recursive; their elements will be converted into plist nodes inside the and containers (respectively). 120 | # * +IO+ (and its descendants) and +StringIO+ objects are read from and their contents placed in a element. 121 | # * User classes may implement +to_plist_node+ to dictate how they should be serialized; otherwise the object will be passed to Marshal.dump and the result placed in a element. 122 | # 123 | # For detailed usage instructions, refer to USAGE[link:files/docs/USAGE.html] and the methods documented below. 124 | module Plist::Emit 125 | # Helper method for injecting into classes. Calls Plist::Emit.dump with +self+. 126 | def to_plist(envelope = true) 127 | return Plist::Emit.dump(self, envelope) 128 | end 129 | 130 | # Helper method for injecting into classes. Calls Plist::Emit.save_plist with +self+. 131 | def save_plist(filename) 132 | Plist::Emit.save_plist(self, filename) 133 | end 134 | 135 | # The following Ruby classes are converted into native plist types: 136 | # Array, Bignum, Date, DateTime, Fixnum, Float, Hash, Integer, String, Symbol, Time 137 | # 138 | # Write us (via RubyForge) if you think another class can be coerced safely into one of the expected plist classes. 139 | # 140 | # +IO+ and +StringIO+ objects are encoded and placed in elements; other objects are Marshal.dump'ed unless they implement +to_plist_node+. 141 | # 142 | # The +envelope+ parameters dictates whether or not the resultant plist fragment is wrapped in the normal XML/plist header and footer. Set it to false if you only want the fragment. 143 | def self.dump(obj, envelope = true) 144 | output = plist_node(obj) 145 | 146 | output = wrap(output) if envelope 147 | 148 | return output 149 | end 150 | 151 | # Writes the serialized object's plist to the specified filename. 152 | def self.save_plist(obj, filename) 153 | File.open(filename, 'wb') do |f| 154 | f.write(obj.to_plist) 155 | end 156 | end 157 | 158 | private 159 | def self.plist_node(element) 160 | output = '' 161 | 162 | if element.respond_to? :to_plist_node 163 | output << element.to_plist_node 164 | else 165 | case element 166 | when Array 167 | if element.empty? 168 | output << "\n" 169 | else 170 | output << tag('array') { 171 | element.collect {|e| plist_node(e)} 172 | } 173 | end 174 | when Hash 175 | if element.empty? 176 | output << "\n" 177 | else 178 | inner_tags = [] 179 | 180 | element.keys.sort.each do |k| 181 | v = element[k] 182 | inner_tags << tag('key', CGI::escapeHTML(k.to_s)) 183 | inner_tags << plist_node(v) 184 | end 185 | 186 | output << tag('dict') { 187 | inner_tags 188 | } 189 | end 190 | when true, false 191 | output << "<#{element}/>\n" 192 | when Time 193 | output << tag('date', element.utc.strftime('%Y-%m-%dT%H:%M:%SZ')) 194 | when Date # also catches DateTime 195 | output << tag('date', element.strftime('%Y-%m-%dT%H:%M:%SZ')) 196 | when String, Symbol, Fixnum, Bignum, Integer, Float 197 | output << tag(element_type(element), CGI::escapeHTML(element.to_s)) 198 | when IO, StringIO 199 | element.rewind 200 | contents = element.read 201 | # note that apple plists are wrapped at a different length then 202 | # what ruby's base64 wraps by default. 203 | # I used #encode64 instead of #b64encode (which allows a length arg) 204 | # because b64encode is b0rked and ignores the length arg. 205 | data = "\n" 206 | Base64::encode64(contents).gsub(/\s+/, '').scan(/.{1,68}/o) { data << $& << "\n" } 207 | output << tag('data', data) 208 | else 209 | output << comment( 'The element below contains a Ruby object which has been serialized with Marshal.dump.' ) 210 | data = "\n" 211 | Base64::encode64(Marshal.dump(element)).gsub(/\s+/, '').scan(/.{1,68}/o) { data << $& << "\n" } 212 | output << tag('data', data ) 213 | end 214 | end 215 | 216 | return output 217 | end 218 | 219 | def self.comment(content) 220 | return "\n" 221 | end 222 | 223 | def self.tag(type, contents = '', &block) 224 | out = nil 225 | 226 | if block_given? 227 | out = IndentedString.new 228 | out << "<#{type}>" 229 | out.raise_indent 230 | 231 | out << block.call 232 | 233 | out.lower_indent 234 | out << "" 235 | else 236 | out = "<#{type}>#{contents.to_s}\n" 237 | end 238 | 239 | return out.to_s 240 | end 241 | 242 | def self.wrap(contents) 243 | output = '' 244 | 245 | output << '' + "\n" 246 | output << '' + "\n" 247 | output << '' + "\n" 248 | 249 | output << contents 250 | 251 | output << '' + "\n" 252 | 253 | return output 254 | end 255 | 256 | def self.element_type(item) 257 | case item 258 | when String, Symbol 259 | 'string' 260 | 261 | when Fixnum, Bignum, Integer 262 | 'integer' 263 | 264 | when Float 265 | 'real' 266 | 267 | else 268 | raise "Don't know about this data type... something must be wrong!" 269 | end 270 | end 271 | private 272 | class IndentedString #:nodoc: 273 | attr_accessor :indent_string 274 | 275 | def initialize(str = "\t") 276 | @indent_string = str 277 | @contents = '' 278 | @indent_level = 0 279 | end 280 | 281 | def to_s 282 | return @contents 283 | end 284 | 285 | def raise_indent 286 | @indent_level += 1 287 | end 288 | 289 | def lower_indent 290 | @indent_level -= 1 if @indent_level > 0 291 | end 292 | 293 | def <<(val) 294 | if val.is_a? Array 295 | val.each do |f| 296 | self << f 297 | end 298 | else 299 | # if it's already indented, don't bother indenting further 300 | unless val =~ /\A#{@indent_string}/ 301 | indent = @indent_string * @indent_level 302 | 303 | @contents << val.gsub(/^/, indent) 304 | else 305 | @contents << val 306 | end 307 | 308 | # it already has a newline, don't add another 309 | @contents << "\n" unless val =~ /\n$/ 310 | end 311 | end 312 | end 313 | end 314 | 315 | # we need to add this so sorting hash keys works properly 316 | class Symbol #:nodoc: 317 | def <=> (other) 318 | self.to_s <=> other.to_s 319 | end 320 | end 321 | 322 | class Array #:nodoc: 323 | include Plist::Emit 324 | end 325 | 326 | class Hash #:nodoc: 327 | include Plist::Emit 328 | end 329 | 330 | # === Load a plist file 331 | # This is the main point of the library: 332 | # 333 | # r = Plist::parse_xml( filename_or_xml ) 334 | module Plist 335 | # Note that I don't use these two elements much: 336 | # 337 | # + Date elements are returned as DateTime objects. 338 | # + Data elements are implemented as Tempfiles 339 | # 340 | # Plist::parse_xml will blow up if it encounters a data element. 341 | # If you encounter such an error, or if you have a Date element which 342 | # can't be parsed into a Time object, please send your plist file to 343 | # plist@hexane.org so that I can implement the proper support. 344 | def Plist::parse_xml( filename_or_xml ) 345 | listener = Listener.new 346 | #parser = REXML::Parsers::StreamParser.new(File.new(filename), listener) 347 | parser = StreamParser.new(filename_or_xml, listener) 348 | parser.parse 349 | listener.result 350 | end 351 | 352 | class Listener 353 | #include REXML::StreamListener 354 | 355 | attr_accessor :result, :open 356 | 357 | def initialize 358 | @result = nil 359 | @open = Array.new 360 | end 361 | 362 | 363 | def tag_start(name, attributes) 364 | @open.push PTag::mappings[name].new 365 | end 366 | 367 | def text( contents ) 368 | @open.last.text = contents if @open.last 369 | end 370 | 371 | def tag_end(name) 372 | last = @open.pop 373 | if @open.empty? 374 | @result = last.to_ruby 375 | else 376 | @open.last.children.push last 377 | end 378 | end 379 | end 380 | 381 | class StreamParser 382 | def initialize( plist_data_or_file, listener ) 383 | if plist_data_or_file.respond_to? :read 384 | @xml = plist_data_or_file.read 385 | elsif File.exists? plist_data_or_file 386 | @xml = File.read( plist_data_or_file ) 387 | else 388 | @xml = plist_data_or_file 389 | end 390 | 391 | @listener = listener 392 | end 393 | 394 | TEXT = /([^<]+)/ 395 | XMLDECL_PATTERN = /<\?xml\s+(.*?)\?>*/um 396 | DOCTYPE_PATTERN = /\s*)/um 397 | COMMENT_START = /\A/um 399 | 400 | 401 | def parse 402 | plist_tags = PTag::mappings.keys.join('|') 403 | start_tag = /<(#{plist_tags})([^>]*)>/i 404 | end_tag = /<\/(#{plist_tags})[^>]*>/i 405 | 406 | require 'strscan' 407 | 408 | @scanner = StringScanner.new( @xml ) 409 | until @scanner.eos? 410 | if @scanner.scan(COMMENT_START) 411 | @scanner.scan(COMMENT_END) 412 | elsif @scanner.scan(XMLDECL_PATTERN) 413 | elsif @scanner.scan(DOCTYPE_PATTERN) 414 | elsif @scanner.scan(start_tag) 415 | @listener.tag_start(@scanner[1], nil) 416 | if (@scanner[2] =~ /\/$/) 417 | @listener.tag_end(@scanner[1]) 418 | end 419 | elsif @scanner.scan(TEXT) 420 | @listener.text(@scanner[1]) 421 | elsif @scanner.scan(end_tag) 422 | @listener.tag_end(@scanner[1]) 423 | else 424 | raise "Unimplemented element" 425 | end 426 | end 427 | end 428 | end 429 | 430 | class PTag 431 | @@mappings = { } 432 | def PTag::mappings 433 | @@mappings 434 | end 435 | 436 | def PTag::inherited( sub_class ) 437 | key = sub_class.to_s.downcase 438 | key.gsub!(/^plist::/, '' ) 439 | key.gsub!(/^p/, '') unless key == "plist" 440 | 441 | @@mappings[key] = sub_class 442 | end 443 | 444 | attr_accessor :text, :children 445 | def initialize 446 | @children = Array.new 447 | end 448 | 449 | def to_ruby 450 | raise "Unimplemented: " + self.class.to_s + "#to_ruby on #{self.inspect}" 451 | end 452 | end 453 | 454 | class PList < PTag 455 | def to_ruby 456 | children.first.to_ruby if children.first 457 | end 458 | end 459 | 460 | class PDict < PTag 461 | def to_ruby 462 | dict = Hash.new 463 | key = nil 464 | 465 | children.each do |c| 466 | if key.nil? 467 | key = c.to_ruby 468 | else 469 | dict[key] = c.to_ruby 470 | key = nil 471 | end 472 | end 473 | 474 | dict 475 | end 476 | end 477 | 478 | class PKey < PTag 479 | def to_ruby 480 | CGI::unescapeHTML(text || '') 481 | end 482 | end 483 | 484 | class PString < PTag 485 | def to_ruby 486 | CGI::unescapeHTML(text || '') 487 | end 488 | end 489 | 490 | class PArray < PTag 491 | def to_ruby 492 | children.collect do |c| 493 | c.to_ruby 494 | end 495 | end 496 | end 497 | 498 | class PInteger < PTag 499 | def to_ruby 500 | text.to_i 501 | end 502 | end 503 | 504 | class PTrue < PTag 505 | def to_ruby 506 | true 507 | end 508 | end 509 | 510 | class PFalse < PTag 511 | def to_ruby 512 | false 513 | end 514 | end 515 | 516 | class PReal < PTag 517 | def to_ruby 518 | text.to_f 519 | end 520 | end 521 | 522 | require 'date' 523 | class PDate < PTag 524 | def to_ruby 525 | DateTime.parse(text) 526 | end 527 | end 528 | 529 | require 'base64' 530 | class PData < PTag 531 | def to_ruby 532 | data = Base64.decode64(text.gsub(/\s+/, '')) 533 | 534 | begin 535 | return Marshal.load(data) 536 | rescue Exception => e 537 | io = StringIO.new 538 | io.write data 539 | io.rewind 540 | return io 541 | end 542 | end 543 | end 544 | end 545 | 546 | 547 | module Plist 548 | VERSION = '3.1.0' 549 | end 550 | 551 | class Net::HTTP 552 | alias_method :old_initialize, :initialize 553 | def initialize(*args) 554 | old_initialize(*args) 555 | @ssl_context = OpenSSL::SSL::SSLContext.new 556 | @ssl_context.verify_mode = OpenSSL::SSL::VERIFY_NONE 557 | end 558 | end 559 | 560 | class Utils 561 | # escape text for use in an AppleScript string 562 | def e_as(str) 563 | str.to_s.gsub(/(?=["\\])/, '\\').gsub(/\`/,'') 564 | end 565 | # use Growl to display messages 566 | # checks for existence of growlnotify 567 | def growl_notify(message,sticky = false) 568 | flags = sticky ? '-s ' : '' 569 | if File.exists? "/usr/local/bin/growlnotify" 570 | app = File.exists?('/Applications/Tags.app') ? "Tags.app" : "Finder.app" 571 | %x{/usr/local/bin/growlnotify -a "#{app}" #{flags}-m "#{message}"} 572 | end 573 | end 574 | # calls growl_notify and outputs to STDOUT if $conf['debug'] is true 575 | def debug_msg(message,sticky = true) 576 | if $conf['debug'] 577 | growl_notify(message,sticky) 578 | STDOUT.puts message 579 | end 580 | end 581 | # if script is called with -r [DAYS] paramater, change the last_checked timestamp 582 | def reset_last_check(days) 583 | reset_time = (Time.now - (60 * 60 * 24 * days.to_i)).strftime('%Y-%m-%dT%H:%M:%SZ') 584 | %x{defaults write com.brettterpstra.PinboardTagger lastcheck #{reset_time}} 585 | debug_msg("Reset last check to #{reset_time}") 586 | end 587 | end 588 | 589 | class Pinboard 590 | attr_accessor :user, :pass, :existing_bookmarks, :new_bookmarks 591 | def initialize 592 | # Make storage directory if needed 593 | FileUtils.mkdir_p($conf['db_location'],:mode => 0755) unless File.exists? $conf['db_location'] 594 | unless $conf['pdf_tag'] == false || $conf['pdf_tag'] == 'false' 595 | # create PDF directory if needed 596 | FileUtils.mkdir_p($conf['pdf_location'],:mode => 0755) unless File.exists? $conf['pdf_location'] 597 | end 598 | # load existing bookmarks database 599 | @existing_bookmarks = self.read_bookmarks 600 | end 601 | # Store a Marshal dump of a hash 602 | def store obj = @existing_bookmarks, file_name = $conf['db_location']+'/bookmarks.stash', options={:gzip => $conf['gzip_db'] } 603 | marshal_dump = Marshal.dump(obj) 604 | file = File.new(file_name,'w') 605 | file = Zlib::GzipWriter.new(file) unless options[:gzip] == false 606 | file.write marshal_dump 607 | file.close 608 | return obj 609 | end 610 | # Load the Marshal dump to a hash 611 | def load file_name 612 | begin 613 | file = Zlib::GzipReader.open(file_name) 614 | rescue Zlib::GzipFile::Error 615 | file = File.open(file_name, 'r') 616 | ensure 617 | obj = Marshal.load file.read 618 | file.close 619 | return obj 620 | end 621 | end 622 | # Set up credentials for Pinboard.in 623 | def set_auth(user,pass) 624 | @user = user 625 | @pass = pass 626 | end 627 | 628 | def new_bookmarks 629 | return self.unique_bookmarks 630 | end 631 | 632 | def existing_bookmarks 633 | @existing_bookmarks 634 | end 635 | # compares local last_check timestamp (stored in `defaults`) to last update stamp from Pinboard 636 | def needs_update? 637 | latest = get_xml('/v1/posts/update') 638 | latest_update = latest.elements['update'].attributes['time'] 639 | latest_time = Time.parse(latest_update) 640 | unless %x{defaults domains|grep 'com.brettterpstra.PinboardTagger'} == '' 641 | last_check = %x{defaults read com.brettterpstra.PinboardTagger lastcheck} 642 | last_time = Time.parse(last_check) 643 | return latest_time > last_time 644 | else 645 | %x{defaults write com.brettterpstra.PinboardTagger lastcheck '2011-04-02T19:37:24Z'} 646 | return true 647 | end 648 | end 649 | # retrieves the XML output from the Pinboard API 650 | def get_xml(api_call) 651 | xml = '' 652 | http = Net::HTTP.new('api.pinboard.in', 443) 653 | http.use_ssl = true 654 | http.start do |http| 655 | request = Net::HTTP::Get.new(api_call) 656 | request.basic_auth @user,@pass 657 | response = http.request(request) 658 | response.value 659 | xml = response.body 660 | end 661 | return REXML::Document.new(xml) 662 | end 663 | # converts Pinboard API output to an array of URLs 664 | def bookmarks_to_array(doc) 665 | bookmarks = [] 666 | doc.elements.each('posts/post') do |ele| 667 | post = {} 668 | ele.attributes.each {|key,val| 669 | post[key] = val; 670 | } 671 | bookmarks.push(post) 672 | end 673 | return bookmarks 674 | end 675 | # compares bookmark array to existing bookmarks to find new urls 676 | def unique_bookmarks 677 | bookmarks = self.bookmarks_to_array(self.get_xml('/v1/posts/all')) 678 | unless @existing_bookmarks.nil? 679 | old_hrefs = @existing_bookmarks.map { |x| x['href'] } 680 | bookmarks.reject! { |s| old_hrefs.include? s['href'] } 681 | end 682 | return bookmarks 683 | end 684 | # wrapper for load 685 | def read_bookmarks 686 | # if the file exists, read it 687 | if File.exists? $conf['db_location']+'/bookmarks.stash' 688 | return self.load $conf['db_location']+'/bookmarks.stash' 689 | else # new database 690 | return [] 691 | end 692 | end 693 | end 694 | 695 | pb = Pinboard.new 696 | util = Utils.new 697 | 698 | pb.set_auth($conf['user'], $conf['password']) 699 | new_bookmarks = pb.new_bookmarks 700 | 701 | if ARGV[0] == '-r' 702 | if ARGV[1] =~ /^\d+$/ 703 | util.reset_last_check(ARGV[1]) 704 | elsif ARGV[1].nil? 705 | util.reset_last_check(1) 706 | else 707 | STDOUT.puts "Invalid reset argument." 708 | STDOUT.puts "Use '-r [NUMBER OF DAYS]'." 709 | end 710 | exit 711 | end 712 | update = pb.needs_update? 713 | if update 714 | message = "Found #{new_bookmarks.count} unindexed bookmarks" 715 | message += ". Exiting." if new_bookmarks.count == 0 716 | util.debug_msg(message,false) 717 | else 718 | util.debug_msg("Pinboard update timestamp is older than local. Exiting.",false) 719 | end 720 | exit if new_bookmarks.count == 0 || !update 721 | counter = 0 722 | if $conf['update_tags_db'] 723 | tags_db = File.join("#{ENV['HOME']}/Library/Application Support/Tags/Bookmarks.plist") 724 | FileUtils.cp(tags_db,tags_db+'.bak') 725 | plist = Plist::parse_xml(tags_db) 726 | end 727 | new_bookmarks.each {|bookmark| 728 | break if counter > 499 # cap the process at 500 bookmarks, resume later 729 | url = bookmark['href'] 730 | title = bookmark['description'] 731 | cleantitle = title.gsub(/[^A-Za-z0-9 '"_\.\-]+/i, '-').gsub(/^\./,'').strip[0..50] 732 | 733 | # Debug crap 734 | # FileUtils.rm($conf['target']+'/'+cleantitle+'.webloc') if File.exists?($conf['target']+'/'+cleantitle+'.webloc') 735 | 736 | unless File.exists?($conf['target']+'/'+cleantitle+'.webloc') 737 | comment = bookmark['extended'].strip 738 | tags = bookmark['tag'].split(' ') 739 | tags.push($conf['always_tag']) if $conf['always_tag'] && !$conf['always_tag'] != '' 740 | tags_app_tags = tags.join('","') 741 | om_tags = tags.join(' ') 742 | mav_tags = tags.join(',') 743 | dateformat = "%m-%d-%Y %I:%M%p" 744 | dateformat = "%d-%m-%Y %I:%M%p" if $conf['dateformat'] && $conf['dateformat'] =~ /uk/i 745 | date = Time.parse(bookmark['time']).strftime(dateformat) 746 | util.debug_msg("Grabbing #{title}, tagging with \"#{tags_app_tags}\"",false) 747 | tagscommand = $conf['tag_method'] == 1 ? %Q{tell application "Tags" to apply tags {"#{tags_app_tags}"} to files} : "return" 748 | begin 749 | osa_script =< 0 then 755 | if #{$conf['tag_method']} = 1 then 756 | #{tagscommand} {POSIX path of (webloc as string)} 757 | else if #{$conf['tag_method']} = 2 and exists (POSIX file "/usr/local/bin/openmeta") then 758 | do shell script "/usr/local/bin/openmeta -p '" & POSIX path of (webloc as string) & "' -a #{om_tags}" 759 | else if #{$conf['tag_method']} = 3 and exists (POSIX file "/usr/local/bin/tag") then 760 | do shell script "/usr/local/bin/tag -a '#{mav_tags}' " & quoted form of (POSIX path of (webloc as string)) 761 | end if 762 | end if 763 | if "#{$conf['create_thumbs']}" = "true" and exists (POSIX file "/usr/local/bin/setWeblocThumb") then 764 | do shell script "/usr/local/bin/setWeblocThumb " & quoted form of (POSIX path of (webloc as string)) 765 | end if 766 | if {"#{tags_app_tags}"} contains "#{$conf['pdf_tag']}" and "#{$conf['pdf_tag']}" is not "false" then 767 | tell application "Paparazzi!" 768 | launch hidden 769 | set minsize to {1024, 768} 770 | capture "#{url}" min size minsize 771 | repeat while busy 772 | -- To wait until the page is loaded. 773 | end repeat 774 | save as PDF in POSIX path of "#{$conf['pdf_location']}/#{util.e_as cleantitle}.pdf" 775 | quit 776 | end tell 777 | if #{$conf['tag_method']} > 0 then 778 | if #{$conf['tag_method']} = 1 then 779 | #{tagscommand} {(POSIX path of "#{$conf['pdf_location']}/#{util.e_as cleantitle}.pdf")} 780 | else if #{$conf['tag_method']} = 2 and exists (POSIX file "/usr/local/bin/openmeta") then 781 | do shell script "/usr/local/bin/openmeta -p '" & (POSIX path of "#{$conf['pdf_location']}/#{util.e_as cleantitle}.pdf") & "' -a #{om_tags}" 782 | else if #{$conf['tag_method']} = 3 and exists (POSIX file "/usr/local/bin/tag") then 783 | do shell script "/usr/local/bin/tag -a '#{mav_tags}' " & quoted form of (POSIX path of "#{$conf['pdf_location']}/#{util.e_as cleantitle}.pdf") 784 | end if 785 | end if 786 | end if 787 | return POSIX path of (webloc as string) 788 | end if 789 | end tell 790 | return POSIX path of (alias (("#{$conf['target']}/" & "#{util.e_as cleantitle}" as string) & ".webloc")) 791 | APPLESCRIPT 792 | ENDOSASCRIPT 793 | bookmark['local_path'] = %x{#{osa_script}}.strip 794 | unless bookmark['local_path'] == '' || bookmark['local_path'] == 'AppleScript' 795 | pb.existing_bookmarks.push(bookmark) 796 | plist[url] = {"title"=>title, "tags"=>tags, "filename"=>cleantitle+'.webloc'} if $conf['update_tags_db'] && !plist.nil? 797 | counter += 1 798 | end 799 | end 800 | else 801 | util.debug_msg("File exists: "+cleantitle,false) 802 | bookmark['local_path'] = $conf['target']+'/'+cleantitle+'.webloc' 803 | pb.existing_bookmarks.push(bookmark) 804 | end 805 | 806 | File.open(tags_db, 'w'){ |io| io << plist.to_plist } if $conf['update_tags_db'] && !plist.nil? 807 | pb.store 808 | } 809 | latest = pb.get_xml('/v1/posts/update') 810 | latest_update = latest.elements['update'].attributes['time'] 811 | %x{defaults write com.brettterpstra.PinboardTagger lastcheck #{latest_update}} 812 | util.growl_notify("Added #{counter} new bookmarks", false) if counter > 0 813 | --------------------------------------------------------------------------------