├── .gitignore
├── README.txt
├── sync.sh
├── static
    ├── media
    │   ├── hands-on-kafka
    │   │   ├── topic.png
    │   │   ├── dualtopic.png
    │   │   └── hierarchy.png
    │   ├── puppet-internal-ca
    │   │   └── infra.png
    │   └── kafka-materialized-views
    │   │   ├── log-compaction-events.png
    │   │   └── log-compaction-architecture.png
    ├── files
    │   ├── 2014-01-14-twitter-trending-riemann.clj
    │   ├── 2014-01-14-twitter-trending-firehose.rb
    │   ├── 2016-12-17-atomic-database.clj
    │   ├── 2014-11-06-clojure-interfaces.clj
    │   ├── 2014-01-14-twitter-trending.html
    │   └── 2016-12-17-atomic-database.html
    ├── errors
    │   ├── 50x.html
    │   └── 404.html
    └── style
    │   ├── main.css
    │   └── syntax.css
├── layouts
    ├── _default
    │   └── single.html
    ├── partials
    │   ├── footer.html
    │   └── header.html
    └── index.html
├── config.toml
└── content
    └── post
        ├── 003-removing-duplicate-gems.org
        ├── 016-map-territory-a-story-of-visibility.md
        ├── 012-a-leiningen-plugin-for-jenkins.md
        ├── 013-firehose-storage-at-paper-li.md
        ├── 010-two-posts-one-engine.md
        ├── 005-introducing-tron.md
        ├── 015-nice-looking-jquery-with-clojurescript.md
        ├── 029-heads-up-for-clojure-library-writers.md
        ├── 024-beyond-ssl-client-cert-authentication-authorization.md
        ├── 020-in-defense-of-immutability.md
        ├── 018-using-riemann-to-monitor-python-apps.md
        ├── 017-poor-man-s-pattern-matching-in-clojure.md
        ├── 001-puppet-extlookup-and-yamlvar.md
        ├── 027-easy-clojure-logging-set-up-with-logconfig.md
        ├── 008-a-bit-of-protocol.md
        ├── 033-hands-on-kafka-dynamic-dns.md
        ├── 007-a-wrapping-macro.md
        ├── 034-atomic-database.md
        ├── 022-real-time-twitter-trending-on-a-budget-with-riemann.md
        ├── 031-pid-tracking-in-modern-init-systems.md
        ├── 025-why-were-there-gotos-in-apple-software-in-the-first-place.md
        ├── 002-openbsd-pf-limits-and-extending-metrics.org
        ├── 006-clojure-wrappers.md
        ├── 004-some-more-thoughts-on-monitoring.md
        ├── 032-simple-materialized-views-in-kafka-and-clojure.md
        ├── 023-poor-man-s-dependency-injection-in-clojure.md
        ├── 021-solving-logging-in-60-lines-of-haskell.md
        ├── 019-neat-trick-using-puppet-as-your-internal-ca.md
        ├── 009-the-death-of-the-configuration-file.md
        ├── 014-weekend-project-ghetto-rpc-with-redis-ruby-and-clojure.md
        └── 026-diving-into-the-python-pickle-format.md


/.gitignore:
--------------------------------------------------------------------------------
1 | /public
2 | 


--------------------------------------------------------------------------------
/README.txt:
--------------------------------------------------------------------------------
1 | My personal blog. Processed by hugo: http://gohugo.io
2 | 


--------------------------------------------------------------------------------
/sync.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 | 
3 | rsync -avz public/ blog:/var/www/data/
4 | 


--------------------------------------------------------------------------------
/static/media/hands-on-kafka/topic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pyr/blog/master/static/media/hands-on-kafka/topic.png


--------------------------------------------------------------------------------
/static/media/hands-on-kafka/dualtopic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pyr/blog/master/static/media/hands-on-kafka/dualtopic.png


--------------------------------------------------------------------------------
/static/media/hands-on-kafka/hierarchy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pyr/blog/master/static/media/hands-on-kafka/hierarchy.png


--------------------------------------------------------------------------------
/static/media/puppet-internal-ca/infra.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pyr/blog/master/static/media/puppet-internal-ca/infra.png


--------------------------------------------------------------------------------
/static/media/kafka-materialized-views/log-compaction-events.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pyr/blog/master/static/media/kafka-materialized-views/log-compaction-events.png


--------------------------------------------------------------------------------
/static/media/kafka-materialized-views/log-compaction-architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pyr/blog/master/static/media/kafka-materialized-views/log-compaction-architecture.png


--------------------------------------------------------------------------------
/layouts/_default/single.html:
--------------------------------------------------------------------------------
1 | {{ partial "header.html" . }}
2 | <header>
3 |   <h1>{{ .Title }}</h1>
4 |     <ul class="header-menu"><li><a href="/">home</a></li></ul>
5 | </header>
6 | 
7 | <div class="content">{{ .Content }}</div>
8 | {{ partial "footer.html" . }}
9 | 


--------------------------------------------------------------------------------
/layouts/partials/footer.html:
--------------------------------------------------------------------------------
 1 | <footer>
 2 |   <p>
 3 |     Copyright © 1999 - 2017 Pierre-Yves Ritschard.<br />
 4 |     Unless otherwise noted content is licensed under a
 5 |     <a href="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-ShareAlike 3.0 License.</a>
 6 |   </p>
 7 | </footer>
 8 | </body>
 9 | </html>
10 | 


--------------------------------------------------------------------------------
/static/files/2014-01-14-twitter-trending-riemann.clj:
--------------------------------------------------------------------------------
 1 | ; -*- mode: clojure; -*-
 2 | ; vim: filetype=clojure
 3 | 
 4 | (logging/init)
 5 | (instrumentation {:enabled? false})
 6 | (udp-server)
 7 | (tcp-server)
 8 | (periodically-expire 1)
 9 | 
10 | (let [store    (index)
11 |       trending (top 10 :metric (tag "top" store) store)]
12 |   (streams
13 |     (by :service (moving-time-window 3600 (smap folds/sum trending)))))
14 | 


--------------------------------------------------------------------------------
/config.toml:
--------------------------------------------------------------------------------
 1 | title = "Spootnik.org"
 2 | baseURL = "https://spootnik.org/"
 3 | languageCode = "en-us"
 4 | PygmentsCodeFences = true
 5 | pygmentsuseclasses = true
 6 | rssLimit = 100
 7 | 
 8 | [permalinks]
 9 |   post = "entries/:year/:month/:day/:slug"
10 | 
11 | [author]
12 |   name = "Pierre-Yves Ritschard"
13 |   email = "pyr@spootnik.org"
14 | 
15 | [params]
16 |   description = "Random ramblings on infrastructure and functional development"
17 | 
18 | 


--------------------------------------------------------------------------------
/content/post/003-removing-duplicate-gems.org:
--------------------------------------------------------------------------------
 1 | #+title: Removing duplicate gems
 2 | #+date: 2011-02-18
 3 | 
 4 | Found myself typing this in a shell:
 5 | 
 6 | #+BEGIN_SRC bash
 7 | gem list --local |\
 8 |    egrep '.*(.*,.*)' |\
 9 |    sed 's/^\([^ ]*\) ([^,]*,\(.*\))/\1\2/' |\
10 |    sed 's/,//g' |\
11 |    awk '{for (i = 2; i &lt;=NF ;i++) {printf "sudo gem uninstall %s --version=%s\n", $1, $i}}'
12 | #+END_SRC
13 | 
14 | Sometimes perl or ruby with -e is just faster
15 | 


--------------------------------------------------------------------------------
/static/errors/50x.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="en">
 3 |   <head>
 4 |     <meta charset="utf-8" />
 5 |     <title>Spootnik.org - Internal server error</title>
 6 |     <link rel="stylesheet" type="text/css" href="/style/main.css">
 7 |     <link href="/favicon.ico" rel="icon" type="image/x-icon" />
 8 |   </head>
 9 | 
10 |   <body>
11 | 
12 |     <div class="content">
13 |       <h2>Internal Server Error</h2>
14 |       <p>Please retry at a later time or head over to the <a href="/">home page</a>.</p>
15 |     </div>
16 |   </body>
17 | </html>
18 | 


--------------------------------------------------------------------------------
/static/errors/404.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="en">
 3 |   <head>
 4 |     <meta charset="utf-8" />
 5 |     <title>Spootnik.org - Page not found</title>
 6 |     <link rel="stylesheet" type="text/css" href="/style/main.css">
 7 |     <link href="/favicon.ico" rel="icon" type="image/x-icon" />
 8 |   </head>
 9 | 
10 |   <body>
11 | 
12 |     <div class="content">
13 |       <h2>404 - Page Not found</h2>
14 |       <p>We could not find the content you are looking for. Please head to the
15 | 	<a href="/">home page</a>.</p>
16 |     </div>
17 |   </body>
18 | </html>
19 | 


--------------------------------------------------------------------------------
/static/files/2014-01-14-twitter-trending-firehose.rb:
--------------------------------------------------------------------------------
 1 | require 'tweetstream'
 2 | require 'riemann/client'
 3 | 
 4 | TweetStream.configure do |config|
 5 |   config.consumer_key       = 'xxx'
 6 |   config.consumer_secret    = 'xxx'
 7 |   config.oauth_token        = 'xxx'
 8 |   config.oauth_token_secret = 'xxx'
 9 |   config.auth_method        = :oauth
10 | end
11 | 
12 | riemann = Riemann::Client.new
13 | 
14 | 
15 | TweetStream::Client.new.sample do |status|
16 |   tags = status.text.scan(/\s#([[:alnum:]]+)/).map{|x| x.first.downcase}
17 | 
18 |   tags.each do |tag|
19 |     puts "emitting #{tag}"
20 |     riemann << {
21 |       service: "#{tag}",
22 |       metric: 1.0,
23 |       tags: ["twitter"],
24 |       ttl: 3600
25 |     }
26 |   end
27 | end
28 | 


--------------------------------------------------------------------------------
/content/post/016-map-territory-a-story-of-visibility.md:
--------------------------------------------------------------------------------
 1 | #+title: Map & Territory: A story of visibility
 2 | #+date: 2013-04-22
 3 | #+slug: map-territory-a-story-of-visibility
 4 | 
 5 | Many thanks to the \#devops crew for hosting devops days Paris ! I loved
 6 | the diversity of the talks, the ignite format is really interesting.
 7 | 
 8 | What struck me most is how devops is now reaching its intended audience,
 9 | i.e: everyone involved in delivering quality software and services. It
10 | was really nice to see people from all types of companies and all types
11 | of positions.
12 | 
13 | Here are the slides for the talk I gave:
14 | 
15 | <script async class="speakerdeck-embed" data-id="a4a38be08d450130e44822000a9d03e8" data-ratio="1.33333333333333" src="//speakerdeck.com/assets/embed.js"></script>
16 | 
17 | 


--------------------------------------------------------------------------------
/layouts/index.html:
--------------------------------------------------------------------------------
 1 | {{ partial "header.html" . }}
 2 | <header>
 3 |   <h1>{{ .Site.Author.name }}</h1>
 4 |   <h2>{{ .Site.Params.Description }}</h2>
 5 |     <ul class="header-menu">
 6 |       <li><a href="https://github.com/pyr">Github</a></li>
 7 |       <li><a href="https://twitter.com/pyr">Twitter</a></li>
 8 |       <li><a href="mailto:{{ .Site.Author.email }}">Email</a></li>
 9 |     </ul>
10 | </header>
11 | 
12 | <div class="content">
13 |   <h2>Articles</h2>
14 |   <ul class="all">
15 |     {{ $pagesToShow := where .Site.RegularPages "Type" "in" site.Params.mainSections }}
16 |     {{ range $pagesToShow.ByDate.Reverse }}
17 |     {{ $link := .Permalink | relURL | replaceRE "^/" "" }}
18 |     <li>{{ .Date.Format "2006-01-02" }}&nbsp;-&nbsp;<a href="{{ $link }}">{{ .Title }}</a></li>
19 |     {{ end }}
20 |   </ul>
21 | </div>
22 | {{ partial "footer.html" . }}
23 | 


--------------------------------------------------------------------------------
/layouts/partials/header.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html lang="{{ .Site.LanguageCode }}">
 3 |   <head>
 4 |     <meta charset="utf-8"/>
 5 |     <title>{{ .Title }}</title>
 6 |     <!--<link rel="stylesheet" type="text/css" href="//fonts.googleapis.com/css?family=Open+Sans:300,400">-->
 7 |     <link rel="stylesheet" type="text/css" href="/style/main.css">
 8 |     <link rel="stylesheet" type="text/css" href="/style/syntax.css">
 9 |     <link href="/favicon.ico" rel="icon" type="image/x-icon">
10 |     <link rel="canonical" href="{{ .Permalink }}">
11 |     {{- with .OutputFormats.Get "RSS" }}
12 |     <link href="{{ .RelPermalink }}" rel="alternate" type="application/rss+xml" title="{{ $.Site.Title }}">
13 |     <link href="{{ .RelPermalink }}" rel="feed" type="application/rss+xml" title="{{ $.Site.Title }}">
14 |     {{- end }}    
15 |     <meta name="viewport" content="width=device-width, initial-scale=1.0">
16 |   </head>
17 |   <body>
18 | 


--------------------------------------------------------------------------------
/content/post/012-a-leiningen-plugin-for-jenkins.md:
--------------------------------------------------------------------------------
 1 | #+title: A Leiningen plugin for Jenkins
 2 | #+date: 2012-07-18
 3 | 
 4 | ![test all the things](http://i.imgur.com/ZaBcN.jpg)
 5 | 
 6 | I use jenkins extensively to test apps and clojure is my go-to language
 7 | for plenty of use cases now. [leiningen](http://leiningen.org) is a very
 8 | nice build tool that lets you get up and running very quickly, even for
 9 | java only projects. It relies on maven under the covers. Its only
10 | drawback used to be the lack of support in jenkins which left you with
11 | two choices:
12 | 
13 | -   Add a pre-build step that would get leiningen
14 | -   Produce pom.xml files to build projects with maven
15 | 
16 | I'm happy to announce that there is now a third option,
17 | [jenkins-leiningen](https://github.com/pyr/jenkins-leiningen) which
18 | makes leiningen integration in jenkins much easier. You will only need
19 | to push the leiningen standalone jar on your build machine and then
20 | provide the necessary leiningen build steps in jenkins.
21 | 


--------------------------------------------------------------------------------
/content/post/013-firehose-storage-at-paper-li.md:
--------------------------------------------------------------------------------
 1 | +++
 2 | title = "Firehose storage at Paper.li"
 3 | slug = "firehose-storage-at-paperli"
 4 | date = "2012-07-26"
 5 | +++
 6 | 
 7 | I was very happy to give a talk on the experience we had with cassandra
 8 | over at paper.li for the first Paris-Cassandra meetup (if you're
 9 | interested, go follow [@pariscassa](https://twitter.com/pariscassa)).
10 | 
11 | While not diving too deep, I tried to give a battlefield overview of how
12 | we came to the decision to go for cassandra, what we do with it and a
13 | few tips should you choose it, very much in the same vein than my
14 | [previous post on clojure](/entries/2012/07/04/another-year-of-clojure).
15 | 
16 | Here are the slides for the talk:
17 | 
18 | <script async class="speakerdeck-embed" data-id="501137c1b38c9d000203034c" data-ratio="1.3333333333333333" src="//speakerdeck.com/assets/embed.js">
19 | </script>
20 | Since no-one didn't speak french during the meetup, the talk was given
21 | in french, recorded and streamed thanks to google plus hangouts (which
22 | is quite impressive I have to say). If you grok french, the video can be
23 | found here: <http://www.youtube.com/watch?v=Uwt-KF7eVOY#t=32m06>
24 | 
25 | 


--------------------------------------------------------------------------------
/content/post/010-two-posts-one-engine.md:
--------------------------------------------------------------------------------
 1 | #+title: Two posts, one engine
 2 | #+date: 2012-07-02
 3 | 
 4 | What does a good nerd do when feeling the urge to write on a topic ?
 5 | Why, update his blogging tool, of course. I've actually loved static
 6 | site generators since the days where I had one based on make and m4, I
 7 | also want a lot of simplicity from the tool I use to blog.
 8 | 
 9 | I started out with posterous because I didn't want to bother with
10 | hosting details from a blog, being a technology ascetic, I also often
11 | want it to match my current tools of choice. I also cannot go to extreme
12 | length on my own since I'm a design noob.
13 | 
14 | Anyhow, that's why I've gradually changed from:
15 | 
16 | -   [posterous](http://posterous.com) (I felt the need to host on my
17 |     own)
18 | -   [octopress](http://octopress.org) (I don't use ruby much anymore)
19 | -   [o-blog](http://renard.github.com/o-blog)
20 | 
21 | Since I started using emacs a lot again, I switched to org-mode for my
22 | GTD like workflow and o-blog seemed like a good fit. With great tools
23 | like [twitter bootstrap](http://twitter.github.com/bootstrap) to help
24 | the design impaired like me, it's now very easy to get a decent looking
25 | site up and running from a simple emacs org file.
26 | 


--------------------------------------------------------------------------------
/content/post/005-introducing-tron.md:
--------------------------------------------------------------------------------
 1 | #+title: Introducing: TRON
 2 | #+date: 2011-07-11
 3 | 
 4 | I just uploaded a small library to github (and clojars), It's a
 5 | generalisation of what I use for recurrent tasks in my clojure
 6 | programs, the fact that the recent pragprog [article](http://pragprog.com/magazines/2011-07/create-unix-services-with-clojure) had a
 7 | handrolled version of it too convinced me it was worth putting
 8 | together in a lib.
 9 | 
10 | The library provides an easy mechanism to register process for later
11 | execution, either recurrent or ponctual. It is called TRON,
12 | replacing CRON's C which stands for command - as in: command run on - with a T for task.
13 | 
14 | Here is a short excerpt of what it can do:
15 | 
16 | ```clojure
17 | (ns sandbox
18 |   (:require tron))
19 | 
20 | (defn- periodic [] (println "periodic"))
21 | (defn- ponctual [] (println "ponctual"))
22 | 
23 | ;; Run the fonction 10 seconds from now
24 | (tron/once ponctual 10000)
25 | 
26 | ;; Run the periodic function every second
27 | (tron/periodically :foo periodic 1000)
28 | 
29 | ;; Cancel the periodic run 5 seconds from now
30 | (tron/once #(tron/cancel :foo) 5000)
31 | ```
32 | 
33 | The code is hosted on github: https://github.com/pyr/tron, the full
34 | annotated source can be found [here](http://spootnik.org/files/tron.html) and the artifacts are already
35 | on clojars (see [here](http://clojars.org/tron)).
36 | The library still needs a better way of expressing delays which will
37 | be worked on, and might benefit from macros allowing you to embed the
38 | body to be executed later. All in due time.
39 | 


--------------------------------------------------------------------------------
/static/style/main.css:
--------------------------------------------------------------------------------
 1 | body {
 2 |     margin: 40px auto;
 3 |     max-width: 800px;
 4 |     color: #444;
 5 |     padding: 0 10px;
 6 |     font-family: "Open Sans", sans-serif;
 7 |     font-weight: 350;
 8 |     /*    font-size: 0.9rem; */
 9 |     font-size: 1.1em;
10 |     line-height: 1.6;
11 | }
12 | 
13 | header {
14 |     margin: 0 auto;
15 |     padding: 1em;
16 |     padding-bottom: 0;
17 |     text-align: center;
18 |     border-bottom: 1px solid #c1000d;
19 |     max-width: 800px;
20 |     position: relative;
21 | }
22 | 
23 | header.content {
24 |     padding-left: 2.5em;
25 |     padding-right: 2.5em;
26 | }
27 | 
28 | .header-menu {
29 |     display: block;
30 |     list-style-type: none;
31 |     text-align: center;
32 | }
33 | 
34 | .header-menu > li { display: inline-block; padding: 0 10px; }
35 | 
36 | .content {
37 |     margin: 0 auto;
38 |     padding: 0 2em;
39 |     max-width: 800px;
40 |     margin-bottom: 50px;
41 |     line-height: 1.6em;
42 | }
43 | 
44 | header h1 {
45 |     margin: 0.2em 0;
46 |     font-size: 2em;
47 |     font-weight: 400;
48 | }
49 | 
50 | blockquote {
51 |     border-left: 5px solid #9cb6d8;
52 |     text-style: italic;
53 |     margin: 1.5em 10px;
54 |     padding: 10px;
55 |     color: #9cb6d8;
56 | }
57 | 
58 | blockquote > p {
59 |     line-height: 1.4;
60 |     text-align: justify;
61 |     font-size: 1.2em;
62 |     font-style: italic;
63 | }
64 | 
65 | footer {
66 |     padding: 1em;
67 |     margin: 0 auto;
68 |     text-align: center;
69 |     border-top: 1px solid #c1000d;
70 |     max-width: 750px;
71 |     position: relative;
72 |     font-size: 90%;
73 | }
74 | 
75 | pre        { background: #272822; padding: 5px; color: #f8f8f2; }
76 | pre > code { font-size: 92%; }
77 | h1,h2,h3   { line-height:1.2; color: #c1000d; }
78 | hr         { color: #c1000d; }
79 | a          { text-decoration: none; color: #3b8bba; }
80 | a:visited  { color: #265778; }
81 | ul.all     { list-style-type: none; }
82 | header h2  { font-weight: 300; color: #ccc; padding: 0; margin-top: 0; }
83 | header ul  { vertical-align: bottom; margin-bottom: 0; font-size: 90%; }
84 | 


--------------------------------------------------------------------------------
/static/files/2016-12-17-atomic-database.clj:
--------------------------------------------------------------------------------
 1 | (ns game.score
 2 |   "Utilities to record and look up "
 3 |   (:require [clojure.edn :as edn]))
 4 | 
 5 | (defn make-score-db
 6 |   "Build a database of high scores"
 7 |   []
 8 |   (atom nil))
 9 | 
10 | (def compare-scores
11 |   "A function which keeps the highest numerical value.
12 |    Handles nil previous values."
13 |   (fnil max 0))
14 | 
15 | (defn record-score!
16 |   "Record a score for user, store only if higher than
17 |    previous or no previous score exists"
18 |   [scores user score]
19 |   (swap! scores update user compare-scores score))
20 | 
21 | (defn user-high-score
22 |   "Lookup highest score for user, may yield nil"
23 |   [scores user]
24 |   (get @scores user))
25 | 
26 | (defn high-score
27 |   "Lookup absolute highest score, may yield nil
28 |    when no scores have been recorded"
29 |   [scores]
30 |   (last (sort-by val @scores)))
31 | 
32 | (defn dump-to-path
33 |   "Store a value's representation to a given path"
34 |   [path value]
35 |   (spit path (pr-str value)))
36 | 
37 | (defn load-from-path
38 |   "Load a value from its representation stored in a given path.
39 |    When reading fails, yield nil"
40 |   [path]
41 |   (try
42 |     (edn/read-string (slurp path))
43 |     (catch Exception _)))
44 | 
45 | (defn persist-fn
46 |   "Yields an atom watch-fn that dumps new states to a path"
47 |   [path]
48 |   (fn [_ _ _ state]
49 |     (dump-to-path path state)))
50 | 
51 | (defn file-backed-atom
52 |    "An atom that loads its initial state from a file and persists each new state
53 |     to the same path"
54 |    [path]
55 |    (let [init  (load-from-path path)
56 |          state (atom init)]
57 |      (add-watch state :persist-watcher (persist-fn path))
58 |      state))
59 | 
60 | (comment
61 |   (def scores (file-backed-atom "/tmp/scores.db"))
62 |   (high-score scores)         ;; => nil
63 |   (user-high-score scores :a) ;; => nil
64 |   (record-score! scores :a 2) ;; => {:a 2}
65 |   (record-score! scores :b 3) ;; => {:a 2 :b 3}
66 |   (record-score! scores :b 1) ;; => {:a 2 :b 3}
67 |   (record-score! scores :a 4) ;; => {:a 4 :b 3}
68 |   (user-high-score scores :a) ;; => 4
69 |   (high-score scores)         ;; => [:a 4]
70 |   )
71 | 


--------------------------------------------------------------------------------
/content/post/015-nice-looking-jquery-with-clojurescript.md:
--------------------------------------------------------------------------------
 1 | #+date: 2012-11-22
 2 | #+title: Nice looking JQuery with Clojurescript
 3 | 
 4 | I did a bit of frontend work for internal tools recently and chose to go
 5 | with clojurescript, for obvious reasons.
 6 | 
 7 | Like clojure, clojurescript supports macros which let you express common
 8 | lengthy idioms. I use the [jayq](https://github.com/ibdknox/jayq)
 9 | library to interact with the browser since JQuery is the de-facto
10 | standard and battle tested.
11 | 
12 | A standard call to a JSON get in JQuery looks like this:
13 | 
14 | ```javascript
15 | $.getJSON('ajax/test.json', function(data) {
16 |    doSomethingWith(data);
17 | });
18 | ```
19 | 
20 | Which is really a call to:
21 | 
22 | ```javascript
23 | $.ajax({
24 |   url: 'ajax/test.json',
25 |   dataType: 'json',
26 |   success: function(data) { doSomethingWith(data); }
27 | });
28 | ```
29 | 
30 | Now when using `jayq` this would translate to:
31 | 
32 | ```clojure
33 | (ajax "ajax/test.json"
34 |       {:dataType "json"
35 |        :success  (fn [data] (do-something-with data))})
36 | ```
37 | 
38 | This is just as simple, but a bit lacking in terms of readability, with
39 | a simple, add to this that you might want to check the `done` status of
40 | the resulting future and you end up with:
41 | 
42 | ```clojure
43 | (let [ftr (ajax "ajax/test.json"
44 |                 {:dataType "json"
45 |                  :success  (fn [data] (do-something-with data))})]
46 |   (.done ftr (fn [] (refresh-view))))
47 | ```
48 | 
49 | Thankfully, with macros we can make this much prettier, with these two
50 | simple macros:
51 | 
52 | ```clojure
53 | (defmacro when-done
54 |   [ftr & body]
55 |   `(.done ~ftr (fn [] ~@body)))
56 | 
57 | (defmacro with-json
58 |   [sym url & body]
59 |   `(jayq.core/ajax
60 |     ~url
61 |     {:dataType "json"
62 |      :success  (fn [data#] (let [~sym (cljs.core/js->clj data#)] ~@body))}))
63 | ```
64 | 
65 | We can now write:
66 | 
67 | ```clojure
68 | (when-done (with-json data "ajax/test.json" (do-something-with data))
69 |            (refresh-view))
70 | ```
71 | 
72 | Which clearly turns down the suck on the overall aspect of your callback
73 | code.
74 | 
75 | There's a lot more to consider of course, as evidenced by [this
76 | PR](https://github.com/ibdknox/jayq/pull/24), but it's a good showcase
77 | of clojurescript's abilities.
78 | 


--------------------------------------------------------------------------------
/content/post/029-heads-up-for-clojure-library-writers.md:
--------------------------------------------------------------------------------
 1 | #+title: Heads up for Clojure library writers
 2 | #+date: 2014-11-03
 3 | 
 4 | Clojure 1.7 is around the corner, we're already at [version
 5 | 1.7.0-alpha3](http://search.maven.org/#artifactdetails%7Corg.clojure%7Cclojure%7C1.7.0-alpha3%7Cjar).
 6 | Fortunately, the iterative approach of clojure taken since 1.3 means
 7 | that upgrading from one version to the next usually only needs a change
 8 | in your `project.clj` file (as well as working interop accross versions,
 9 | which is always nice).
10 | 
11 | There are some neat changes in 1.7, the most notable being the addition
12 | of transducers. I recommend reading through the introduction to
13 | transducers at
14 | <http://blog.cognitect.com/blog/2014/8/6/transducers-are-coming> and the
15 | video from strange loop at
16 | <https://www.youtube.com/watch?v=6mTbuzafcII>.
17 | 
18 | A much smaller addition to 1.7 is the introduction of the `update`
19 | function in `clojure.core`. `update` is directly equivalent to
20 | `update-in` but operates on a single key.
21 | 
22 | When you wrote:
23 | 
24 | ``` clojure
25 | (-> input-map
26 |     (update-in [:my-counter-key] inc))
27 | ```
28 | 
29 | You will now be able to write:
30 | 
31 | ``` clojure
32 | (-> input-map
33 |     (update :my-counter-key inc))
34 | ```
35 | 
36 | This has been a long-wanted change and brings `update` on par with
37 | `get`, and `assoc` which have their `-in` suffixed equivalents.
38 | 
39 | One direct consequence of the change is that if you have a namespace
40 | that exposes an `update` function, you will need to deal with the fact
41 | that it will now clash with `clojure.core/update` since `clojure.core`
42 | is referred by default in all namespaces.
43 | 
44 | You have two strategies to deal with that fact:
45 | 
46 | -   Rename the function (which can be a bit intrusive)
47 | -   Prevent `clojure.core/update` from being referred in your namespace
48 | 
49 | For the second strategy, you will only need to use the following form in
50 | your namespace declaration:
51 | 
52 | ``` clojure
53 | (ns my.namespace
54 |  (:require [...])
55 |  (:refer-clojure :exclude [update]))
56 | ```
57 | 
58 | If you don't, your library consumers will have to deal with messages
59 | such as:
60 | 
61 | ``` clojure
62 | WARNING: update already refers to: #'clojure.core/update in namespace: foo.core, being replaced by: #'foo.core/update
63 | WARNING: update already refers to: #'clojure.core/update in namespace: user, being replaced by: #'foo.core/update
64 | ```
65 | 


--------------------------------------------------------------------------------
/content/post/024-beyond-ssl-client-cert-authentication-authorization.md:
--------------------------------------------------------------------------------
 1 | #+title: Beyond SSL client cert authentication: authorization
 2 | #+date: 2014-01-26
 3 | 
 4 | In a [previous
 5 | article](/entries/2013/05/30/neat-trick-using-puppet-as-your-internal-ca),
 6 | I tried to make the case for using a private certificate authority to
 7 | authenticate access to internal tools with SSL client certificates.
 8 | 
 9 | This approach is perfect to secure access to the likes of
10 | [kibana](http://www.elasticsearch.org/overview/kibana),
11 | [riemann-dash](https://github.com/aphyr/riemann-dash),
12 | [graphite](http://graphite.wikidot.com) or similar tools.
13 | 
14 | If you start depending more and more on client-side certificates, you're
15 | bound to reach the point when you need to tackle authorization as well.
16 | 
17 | While well-known, it is perfectly feasible to do so while keeping your
18 | private CA as a single source of internal user management.
19 | 
20 | I will be assuming a private CA authenticates clients for sites
21 | accessing `app.priv.example.com` and that three SSL client certificates
22 | exist: `alice.users.example.com`, `bob.users.example.com`,
23 | `charlie.users.example.com` (as mentionned above, see
24 | [here](/entries/2013/05/30/neat-trick-using-puppet-as-your-internal-ca)
25 | for a quick way to get up and running).
26 | 
27 | Now since our certificates bear the names of clients what we need to do
28 | is retrieve the certificate's name. Assuming that you have a web
29 | application exposed through HTTP which [nginx](http://nginx.org) proxies
30 | over to, here are the relevant bits that need to be added.
31 | 
32 | ```
33 | proxy_set_header X-Client-Verify $ssl_client_verify;
34 | proxy_set_header X-Client-DN $ssl_client_s_dn;
35 | proxy_set_header X-SSL-Issuer $ssl_client_i_dn;
36 | ```
37 | 
38 | Let's go over them one by one:
39 | 
40 | -   `$ssl_client_verify`: Can be set to **SUCCESS**, **FAILED** or
41 |     **NONE**.
42 | -   `$ssl_client_s_dn`: Will be set to the Subject DN of the client
43 |     cert.
44 | -   `$ssl_client_i_dn`: Will be set to the Issuer DN of the client cert.
45 | 
46 | As far as configuration is concerned, this is all that is needed. There
47 | are more variables that you can tap into if necessary refer to the nginx
48 | [http\_ssl module
49 | documentation](http://nginx.org/en/docs/http/ngx_http_ssl_module.html)
50 | for an exhaustive list. If you rely on the apache webserver, similar
51 | environment variables are available as documented
52 | [here](http://httpd.apache.org/docs/2.2/mod/mod_ssl.html).
53 | 
54 | Within applications, you'll receive the identity of clients in this
55 | format and can thus be retrieved with a regexp:
56 | 
57 | ```
58 | CN=bob.users.example.com
59 | ```
60 | 
61 | It's now dead simple to tie in to your application. Here is a simple
62 | ring middleware which attaches the calling user to incoming requests.
63 | 
64 | ```clojure
65 | (defn wrap-ssl-client-auth [handler]
66 |   (fn [request]
67 |     (let [ssl_cn   (get-in request [:headers "X-Client-DN"])
68 |           [_ user] (re-find #"CN=(.*)\.users\.example\.com$" ssl_cn)]
69 |       (handler (assoc request :user user)))))
70 | ```
71 | 


--------------------------------------------------------------------------------
/static/style/syntax.css:
--------------------------------------------------------------------------------
 1 | .hll { background-color: #49483e }
 2 | .c { color: #75715e } /* Comment */
 3 | .err { color: #960050; background-color: #1e0010 } /* Error */
 4 | .k { color: #66d9ef } /* Keyword */
 5 | .l { color: #ae81ff } /* Literal */
 6 | .n { color: #f8f8f2 } /* Name */
 7 | .o { color: #f92672 } /* Operator */
 8 | .p { color: #f8f8f2 } /* Punctuation */
 9 | .cm { color: #75715e } /* Comment.Multiline */
10 | .cp { color: #75715e } /* Comment.Preproc */
11 | .c1 { color: #75715e } /* Comment.Single */
12 | .cs { color: #75715e } /* Comment.Special */
13 | .ge { font-style: italic } /* Generic.Emph */
14 | .gs { font-weight: bold } /* Generic.Strong */
15 | .kc { color: #66d9ef } /* Keyword.Constant */
16 | .kd { color: #66d9ef } /* Keyword.Declaration */
17 | .kn { color: #f92672 } /* Keyword.Namespace */
18 | .kp { color: #66d9ef } /* Keyword.Pseudo */
19 | .kr { color: #66d9ef } /* Keyword.Reserved */
20 | .kt { color: #66d9ef } /* Keyword.Type */
21 | .ld { color: #e6db74 } /* Literal.Date */
22 | .m { color: #ae81ff } /* Literal.Number */
23 | .s { color: #e6db74 } /* Literal.String */
24 | .na { color: #a6e22e } /* Name.Attribute */
25 | .nb { color: #f8f8f2 } /* Name.Builtin */
26 | .nc { color: #a6e22e } /* Name.Class */
27 | .no { color: #66d9ef } /* Name.Constant */
28 | .nd { color: #a6e22e } /* Name.Decorator */
29 | .ni { color: #f8f8f2 } /* Name.Entity */
30 | .ne { color: #a6e22e } /* Name.Exception */
31 | .nf { color: #a6e22e } /* Name.Function */
32 | .nl { color: #f8f8f2 } /* Name.Label */
33 | .nn { color: #f8f8f2 } /* Name.Namespace */
34 | .nx { color: #a6e22e } /* Name.Other */
35 | .py { color: #f8f8f2 } /* Name.Property */
36 | .nt { color: #f92672 } /* Name.Tag */
37 | .nv { color: #f8f8f2 } /* Name.Variable */
38 | .ow { color: #f92672 } /* Operator.Word */
39 | .w { color: #f8f8f2 } /* Text.Whitespace */
40 | .mf { color: #ae81ff } /* Literal.Number.Float */
41 | .mh { color: #ae81ff } /* Literal.Number.Hex */
42 | .mi { color: #ae81ff } /* Literal.Number.Integer */
43 | .mo { color: #ae81ff } /* Literal.Number.Oct */
44 | .sb { color: #e6db74 } /* Literal.String.Backtick */
45 | .sc { color: #e6db74 } /* Literal.String.Char */
46 | .sd { color: #e6db74 } /* Literal.String.Doc */
47 | .s2 { color: #e6db74 } /* Literal.String.Double */
48 | .se { color: #ae81ff } /* Literal.String.Escape */
49 | .sh { color: #e6db74 } /* Literal.String.Heredoc */
50 | .si { color: #e6db74 } /* Literal.String.Interpol */
51 | .sx { color: #e6db74 } /* Literal.String.Other */
52 | .sr { color: #e6db74 } /* Literal.String.Regex */
53 | .s1 { color: #e6db74 } /* Literal.String.Single */
54 | .ss { color: #e6db74 } /* Literal.String.Symbol */
55 | .bp { color: #f8f8f2 } /* Name.Builtin.Pseudo */
56 | .vc { color: #f8f8f2 } /* Name.Variable.Class */
57 | .vg { color: #f8f8f2 } /* Name.Variable.Global */
58 | .vi { color: #f8f8f2 } /* Name.Variable.Instance */
59 | .il { color: #ae81ff } /* Literal.Number.Integer.Long */
60 | 
61 | .gh { } /* Generic Heading & Diff Header */
62 | .gu { color: #75715e; } /* Generic.Subheading & Diff Unified/Comment? */
63 | .gd { color: #f92672; } /* Generic.Deleted & Diff Deleted */
64 | .gi { color: #a6e22e; } /* Generic.Inserted & Diff Inserted */
65 | 


--------------------------------------------------------------------------------
/content/post/020-in-defense-of-immutability.md:
--------------------------------------------------------------------------------
 1 | #+title: In defense of immutability
 2 | #+date: 2013-11-22
 3 | 
 4 | I thought I'd whip together a short blog post as a reaction to Mark
 5 | Burgess' ([@markburgess\_osl](https://twitter.com/markburgess_osl))
 6 | keynote at the Cloudstack Collaboration Conference.
 7 | 
 8 | I thoroughly enjoyed the keynote but was a bit taken aback by one of the
 9 | points that was made. It can be summed up in a quote that has since been
10 | floating around:
11 | 
12 | <blockquote class="twitter-tweet" lang="en"><p><a href="https://twitter.com/search?q=%23ccceu13&amp;src=hash">#ccceu13</a> <a href="https://twitter.com/markburgess_osl">@markburgess_osl</a> no such thing .. immutable system .. show me a sys that is not changing and I&#39;ll show u a sys that is powered off</p>&mdash; botchagalupe (@botchagalupe) <a href="https://twitter.com/botchagalupe/statuses/403811430971211776">November 22, 2013</a></blockquote> <script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
13 | Mark digressed and dissed the notion that there is such a thing as
14 | **immutable infrastructure** and that it should be a goal to strive for.
15 | He supported this argument by saying that applications that live and
16 | perform a function or deliver a service can not be immutable - lest they
17 | are not running.
18 | 
19 | I think a few things are misguided in Mark's analysis of what is meant
20 | by immutable infrastructure
21 | 
22 | ### Immutable infrastructure semantics
23 | 
24 | What people refer to when promoting immutable infrastructure is
25 | predominantly the promotion of immutable data structure to represent and
26 | converge infrastructure. In now way does it mean that systems are viewed
27 | as immobile.
28 | 
29 | It is also a realization that many parts of infrastructure can now be
30 | treated as fully stateless automatons and for which lifecycle management
31 | can partly happen at a different level than previously namely by
32 | replacing instance instead of reconverging the state of an existing
33 | system.
34 | 
35 | ### Immutability promotes persistent data structures and a better audit trail
36 | 
37 | Since immutable data structure enforce building new copies of data on
38 | change, instead of silently mutating state, it provides a better avenue
39 | to create a complete history of state for data.
40 | 
41 | One of the main thing immutable data provides is a consistent view of
42 | data at a certain point in time - which again, does not limit its
43 | ability to evolve over time. This property is key in building simple
44 | audit trails.
45 | 
46 | For one thing, if immutability was such a limiting factor, I don't think
47 | so many programming languages would be built around it.
48 | 
49 | ### Immutable infrastructure does not conflict with configuration management
50 | 
51 | Many people now turning to this new way to think about lifecycle
52 | management of infrastructure and systems as a whole come from years of
53 | experience with configuration management and in no way are trying to get
54 | rid of it. It is more a reflexion on what conf management is today,
55 | where it happens and how it could evolve.
56 | 
57 | ### TL;DR
58 | 
59 | -   Immutable data structures can help improve the way we describe
60 |     infrastructure and system components
61 | -   Nobody thinks of systems as pure (in the functional sense)
62 |     functions, environment matters
63 | -   Immutable infrastructure refers to a consistent view at a certain
64 |     point in time, changes means new copies
65 | -   Configuration management still has a predominant place when striving
66 |     for immutable infrastructure
67 | 
68 | I'll now go read Mark's book which he pitched rather well, except for
69 | this minor nitpick (ok that and the cheapshot at lambda calculus, but I
70 | won't cover that!) :-)
71 | 


--------------------------------------------------------------------------------
/content/post/018-using-riemann-to-monitor-python-apps.md:
--------------------------------------------------------------------------------
  1 | #+title: Using Riemann to monitor python apps
  2 | #+date: 2013-05-21
  3 | 
  4 | Another quick blog post today to show-case usage of riemann for in-app
  5 | reporting. This small blurb will push out a metric for each wrap route
  6 | with the metric name you provide.
  7 | 
  8 | When handlers raise exceptions, a metric is sent out as well with the
  9 | exception's message as description.
 10 | 
 11 | ```python
 12 | import socket
 13 | import time
 14 | import bernhard
 15 | 
 16 | # [...]
 17 | 
 18 | def wrap_riemann(metric,
 19 |                  client=bernhard.Client(),
 20 |                  tags=['python']):
 21 |     def riemann_decorator(f):
 22 |         @wraps(f)
 23 |         def decorated_function(*args, **kwargs):
 24 | 
 25 |             host = socket.gethostname()
 26 |             started = time.time()
 27 |             try:
 28 |                 response = f(*args, **kwargs)
 29 |             except Exception as e:
 30 |                 client.send({'host': host,
 31 |                              'service': metric + "-exceptions",
 32 |                              'description': str(e),
 33 |                              'tags': tags + ['exception'],
 34 |                              'state': 'critical',
 35 |                              'metric': 1})
 36 |                 raise
 37 | 
 38 |             duration = (time.time() - started) * 1000
 39 |             clientsend({'host': host,
 40 |                         'service': metric + "-time",
 41 |                         'tags': tags + ['duration'],
 42 |                         'metric': duration})
 43 |             return response
 44 |         return decorated_function
 45 |     return riemann_decorator
 46 | ```
 47 | 
 48 | Provided you have a [flask](http://flask.pocoo.org) app for instance you
 49 | could then have use the wrapper in the following way:
 50 | 
 51 | ```python
 52 | app = Flask(__name__)
 53 | riemann = bernhard.Client()
 54 | 
 55 | @app.route('/users')
 56 | @wrap_riemann('list-users', client=riemann)
 57 | def list_users():
 58 |   # [...]
 59 | 
 60 | @app.route('/users/<id>', methods=['DELETE'])
 61 | @wrap_riemann('delete-user', client=riemann)
 62 | def delete_users():
 63 |   # [...]
 64 | ```
 65 | 
 66 | In riemann we can easily massage these events to give us points worth
 67 | looking at:
 68 | 
 69 | -   A per route and overall mean request time
 70 | -   A per route and overall per second exception gauge
 71 | 
 72 | ```clojure
 73 | ;; start-up servers
 74 | (tcp-server :host "0.0.0.0")
 75 | (udp-server :host "0.0.0.0")
 76 | 
 77 | (def graph (graphite {:host "your.graphite.host"}))
 78 | (def index (default {:state "ok" :ttl 60} (update-index (index))))
 79 | 
 80 | (periodically-expire 5 index)
 81 | 
 82 | (streams
 83 | 
 84 |   ;; we're interested in events coming from the python web app
 85 |   (tagged "python"
 86 | 
 87 |     ;; aggregate duration event for each interval of one second
 88 |     ;; then compute the mean before sending of to indexer and grapher
 89 |     (tagged "duration"
 90 | 
 91 |       ;; We split by service to get one metric per route
 92 |       (by [:service]
 93 |         (fixed-time-window 1
 94 |            (combine folds/mean index graph)))
 95 | 
 96 |       ;; unsplitted, we'll get an overall metric
 97 |       (with {:service "overall-mean-duration"}
 98 |         (fixed-time-window 1
 99 |            (combine folds/mean index graph))))
100 | 
101 |     ;; sum exceptions per second
102 |     (tagged "exception"
103 |       (by [:service]
104 |         (rate 1 index graph))
105 |       (with {:service "overall-exceptions"}
106 |         (rate 1 index graph)))))
107 | ```
108 | 
109 | Obviously this wrapper would work just as well for any python function.
110 | As a ways
111 | 
112 | I extended a bit on this idea and released a cleaned up version here:
113 | <https://github.com/exoscale/python-riemann-wrapper>.
114 | 


--------------------------------------------------------------------------------
/content/post/017-poor-man-s-pattern-matching-in-clojure.md:
--------------------------------------------------------------------------------
  1 | #+title: Poor man's pattern matching in clojure
  2 | #+date: 2013-05-21
  3 | 
  4 | A quick tip which helped me out in a few situations. I'd be inclined to
  5 | point people to [core.match](https://github.com/clojure/core.match) for
  6 | any matching needs, but the fact that it doesn't play well with
  7 | clojure's ahead-of-time (`AOT`) compilation requires playing dirty
  8 | dynamic namespace loading tricks to use it.
  9 | 
 10 | A common case I stumbled upon is having a list of homogenous records -
 11 | say, coming from a database, or an event stream - and needing to take
 12 | specific action based on the value of several keys in the records.
 13 | 
 14 | Take for instance an event stream which would contain homogenous records
 15 | of with the following structure:
 16 | 
 17 | ```clojure
 18 | [{:user      "bob"
 19 |   :action    :create-ticket
 20 |   :status    :success
 21 |   :message   "succeeded"
 22 |   :timestamp #inst "2013-05-23T18:19:39.623-00:00"}
 23 |  {:user      "bob"
 24 |   :action    :update-ticket
 25 |   :status    :failure
 26 |   :message   "insufficient rights"
 27 |   :timestamp #inst "2013-05-23T18:19:40.623-00:00"}
 28 |  {:user      "bob"
 29 |   :action    :delete-ticket
 30 |   :status    :success
 31 |   :message   "succeeded"
 32 |   :timestamp #inst "2013-05-23T18:19:41.623-00:00"}]
 33 | ```
 34 | 
 35 | Now, say you need do do a simple thing based on the output of the value
 36 | of both `:action` and `:status`.
 37 | 
 38 | The first reflex would be to do this within a `for` or `doseq`:
 39 | 
 40 | ```clojure
 41 | (for [{:keys [action status] :as event}]
 42 |    (cond
 43 |      (and (= action :create-ticket) (= status :success)) (handle-cond-1 event)
 44 |      (and (= action :update-ticket) (= status :success)) (handle-cond-2 event)
 45 |      (and (= action :delete-ticket) (= status :failure)) (handle-cond-3 event)))
 46 | ```
 47 | 
 48 | This is a bit cumbersome. A first step would be to use the fact that
 49 | clojure seqs and maps can be matched, by narrowing down the initial
 50 | event to the matchable content. `juxt` can help in this situation, here
 51 | is its doc for reference.
 52 | 
 53 | > Takes a set of functions and returns a fn that is the juxtaposition of
 54 | > those fns. The returned fn takes a variable number of args, and
 55 | > returns a vector containing the result of applying each fn to the args
 56 | > (left-to-right).
 57 | 
 58 | I suggest you play around with `juxt` on the repl to get comfortable
 59 | with it, here is the example usage we're interested in:
 60 | 
 61 | ```clojure
 62 | (let [narrow-keys (juxt :action :status)]
 63 |    (narrow-keys {:user      "bob"
 64 |                  :action    :update-ticket
 65 |                  :status    :failure
 66 |                  :message   "insufficient rights"
 67 |                  :timestamp #inst "2013-05-23T18:19:40.623-00:00"}))
 68 |  => [:update-ticket :failure]
 69 | ```
 70 | 
 71 | Given that function, we can now rewrite our condition handling code in a
 72 | much more succint way:
 73 | 
 74 | ```clojure
 75 | (let [narrow-keys (juxt :action :status)]
 76 |   (for [event events]
 77 |     (case (narrow-keys event)
 78 |       [:create-ticket :success] (handle-cond-1 event)
 79 |       [:update-ticket :failure] (handle-cond-2 event)
 80 |       [:delete-ticket :success] (handle-cond-3 event))))
 81 | ```
 82 | 
 83 | Now with this method, we have a perfect candidate for a multimethod:
 84 | 
 85 | ```clojure
 86 | (defmulti handle-event (juxt :action :status))
 87 | (defmethod handle-event [:create-ticket :success]
 88 |    [event]
 89 |    ...)
 90 | (defmethod handle-event [:update-ticket :failure]
 91 |    [event]
 92 |    ...)
 93 | (defmethod handle-event [:delete-ticket :success]
 94 |    [event]
 95 |    ...)
 96 | ```
 97 | 
 98 | Of course, for more complex cases and wildcard handling, I suggest
 99 | taking a look at [core.match](https://github.com/clojure/core.match).
100 | 


--------------------------------------------------------------------------------
/content/post/001-puppet-extlookup-and-yamlvar.md:
--------------------------------------------------------------------------------
  1 | +++
  2 | date = "2011-02-05T14:42:57+01:00"
  3 | title = "Puppet, extlookup, and yamlvar"
  4 | draft = false
  5 | +++
  6 | 
  7 | When you dive in a complex project, even though you try to be as good as
  8 | possible with documentation, sometime you just miss things.
  9 | 
 10 | I've had a good opportunity to realize this just recently while working
 11 | on puppet. A common head scratcher in puppet is finding a way to keep
 12 | system, technology and platform specifics separate. I will probably
 13 | write at length on this particular subject a bit later, but to give a
 14 | quick explanation, just consider this quick scenario:
 15 | 
 16 | - You host web servers for the same vhosts in two different locations
 17 | - You manage them with the same puppet instance
 18 | - They need a unique configuration
 19 | - Some details such as their name server change
 20 | 
 21 | Puppet allows you to get to the point where you can just write this:
 22 | 
 23 | ```puppet
 24 | class webserver { 
 25 |     include unix
 26 |     include nginx
 27 | 
 28 |     nginx::upstream { rails_app:
 29 |         servers => ["127.0.0.1:8000", "127.0.0.1:8001"]
 30 |     }
 31 |     nginx::upstream_vhost { "www.example.com":
 32 |         upstream    => rails_app,
 33 |         listen      => 80
 34 |     }
 35 |     nginx::static_vhost { "static.example.com":
 36 |         root    => "/srv/www/static.example.com",
 37 |         listen  => 80
 38 |     }
 39 | }
 40 | ```
 41 | 
 42 | This is great, generic and allows you to deploy configurations to many
 43 | machines, so it's a bit of a hassle to have to go through the trouble of
 44 | going through case statements just for DNS.
 45 | 
 46 | ```ruby
 47 | module Puppet::Parser::Functions
 48 |     require 'yaml'
 49 | 
 50 |     newfunction(:yamlvar, :type => :rvalue) do |args|
 51 |         defval = args[1]
 52 |         yaml_path = args[2] || lookupvar('yamlvar_data')
 53 | 
 54 |         unless yaml_path or (yaml_path = lookupvar('yamlvar_data'))
 55 |             raise Puppet::ParseError, "No configuration for yamlvar"
 56 |         end 
 57 | 
 58 |         begin
 59 |             data = YAML.load_file(yaml_path)
 60 |         rescue
 61 |             raise Puppet::ParseError, "Cannot read yaml data"
 62 |         end 
 63 | 
 64 |         unless data['order']
 65 |             data['order'] = %w(fqdn operatingsystem)
 66 |         end 
 67 | 
 68 |         # parse preference tab in this context
 69 |         prefs = data['order'].map{|x| lookupvar(x)}.select{|x| x}
 70 |         prefs << 'default'
 71 | 
 72 |         val = data['values'][args[0]] or args[1]
 73 |         unless val 
 74 |             raise Puppet::ParseError, "Cannot find #{args[0]}"
 75 |         end 
 76 | 
 77 |         # map strings to hashes
 78 |         if val.is_a? Hash
 79 |             # find the union of keys in our order tab and keys available
 80 |             key = (prefs & val.keys).first
 81 | 
 82 |             # this cannot happen
 83 |             raise Puppet::ParseError, "Key is nil ?" unless key 
 84 |         
 85 |             val = val[key]
 86 |         end 
 87 | 
 88 |         raise Puppet::ParseError, "Cannot find val for #{args[0]}" unless val 
 89 |         val 
 90 |     end
 91 | end
 92 | ```
 93 | 
 94 | This allows me do do clever local variables like these:
 95 | 
 96 | ```yaml
 97 | ---
 98 | order:  [ fqdn, operatingsystem ]
 99 | values:
100 |     ns_server:
101 |         www01.remote.example.com: "10.1.1.1"
102 |         default: "10.1.2.1"
103 | ```
104 | 
105 | 
106 | 
107 | And call them from classes like this:
108 | 
109 | ```puppet
110 | class unix {
111 |     [...]
112 |     $ns_server = yamlvar(ns_server)
113 |     file { "/etc/resolv.conf":
114 |         owner => root,
115 |         group => $root_group,
116 |         mode => 0644,
117 |         content => template("unix/resolv.conf.erb")
118 |     }
119 |     [...]
120 | }
121 | ```
122 | 
123 | Well this is all fine and dandy until I came upon this:
124 | [complex data and puppet](http://www.devco.net/archives/2009/08/31/complex_data_and_puppet.php][complex data and puppet).
125 | That's right, @ripienaar already did the work and guess what?
126 | [puppet 2.6.1 and extlookup](http://www.devco.net/archives/2010/09/14/puppet_261_and_extlookup.php) it's already in puppet 2.6.1!
127 | 


--------------------------------------------------------------------------------
/content/post/027-easy-clojure-logging-set-up-with-logconfig.md:
--------------------------------------------------------------------------------
  1 | #+title: Easy clojure logging set-up with logconfig
  2 | #+date: 2014-10-15
  3 | 
  4 | \*TL;DR\*: I love
  5 | [clojure.tools.logging](https://github.com/clojure/tools.logging), but
  6 | setting JVM logging up can be a bit frustrating, I wrote
  7 | [logconfig](https://github.com/pyr/logconfig) to help.
  8 | 
  9 | When I started clojure development (about 5 years ago now), I was new to
 10 | the JVM - having no real Java background. My first clojure projects
 11 | where long running, data consuming tasks and thus logging was a
 12 | consideration from the start. The least I could say is that navigating
 13 | the available logging options and understanding how to configure each
 14 | framework was daunting.
 15 | 
 16 | ### JVM logging 101
 17 | 
 18 | Once you get around to understanding how logging works on the JVM, it
 19 | makes a log of sense, for those not familiar with the concepts, here is
 20 | a quick recap - I will be explaining this in the context of
 21 | [log4j](http://logging.apache.org/log4j/1.2/), but the same holds for
 22 | [slf4j](http://www.slf4j.org), [logback](http://logback.qos.ch) and
 23 | other frameworks:
 24 | 
 25 | -   Logging frameworks can be configured inside or outside the
 26 |     application.
 27 | -   The common method is for logging to be configured outside, with a
 28 |     specific configuration file.
 29 | -   User-provided classes can be added to the JVM to format (through
 30 |     layout) or write (through appenders) logs in a different manner.
 31 | 
 32 | This proves really useful, since you might need to ship logs as
 33 | JSON-formatted payloads to integrate with your
 34 | [logstash](http://logstash.org) infrastructure for instance, you might
 35 | even rely on sending logs over the network, without the original
 36 | application writer having had to worry about these use-cases.
 37 | 
 38 | ### The meat of the problem
 39 | 
 40 | While having the possibility of configuring logging in such a way, it's
 41 | not a use case many people have, and spreading an application's
 42 | configuration through-out several files does not facilitate starting
 43 | out.
 44 | 
 45 | I think [elasticsearch](http://elasticsearch.org) is a project which
 46 | gets things right, allowing logging to be configured from the same file
 47 | than the rest of the service, only exposing the most common options.
 48 | 
 49 | ### Introducing logconfig
 50 | 
 51 | [logconfig](https://github.com/pyr/logconfig), which is available on
 52 | clojars (at version **0.7.1** at the time of writing), provides you with
 53 | a simple way of taking care of that problem, it does the following
 54 | things:
 55 | 
 56 | -   Provide a way to configure log4j from a clojure map.
 57 | -   Allow overriding of the configuration for people wanting to provide
 58 |     their own log4j.properties config.
 59 | -   Support both enhanced patterns and JSON event as layouts, enabling
 60 |     easy integration with logstash.
 61 | -   Append files with a time based rolling policy
 62 | -   Optional console output (for people using runit or debug purposes).
 63 | 
 64 | A nice side-effect of relying on logconfig is the reduced coordinates
 65 | matrix:
 66 | 
 67 | ``` clojure
 68 | ;; before
 69 |   :dependencies [...
 70 |                  [commons-logging/commons-logging "1.2"]
 71 |                  [org.slf4j/slf4j-log4j12 "1.7.7"]
 72 |                  [net.logstash.log4j/jsonevent-layout "1.7"]
 73 |                  [log4j/apache-log4j-extras "1.2.17"]
 74 |                  [log4j/log4j "1.2.17"
 75 |                    :exclusions [javax.mail/mail
 76 |                                 javax.jms/jms
 77 |                                 com.sun.jdmk/jmxtools
 78 |                                 com.sun.jmx/jmxri]]]
 79 | ;; after
 80 |   :dependencies [...
 81 |                  [org.spootnik/logconfig "0.7.1"]]
 82 | ```
 83 | 
 84 | ### Sample use-case: fleet
 85 | 
 86 | [fleet,](https://github.com/pyr/fleet) our command and control framework
 87 | at [exoscale](https://exoscale.ch) is configured through a YAML file,
 88 | the file is read and contains several sections: `transport`, `codec`,
 89 | `scenarios`, `http`, `security` and `logging`.
 90 | 
 91 | ``` yaml
 92 | logging:
 93 |   console: true
 94 |   files:
 95 |     - "/var/log/fleet.log"
 96 | security:
 97 |   ca-priv: "doc/ca/ca.key"
 98 |   certdir: "doc/ca"
 99 |   suffix: "pem"
100 | scenarios:
101 |   path: "doc/scenarios"
102 | http:
103 |   port: 8080
104 |   origins:
105 |     - "http://example.com"
106 | ```
107 | 
108 | The `logging` key in the YAML file is expected to adhere to logconfig's
109 | format and will be fed to logconfig. Users relying on existing
110 | log4j.properties configuration can also set `external` to true in the
111 | YAML config and provide their log4j configuration through the standard
112 | JVM properties.
113 | 
114 | Both [cyanite](https://github.com/pyr/cyanite) and
115 | [pithos](https://github.com/exoscale/pithos) now also rely on this
116 | mechanism.
117 | 
118 | I hope this can be useful to other developers building services, apps
119 | and daemons in clojure, the full documentation for the API is available
120 | here: <http://pyr.github.io/logconfig>, check-out the project at
121 | <https://github.com/pyr/logconfig>.
122 | 


--------------------------------------------------------------------------------
/content/post/008-a-bit-of-protocol.md:
--------------------------------------------------------------------------------
  1 | #+title: A bit of protocol
  2 | #+date: 2011-08-12
  3 | 
  4 | ### Protocols and mixins
  5 | 
  6 | I recently had to implement something in clojure I've done many times in
  7 | ruby, which involved using protocols. I thought it would be a nice
  8 | example of comparing class re-opening in ruby and protocol extension in
  9 | clojure.
 10 | 
 11 | ### The problem
 12 | 
 13 | I use cassandra, and in many places, cassandra needs to work with UUID
 14 | types a lot. When exposing results over `JSON`, this is often a problem
 15 | since standard serializers don't support these types.
 16 | 
 17 | What we want to do in ruby and clojure is simple:
 18 | 
 19 | ```ruby
 20 | require 'simple_uuid'
 21 | require 'json'
 22 | 
 23 | # This fails
 24 | {:uuid => SimpleUUID::UUID.new }.to_json
 25 | ```
 26 | 
 27 | This code fails because the `json` module looks for a `to_json` method
 28 | in each object. Failing to do so, it calls `Object#to_s.to_json`. Now
 29 | this would work fine if `to_s` gave a good textual representation of a
 30 | UUID, but it returns the byte array for that UUID.
 31 | 
 32 | ```clojure
 33 | (ns foo
 34 |  (:use clojure.data.json)
 35 |  (:import java.util.UUID))
 36 | 
 37 | ; This fails
 38 | (println (json-str {:uuid (UUID/randomUUID)}))
 39 | ```
 40 | 
 41 | In clojure we are informed that `java.util.UUID` doesn't respond to
 42 | `write-json`
 43 | 
 44 | \#\# Fixing the problem in ruby
 45 | 
 46 | How to fix this in ruby is no problem, and widely known, since the
 47 | `simple_uuid` gem provides a `to_guid` method which returns the textual
 48 | representation, it's as easy as:
 49 | 
 50 | ```ruby
 51 | require 'simple_uuid'
 52 | require 'json'
 53 | 
 54 | module SimpleUUID
 55 |   class UUID
 56 |     def to_json *args
 57 |       "\"#{to_guid}\""
 58 |     end
 59 |   end
 60 | end
 61 | 
 62 | puts({:uuid => SimpleUUID::UUID.new}.to_json)
 63 | ```
 64 | 
 65 | This was simple enough, reopening the module then class is allowed - and
 66 | to some extent, encouraged - in ruby. We just added a `to_json` method
 67 | which is what the `JSON` module looks for when walking through objects.
 68 | 
 69 | ### Fixing the problem in clojure
 70 | 
 71 | clojure has the ability to provide so-called **protocols**, similar to
 72 | java **interfaces**. Protocols are defined with `defprotocol` and
 73 | implemented anywhere. Here is the appropriate bit from
 74 | `clojure.data.json`
 75 | 
 76 | ```clojure
 77 | ;;; JSON PRINTER
 78 | 
 79 | (defprotocol Write-JSON
 80 |   (write-json [object out escape-unicode?]
 81 |               "Print object to PrintWriter out as JSON"))
 82 | ```
 83 | 
 84 | This defines that `write-json` will be dispatched based on class to an
 85 | appropriate writer. The clojure page on protocols[^1] has all the
 86 | detailed information, but I'll focus on the `extend` part here, which
 87 | allows to extend a type with new protocol implementations. `extend`
 88 | expects a type then pairs of protocol names to maps, the map containing
 89 | function name to implementation mappings.
 90 | 
 91 | Protocol functions always have the object they need to operate on as
 92 | their first argument, here the function takes two additional arguments
 93 | 
 94 | -   `out` which is the output stream the representation should be pushed
 95 |     to
 96 | -   `escape-unicode?` which determines whether unicode characters should
 97 |     be escaped.
 98 | 
 99 | Following that logic, the implementation can now be written like this:
100 | 
101 | ```clojure
102 | (ns somewhere
103 |   (:import java.util.UUID)
104 |   (:use clojure.data.json))
105 | 
106 | (defn write-json-uuid [obj out escape-unicode?]
107 |   (binding [*out* out]
108 |     (pr (.toString obj))))
109 | 
110 | (extend UUID Write-JSON
111 |   {:write-json write-json-uuid})
112 | 
113 | (println (json-str {:uuid (UUID/randomUUID)}))
114 | ```
115 | 
116 | ### Writing the protocol extension
117 | 
118 | The actual function `write-json-uuid` is quite simple, I initially wrote
119 | it as:
120 | 
121 | ```clojure
122 | (defn write-json-uuid [obj out escape-unicode?]
123 |   (.print out (pr-str (.toString obj))))
124 | ```
125 | 
126 | But it seems a bit overkill to go to the trouble of writing to a string,
127 | then pushing that string out to the **writer** object.
128 | 
129 | ### Dynamic bindings
130 | 
131 | A small digression is needed here, clojure has **dynamic** symbols,
132 | defined like so: `(def ^{:dynamic true} *my-dyn-symbol*)` The enclosing
133 | stars are a convention but widely used.
134 | 
135 | Dynamic symbols can be manipulated with `binding`, which operates like
136 | `let` but the bindings will follow the rest of the execution enclosed,
137 | not just the function's context.
138 | 
139 | Clojure uses the `*out*` symbol everywhere to denote the current output
140 | stream, many functions operate on it, `pr` is among them.
141 | 
142 | By binding `*out*` to the stream that was given as argument to
143 | `write-json`, `pr` can simply be called on the function.
144 | 
145 | ### Closing words
146 | 
147 | The most common dispatching idiom in clojure is `defmethod=/=defmulti`,
148 | but protocols also provide a very fast and useful way to implement
149 | polymorphism in clojure. It's also nice to note that the implementation
150 | wasn't longer in clojure than ruby.
151 | 
152 | [^1]: <http://clojure.org/protocols>
153 | 


--------------------------------------------------------------------------------
/static/files/2014-11-06-clojure-interfaces.clj:
--------------------------------------------------------------------------------
  1 | (ns redis.transient
  2 |   (:refer-clojure :exclude [read-string])
  3 |   (:require [clojure.edn :refer [read-string]])
  4 |   (:require [taoensso.carmine :as r :refer [wcar]]))
  5 | 
  6 | (defn ->mapentry
  7 |   [k v]
  8 |   ^{:type :redis :prefix "[" :suffix "]"}
  9 |   (reify
 10 |     clojure.lang.Indexed
 11 |     (nth [this i] (nth [k v] i))
 12 |     (nth [this i default] (nth [k v] i default))
 13 |     clojure.lang.Seqable
 14 |     (seq [this] (list k v))
 15 |     clojure.lang.Counted
 16 |     (count [this] 2)
 17 |     clojure.lang.IMapEntry
 18 |     (getKey [this] k)
 19 |     (getValue [this] v)
 20 |     (key [this] k)
 21 |     (val [this] v)))
 22 | 
 23 | (defn hash->transient
 24 |   [spec k]
 25 |   ^{:type :redis :prefix "{" :suffix "}" :sep "," :tuple? true}
 26 |   (reify
 27 |     clojure.lang.ILookup
 28 |     (valAt [this subk]
 29 |       (when-let [res (wcar spec (r/hget k (pr-str subk)))]
 30 |         (read-string res)))
 31 |     (valAt [this subk default]
 32 |       (or (.valAt this subk) default))
 33 |     clojure.lang.ITransientMap
 34 |     (assoc [this subk v]
 35 |       (wcar spec (r/hset k (pr-str subk) (pr-str v)))
 36 |       this)
 37 |     (without [this subk]
 38 |       (wcar spec (r/hdel k (pr-str subk)))
 39 |       this)
 40 |     clojure.lang.IFn
 41 |     (invoke [this subk]
 42 |       (.valAt this subk))
 43 |     clojure.lang.Counted
 44 |     (count [this]
 45 |       (count (partition 2 (wcar spec (r/hgetall k)))))
 46 |     clojure.lang.Seqable
 47 |     (seq [this]
 48 |       (for [[k v] (partition 2 (wcar spec (r/hgetall k)))]
 49 |         (->mapentry (read-string k)
 50 |                     (read-string v))))))
 51 | 
 52 | (defn list->transient
 53 |   [spec k]
 54 |   ^{:type :redis :prefix "[" :suffix "]"}
 55 |   (reify
 56 |     clojure.lang.Counted
 57 |     (count [this]
 58 |       (wcar spec (r/llen k)))
 59 |     clojure.lang.Seqable
 60 |     (seq [this]
 61 |       (map read-string (wcar spec (r/lrange k 0 -1))))
 62 |     clojure.lang.ITransientCollection
 63 |     (conj [this v]
 64 |       (wcar spec (r/lpush k (pr-str v)))
 65 |       this)
 66 |     clojure.lang.ITransientVector
 67 |     (assocN [this index v]
 68 |       (wcar spec (r/lset k index v))
 69 |       this)
 70 |     (pop [this]
 71 |       (wcar spec (r/lpop k))
 72 |       this)))
 73 | 
 74 | (defn set->transient
 75 |   [spec k]
 76 |   ^{:type :redis :prefix "#{" :suffix "}"}
 77 |   (reify
 78 |     clojure.lang.Counted
 79 |     (count [this]
 80 |       (wcar spec (r/scard k)))
 81 |     clojure.lang.Seqable
 82 |     (seq [this]
 83 |       (map read-string (wcar spec (r/smembers k))))
 84 |     clojure.lang.ITransientCollection
 85 |     (conj [this v]
 86 |       (wcar spec (r/sadd k (pr-str v)))
 87 |       this)
 88 |     clojure.lang.ITransientSet
 89 |     (disjoin [this v]
 90 |       (wcar spec (r/srem k (pr-str v)))
 91 |       this)
 92 |     clojure.lang.IFn
 93 |     (invoke [this member]
 94 |       (when (.contains this member)
 95 |         member))
 96 |     (contains [this v]
 97 |       (let [member (wcar spec (r/sismember k (pr-str v)))]
 98 |         (pos? member)))
 99 |     (get [this v]
100 |       (when (.contains this v)
101 |         v))))
102 | 
103 | (defn instance->transient
104 |   [spec]
105 |   ^{:type :redis :prefix "{" :suffix "}" :sep "," :tuple? true}
106 |   (reify
107 |     clojure.lang.ILookup
108 |     (valAt [this k]
109 |       (let [k    (pr-str k)
110 |             type (wcar spec (r/type k))]
111 |         (condp = type
112 |           "string" (read-string (wcar spec (r/get k)))
113 |           "hash"   (hash->transient spec k)
114 |           "list"   (list->transient spec k)
115 |           "set"    (set->transient spec k)
116 |           "none"   nil
117 |           (throw (ex-info "unsupported redis type" {:type type})))))
118 |     (valAt [this k default]
119 |       (or (.valAt this k) default))
120 |     clojure.lang.Counted
121 |     (count [this]
122 |       (count (wcar spec (r/keys "*"))))
123 |     clojure.lang.Seqable
124 |     (seq [this]
125 |       (let [keys (wcar spec (r/keys "*"))]
126 |         (for [k keys]
127 |           (->mapentry (read-string k) (.valAt this (read-string k))))))
128 |     clojure.lang.ITransientMap
129 |     (assoc [this k v]
130 |       (let [k (pr-str k)]
131 |         (cond
132 |          (set? v)        (doseq [member v]
133 |                            (wcar spec (r/sadd k (pr-str member))))
134 |          (map? v)        (doseq [[subk v] v]
135 |                            (wcar spec (r/hset k (pr-str subk) (pr-str v))))
136 |          (sequential? v) (doseq [e v]
137 |                            (wcar spec (r/lpush k (pr-str e))))
138 |          :else           (wcar spec (r/set k (pr-str v)))))
139 |       this)
140 |     (without [this k]
141 |       (wcar spec (r/del (pr-str k)))
142 |       this)))
143 | 
144 | (defmethod print-method :redis
145 |   [obj ^java.io.Writer writer]
146 |   (let [{:keys [prefix suffix sep tuple?]} (meta obj)]
147 |     (.write writer prefix)
148 |     (when (pos? (count obj))
149 |       (loop [[item & items] (seq obj)]
150 |         (if tuple?
151 |           (do
152 |             (print-method (key item) writer)
153 |             (.write writer " ")
154 |             (print-method (val item) writer))
155 |           (print-method item writer))
156 |         (when (seq items)
157 |           (.write writer (str sep  " "))
158 |           (recur items))))
159 | (.write writer suffix)))
160 | 


--------------------------------------------------------------------------------
/content/post/033-hands-on-kafka-dynamic-dns.md:
--------------------------------------------------------------------------------
  1 | #+title: Hands on Kafka: Dynamic DNS
  2 | #+date: 2015-04-23
  3 | 
  4 | I [recently
  5 | wrote](/entries/2015/03/10/simple-materialized-views-in-kafka-and-clojure)
  6 | about kafka [log
  7 | compaction](https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction)
  8 | and the use cases it allows. The article focused on simple key-value
  9 | storage and did not address going beyond this. In practice, values
 10 | associated with keys often need more than just bare values.
 11 | 
 12 | To see how log compaction can still be leveraged with more complex
 13 | types, we will see how to approach maintaining the state of a list in
 14 | kafka through the lens of a dynamic DNS setup.
 15 | 
 16 | ### DNS: the 30 second introduction
 17 | 
 18 | I assume my readers are familiar with the architecture of the *Domain
 19 | Name System* (**DNS**). To summarize, DNS revolves around the notions of
 20 | zones, separated by dots which follow a tree like hierachy starting at
 21 | the right-most zone.
 22 | 
 23 | ![Hierarchy](/media/hands-on-kafka/hierarchy.png)
 24 | 
 25 | Each zone is responsible for maintaining a list of records. Records each
 26 | have a type and an associated payload. Here's a non-exhaustive list of
 27 | record types:
 28 | 
 29 |   Record    Content
 30 |   --------- ---------------------------------------------------------
 31 |   `SOA`     Start of Authority. Provides zone details and timeouts.
 32 |   `NS`      Delegates zone to other nameservers.
 33 |   `A`       Maps a record to an IPv4 address.
 34 |   `AAAA`    Maps a record to an IPv6 address.
 35 |   `CNAME`   Aliases a record to another.
 36 |   `MX`      Mail server responsibility for a record.
 37 |   `SRV`     Arbitrary service responsibility for a record.
 38 |   `TXT`     Arbitrary text associated with record.
 39 |   `PTR`     Maps an IP record with a zone.
 40 | 
 41 | Given this hierarchy and properties, DNS can be abstracted to a hash
 42 | table, keyed by zone. Value contents can be considered lists.
 43 | 
 44 | ```javascript
 45 | {
 46 |   "exoscale.com": [
 47 |     {record: "api", type: "A", content: "10.0.0.1"},
 48 |     {record: "www", type: "A", content: "10.0.0.2"},
 49 |   ],
 50 |   "google.com": [
 51 |     {record: "www", type: "A", content: "10.1.0.1"}
 52 |   ]
 53 | }
 54 | ```
 55 | 
 56 | In reality, zone contents are stored in zone files, whose content look
 57 | roughly like this:
 58 | 
 59 | ```
 60 | $TTL  86400 
 61 | $ORIGIN example.com.
 62 | @  1D  IN  SOA ns1.example.com. hostmaster.example.com. (
 63 |                2015042301 ; serial
 64 |                3H ; refresh
 65 |                15 ; retry
 66 |                1w ; expire
 67 |                3h ; minimum
 68 |                )
 69 | IN  NS  ns1.example.com.     ; nameserver
 70 | IN  NS  ns2.example.com.     ; nameserver
 71 | IN  MX  10 mail.example.com. ; mail provider
 72 | ; server host definitions
 73 | ns1    IN  A      10.0.0.1
 74 | ns1    IN  A      10.0.0.2
 75 | mail   IN  A      10.0.0.10
 76 | www    IN  A      10.0.0.10
 77 | api    IN  CNAME  www
 78 | ```
 79 | 
 80 | Based on our mock list content above, generating a correct DNS zone file
 81 | is a simple process.
 82 | 
 83 | ### Dynamic DNS motivation
 84 | 
 85 | Dynamic DNS updates greatly help when doing any of the following:
 86 | 
 87 | -   Automated zone synchronisation based on configuration management.
 88 | -   Automated zone synchronisation based on IaaS content.
 89 | -   Authorized and authenticated programmatic access to zone contents.
 90 | 
 91 | Most name servers support fast reloads and convergence of configuration,
 92 | but still require generating zone files on the fly and reloading
 93 | configuration. Kafka can be a very valid choice to maintain a stream of
 94 | changes to zones.
 95 | 
 96 | ### Storing zone changes in Kafka
 97 | 
 98 | Updates to DNS zone usually trickle in as invidual record changes. An
 99 | evident candidate for topic keys is the actual zone name. As far as
100 | changes are concerned it makes sense to store the individual record
101 | changes, not the whole zone on each change. Kafka payloads could thus be
102 | standard operations on lists:
103 | 
104 |   Operation   Effect
105 |   ----------- ----------------- --
106 |   `ADD`       Create a record
107 |   `SET`       Update a record
108 |   `DEL`       Delete a record
109 | 
110 | Each operation modifies the state of the list and reading from the head
111 | of a log for a particular key ensures that a correct, up to date version
112 | of a zone can be recreated:
113 | 
114 | ![Topic](/media/hands-on-kafka/topic.png)
115 | 
116 | ### Taking advantage of log compaction
117 | 
118 | While this is fully functional, the only correct compaction method for
119 | the above approach is time based, and requires reading from the head of
120 | the log. A simple way to address this issue is to create a second topic,
121 | meant to hold full zone snapshots, associated with the offset at which
122 | the snapshot was done. This allows to use log compaction on the snapshot
123 | topic.
124 | 
125 | With this approach, starting a consumer from scratch only requires two
126 | operations:
127 | 
128 | -   Read the snapshot log from its head.
129 | -   Read the update log, only considering entries which are more recent
130 |     than the associated snapshot time.
131 | 
132 | ![Dual Topic](/media/hands-on-kafka/dualtopic.png)
133 | 
134 | For this approach to work, a single property must remain true: snapshots
135 | emitted on the snapshot topic should be more frequent than the
136 | expiration on the update topic.
137 | 
138 | ### Similar use-cases
139 | 
140 | Beyond **DNS**, this approach is valid for all standard compound types
141 | and their operations:
142 | 
143 | -   **Stacks**: `push`, `pop`
144 | -   **Lists**: `add`, `del`, `set`
145 | -   **Maps**: `set`, `unset`
146 | -   **Sets**: `add`, `del`
147 | 
148 | 


--------------------------------------------------------------------------------
/content/post/007-a-wrapping-macro.md:
--------------------------------------------------------------------------------
  1 | #+title: A wrapping macro
  2 | #+date: 2011-08-06
  3 | 
  4 | ### A bit of sugar
  5 | 
  6 | The **wrap-with** function described in my last post[^1] is useful, but
  7 | you still end-up having to write closures which might be confusing to
  8 | people who just want to write simple wrappers.
  9 | 
 10 | Fortunately, clojure provides the ability to enhance the language with
 11 | syntactic sugar for use cases such as this one.
 12 | 
 13 | ### A word of warning
 14 | 
 15 | I'm obviously going to talk about macros in this article. I still think
 16 | one has to postpone the writing macros as much as possible, to avoid
 17 | creating code that feels too **magic** to the outside reader.
 18 | 
 19 | There are two use cases where resorting to macro is idiomatic, we'll
 20 | explore the first one here:
 21 | 
 22 | -   macros which help define symbols, usually named `def<resource>`
 23 | -   macros which wrap access to a resource within a closure, usually
 24 |     named `with-<resource>`
 25 | 
 26 | The first kind of macro is used on a common basis by the clojure
 27 | programmer: defn[^2]. Yep, that's right the idiomatic way to declare
 28 | functions in clojure is a macro that wraps a call to def.
 29 | 
 30 | The second kind's most popular example is with-open[^3] which encloses
 31 | access to a resource and ensures that it gets closed. The
 32 | **with-resource** calls have become common idioms in clojure libraries
 33 | and provide a great equivalent to the similar ruby co-block idiom. This
 34 | type of macros will be described in a later post though.
 35 | 
 36 | ### Macro terminology
 37 | 
 38 | Macros need access to all kind of resources and reading them might be
 39 | hard on the eyes at first, several people have written on the subject of
 40 | macros, and books have been written that go into great detail on the
 41 | subject. So I'll just go with a cheat sheet:
 42 | 
 43 | 1.  The body of a macro is usually quoted
 44 | 
 45 |     Macros insert are expanded to code, hence you must provide the
 46 |     **s-exprs** you want to be executed by quoting them otherwise they
 47 |     won't be executed at the time of execution.
 48 | 
 49 |     Beware that there are two types of quoting available in clojure:
 50 | 
 51 |     -   Standard quoting, using '
 52 |     -   Backtick quoting, using \` which expands forms into the current
 53 |         namespace, and is generally used for macros
 54 | 
 55 | 2.  Accessing data from within the quoted **s-exprs**
 56 | 
 57 |     There are two ways to access data from within a quoted list of
 58 |     expressions:
 59 | 
 60 |     -   **unquote**: which takes the value of a symbol and replaces it
 61 |         in the expanded list, **\~expr**
 62 |     -   **unquote-splicing**: which takes the value of a symbol pointing
 63 |         to a list and expands it spliced, **\~@expr**
 64 | 
 65 | 3.  The canonical unless example
 66 | 
 67 |     Unless is the most common example macro described, let's see how it
 68 |     is written
 69 | 
 70 |     ```clojure
 71 |     (defmacro unless [test & exprs]
 72 |       `(if (not ~test)
 73 |         (do ~@exprs)))
 74 |     ```
 75 | 
 76 |     Short but dense! The code reads like this:
 77 | 
 78 |     -   Define an unless macro which takes an arbitrary number of
 79 |         arguments, the first one being bound to **test**, the rest to a
 80 |         list called **exprs**
 81 |     -   Test the veracity of **test**
 82 |     -   Execute the expressions in a **do** block
 83 | 
 84 | ### Wrapping up
 85 | 
 86 | Building on our previous function **wrap-with**, we can then help people
 87 | write wrapper functions more easily:
 88 | 
 89 | ```clojure
 90 | (defmacro defwrapper [wrapper-name handler bindings & exprs]
 91 |   `(def ~wrapper-name
 92 |     (fn [~handler]
 93 |       (fn ~bindings
 94 |         (do ~@exprs)))))
 95 | ```
 96 | 
 97 | This is somewhat inelegant since we still need to supply a symbol which
 98 | is going to be bound to the handler. We can wrap it up using our
 99 | previous function:
100 | 
101 | ```clojure
102 | (defn to-be-wrapped [payload]
103 |   (assoc payload :reply :ok))
104 | 
105 | (defwrapper wrap-add-foo handler [payload]
106 |   (handler (assoc payload :foo :bar)))
107 | 
108 | (wrap-with to-be-wrapped [wrap-add-foo])
109 | ```
110 | 
111 | ### Room for improvement
112 | 
113 | Now let's play a bit of magic, how about creating a macro which rebinds
114 | a symbol altogether:
115 | 
116 | ```clojure
117 | (defmacro wrap-around [handler bindings & exprs]
118 |   `(let [x#    ~handler
119 |          meta#  (meta (var ~handler))]
120 |      (def ~handler
121 |        (fn ~bindings
122 |          (let [~handler x#] 
123 |            (do ~@exprs))))
124 |      (alter-meta! (var ~handler) merge meta#)))
125 | ```
126 | 
127 | Notice the last call to alter-meta![^4] which preserves the initial
128 | var's metadata, such as **:tag** or **:arglists**. Now here are the
129 | macros in context:
130 | 
131 | ```clojure
132 | (wrap-around send-command [payload]
133 |   (send-command (assoc payload :foo :bar)))
134 | 
135 | ;; store elapsed time in 
136 | (wrap-around send-command [payload]
137 |   (let [start (System/nanoTime)]
138 |     (assoc (send-command payload) (- (System/nanoTime) start))))
139 | ```
140 | 
141 | ### Closing words
142 | 
143 | This is just a peak into the power of macros in clojure, and it was a
144 | fun journey getting to the bottom of the last macro. However the last
145 | form complicates reading to some extend and should thus be avoided if
146 | possible.
147 | 
148 | [^1]: <http://spootnik.org/blog/2011/08/04/clojure-wrappers>
149 | 
150 | [^2]: <http://clojure.github.com/clojure/clojure.core-api.html#clojure.core/defn>
151 | 
152 | [^3]: <http://clojure.github.com/clojure/clojure.core-api.html#clojure.core/with-open>
153 | 
154 | [^4]: <http://clojure.github.com/clojure/clojure.core-api.html#clojure.core/alter-meta>!
155 | 


--------------------------------------------------------------------------------
/content/post/034-atomic-database.md:
--------------------------------------------------------------------------------
  1 | #+title: Building an atomic database with clojure
  2 | #+date: 2016-12-17
  3 | 
  4 | Atoms provide a way to hold onto a value in clojure and perform
  5 | thread-safe transitions on that value. In a world of immutability, they
  6 | are the closest equivalent to other languages' notion of variables you
  7 | will encounter in your daily clojure programming.
  8 | 
  9 | Storing with an atom
 10 | ====================
 11 | 
 12 | One of the frequent uses of atoms is to hold onto maps used as some sort
 13 | of cache. Let's say our program stores a per-user high score for a game.
 14 | 
 15 | To store high-scores in memory atoms allow us to implement things very
 16 | quickly:
 17 | 
 18 | ```clojure
 19 | (ns game.scores
 20 |   "Utilities to record and look up game high scores")
 21 | 
 22 | (defn make-score-db
 23 |   "Build a database of high-scores"
 24 |   []
 25 |   (atom nil))
 26 | 
 27 | (def compare-scores
 28 |   "A function which keeps the highest numerical value.
 29 |    Handles nil previous values."
 30 |   (fnil max 0))
 31 | 
 32 | (defn record-score!
 33 |   "Record a score for user, store only if higher than
 34 |    previous or no previous score exists"
 35 |   [scores user score]
 36 |   (swap! scores update user compare-scores score))
 37 | 
 38 | (defn user-high-score
 39 |   "Lookup highest score for user, may yield nil"
 40 |   [scores user]
 41 |   (get @scores user))
 42 | 
 43 | (defn high-score
 44 |   "Lookup absolute highest score, may yield nil
 45 |    when no scores have been recorded"
 46 |   [scores]
 47 |   (last (sort-by val @scores)))
 48 | ```
 49 | 
 50 | In the above we have put together a very simple record mechanism, which
 51 | through the use of `defonce` keeps scores across application reloads or
 52 | namespace reevaluations. Ideally this should be provided as a component,
 53 | but for the purposes of this post we will keep things as simple as
 54 | possible.
 55 | 
 56 | Using the namespace works as expected:
 57 | 
 58 | ```clojure
 59 | (def scores (make-score-db))
 60 | (high-score scores)         ;; => nil
 61 | (user-high-score scores :a) ;; => nil
 62 | (record-score! scores :a 2) ;; => {:a 2}
 63 | (record-score! scores :b 3) ;; => {:a 2 :b 3}
 64 | (record-score! scores :b 1) ;; => {:a 2 :b 3}
 65 | (record-score! scores :a 4) ;; => {:a 4 :b 3}
 66 | (user-high-score scores :a) ;; => 4
 67 | (high-score scores)         ;; => [:a 4]
 68 | ```
 69 | 
 70 | Atom persistence
 71 | ================
 72 | 
 73 | This is all old news to most. What I want to showcase here is how the
 74 | `add-watch` functionality on top of atoms can help serializing atoms
 75 | like these.
 76 | 
 77 | First lets consider the following:
 78 | 
 79 | -   We want to store our `high-scores` state to disk
 80 | -   The content of `high-scores` contains no unprintable values
 81 | 
 82 | It is thus straightforward to write a serializer and deserializer for
 83 | such a map:
 84 | 
 85 | ```clojure
 86 | (ns game.serialization
 87 |   "Serialization utilities"
 88 |    (:require [clojure.edn :as edn]))
 89 | 
 90 | (defn dump-to-path
 91 |   "Store a value's representation to a given path"
 92 |   [path value]
 93 |   (spit path (pr-str value)))
 94 | 
 95 | (defn load-from-path
 96 |   "Load a value from its representation stored in a given path.
 97 |    When reading fails, yield nil"
 98 |   [path]
 99 |   (try
100 |     (edn/read-string (slurp path))
101 |     (catch Exception _)))
102 | ```
103 | 
104 | This also works as expected:
105 | 
106 | ```clojure
107 | (dump-to-path "/tmp/scores.db"
108 |   {:a 0 :b 3 :c 3 :d 4})          ;; => nil
109 | (load-from-path "/tmp/scores.db") ;; => {:a 0 :b 3 :c 3 :d 4}
110 | ```
111 | 
112 | With these two separate namespaces, we are now left figuring out how to
113 | persist our high-score database. To be as faithful as possible, we will
114 | avoid techniques such as doing regular snapshots. Instead we will reach
115 | out to `add-watch` which has the following signature `(add-watch
116 | reference key fn)` and documentation:
117 | 
118 | > Adds a watch function to an agent/atom/var/ref reference. The watch fn
119 | > must be a fn of 4 args: a key, the reference, its old-state, its
120 | > new-state. Whenever the reference's state might have been changed, any
121 | > registered watches will have their functions called. The watch fn will
122 | > be called synchronously, on the agent's thread if an agent, before any
123 | > pending sends if agent or ref. Note that an atom's or ref's state may
124 | > have changed again prior to the fn call, so use old/new-state rather
125 | > than derefing the reference. Note also that watch fns may be called
126 | > from multiple threads simultaneously. Var watchers are triggered only
127 | > by root binding changes, not thread-local set!s. Keys must be unique
128 | > per reference, and can be used to remove the watch with remove-watch,
129 | > but are otherwise considered opaque by the watch mechanism.
130 | 
131 | Our job is thus to write a 4 argument function of the atom itself, a key
132 | to identify the watcher, the previous and new state.
133 | 
134 | To persist each state transition to a file, we can use our
135 | `dump-to-path` function above as follows:
136 | 
137 | ```clojure
138 | (defn persist-fn
139 |   "Yields an atom watch-fn that dumps new states to a path"
140 |   [path]
141 |   (fn [_ _ _ state]
142 |     (dump-to-path path state)))
143 | 
144 | (defn file-backed-atom
145 |    "An atom that loads its initial state from a file and persists each new state
146 |     to the same path"
147 |    [path]
148 |    (let [init  (load-from-path path)
149 |          state (atom init)]
150 |      (add-watch state :persist-watcher (persist-fn path))
151 |      state))
152 | ```
153 | 
154 | Wrapping up
155 | ===========
156 | 
157 | The examples above can now be exercized using our new `file-backed-atom`
158 | function:
159 | 
160 | ```clojure
161 | (def scores (file-backed-atom "/tmp/scores.db"))
162 | (high-score scores)         ;; => nil
163 | (user-high-score scores :a) ;; => nil
164 | (record-score! scores :a 2) ;; => {:a 2}
165 | (record-score! scores :b 3) ;; => {:a 2 :b 3}
166 | (record-score! scores :b 1) ;; => {:a 2 :b 3}
167 | (record-score! scores :a 4) ;; => {:a 4 :b 3}
168 | (user-high-score scores :a) ;; => 4
169 | (high-score scores)         ;; => [:a 4]
170 | ```
171 | 
172 | The code presented here is available [here](/files/2016-12-17-atomic-database.html)
173 | 
174 | 


--------------------------------------------------------------------------------
/content/post/022-real-time-twitter-trending-on-a-budget-with-riemann.md:
--------------------------------------------------------------------------------
  1 | #+date: 2014-01-14
  2 | #+title: Real-time Twitter trending on a budget with riemann
  3 | 
  4 | I recently stumbled upon this article
  5 | <http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/>
  6 | by Michael Noll which explains a strategy for computing twitter trends
  7 | with Storm.
  8 | 
  9 | I love Storm, but not everyone has a cluster already, and I think
 10 | computing tops is a problem that lends itself well to single node
 11 | computing since datasets often are very simple (in the twitter trend
 12 | case we store a tuple of hashtag and time of insertion) and thus can fit
 13 | in a single box's memory capacity while being able to service many
 14 | events per second.
 15 | 
 16 | It turns out, riemann is a great tool for tackling this type of problem
 17 | and is able to handle a huge amount of events per second while keeping a
 18 | small and concise configuration.
 19 | 
 20 | It goes without saying that Storm will be a better performer when you
 21 | are trying to compute a vast amount of data (for instance, the real
 22 | twitter firehose).
 23 | 
 24 | ### Accumulating tweets
 25 | 
 26 | In this example we will compute twitter trends from a sample of the
 27 | firehose, as provided by twitter. The
 28 | [tweetstream](https://github.com/tweetstream/tweetstream) ruby library
 29 | provides a very easy way to process the "sample hose" and here is a
 30 | small script which extracts hash tags from tweets and publishes them to
 31 | a local riemann instance:
 32 | 
 33 | ```ruby
 34 | require 'tweetstream'
 35 | require 'riemann/client'
 36 | 
 37 | TweetStream.configure do |config|
 38 |   config.consumer_key       = 'xxx'
 39 |   config.consumer_secret    = 'xxx'
 40 |   config.oauth_token        = 'xxx'
 41 |   config.oauth_token_secret = 'xxx'
 42 |   config.auth_method        = :oauth
 43 | end
 44 | 
 45 | riemann = Riemann::Client.new
 46 | 
 47 | TweetStream::Client.new.sample do |status|
 48 |   status.text.scan(/\s#([[:alnum:]]+)/).map{|x| x.first.downcase}.each do |tag|
 49 |     riemann << {service: tag, metric: 1.0, tags: ["twitter"], ttl: 3600}
 50 |   end
 51 | end
 52 | ```
 53 | 
 54 | For each tweet in the firehose we emit a riemann event tagged with
 55 | `twitter` and a metric of 1.0, the service is the tag which was found.
 56 | 
 57 | ### Computing trends in riemann
 58 | 
 59 | The rationale for computing trends is as follows:
 60 | 
 61 | -   Keep a moving time window of an hour
 62 | -   Compute per-tag counts
 63 | -   Sort by computed count, then by time
 64 | -   Keep the top N events
 65 | 
 66 | Riemann provides several facilities out of the box which can be used to
 67 | implement this, most noticeably:
 68 | 
 69 | -   The `top` stream which separates events in two streams: top & bottom
 70 | -   The `moving-time-window` stream
 71 | 
 72 | With recent changes in riemann's `top` function we can use this simple
 73 | configuration to compute trends:
 74 | 
 75 | ```clojure
 76 | (let [store    (index)
 77 |       trending (top 10 (juxt :metric :time) (tag "top" store) store)]
 78 |   (streams
 79 |     (by :service (moving-time-window 3600 (smap folds/sum trending)))))
 80 | ```
 81 | 
 82 | Let's break down what happens in this configuration.
 83 | 
 84 | -   We create an index and a `trending` stream which keeps the top 10
 85 |     trending hashtags, we'll get back to this one later.
 86 | -   For each incoming event, we split on service (the hashtag), and then
 87 |     sum all occurences in the last hour
 88 | -   This generate an event whose metric is the number of occurences in
 89 |     an hour which gets sent to trending
 90 | 
 91 | Now let's look a bit more in-depth at what is provided by the `trending`
 92 | stream. We are using the 4-arity version of `top`, so in this case:
 93 | 
 94 | -   We want to compute the top 10 (first argument)
 95 | -   We compare and sort events using the `(juxt :metric :time)`
 96 |     function. `juxt` yields a vector, which is the result of applying
 97 |     its arguments to its input. For an input event
 98 |     `{:metric 1.0 :time 2}` our function will yield `[1.0 2]`, we
 99 |     leverage the fact that vectors implement the `Comparable` interface
100 |     and thus will correctly sort event by metric, then time
101 | -   We send events belonging to the top 10 to the stream
102 |     `(tag "top" store)`
103 | -   We send events not belonging to the top 10 or bumped from the top 10
104 |     to the stream `store`
105 | 
106 | ### Fetching results
107 | 
108 | Running twitter-hose.rb against such a configuration we can now query
109 | the index to retrieve. With the ruby `riemann-client` gem we just
110 | retrieve the indexed elements tagged with **top**:
111 | 
112 | ```ruby
113 | require 'riemann/client'
114 | require 'pp'
115 | 
116 | client = Riemann::Client.new
117 | pp client['tagged "top"']
118 | ```
119 | 
120 | ### Going further
121 | 
122 | It might be interesting to play with a better comparison function than
123 | `(juxt :metric :time)`, it would be interesting to compute a decay
124 | factor from the time and apply it to the metric and let comparisons be
125 | done on this output.
126 | 
127 | The skeleton of such a function could be:
128 | 
129 | ```clojure
130 | (def decay-factor xxx)
131 | 
132 | (defn decaying [{:keys [metric time] :as event}]
133 |   (let [ (unix-time)]
134 |     (- metric (* ((unix-time) - time) decay-factor))))
135 | ```
136 | 
137 | This would allow expiring old trends quicker.
138 | 
139 | The full code for this example is available at:
140 | 
141 | [/files/2014-01-14-twitter-trending.html](/files/2014-01-14-twitter-trending.html)
142 | 
143 | ### Other applications
144 | 
145 | When transferring that problem domain to the typical datasets riemann
146 | handles, the top stream can be a great way to find outliers in a
147 | production environment, in terms of CPU consumption, bursts of log
148 | types.
149 | 
150 | ### Toy scaling strategies
151 | 
152 | I'd like to advise implementers to look beyond riemann for scaling top
153 | extraction from streams, as tools like Storm are great for these use
154 | cases.
155 | 
156 | But in jest, I'll mention that since the
157 | [riemann-kafka](https://github.com/pyr/riemann-kafka) plugin - by yours
158 | truly - allows producing and consuming to and from kafka queues,
159 | intermediate riemann cores could compute local tops and send the
160 | aggregated results over to a central riemann instance which would then
161 | determine the overall top.
162 | 
163 | I hope this gives you a good glimpse of what riemann can provide beyond
164 | simple threshold alerts.
165 | 


--------------------------------------------------------------------------------
/content/post/031-pid-tracking-in-modern-init-systems.md:
--------------------------------------------------------------------------------
  1 | #+title: PID tracking in modern init systems
  2 | #+date: 2014-11-09
  3 | 
  4 | Wherever your init daemon preference may go to, if you're involved in
  5 | writing daemons you're currently faced with the following options:
  6 | 
  7 | -   Start your software in the foreground and let something handle it
  8 | -   Write a [huge
  9 |     kludge](https://github.com/exceliance/haproxy/blob/master/scripts/haproxy.init.debian)
 10 |     of shell script which tries to keep track of the daemons PID
 11 | -   Work hand in hand with an init system which can be hinted at the
 12 |     service's PID
 13 | 
 14 | Why is tracking daemon process IDs hard ? Because as a parent, you don't
 15 | have many options to watch how your children processes evolve, the only
 16 | reliable way is to find a way for the children to send back their PID in
 17 | some way.
 18 | 
 19 | Traditionally this has been done with PID files which are usually
 20 | subject to being left. **upstart** has a mechanism to do tracking which
 21 | is notorious for its ability to lose track of the real child PID and
 22 | which leaves init in a complete screwed up state, with the original
 23 | side-effect of hanging machines during shutdown.
 24 | 
 25 | Additionaly, to provide efficient sequencing of daemon startups,
 26 | traditional init scripts resort to sleeping for arbitrary periods to
 27 | ensure daemons are started while having no guarantee as to the actual
 28 | readyness of the service.
 29 | 
 30 | In this article we'll explore two ways of enabling daemons to coexist
 31 | with **upstart** and **systemd**.
 32 | 
 33 | ### Upstart and `expect stop`
 34 | 
 35 | **upstart** has four modes of launching daemons:
 36 | 
 37 | -   A simple mode where the daemon is expected to run in the foreground
 38 | -   `expect fork` which expects the daemon process to fork once
 39 | -   `expect daemon` which expects the daemon process to fork twice
 40 | -   `expect stop` which waits for any child process to stop itself
 41 | 
 42 | When using `expect stop`, by tracking childs for SIGSTOP signals,
 43 | **upstart** is able to reliably determine which PID the daemon lives
 44 | under. When launched from **upstart**, the `UPSTART_JOB` environment
 45 | variable is set, which means that it suffices to check for it. This also
 46 | gives a good indication to the daemon that it should stay in the
 47 | foreground:
 48 | 
 49 | ```c
 50 | const char    *upstart_job = getenv("UPSTART_JOB");
 51 | 
 52 | 
 53 |   if (upstart_job != NULL)
 54 |     raise(SIGSTOP); /* wait for upstart to start us up again */
 55 |   else {
 56 | 
 57 |     switch ((pid = fork())) {
 58 |     case -1:
 59 |       /* handle error */
 60 |       exit(1);
 61 |     case 0:
 62 |       /* we're in the parent */
 63 |       return 0;
 64 |     default:
 65 |       break;
 66 |     }
 67 | 
 68 |     setsid();
 69 |     close(2);
 70 |     close(1);
 71 |     close(0);
 72 | 
 73 |     if (open("/dev/null", O_RDWR) != 0)
 74 |       err(1, "cannot open /dev/null as stdin");
 75 |     if (dup(0) != 1)
 76 |       err(1, "cannot open /dev/null as stdout");
 77 |     if (dup(0) != 2)
 78 |       err(1, "cannot open /dev/null as stderr");
 79 |   }
 80 | ```
 81 | 
 82 | This is actually all there is to it as far as **upstart** is concerned.
 83 | **Upstart** will catch the signal and register the PID it came from as
 84 | the daemon. This way there is no risk of **upstart** losing track of the
 85 | correct PID.
 86 | 
 87 | Let's test our daemon manually:
 88 | 
 89 | ```bash
 90 | $ env UPSTART_JOB=t $HOME/mydaemon
 91 | $ ps auxw | grep mydaemon
 92 | pyr      22702  0.0  0.0  22044  1576 ?        Ts   21:21   0:00 /home/pyr/mydaemon
 93 | ```
 94 | 
 95 | The interesting bit here is that the reported state contains `T` for
 96 | stopped. We can now resume execution by issuing:
 97 | 
 98 | ```bash
 99 | kill -CONT 22702
100 | ```
101 | 
102 | Now configuring your daemon in upstart just needs:
103 | 
104 | ```bash
105 | expect stop
106 | respawn
107 | exec /home/pyr/mydaemon
108 | ```
109 | 
110 | ### Systemd and `sd_notify`
111 | 
112 | **Systemd** provides a similar facility for daemons, although it goes a
113 | bit further at the expense of an increased complexity.
114 | 
115 | **Systemd**'s approach is to expose a UNIX datagram socket to daemons
116 | for feedback purposes. The payload is composed of line separated key
117 | value pairs, where keys may be either one of:
118 | 
119 | -   `READY`: indicate whether the service is ready to operate.
120 | -   `STATUS`: update the status to display in systemctl's output.
121 | -   `ERRNO`: in the case of failure, hint at the reason for failure.
122 | -   `BUSERROR`: DBUS style error hints.
123 | -   `MAINPID`: indicate which PID the daemon runs as.
124 | -   `WATCHDOG`: when perusing the watchdog features of **systemd**, this
125 |     signal will reset the watchdog timestamp.
126 | 
127 | A word of advice, if you plan on using the error notification mechanism,
128 | it would be advisable to pre-allocate a static buffer to be able to send
129 | out messages even in out-of-memory situations.
130 | 
131 | Like **upstart**, **systemd** sets an environment variable:
132 | `NOTIFY_SOCKET` to allow conditional behavior. It's up to the daemon to
133 | include its PID in the message payload, so it doesn't matter whether
134 | forking happens or not.
135 | 
136 | **Systemd** is happy with both forking and foreground-running daemons, a
137 | simple recommendation could be to only daemonize when neither
138 | `UPSTART_JOB` nor `NOTIFY_SOCKET` are present:
139 | 
140 | ```c
141 |   const char    *upstart_job = getenv("UPSTART_JOB");
142 |   const char    *systemd_socket = getenv("NOTIFY_SOCKET");
143 | 
144 | /* ... */
145 | 
146 |   if (upstart_job != NULL)
147 |     raise(SIGSTOP); /* wait for upstart to start us up again */
148 |   else if (notify_socket != NULL)
149 |     sd_notify(0, "READY=1\nSTATUS=ready\nMAINPID=%lu\n",
150 |               getpid())
151 |   else
152 |     /* daemonize ... */
153 | ```
154 | 
155 | The use of `sd_notify` requires linking to `libsystemd`, if you want to
156 | keep dependencies to a minimum, you also have the possibility of
157 | crafting the payload directly and sending a single UDP datagram to the
158 | socket stored in the `NOTIFY_SOCKET` environment variable. Here's an
159 | implementation from Vincent Bernat's LLDPD:
160 | <https://github.com/vincentbernat/lldpd/blob/abc042057d9fc237b239948136cb89a4a2ac9a01/src/daemon/lldpd.c#L1233-L1276>
161 | 
162 | To configure your **systemd** *unit*, you'll now need to mark your job
163 | as having the type `notify`:
164 | 
165 | ```ini
166 | [Unit]
167 | Description=My daemon
168 | Documentation=man:mydaemon(8)
169 | After=network.target
170 | 
171 | [Service]
172 | Type=notify
173 | NotifyAccess=main
174 | ExecStart=/home/pyr/mydaemon
175 | 
176 | [Install]
177 | WantedBy=multi-user.target
178 | ```
179 | 


--------------------------------------------------------------------------------
/content/post/025-why-were-there-gotos-in-apple-software-in-the-first-place.md:
--------------------------------------------------------------------------------
  1 | #+title: Why were there gotos in apple software in the first place?
  2 | #+date: 2014-02-25
  3 | 
  4 | A recent vulnerability in iOS and Mac OS can boils down to a double goto
  5 | resulting in making critical ssl verification code unreachable.
  6 | 
  7 | ```c
  8 | hashOut.data = hashes + SSL_MD5_DIGEST_LEN;
  9 | hashOut.length = SSL_SHA1_DIGEST_LEN;
 10 | if ((err = SSLFreeBuffer(&hashCtx)) != 0)
 11 |     goto fail;
 12 | if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
 13 |     goto fail;
 14 | if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
 15 |     goto fail;
 16 | if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
 17 |     goto fail;
 18 | if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
 19 |     goto fail;
 20 |     goto fail;
 21 | if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
 22 |     goto fail;
 23 | 
 24 | /* ... */
 25 | 
 26 | fail:
 27 |     SSLFreeBuffer(&signedHashes);
 28 |     SSLFreeBuffer(&hashCtx);
 29 |     return err;
 30 | ```
 31 | 
 32 | Since the fail label return err, no error is reported in normal
 33 | conditions, making the lack of verification silent.
 34 | 
 35 | ### But gotos are bad, lol
 36 | 
 37 | With all the talk about goto being bad (if you haven't, read **Edsger
 38 | Dijkstra**'s famous [go to considered
 39 | harmful](http://www.u.arizona.edu/~rubinson/copyright_violations/Go_To_Considered_Harmful.html)),
 40 | it's a wonder it could still be found in production code. In this short
 41 | post I'd like to point out that while `goto` is generally a code smell,
 42 | it has one very valid and important use in Ansi C: exception handling.
 43 | 
 44 | Let's look at a simple function that makes use of **goto exception
 45 | handling**:
 46 | 
 47 | ```c
 48 | char *
 49 | load_file(const char *name, off_t *len)
 50 | {
 51 |     struct stat  st;
 52 |     off_t            size;
 53 |     char            *buf = NULL;
 54 |     int            fd;
 55 | 
 56 |     if ((fd = open(name, O_RDONLY)) == -1)
 57 |         return (NULL);
 58 |     if (fstat(fd, &st) != 0)
 59 |         goto fail;
 60 |     size = st.st_size;
 61 |     if ((buf = calloc(1, size + 1)) == NULL)
 62 |         goto fail;
 63 |     if (read(fd, buf, size) != size)
 64 |         goto fail;
 65 |     close(fd);
 66 | 
 67 |     *len = size + 1;
 68 |     return (buf);
 69 | 
 70 | fail:
 71 |     if (buf != NULL)
 72 |         free(buf);
 73 |     close(fd);
 74 |     return (NULL);
 75 | }
 76 | ```
 77 | 
 78 | Here goto serves a few purposes:
 79 | 
 80 | -   keep the code intent clear
 81 | -   reduce condition branching
 82 | -   allow graceful failure handling
 83 | 
 84 | While this excerpt is short, it would already be much more awkward to
 85 | repeat the failure handling code in the body of the `if` statement
 86 | testing for error conditions.
 87 | 
 88 | A more complex example shows how multiple types of "exceptions" can be
 89 | handled with `goto` without falling into the spaghetti code trap.
 90 | 
 91 | ```c
 92 | void
 93 | ssl_read(int fd, short event, void *p)
 94 | {
 95 |     struct bufferevent  *bufev = p;
 96 |     struct conn           *s = bufev->cbarg;
 97 |     int                      ret;
 98 |     int                      ssl_err;
 99 |     short                      what;
100 |     size_t                   len;
101 |     char                       rbuf[IBUF_READ_SIZE];
102 |     int                      howmuch = IBUF_READ_SIZE;
103 | 
104 |     what = EVBUFFER_READ;
105 | 
106 |     if (event == EV_TIMEOUT) {
107 |         what |= EVBUFFER_TIMEOUT;
108 |         goto err;
109 |     }
110 | 
111 |     if (bufev->wm_read.high != 0)
112 |         howmuch = MIN(sizeof(rbuf), bufev->wm_read.high);
113 | 
114 |     ret = SSL_read(s->s_ssl, rbuf, howmuch);
115 |     if (ret <= 0) {
116 |         ssl_err = SSL_get_error(s->s_ssl, ret);
117 | 
118 |         switch (ssl_err) {
119 |         case SSL_ERROR_WANT_READ:
120 |             goto retry;
121 |         case SSL_ERROR_WANT_WRITE:
122 |             goto retry;
123 |         default:
124 |             if (ret == 0)
125 |                 what |= EVBUFFER_EOF;
126 |             else {
127 |                 ssl_error("ssl_read");
128 |                 what |= EVBUFFER_ERROR;
129 |             }
130 |             goto err;
131 |         }
132 |     }
133 | 
134 |     if (evbuffer_add(bufev->input, rbuf, ret) == -1) {
135 |         what |= EVBUFFER_ERROR;
136 |         goto err;
137 |     }
138 | 
139 |     ssl_bufferevent_add(&bufev->ev_read, bufev->timeout_read);
140 | 
141 |     len = EVBUFFER_LENGTH(bufev->input);
142 |     if (bufev->wm_read.low != 0 && len < bufev->wm_read.low)
143 |         return;
144 |     if (bufev->wm_read.high != 0 && len > bufev->wm_read.high) {
145 |         struct evbuffer *buf = bufev->input;
146 |         event_del(&bufev->ev_read);
147 |         evbuffer_setcb(buf, bufferevent_read_pressure_cb, bufev);
148 |         return;
149 |     }
150 | 
151 |     if (bufev->readcb != NULL)
152 |         (*bufev->readcb)(bufev, bufev->cbarg);
153 |     return;
154 | 
155 | retry:
156 |     ssl_bufferevent_add(&bufev->ev_read, bufev->timeout_read);
157 |     return;
158 | 
159 | err:
160 |     (*bufev->errorcb)(bufev, what, bufev->cbarg);
161 | }
162 | ```
163 | 
164 | One could wonder why functions aren't used in lieue of goto statements
165 | in this context, it boils down to two things: context and efficiency.
166 | 
167 | Since the canonical use case of goto is for a different termination path
168 | that handles cleanup it needs context - i.e: local variables - that
169 | would need to be carried to the cleanup function this would make for
170 | proliferation of awkward specific functions.
171 | 
172 | Additionaly, functions create additional stack frames which in some
173 | scenarios may be a concern especially in the context of kernel
174 | programming, and critical path functions.
175 | 
176 | ### The take-away
177 | 
178 | While there is a general sentiment that the goto statement should be
179 | avoided, which is mostly valid, it's not a hard rule and there is, in C,
180 | a very valid use case that only goto provides.
181 | 
182 | In the case of the apple code, the error did not stem from the use of
183 | the goto statement but on an unfortunate typo.
184 | 
185 | It's interesting to note that **Edsger Dijkstra** wrote his original
186 | piece at a time where conditional and loop constructs such if/then/else
187 | and while where not available in mainstream languages such as Basic. He
188 | later clarified his initial statement, saying:
189 | 
190 | > Please don't fall into the trap of believing that I am terribly
191 | > dogmatical about \[the goto statement\]. I I have the uncomfortable
192 | > feeling that others are making a religion out of it, as if the
193 | > conceptual problems of programming could be solved by a single trick,
194 | > by a simple form of coding discipline.
195 | 
196 | words of wisdom.
197 | 


--------------------------------------------------------------------------------
/content/post/002-openbsd-pf-limits-and-extending-metrics.org:
--------------------------------------------------------------------------------
  1 | #+title: OpenBSD pf: limits and extending metrics
  2 | #+date: 2011-02-17
  3 | #+author: Pierre-Yves Ritschard
  4 | 
  5 | When running pf firewalls and loadbalancers with relayd, some metrics are critical. One thing that might not be obvious when looking at pf is that the maximum number of sessions a firewall can handle is fixed, as evidenced from looking at the output of =pfctl -si=
  6 | 
  7 | #+BEGIN_EXAMPLE
  8 | Status: Enabled for 10 days 00:39:47             Debug: err
  9 | 
 10 | State Table                          Total             Rate
 11 |  current entries                        6               
 12 |  searches                         7970148            9.2/s
 13 |  inserts                            60227            0.1/s
 14 |  removals                           60221            0.1/s
 15 | Counters
 16 |  match                              67984            0.1/s
 17 |  bad-offset                             0            0.0/s
 18 |  fragment                               1            0.0/s
 19 |  short                                167            0.0/s
 20 |  normalize                              0            0.0/s
 21 |  memory                                 0            0.0/s
 22 |  bad-timestamp                          0            0.0/s
 23 |  congestion                             0            0.0/s
 24 |  ip-option                              0            0.0/s
 25 |  proto-cksum                            0            0.0/s
 26 |  state-mismatch                        34            0.0/s
 27 |  state-insert                           0            0.0/s
 28 |  state-limit                            0            0.0/s
 29 |  src-limit                              0            0.0/s
 30 |  synproxy                               0            0.0/s
 31 | #+END_EXAMPLE
 32 | 
 33 | Now this can be compared to the limit values you have set which can be queried through =pfctl -sm=
 34 | 
 35 | #+BEGIN_EXAMPLE
 36 | states        hard limit   200000
 37 | src-nodes     hard limit    10000
 38 | frags         hard limit     5000
 39 | tables        hard limit     1000
 40 | table-entries hard limit   200000
 41 | #+END_EXAMPLE
 42 | 
 43 | In this particular example, the limit won't be reached for a while!
 44 | 
 45 | By default pf has a very low state limit to cope with machines with very small amounts of RAM, OpenBSD still runs on VAX where 32m of RAM is a lot. On a default install only 10k states are allowed, for a production firewall this is likely to be too small really soon. To raise this limit use the =set limit { states  }=. My advice would be to start at 10000 and to raise appropriately if need be.
 46 | 
 47 | Beware that once the state limit is reached, no new states will be allowed to keep old ones functioning leading to a difficult debugging situation.
 48 | 
 49 | As a bonus, here is a simple github gist which implements statistics collection for collectd, allowing you to graph state consumption and alert when the limit is nearly reached.
 50 | 
 51 | #+BEGIN_SRC c
 52 | /*
 53 |  * Copyright (c) 2010 Pierre-Yves Ritschard <pyr@openbsd.org>
 54 |  *
 55 |  * Permission to use, copy, modify, and distribute this software for any
 56 |  * purpose with or without fee is hereby granted, provided that the above
 57 |  * copyright notice and this permission notice appear in all copies.
 58 |  *
 59 |  * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 60 |  * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 61 |  * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 62 |  * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 63 |  * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 64 |  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 65 |  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 66 |  */
 67 | 
 68 | #include <sys/types.h>
 69 | #include <sys/ioctl.h>
 70 | #include <sys/socket.h>
 71 | 
 72 | #include <net/if.h>
 73 | #include <net/pfvar.h>
 74 | 
 75 | #include <limits.h>
 76 | #include <fcntl.h>
 77 | #include <paths.h>
 78 | #include <string.h>
 79 | #include <stdio.h>
 80 | #include <stdlib.h>
 81 | #include <errno.h>
 82 | 
 83 | #ifndef TEST
 84 | #include "collectd.h"
 85 | #include "plugin.h"
 86 | #else
 87 | typedef u_int64_t   counter_t;
 88 | #endif
 89 | 
 90 | #define PF_SOCKET "/dev/pf"
 91 | 
 92 | struct pfdata {
 93 |     int pd_dev;
 94 | };
 95 | 
 96 | static struct pfdata    pd;
 97 | 
 98 | static int  pf_init(void);
 99 | static int  pf_read(void);
100 | static void submit_counter(const char *, const char *, counter_t);
101 | 
102 | void
103 | submit_counter(const char *type, const char *inst, counter_t val)
104 | {
105 | #ifndef TEST
106 |     value_t         values[1];
107 |     value_list_t    vl = VALUE_LIST_INIT;
108 | 
109 |     values[0].counter = val;
110 |     vl.values = values;
111 |     vl.values_len = 1;
112 | 
113 |     strlcpy(vl.host, hostname_g, sizeof(vl.host));
114 |     strlcpy(vl.plugin, "pf", sizeof(vl.plugin));
115 |     strlcpy(vl.type, type, sizeof(vl.type));
116 |     strlcpy(vl.type_instance, inst, sizeof(vl.type_instance));
117 |     plugin_dispatch_values(&vl);
118 | #else
119 |     printf("%s.%s: %lld\n", type, inst, val);
120 | #endif
121 | }
122 | 
123 | int
124 | pf_init(void)
125 | {
126 |     struct pf_status    status;
127 | 
128 |     memset(&pd, '\0', sizeof(pd));
129 | 
130 |     if ((pd.pd_dev = open(PF_SOCKET, O_RDWR)) == -1) {
131 |         return (-1);
132 |     }
133 |     if (ioctl(pd.pd_dev, DIOCGETSTATUS, &status) == -1) {
134 |         return (-1);
135 |     }
136 |     close(pd.pd_dev);
137 |     if (!status.running)
138 |         return (-1);
139 | 
140 |     return (0);
141 | }
142 | 
143 | int
144 | pf_read(void)
145 | {
146 |     int                 i;
147 |     struct pf_status    status;
148 | 
149 |     char    *cnames[] = PFRES_NAMES;
150 |     char    *lnames[] = LCNT_NAMES;
151 |     char    *names[] = { "searches", "inserts", "removals" };
152 | 
153 |     if ((pd.pd_dev = open(PF_SOCKET, O_RDWR)) == -1) {
154 |         return (-1);
155 |     }
156 |     if (ioctl(pd.pd_dev, DIOCGETSTATUS, &status) == -1) {
157 |         return (-1);
158 |     }
159 |     close(pd.pd_dev);
160 |     for (i = 0; i < PFRES_MAX; i++)
161 |         submit_counter("pf_counters", cnames[i], status.counters[i]);
162 |     for (i = 0; i < LCNT_MAX; i++)
163 |         submit_counter("pf_limits", lnames[i], status.lcounters[i]);
164 |     for (i = 0; i < FCNT_MAX; i++)
165 |         submit_counter("pf_state", names[i], status.fcounters[i]);
166 |     for (i = 0; i < SCNT_MAX; i++)
167 |         submit_counter("pf_source", names[i], status.scounters[i]);
168 |     return (0);
169 | }
170 | 
171 | #ifdef TEST
172 | int
173 | main(int argc, char *argv[])
174 | {
175 |     if (pf_init())
176 |         err(1, "pf_init");
177 |     if (pf_read())
178 |         err(1, "pf_read");
179 |     return (0);
180 | }
181 | #else
182 | void module_register(void) {
183 |     plugin_register_init("pf", pf_init);
184 |     plugin_register_read("pf", pf_read);
185 | }
186 | #endif
187 | #+END_SRC
188 | 
189 | 


--------------------------------------------------------------------------------
/content/post/006-clojure-wrappers.md:
--------------------------------------------------------------------------------
  1 | #+date: 2011-08-04
  2 | #+title: Clojure Wrappers
  3 | 
  4 | ### Functions in the wild
  5 | 
  6 | Functional programming rears its head in the most unusual places in
  7 | software design. Take the web stack interfaces (rack[^1], plack[^2],
  8 | WSGI[^3]), they all implement a very functional view of what a web
  9 | application is: a function that takes a map as input and returns a map
 10 | as output.
 11 | 
 12 | Middleware then build on that abstraction composing themselves down to
 13 | the actual handler call.
 14 | 
 15 | The graylog2 extension mechanism is a good example of this, the bulk of
 16 | which is really simple:
 17 | 
 18 | ```ruby
 19 | def _call(env)
 20 |   begin
 21 |     # Call the app we are monitoring
 22 |     @app.call(env)
 23 |   rescue Exception => err
 24 |     # An exception has been raised. Send to Graylog2!
 25 |     send_to_graylog2(err)
 26 | 
 27 |     # Raise the exception again to pass backto app.
 28 |     raise
 29 |   end
 30 | end
 31 | ```
 32 | 
 33 | All the code does is wrap application call in an exception catching
 34 | block and returning the original result. Providing such a composition
 35 | interface enables doing two useful things for middleware applications:
 36 | 
 37 | -   Taking action before the handler is called, for instance parsing
 38 |     json data as arguments.
 39 | -   Taking action after the handler was called, in this case catching
 40 |     exceptions.
 41 | 
 42 | ### Composition in clojure
 43 | 
 44 | The russian doll approach taken in rack was a natural fit for clojure's
 45 | web stack, ring[^4]. What I want to show here is how easy it is to write
 46 | a simple wrapping layer for any type of function, enabling building
 47 | simple input and output filters for any type of logic.
 48 | 
 49 | ### The basics
 50 | 
 51 | Let's say we have a simple function interacting with a library, taking a
 52 | map as parameter, yielding an operation status map back:
 53 | 
 54 | ```clojure
 55 | (defn send-command
 56 |   "send a command"
 57 |   [payload]
 58 |   (-> payload
 59 |      serialize     ; translate into a format for on-wire
 60 |      send-sync     ; send command and wait for answer
 61 |      deserialize)) ; translate result back as map
 62 | ```
 63 | 
 64 | Now let's say we need the following filters:
 65 | 
 66 | -   We need certain keys in the command map sent out
 67 | -   We want to provide defaults for the reply map
 68 | -   We want to time command execution for statistics usage
 69 | 
 70 | The functions are easy to write:
 71 | 
 72 | ```clojure
 73 | (defn filter-required [payload]
 74 |   (let [required [:user :operation]] 
 75 |     (when (some nil? (map payload required))
 76 |       (throw (Exception. "invalid payload"))))
 77 |   payload)
 78 | 
 79 | (defn filter-defaults [response]
 80 |   (let [defaults {:status :unknown, :user :guest}]
 81 |     (merge defaults response)))
 82 | 
 83 | (defn time-command [payload]
 84 |   (let [start-ts    (System/nanoTime)
 85 |         response    (send-command payload)
 86 |         end-ts      (System/nanoTime)]
 87 |     (merge response {:elapsed (- end-ts start-ts)})))
 88 | ```
 89 | 
 90 | Now all that is required is linking those functions together. A very
 91 | naive approach would be to go the imperative way, with let:
 92 | 
 93 | ```clojure
 94 | (defn linking-wrappers [payload]
 95 |   (let [payload  (filter-required payload)
 96 |         payload  (filter-defaults payload)
 97 |         response (time-command payload)]
 98 |     response))
 99 | ```
100 | 
101 | ### Evolving towards a wrapper interface
102 | 
103 | Thinking about it in a more functional way, it becomes clear that this
104 | is just threading the payload through functions. Clojure even has a nice
105 | macro that does just that.
106 | 
107 | ```clojure
108 | (defn composing-wrappers [payload]
109 |   (-> payload filter-required filter-defaults time-command))
110 | ```
111 | 
112 | This is already very handy, but needs a bit of work when we want to move
113 | the filters around, or if we wanted to be able to provide the filters as
114 | a list, even though using **loop** and **recur** it seems feasible.
115 | 
116 | One of the gripes of such an approach is that you need two types of
117 | middleware functions, those that happen before and those that happen
118 | after an action, writing a generic timing filter that can be plugged in
119 | anywhere would involve writing two filter functions!
120 | 
121 | The other gripe is that there is no way to bypass the execution of the
122 | filter chain, except by throwing exceptions, what we want is to wrap
123 | around the command call to be able to interfere with the processing.
124 | 
125 | Looking back on the **rack** approach, we see that the call to the
126 | actual rack handler is enclosed within the middleware, doing the same in
127 | clojure would involve returning a function wrapping the original call,
128 | which is exactly what has been done for ring[^5], by the way:
129 | 
130 | ```clojure
131 | (defn filter-required [handler]
132 |   (fn [payload]
133 |     (let [required [:user :operation]] 
134 |       (if (some nil? (map payload required))
135 |         {:status :fail :message "invalid payload"}
136 |         (handler payload)))))
137 | 
138 | (defn filter-defaults [handler]
139 |   (fn [payload]
140 |     (let [defaults {:status :unknown, :user :guest}]
141 |       (handler (merge defaults payload)))))
142 | 
143 | (defn time-command [handler]
144 |   (fn [payload]
145 |     (let [start-ts    (System/nanoTime)
146 |           response    (handler payload)
147 |           end-ts      (System/nanoTime)]
148 |       (merge response {:elapsed (- end-ts start-ts)}))))
149 | ```
150 | 
151 | Reusing the threading operator, building the composed handler is now
152 | dead easy:
153 | 
154 | ```clojure
155 | (def composed (-> send-command
156 |                   time-command
157 |                   filter-defaults
158 |                   filter-required))
159 | ```
160 | 
161 | ### Tying it all together
162 | 
163 | We have now reached the point where composition is very easy, at the
164 | expense of a bit of overhead when writing wrappers.
165 | 
166 | The last enhancement that could really help is being able to provide a
167 | list of functions to decorate a function with which would yield the
168 | composed handler.
169 | 
170 | We cannot apply to **-&gt;** since it is a macro, so we call **loop**
171 | and **recur** to the rescue:
172 | 
173 | ```clojure
174 | (defn wrap-with [handler all-decorators]
175 |   (loop [cur-handler  handler
176 |          decorators   all-decorators]
177 |     (if decorators
178 |       (recur ((first decorators) cur-handler) (next decorators))
179 |       cur-handler)))
180 | ```
181 | 
182 | Or as **scottjad** noted:
183 | 
184 | ```clojure
185 | (defn wrap-with [handler all-decorators]
186 |   (reduce #(%2 %1) handler all-decorators))
187 | ```
188 | 
189 | Now, you see this function has no knowledge at all of the logic of
190 | handlers, making it very easy to reuse in a many places, writing
191 | composed functions is now as easy as:
192 | 
193 | ```clojure
194 | (def wrapped-command
195 |   (wrap-with send-command [time-command filter-defaults filter-required]))
196 | ```
197 | 
198 | I hope this little walkthrough helps you navigate more easily through
199 | projects such as ring, compojure, and the likes. You'll see that in many
200 | places using such a mechanism allows elegant test composition.
201 | 
202 | [^1]: <http://rack.rubyforge.org/>
203 | 
204 | [^2]: <http://plackperl.org/>
205 | 
206 | [^3]: <http://wsgi.org/wsgi/>
207 | 
208 | [^4]: <https://github.com/mmcgrana/ring>
209 | 
210 | [^5]: <https://github.com/mmcgrana/ring>
211 | 


--------------------------------------------------------------------------------
/content/post/004-some-more-thoughts-on-monitoring.md:
--------------------------------------------------------------------------------
  1 | #+title: Some more thoughts on monitoring
  2 | #+date: 2011-07-08
  3 | 
  4 | Lately, monitoring has been a trending topic from the devops crowd.
  5 | [ripienaar](http://www.devco.net/archives/2011/03/19/thinking_about_monitoring_frameworks.php)
  6 |  and [Jason Dixon](http://obfuscurity.com/2011/07/Monitoring-Sucks-Do-Something-About-It) amongst other have voiced what
  7 | many are thinking. They've done a good job describing what's wrong and
  8 | what sort of tool the industry needs. They also express clearly the
  9 | need to part from a monolithic supervision solution and monolithic
 10 | graphing solution.
 11 | 
 12 | I'll take my shot at expressing what I feel is wrong in the current tools:
 13 | 
 14 | ## Why won't you cover 90% of use cases?
 15 | 
 16 | Supervision is hard, each production is different, and complex
 17 | business logic must be tested, so indeed, a monitoring tool must be
 18 | able to be extended easily, that's a given and every supervision tool
 19 | got this right. But why on earth should tests that every production
 20 | will need be implemented as extensions ? Let's take a look at the
 21 | expected value which is the less intrusive way to check for a
 22 | machine's load average:
 23 | 
 24 | - The nagios core engine determines that an snmp check must be run for a machine
 25 | - Fork, execute a shell which execs the check_snmp command
 26 | - Feed the right arguments to snmpget
 27 | 
 28 | You think I am kidding ? I am not. Of course each machine needing a
 29 | check will need to go through this steps. So for as few as 20 machines
 30 | requiring supervision  at each check interval 60 processes would be
 31 | spawned. 60 processes spawned for what ? Sending 20 udp packets,
 32 | waiting for a packet in return. Same goes for TCP, ICMP, and many
 33 | more.
 34 | 
 35 | But it gets better ! Want to check more than one SNMP OIDs on the same
 36 | machine ? The same process happens for every OID, which means that if
 37 | you have
 38 | 
 39 | Now consider the common use case, what does a supervision and graphing
 40 | engine do most of its time:
 41 | 
 42 | - Poll ICMP
 43 | - Poll TCP - sometimes sending or expecting a payload, say for HTTP or
 44 | SMTP checks
 45 | - Poll SNMP
 46 | 
 47 | So for a simple setup of 20 machines, checking these simple services,
 48 | you could be well into the thousands of process spawning *every* check
 49 | interval *per machine*. If you have a reasonable interval, say 30
 50 | seconds or a minute.
 51 | 
 52 | Add to that some custom hand written scripts in perl, python, ruby -
 53 | or worse, bash - to check for business logic and you end up having to
 54 | sacrifice a large machine (or cloud instance) for simple checks.
 55 | 
 56 | That would be my number one requirement for a clean monitoring system:
 57 | Cover the simple use cases ! Better yet, do it asynchronously !
 58 | Because for the common use case, all monitoring needs to do is wait on
 59 | I/O. Every language has nice interfaces for handling evented I/O the
 60 | core of a poller should be evented.
 61 | 
 62 | There are of course a few edge cases which make it hard to use that
 63 | technique, ICMP coming to mind since it requires root access on UNIX
 64 | boxes, but either privilege separation or a root process for ICMP
 65 | checks can mitigate that difficulty.
 66 | 
 67 | ## Why is alerting supposed to be different than graphing ?
 68 | 
 69 | Except from some less than ideal solutions &ndash; looking at you
 70 | [Zabbix](http://zabbix.com) - Supervision and Graphing are most of the time two
 71 | separate tool suites, which means that in many cases, the same metrics
 72 | are polled several times. The typical web shop now has a cacti and
 73 | nagios installation, standard metrics such as available disk space
 74 | will be polled by cacti and then by nagios (in many cases through an
 75 | horrible private mechanism such as nrpe).
 76 | 
 77 | Functionally speaking the tasks to be completed are rather simple:
 78 | 
 79 | - Polling a list of data-points
 80 | - Ability to create compound data-points based on polled values
 81 | - Alerting on data-point thresholds or conditions
 82 | -- Storing time-series of data-points
 83 | 
 84 | These four tasks are all that is needed for a complete monitoring and
 85 | graphing solution. Of course this is only the core of the solution and
 86 | other features are needed, but as far as data is concerned these four
 87 | tasks are sufficient.
 88 | 
 89 | ## How many times will we have to reinvent SNMP
 90 | 
 91 | I'll give you that, SNMP sucks, the S in the name - simple - is a
 92 | blatant lie. In fact, for people running in the cloud, a collector
 93 | such as [Collectd](http://collectd.org) might be a better option. But the fact that
 94 | every monitoring application "vendor" has a private non
 95 | inter-operable collecting agent is distressing to say the least.
 96 | 
 97 | SNMP can rarely be totally avoided and when possible should be relied
 98 | upon. Well thought out, easily extensible collectors are nice
 99 | additions but most solutions are clearly inferior to SNMP and added
100 | stress on machines through sequential, process spawning solutions.
101 | 
102 | ## A broken thermometer does not mean your healthy
103 | 
104 | (LLDP, CDP, SNMP) are very useful to make sure assumptions you make on
105 | a production environment match the reality, they should never be the
106 | basis of decisions or considered exhaustive.
107 | 
108 | A simple analogy, using discovery based monitoring solutions is
109 | equivalent to saying you store your machine list in a DNS zone file.
110 | It should be true, there should be mechanisms to ensure it is true,
111 | but might get out of sync over time: it cannot be treated as a source
112 | of truth.
113 | 
114 | ## Does everyone need a horizontally scalable solution ?
115 | 
116 | I appreciate the fact that every one wants the next big tool to be
117 | horizontally scalable, to distribute checks geographically. The thing
118 | is, most people need this because a single machine or instance&rsquo;s
119 | limits are very easily reached with today&rsquo;s solutions. A single
120 | process evented check engine, with an embedded interpretor allowing
121 | simple business logic checks should be small enough to allow matching
122 | most peoples needs.
123 | 
124 | This is not to say, once the single machine limit is reached, a
125 | distributed mode should not be available for larger installations. But
126 | the current trend seems to recommend using AMQP type transports (e.g:
127 | which while still being more economic than nagios' approach will put
128 | an unnecessary strain on singe machine setups and also raise the bar
129 | of prerequisites for a working installation.
130 | 
131 | Now as far as storage is concerned, there are enough options out there
132 | to choose from which make it easy to scale storage. Time-series and
133 | data-points are perfect candidates for non relational databases and
134 | should be leveraged in this case. For single machine setups, RRD type
135 | databases should also be usable.
136 | 
137 | ## Keep it decoupled
138 | 
139 | The above points can be addressed by using decoupled software. Cacti
140 | for instance is a great visualization interface but has a strongly
141 | coupled poller and storage engine, making it very cumbersome to change
142 | parts of its functionality (for instance replacing the RRD storage
143 | part).
144 | 
145 | Even though I believe in making it easy to use single machine setups,
146 | each part should be easily exported elsewhere or replaced. Production
147 | setups are complex and demanding, each having their specific
148 | prerequisites and preferences.
149 | 
150 | Some essential parts stand out as easily decoupled:
151 | 
152 | - Data-point pollers
153 | - Data-point storage engine
154 | - Visualization Interface
155 | - Alerting
156 | 
157 | ## Current options
158 | 
159 | There are plenty of tools which even though they need a lot of work to
160 | be made to work together still provide a "good enough" feeling,
161 | amongst those I have been happy to work with:
162 | 
163 | - **Nagios**: The lesser of many evils
164 | - **Collectd**: Nice poller which can be used from nagios for alerting
165 | - **Graphite** http://graphite.wikidot.com: Nice grapher which is inter-operable with collectd
166 | - **OpenTSDB** http://opentsdb.net: Seems like a step in the right direction but requires a complex stack to be setup.
167 | 
168 | ## Final Words
169 | 
170 | Now of course if all that time spent writing articles was spent
171 | coding, we might get closer to a good solution. I will do my best to
172 | unslack(); and get busy coding.
173 | 
174 | 


--------------------------------------------------------------------------------
/content/post/032-simple-materialized-views-in-kafka-and-clojure.md:
--------------------------------------------------------------------------------
  1 | #+title: Simple materialized views in Kafka and Clojure
  2 | #+date: 2015-03-10
  3 | 
  4 | > A hands-on dive into [Apache Kafka](http://kafka.apache.org) to build
  5 | > a scalable and fault-tolerant persistence layer.
  6 | 
  7 | With its most recent release, [Apache Kafka](http://kafka.apache.org)
  8 | introduced a couple of interesting changes, not least of which is [Log
  9 | Compaction](http://kafka.apache.org/documentation.html#compaction), in
 10 | this article we will walk through a simplistic use case which takes
 11 | advantage of it.
 12 | 
 13 | ### Log compaction: the five minute introduction.
 14 | 
 15 | I won't extensively detail what log compaction is, since it's been
 16 | thoroughly described. I encourage readers not familiar with the concept
 17 | or Apache Kafka in general to go through these articles which give a
 18 | great overview of the system and its capabilities:
 19 | 
 20 | -   <http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying>
 21 | -   <http://blog.confluent.io/2015/02/25/stream-data-platform-1/>
 22 | -   <http://blog.confluent.io/2015/02/25/stream-data-platform-2/>
 23 | 
 24 | In this article we will explore how to build a simple materialized view
 25 | from the contents of a compacted kafka log. A working version of the
 26 | approach described here can be found at <https://github.com/pyr/kmodel>
 27 | and may be used as a companion while reading the article.
 28 | 
 29 | If you're interested in materialized views, I warmly recommend looking
 30 | into [Apache Samza](http://samza.apache.org) and this [Introductory
 31 | blog-post](http://blog.confluent.io/2015/03/04/turning-the-database-inside-out-with-apache-samza/)
 32 | by Martin Kleppmann.
 33 | 
 34 | ### Overall architecture
 35 | 
 36 | For the purpose of this experiment, we will consider a very simple job
 37 | board application. The application relies on a single entity type: a job
 38 | description, and either does per-key access or retrieves the whole set
 39 | of keys.
 40 | 
 41 | Our application will perform every read from the materialized view in
 42 | redis, while all mutation operation will be logged to kafka.
 43 | 
 44 | ![log compaction architecture](/media/kafka-materialized-views/log-compaction-architecture.png)
 45 | 
 46 | In this scenario all components may be horizontally scaled. Additionaly
 47 | the materialized view can be fully recreated at any time, since the log
 48 | compaction ensures that at least the last state of all live keys are
 49 | present in the log. This means that by starting a read from the head of
 50 | the log, a consistent state can be recreated.
 51 | 
 52 | ### Exposed API
 53 | 
 54 | A mere four rest routes are necessary to implement this service:
 55 | 
 56 | -   `GET /api/job`: retrieve all jobs and their description.
 57 | -   `POST /api/job`: insert a new job description.
 58 | -   `PUT /api/job/:id`: modify an existing job description.
 59 | -   `DELETE /api/job/:id`: remove a job description.
 60 | 
 61 | We can map this REST functionality to a clojure protocol - the rough
 62 | equivalent of an interface in OOP languages - with a mere 4 signatures:
 63 | 
 64 | ```clojure
 65 | (defprotocol JobDB
 66 |   "Our persistence protocol."
 67 |   (add! [this payload] [this id payload] "Upsert entry, optionally creating a key")
 68 |   (del! [this id] "Remove entry.")
 69 |   (all [this] "Retrieve all entries."))
 70 | ```
 71 | 
 72 | Assuming this protocol is implemented, writing the HTTP API is
 73 | relatively straightforward when leveraging tools such as
 74 | [compojure](https://github.com/weavejester/compojure) in clojure:
 75 | 
 76 | ```clojure
 77 | (defn api-routes
 78 |   "Secure, Type-safe, User-input-validating, Versioned and Multi-format API.
 79 |    (just kidding)"
 80 |   [db]
 81 |   (->
 82 |    (routes
 83 |     (GET    "/api/job"     []           (response (all db)))
 84 |     (POST   "/api/job"     req          (response (add! db (:body req))))
 85 |     (PUT    "/api/job/:id" [id :as req] (response (add! db id (:body req))))
 86 |     (DELETE "/api/job/:id" [id]         (response (del! db id)))
 87 |     (GET    "/"            []           (redirect "/index.html"))
 88 | 
 89 |     (resources                          "/")
 90 |     (not-found                          "<html><h2>404</h2></html>"))
 91 | 
 92 |    (json/wrap-json-body {:keywords? true})
 93 |    (json/wrap-json-response)))
 94 | ```
 95 | 
 96 | I will not describe the client-side javascript code used to interact
 97 | with the API in this article, it is a very basic AngularJS application.
 98 | 
 99 | ### Persistence layer
100 | 
101 | Were we to use redis exclusively, the operation would be quite
102 | straightforward, we would rely on a redis set to contain the set of all
103 | known keys. Each corresponding key would contain a serialized job
104 | description.
105 | 
106 | In terms of operations, this would mean:
107 | 
108 | -   Retrieval, would involve a `SMEMBERS` of the `jobs` key, then
109 |     mapping over the result to issue a `GET`.
110 | -   Insertions and updates could be merge into a single "Upsert"
111 |     operation which would `SET` a key and would then add the key to the
112 |     known set through a `SADD` command.
113 | -   Deletions would remove the key from the known set through a `SREM`
114 |     command and would then `DEL` the corresponding key.
115 | 
116 | Let's look at an example sequence of events
117 | 
118 | ![log compaction events](/media/kafka-materialized-views/log-compaction-events.png)
119 | 
120 | As it turns out, it is not much more work when going through Apache
121 | Kafka.
122 | 
123 | 1.  Persistence interaction in the API
124 | 
125 |     In the client, retrieval happens as described above. This example
126 |     code is in the context of the implementation - or as clojure would
127 |     have it **reification** - of the above protocol.
128 | 
129 |     ```clojure
130 |     (all [this]
131 |       ;; step 1. Fetch all keys from set
132 |       (let [members (redis/smembers "jobs")] 
133 |          ;; step 4. Merge into a map
134 |          (reduce merge {}      
135 |            ;; step 2. Iterate on all keys
136 |            (for [key members]  
137 |              ;; step 3. Create a tuple [key, (deserialized payload)]
138 |              [key (-> key redis/get edn/read-string)]))))
139 |     ```
140 | 
141 |     The rest of the operations emit records on kafka:
142 | 
143 |     ```clojure
144 |     (add! [this id payload]
145 |       (.send producer (record "job" id payload)))
146 |     (add! [this payload]
147 |       (add! this (random-id!) payload))
148 |     (del! [this id]
149 |       (.send producer (record "job" id nil))))))
150 |     ```
151 | 
152 |     Note how deletions just produce a record for the given key with a
153 |     nil payload. This approach produces what is called a tombstone in
154 |     distributed storage systems. It will tell kafka that prior entries
155 |     can be discarded but will keep it for a configurable amount of time
156 |     to ensure coordination across consumers.
157 | 
158 | 2.  Consuming persistence events
159 | 
160 |     On the consumer side, the approach is as described above
161 | 
162 |     ```clojure
163 | 
164 |     (defmulti  materialize! :op)
165 | 
166 |     (defmethod materialize! :del
167 |       [payload]
168 |       (r/srem "jobs" (:key payload))
169 |       (r/del (:key payload)))
170 | 
171 |     (defmethod materialize! :set
172 |       [payload]
173 |       (r/set (:key payload) (pr-str (:msg payload)))
174 |       (r/sadd "jobs" (:key payload)))
175 | 
176 |     (doseq [payload (messages-in-stream {:topic "jobs"})]
177 |       (let [op (if (nil? (:msg payload) :del :set))]
178 |         (materialize! (assoc payload :op op))))
179 |     ```
180 | 
181 | ### Scaling strategy and view updates
182 | 
183 | Where things start to get interesting, is that with this approach, the
184 | following becomes possible:
185 | 
186 | -   The API component is fully stateless and can be scaled horizontally.
187 |     This is not much of a break-through and is usually the case.
188 | -   The redis layer can use a consistent hash to shard across several
189 |     instances and better use memory. While this is feasible in a more
190 |     typical scenario, re-sharding induces a lot of complex manual
191 |     handling. With the log approach, re-sharding only involves
192 |     re-reading the log.
193 | -   The consumer layer may be horizontally scaled as well
194 | 
195 | Additionally, since a consistent history of events is available in the
196 | log, adding views which generate new entities or ways to look-up data
197 | now only involve adapating the consumer and re-reading from the head of
198 | the log.
199 | 
200 | ### Going beyond
201 | 
202 | I hope this gives a good overview of the compaction mechanism. I used
203 | redis in this example, but of course, materialized views may be created
204 | on any storage backends. But in some cases even this is unneeded! Since
205 | consumers register themselves in zookeeper, they could directly expose a
206 | query interface and let clients contact them directly.
207 | 


--------------------------------------------------------------------------------
/content/post/023-poor-man-s-dependency-injection-in-clojure.md:
--------------------------------------------------------------------------------
  1 | +++
  2 | title = "Poor man's dependency injection in Clojure"
  3 | date = "2014-01-25"
  4 | slug = "poor-mans-dependency-injection-in-clojure"
  5 | +++
  6 | 
  7 | When writing daemons in clojure which need configuration, you often find
  8 | yourself in a situation where you want to provide users with a way of
  9 | overriding or extending some parts of the application.
 10 | 
 11 | All popular daemons provide this flexibility, usually through
 12 | **modules** or **plugins**. The extension mechanism is a varying beast
 13 | though, lets look at how popular daemons work with it.
 14 | 
 15 | -   [nginx.org](http://nginx.org) , written in C, uses a function
 16 |     pointer structure that needs to be included at compile time. Modules
 17 |     are expected to bring in their own parser extensions for the
 18 |     configuration file.
 19 | 
 20 | -   [collectd](http://collectd.org), written in C uses function pointer
 21 |     structures as well but allows dynamic loading through **ld.so**.
 22 |     Additionally, the exposed function are expected to work with a
 23 |     pre-parsed structure to configure their behavior
 24 | 
 25 | -   [puppet](http://puppetlabs.com), written in Ruby, lets additional
 26 |     module reopen the puppet module to add functionality
 27 | 
 28 | -   [cassandra](http://cassandra.apache.org), written in Java parses a
 29 |     YAML configuration file which specifies classes which will be loaded
 30 |     to provide a specific functionality
 31 | 
 32 | While all these approaches are valid, Cassandra's approach most closely
 33 | ressembles what you'd expect a clojure program to provide since it runs
 34 | on the JVM. That particular type of behavior management - while usually
 35 | being defined in XML files, since it is so pervasive in the Java
 36 | community - is called **Dependency Injection**.
 37 | 
 38 | ### Dependency injection on the JVM
 39 | 
 40 | The JVM brings two things which simplify creating a daemon with
 41 | configurable behavior:
 42 | 
 43 | -   **Interfaces** let you define a contract an object must satisfy
 44 | -   **Classpaths** let you add code to a project at run-time (not
 45 |     build-time)
 46 | 
 47 | Cassandra's YAML configuration takes advantage of these two properties
 48 | to let you swap implementation for different types of authenticators,
 49 | snitches or partitioners.
 50 | 
 51 | ### A lightweight approach in clojure
 52 | 
 53 | So let's mimick cassandra and write a simple configuration file which
 54 | allows modifying behavior.
 55 | 
 56 | Let's pretend we have a daemon which listens for data through
 57 | transports, and needs to store it using a storage mechanism. A good
 58 | example would be a log storage daemon, listening for incoming log lines,
 59 | and storing them somewhere.
 60 | 
 61 | For such a daemon, the following "contracts" emerge:
 62 | 
 63 | -   **transports**: which listen for incoming log lines
 64 | -   **codecs**: which determine how data should be de-serialized
 65 | -   **stores**: which provide a way of storing data
 66 | 
 67 | This gives us the following clojure protocols:
 68 | 
 69 | ```clojure
 70 | (defprotocol Store
 71 |   (store! [this payload]))
 72 | 
 73 | (defprotocol Transport
 74 |   (listen! [this sink]))
 75 | 
 76 | (defprotocol Codec
 77 |   (decode [this payload]))
 78 | 
 79 | (defprotocol Service
 80 |   (start! [this]))
 81 | ```
 82 | 
 83 | This gives you the ability to build an engine which has no knowledge of
 84 | underlying implementation and can be very easily tested and inspected:
 85 | 
 86 | 
 87 | ```clojure
 88 | (defn reactor
 89 |   [transports codec store]
 90 |   (let [ch  (chan 10)]
 91 |     (reify
 92 |       Service
 93 |       (start! [this]
 94 |         (go-loop []
 95 |           (when-let [msg (<! ch)]
 96 |             (store! store (decode codec msg))
 97 |               (recur)))
 98 |         (doseq [transport transports]
 99 |           (start! transport)
100 |           (listen! transport sink))))))
101 | ```
102 | 
103 | As shown above, we use reify to create an instance of an object honoring
104 | a specific protocol (or Java interface).
105 | 
106 | Here are simplistic implementations of an EDN codec, an stdout store and
107 | an stdin transport:
108 | 
109 | ```clojure
110 | (defn edn-codec [config]
111 |   (reify Codec
112 |     (decode [this payload]
113 |       (read-string payload))))
114 | 
115 | (defn stdout-store [config]
116 |   (reify
117 |     Store
118 |     (store! [this payload]
119 |       (println "storing: " payload))))
120 | 
121 | (defn stdin-transport [config]
122 |   (let [sink (atom nil)]
123 |     (reify
124 |       Transport
125 |       (listen! [this new-sink]
126 |         (reset! sink new-sink))
127 |       Service
128 |       (start!
129 |         (future
130 |           (loop []
131 |             (when-let [input (read-line)]
132 |               (>!! @sink input)
133 |               (recur))))))))
134 | ```
135 | 
136 | Note that each implementation gets passed a configuration variable -
137 | which will be useful.
138 | 
139 | ### A yaml configuration
140 | 
141 | Now that we have our protocols in place let's see if we can come up with
142 | a sensible configuration file for our mock daemon:
143 | 
144 | ```yaml
145 | codec:
146 |   use: mock-daemon.codec/edn-codec
147 | transports:
148 |   stdin:
149 |     use: mock-daemon.transport.stdin/stdin-transport
150 | store:
151 |   use: mock-daemon.transport.stdin/stdout-store
152 | ```
153 | 
154 | Our config contains three keys. `codec` and `store` are maps containing
155 | at least a `use` key which points to a symbol that will yield an
156 | instance of a class implementing the `Codec` or `Store` protocol.
157 | 
158 | Now all that remains to be done is having an an easy way to load this
159 | configuration and produce a codec, transports and stores from it.
160 | 
161 | ### Clojure introspection
162 | 
163 | Parsing the above configuration from yaml, with for instance
164 | `clj-yaml.core/parse-string`, will yield a map, if we only look at the
165 | codec part we would have:
166 | 
167 | ```clojure
168 | {:codec {:use "mock-daemon.codec/edn-codec"}}
169 | ```
170 | 
171 | Our goal will be to retrieve an instance reifying `Codec` from the
172 | string `mock-daemon.codec/edn-codec`.
173 | 
174 | This can be done in two steps:
175 | 
176 | -   Retrieve the symbol
177 | -   Call out the function
178 | 
179 | To retrieve the symbol, this simple bit will do:
180 | 
181 | ```clojure
182 | (defn find-ns-var
183 |   [candidate]
184 |   (try
185 |     (let [var-in-ns  (symbol candidate)
186 |           ns         (symbol (namespace var-in-ns))]
187 |       (require ns)
188 |       (find-var var-in-ns))
189 |     (catch Exception _)))
190 | ```
191 | 
192 | We first extract the namespace out of the namespace qualified var and
193 | require it, then get the var. Any errors will result in nil being
194 | returned.
195 | 
196 | Now that we have the function, it's straightforward to call it with the
197 | config:
198 | 
199 | ```clojure
200 | (defn instantiate
201 |   [candidate config]
202 |   (if-let [reifier (find-ns-var candidate)]
203 |     (reifier config)
204 |     (throw (ex-info (str "no such var: " candidate) {}))))
205 | ```
206 | 
207 | We can now tie these two functions:
208 | 
209 | ```clojure
210 | (defn get-instance
211 |   [config]
212 |   (let [candidate (-> config :use name symbol)
213 |         raw-config (dissoc config :use)]
214 |     (instantiate candidate raw-config)))
215 | ```
216 | 
217 | These three snippets are the only bits of introspection you'll need and
218 | are the core of our solution.
219 | 
220 | ### Tying it together
221 | 
222 | We can now make use of get-instance in our configuration loading code:
223 | 
224 | ```clojure
225 | (defn load-path
226 |   [path]
227 |   (-> (or path
228 |           (System/getenv "CONFIGURATION_PATH")
229 |           "/etc/default_path.yaml")
230 |       slurp
231 |       parse-string))
232 | 
233 | (defn get-transports
234 |   [transports]
235 |   (zipmap (keys transports)
236 |           (mapv get-instance (vals transports))))
237 | 
238 | (defn init
239 |   [path]
240 |   (try
241 |     (-> (load-path path)
242 |         (update-in [:codec] get-instance)
243 |         (update-in [:store] get-instance)
244 |         (update-in [:transports] get-transports))))
245 | ```
246 | 
247 | ### Using it from your main function
248 | 
249 | Now that all elements are there, starting up the daemon ends up only
250 | creating the configuration and working with protocols by calling our
251 | previous `reactor` function.
252 | 
253 | ```clojure
254 | (defn main
255 |   [& [config-file]]
256 |   (let [config     (config/init config-file)
257 |         codec      (:codec config)
258 |         store      (:store config)
259 |         transports (:transports config)
260 |         reactor    (reactor transports codec store)]
261 |     (start! reactor)))
262 | ```
263 | 
264 | By having `reactor` decoupled from the implementations of transports,
265 | codecs and the likes, testing the meat of the daemon becomes dead
266 | simple; a reactor can be started with dummy transports, stores and
267 | codecs to validate its inner-workings.
268 | 
269 | I hope this gives a good overview of simple techniques for building
270 | daemons in clojure.
271 | 


--------------------------------------------------------------------------------
/content/post/021-solving-logging-in-60-lines-of-haskell.md:
--------------------------------------------------------------------------------
  1 | #+title: Solving Nginx logging in 60 lines of Haskell
  2 | #+date: 2013-11-27
  3 | 
  4 | Nginx is well-known for only logging to files and being unable to log to
  5 | syslog out of the box.
  6 | 
  7 | There are a few ways around this, one that is often proposed is creating
  8 | named pipes (or FIFOs) before starting up nginx. Pipes have the same
  9 | properties than regular files in UNIX (to adhere to the important notion
 10 | that everything is a file in UNIX), but they expect data written to them
 11 | to be consumed by another process at some point. To compensate for the
 12 | fact that consumers might sometimes be slower than producers they
 13 | maintain a buffer of readily available data, with a hard maximum of 64k
 14 | in Linux systems for instance.
 15 | 
 16 | ### Small digression: understanding linux pipes max buffer size
 17 | 
 18 | It can be a bit confusing to figure out what the exact size of FIFO
 19 | buffer is in linux. Our first reflex will be to look at the output of
 20 | `ulimit`
 21 | 
 22 | ```bash
 23 | # ulimit -a
 24 | core file size          (blocks, -c) 0
 25 | data seg size           (kbytes, -d) unlimited
 26 | scheduling priority             (-e) 30
 27 | file size               (blocks, -f) unlimited
 28 | pending signals                 (-i) 63488
 29 | max locked memory       (kbytes, -l) unlimited
 30 | max memory size         (kbytes, -m) unlimited
 31 | open files                      (-n) 1024
 32 | pipe size            (512 bytes, -p) 8
 33 | POSIX message queues     (bytes, -q) 819200
 34 | real-time priority              (-r) 99
 35 | stack size              (kbytes, -s) 8192
 36 | cpu time               (seconds, -t) unlimited
 37 | max user processes              (-u) 63488
 38 | virtual memory          (kbytes, -v) unlimited
 39 | file locks                      (-x) unlimited
 40 | ```
 41 | 
 42 | Which seems to indicate that the available pipe size in bytes is
 43 | `512 * 8`, amounting to 4kb. Turns out, this is the maximum atomic size
 44 | of a payload on a pipe, but the kernel reserves several buffers for each
 45 | created pipe, with a hard limit set in
 46 | <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/pipe_fs_i.h?id=refs/tags/v3.13-rc1#n4>.
 47 | 
 48 | The limit turns out to be `4096 * 16`, amounting to 64kb, still not
 49 | much.
 50 | 
 51 | ### Pipe consumption strategies
 52 | 
 53 | Pipes are tricky beasts and will bite you if you try to consume them
 54 | from syslog-ng or rsyslog without anything in between. First lets see
 55 | what happens if you write on a pipe which has no consumer:
 56 | 
 57 | ```bash
 58 | $ mkfifo foo
 59 | $ echo bar > foo
 60 | ```
 61 | 
 62 | bash
 63 | 
 64 | That's right, having no consumer on a pipe results in blocking writes
 65 | which will not please nginx, or any other process which expects logging
 66 | a line to a file to be a fast operation (and in many application will
 67 | result in total lock-up).
 68 | 
 69 | Even though we can expect a syslog daemon to be mostly up all the time,
 70 | it imposes huge availability constraints on a system daemon that can
 71 | otherwise safely sustain short availability glitches.
 72 | 
 73 | ### A possible solution
 74 | 
 75 | What if instead of letting rsyslog do the work we wrapped the nginx
 76 | process in with a small wrapper utility, responsible for pushing logs
 77 | out to syslog. The utility would:
 78 | 
 79 | -   Clean up old pipes
 80 | -   Provision pipes
 81 | -   Set up a connection to syslog
 82 | -   Start nginx in the foreground, while watching pipes for incoming
 83 |     data
 84 | 
 85 | The only requirement with regard to nginx's configuration is to start it
 86 | in the foreground, which can be enabled with this single line in
 87 | `nginx.conf`:
 88 | 
 89 | ```bash
 90 | daemon off;
 91 | ```
 92 | 
 93 | ### Wrapper behavior
 94 | 
 95 | We will assume that the wrapper utility receives a list of command line
 96 | arguments corresponding to the pipes it has to open, if for instance we
 97 | only log to `/var/log/nginx/access.log` and `/var/log/nginx/error.log`
 98 | we could call our wrapper - let's call it `nginxpipe` - this way:
 99 | 
100 | ```bash
101 | nginxpipe nginx-access:/var/log/nginx/access.log nginx-error:/var/log/nginx/error.log
102 | ```
103 | 
104 | Since the wrapper would stay in the foreground to watch for its child
105 | nginx process, integration in init scripts has to account for it, for
106 | ubuntu's upstart this translates to the following configuration in
107 | `/etc/init/nginxpipe.conf`:
108 | 
109 | ```bash
110 | respawn
111 | exec nginxpipe nginx-access:/var/log/nginx/access.log nginx-error:/var/log/nginx/error.log
112 | ```
113 | 
114 | ### Building the wrapper
115 | 
116 | For once, the code I'll show won't be in clojure since it does not lend
117 | itself well to such tasks, being hindered by slow startup times and
118 | inability to easily call OS specific functions. Instead, this will be
119 | built in haskell which lends itself very well to system programming,
120 | much like go (another more-concise-than-c system programming language).
121 | 
122 | First, our main function:
123 | 
124 | ```haskell
125 | main = do
126 |   mainlog <- openlog "nginxpipe" [PID] DAEMON NOTICE
127 |   updateGlobalLogger rootLoggerName (setHandlers [mainlog])
128 |   updateGlobalLogger rootLoggerName (setLevel NOTICE)
129 |   noticeM "nginxpipe" "starting up"
130 |   args <- getArgs
131 |   mk_pipes $ map get_logname args
132 |   noticeM "nginxpipe" "starting nginx"
133 |   ph <- runCommand "nginx"
134 |   exit_code <- waitForProcess ph
135 |   noticeM "nginxpipe" $ "nginx stopped with code: " ++ show exit_code
136 | ```
137 | 
138 | We start by creating a log handler, then using it as our only log
139 | destination throughout the program. We then call `mk_pipes` which will
140 | look on the given arguments and finally start the nginx process and wait
141 | for it to return.
142 | 
143 | The list of argument given to `mk_pipes` is slightly modified, it
144 | transforms the initial list consisting of
145 | 
146 | ```haskell
147 | [ "nginx-access:/var/log/nginx/access.log", "nginx-error:/var/log/nginx/error.log"]
148 | ```
149 | 
150 | into a list of string-tuples:
151 | 
152 | ```haskell
153 | [("nginx-access","/var/log/nginx/access.log"), ("nginx-error","/var/log/nginx/error.log")]
154 | ```
155 | 
156 | To create this modified list which just map our input list with a simple
157 | function:
158 | 
159 | ```haskell
160 | is_colon x = x == ':'
161 | get_logname path = (ltype, p) where (ltype, (_:p)) = break is_colon path
162 | ```
163 | 
164 | Next up is the pipe creation, since Haskell has no loop we use tail
165 | recursion to iterate on the list of tuples:
166 | 
167 | ```haskell
168 | mk_pipes :: [(String,String)] -> IO ()
169 | mk_pipes (pipe:pipes) = do
170 |   mk_pipe pipe
171 |   mk_pipes pipes
172 | mk_pipes [] = return ()
173 | ```
174 | 
175 | The bulk of work happens in the `mk_pipe` function:
176 | 
177 | ```haskell
178 | mk_pipe :: (String,String) -> IO ()
179 | mk_pipe (ltype,path) = do
180 |   safe_remove path
181 |   createNamedPipe path 0644
182 |   fd <- openFile path ReadMode
183 |   hSetBuffering fd LineBuffering
184 |   void $ forkIO $ forever $ do
185 |     is_eof <- hIsEOF fd
186 |     if is_eof then threadDelay 1000000 else get_line ltype fd
187 | ```
188 | 
189 | The intersting bit in that function is the last 3 lines, where we create
190 | a new "IO Thread" with `forkIO` inside which we loop forever waiting for
191 | input for at most 1 second, logging to syslog when new input comes in.
192 | 
193 | The two remaining functions `get_line` and `safe_remove` have very
194 | simple definitions, I intentionnaly left a small race-condition in
195 | `safe_remove` to make it more readable:
196 | 
197 | ```haskell
198 | safe_remove path = do
199 |   exists <- doesFileExist path
200 |   when exists $ removeFile path
201 | 
202 | get_line ltype fd = do
203 |   line <- hGetLine fd
204 |   noticeM ltype line
205 | ```
206 | 
207 | I'm not diving into each line of the code, there is plenty of great
208 | litterature on haskell, I'd recommmend "real world haskell" as a great
209 | first book on the language.
210 | 
211 | I just wanted to show-case the fact that Haskell is a great alternative
212 | for building fast and lightweight system programs.
213 | 
214 | ### The awesome part: distribution!
215 | 
216 | The full source for this program is available at
217 | <https://github.com/pyr/nginxpipe>, it can be built in one of two ways:
218 | 
219 | -   Using the cabal dependency management system (which calls GHC)
220 | -   With the GHC compiler directly
221 | 
222 | With cabal you would just run:
223 | 
224 | ```bash
225 | cabal install --prefix=/somewhere
226 | ```
227 | 
228 | Let's look at the ouput:
229 | 
230 | ```bash
231 | $ ldd /somewhere/bin/nginxpipe 
232 | linux-vdso.so.1 (0x00007fffe67fe000)
233 | librt.so.1 => /usr/lib/librt.so.1 (0x00007fb8064d8000)
234 | libutil.so.1 => /usr/lib/libutil.so.1 (0x00007fb8062d5000)
235 | libdl.so.2 => /usr/lib/libdl.so.2 (0x00007fb8060d1000)
236 | libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007fb805eb3000)
237 | libgmp.so.10 => /usr/lib/libgmp.so.10 (0x00007fb805c3c000)
238 | libm.so.6 => /usr/lib/libm.so.6 (0x00007fb805939000)
239 | libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fb805723000)
240 | libc.so.6 => /usr/lib/libc.so.6 (0x00007fb805378000)
241 | /lib64/ld-linux-x86-64.so.2 (0x00007fb8066e0000)
242 | $ du -sh /somewhere/bin/nginxpipe
243 | 1.9M /somewhere/bin/nginxpipe
244 | ```
245 | 
246 | That's right, no crazy dependencies (for instance, this figures out the
247 | correct dependencies across archlinux, ubuntu and debian for me) and a
248 | smallish executable.
249 | 
250 | Obviously this is not a complete solution as-is, but quickly adding
251 | support for a real configuration file would not be a huge endeavour,
252 | where for instance an alternative command to nginx could be provided.
253 | 
254 | Hopefully this will help you consider haskell for your system
255 | programming needs in the future!
256 | 


--------------------------------------------------------------------------------
/content/post/019-neat-trick-using-puppet-as-your-internal-ca.md:
--------------------------------------------------------------------------------
  1 | #+title: Neat Trick: using Puppet as your internal CA
  2 | #+slug: neat-trick-using-puppet-as-your-internal-ca
  3 | #+date: 2013-05-30
  4 | 
  5 | It's a shame that so many organisations rely on HTTP basic-auth and
  6 | self-signed certs to secure access to internal tools. Sure enough, it's
  7 | quick and easy to deploy, but you get stuck in a world where:
  8 | 
  9 | -   Credentials are scattered and difficult to manage
 10 | -   The usability of some tools gets broken
 11 | -   Each person coming in or out of the company means
 12 | 
 13 | either a sweep of the your password databases or a new attack surface.
 14 | 
 15 | The only plausible cause for this state of affairs is the perceived
 16 | complexity of setting up an internal PKI infrastructure. Unfortunately,
 17 | this means passing out on a great UI-respecting authentication and -
 18 | with a little plumbing - authorization scheme.
 19 | 
 20 | Once an internal CA is setup you get the following benefits:
 21 | 
 22 | -   Simplified securing of any internal website
 23 | -   Single-Sign-On (SSO) access to sites
 24 | -   Easy and error-free site-wide privilege revocation
 25 | -   Securing of more than just websites but any SSL aware service
 26 | 
 27 | Bottom line, CAs are cool
 28 | 
 29 | ### The overall picture of a PKI
 30 | 
 31 | CAs take part in PKI - Public Key Infrastructure - A big word to
 32 | designate a human and/or automated process to handle the lifecycle of
 33 | digital certificates within an organisation.
 34 | 
 35 | When your browser accesses an SSL secured site, it will verify the
 36 | presented signature against the list of stored CAs it holds.
 37 | 
 38 | Just like any public and private key pairs, the public part can be
 39 | distributed by any means.
 40 | 
 41 | ### The catch
 42 | 
 43 | So if internal CAs have so many benefits, how come no one uses them ?
 44 | Here's the thing, tooling plain **sucks**. It's very easy to get lost in
 45 | a maze of bad `openssl` command-line options when you first tackle the
 46 | task, or get sucked in the horrible `CA.pl` which lives in
 47 | `/etc/ssl/ca/CA.pl` on many systems.
 48 | 
 49 | So the usual process is: spend a bit of time crafting a system that
 50 | generates certificates, figure out too late that serials must be
 51 | factored in from the start to integrate revocation support, start over.
 52 | 
 53 | All this eventually gets hidden behind a bit of shell script and ends up
 54 | working but is severely lacking.
 55 | 
 56 | The second reason is that, in addition to tooling issues, it is easy to
 57 | get bitten and use them the wrong way: forgot to include a Certificate
 58 | Revocation List (CRL) with your certificate ? You have no way of letting
 59 | your infrastructure know someone left the company ! You're not
 60 | monitoring the expiry of certificates ? Everybody gets locked out
 61 | (usually happens over a weekend).
 62 | 
 63 | ### A word on revocation
 64 | 
 65 | No CA is truly useful without a good scheme for revocation. There are
 66 | two ways of handling it:
 67 | 
 68 | -   Distributing a Certificate Revocation List (or `CRL`), which is a
 69 |     plain list of serials that have been revoked.
 70 | -   Making use of a Role Based Access Control (or `RBAC`) server, which
 71 |     lives at an address bundled in the certificate which clients can
 72 |     connect to to validate.
 73 | 
 74 | If you manage a small number of services and have a configuration
 75 | management framework or build your own packages, relying on a CRL is
 76 | valid and will be the mechanism described in this article.
 77 | 
 78 | ### The ideal tool
 79 | 
 80 | Ultimately, what you'd expect from a CA managing tool is just a way to
 81 | get a list of certs, generate them and revoke them.
 82 | 
 83 | ### Guess what ? Chances are you already have an internal CA !
 84 | 
 85 | If you manage your infrastructure with a configuration management
 86 | framework - and you should - there's a roughly 50% chance that you are
 87 | using [puppet](https://puppetlabs.com).
 88 | 
 89 | If you do, then you already are running an internal CA, since that is
 90 | what the puppet master process is using to authenticate nodes contacting
 91 | it.
 92 | 
 93 | When you issue your first puppet run against the master, a CSR
 94 | (certificate signing request) is generated against the master's CA,
 95 | depending on the master's policy it will be either automatically signed
 96 | or stored, in which case it will show up in the output of the
 97 | `puppet cert list` command. CSRs can then be signed with
 98 | `puppet cert sign`.
 99 | 
100 | But there is nothing special to these certificates, `puppet cert` just
101 | exposes a nice facade to a subset of OpenSSL's functionality.
102 | 
103 | ### What if I dont' use puppet
104 | 
105 | The CA part of puppet's code stands on it's own and by installing puppet
106 | through `apt-get`, `yum`, or `gem` you will get the functionality
107 | without needing to start any additional service on your machine.
108 | 
109 | ### Using the CA
110 | 
111 | Since your CA isn't a root one, it needs to be registered wherever you
112 | will need to validate certs. Usually this just means installing it in
113 | your browser. The CA is nothing more than a public key and can be
114 | distributed as is.
115 | 
116 | For the purpose of this article, puppet wil be run with a different
117 | configuration to avoid interfering with its own certificates. This means
118 | adding a `--confdir` to every command you issue.
119 | 
120 | ### A typical set-up
121 | 
122 | To illustrate how to set up a complete solution using the puppet
123 | commmand line tool, we will assume you have three separate sites to
124 | authenticate:
125 | 
126 | -   Your internal portal and documentation site: `doc.priv.example.com`
127 | -   Graphite: `graph.priv.example.com`
128 | -   Kibana: `logs.priv.example.com`
129 | 
130 | This set-up will be expected to handle authentication on behalf of
131 | graphite, the internal portal and kibana.
132 | 
133 | Although a CA can be published to several servers, in this mock
134 | infrastructure, a single [nginx](http://nginx.org) reverse proxy is used
135 | to redirect traffic to internal sites.
136 | 
137 | ![infrastructure](/media/puppet-internal-ca/infra.png)
138 | 
139 | #### Setting up your CA
140 | 
141 | First things first, lets provide an isolated sandbox for puppet to
142 | handle its certificates in.
143 | 
144 | I'll assume you want all certificate data to live in `/etc/ssl-ca`.
145 | Start by creating the directory and pushing the following
146 | configuration in `/etc/ssl-ca/puppet.conf`
147 | 
148 | ```
149 | [main]
150 | logdir=/etc/ssl-ca/log
151 | vardir=/etc/ssl-ca/data
152 | ssldir=/etc/ssl-ca/ssl
153 | rundir=/etc/ssl-ca/run
154 | ```
155 | 
156 | Your now ready to generate your initial environment with:
157 | 
158 | ```bash
159 | puppet cert --configdir /etc/ssl-ca list
160 | ```
161 | 
162 | At this point you have generated a CA, and you're ready to generate
163 | new certificates for your users.
164 | 
165 | Although certs can be arbitrarily named, I tend to stick to a naming
166 | scheme that matches the domain the sites it runs on, in this case,
167 | we could go with `users.priv.example.com`.
168 | 
169 | We have three users in the organisation: Alice, Bob and Charlie,
170 | lets give them each a certificate and one for each service we will
171 | run.
172 | 
173 | ```bash
174 | for admin in alice bob charlie; do
175 | puppet cert --configdir /etc/ssl-ca generate ${admin}.users.priv.example.com
176 | done
177 | 
178 | for service in doc build graph; do
179 | puppet cert --configdir /etc/ssl-ca generate ${service}.priv.example.com
180 | done
181 | 
182 | ```
183 | 
184 | Your users now all have a valid certificate. Two steps remain: using
185 | the CA on the HTTP servers, and installing the certificate on the
186 | users' browsers.
187 | 
188 | For each of your sites, the following SSL configuration block can be
189 | used in nginx:
190 | 
191 | ```
192 | ssl on;
193 | ssl_verify_client on;
194 | ssl_certificate '/etc/ssl-ca/ssl/certs/doc.priv.example.com.pem';
195 | ssl_certificate_key '/etc/ssl-ca/private_keys/doc.priv.example.com.pem';
196 | ssl_crl '/etc/ssl-ca/ssl/ca/ca_crl.pem';
197 | ssl_client_certificate '/etc/ssl/ssl/ca/ca_crt.pem';
198 | ssl_session_cache 'shared:SSL:128m';
199 | ```
200 | 
201 | A few notes on the above configuration:
202 | 
203 | -   `ssl_verify_client on` instructs the web server to only allow
204 | traffic for which a valid client certificate was presented.
205 | -   read up on `ssl_session_cache` to decide which strategy works
206 | for you.
207 | -   do not be fooled by the directive name, `ssl_client_certificate`
208 | points to the certificate used to authenticate client
209 | certificates with.
210 | 
211 | ### Installing the certificate on browsers
212 | 
213 | Now that servers are ready to authenticate incoming clients, the last
214 | step is to distribute certificates out to clients. The ca~crt~.pem and
215 | client cert and key could be given as-is, but browsers usually expect
216 | the CA and certificate to be bundled in a `PKCS12` file.
217 | 
218 | For this, a simple script will do the trick, this one would expect the
219 | name of the generated user's certificate and a password, adapt to your
220 | liking:
221 | 
222 | ```bash
223 | #!/bin/sh
224 | 
225 | name=$1
226 | password=$2
227 | domain=example.com
228 | ssl_dir=/etc/ssl-ca/ssl
229 | cert_name=`echo $name.$domain`
230 | mkdir -p $ssl_dir/pkcs12
231 | 
232 | openssl pkcs12 -export -in $ssl_dir/certs/$full_name.pem -inkey         \
233 |   $ssl_dir/private_keys/$full_name.pem -certfile $ssl_dir/ca/ca_crt.pem \
234 |   -out $ssl_dir/pkcs12/$full_name.p12 -passout pass:$password
235 | ```
236 | 
237 | The resulting file can be handed over to your staff who will then
238 | happily access services
239 | 
240 | ### Handling Revocation
241 | 
242 | Revocation is a simple matter of issuing a `puppet cert revoke` command
243 | and then redistributing the CRL file to web servers. As mentionned
244 | earlier I would advise distributing the CRL as an OS package, which will
245 | let you quickly deploy updates and ensure all your servers honor your
246 | latest revocation list.
247 | 


--------------------------------------------------------------------------------
/static/files/2014-01-14-twitter-trending.html:
--------------------------------------------------------------------------------
  1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
  2 |    "http://www.w3.org/TR/html4/strict.dtd">
  3 | 
  4 | <html>
  5 | <head>
  6 |   <title>2014-01-14-twitter-trending</title>
  7 |   <meta http-equiv="content-type" content="text/html; charset=utf-8">
  8 |   <style type="text/css">
  9 | td.linenos { background-color: #f0f0f0; padding-right: 10px; }
 10 | span.lineno { background-color: #f0f0f0; padding: 0 5px 0 5px; }
 11 | pre { line-height: 125%; }
 12 | body .hll { background-color: #ffffcc }
 13 | body  { background: #f8f8f8; }
 14 | body .c { color: #408080; font-style: italic } /* Comment */
 15 | body .err { border: 1px solid #FF0000 } /* Error */
 16 | body .k { color: #008000; font-weight: bold } /* Keyword */
 17 | body .o { color: #666666 } /* Operator */
 18 | body .ch { color: #408080; font-style: italic } /* Comment.Hashbang */
 19 | body .cm { color: #408080; font-style: italic } /* Comment.Multiline */
 20 | body .cp { color: #BC7A00 } /* Comment.Preproc */
 21 | body .cpf { color: #408080; font-style: italic } /* Comment.PreprocFile */
 22 | body .c1 { color: #408080; font-style: italic } /* Comment.Single */
 23 | body .cs { color: #408080; font-style: italic } /* Comment.Special */
 24 | body .gd { color: #A00000 } /* Generic.Deleted */
 25 | body .ge { font-style: italic } /* Generic.Emph */
 26 | body .gr { color: #FF0000 } /* Generic.Error */
 27 | body .gh { color: #000080; font-weight: bold } /* Generic.Heading */
 28 | body .gi { color: #00A000 } /* Generic.Inserted */
 29 | body .go { color: #888888 } /* Generic.Output */
 30 | body .gp { color: #000080; font-weight: bold } /* Generic.Prompt */
 31 | body .gs { font-weight: bold } /* Generic.Strong */
 32 | body .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
 33 | body .gt { color: #0044DD } /* Generic.Traceback */
 34 | body .kc { color: #008000; font-weight: bold } /* Keyword.Constant */
 35 | body .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */
 36 | body .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */
 37 | body .kp { color: #008000 } /* Keyword.Pseudo */
 38 | body .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */
 39 | body .kt { color: #B00040 } /* Keyword.Type */
 40 | body .m { color: #666666 } /* Literal.Number */
 41 | body .s { color: #BA2121 } /* Literal.String */
 42 | body .na { color: #7D9029 } /* Name.Attribute */
 43 | body .nb { color: #008000 } /* Name.Builtin */
 44 | body .nc { color: #0000FF; font-weight: bold } /* Name.Class */
 45 | body .no { color: #880000 } /* Name.Constant */
 46 | body .nd { color: #AA22FF } /* Name.Decorator */
 47 | body .ni { color: #999999; font-weight: bold } /* Name.Entity */
 48 | body .ne { color: #D2413A; font-weight: bold } /* Name.Exception */
 49 | body .nf { color: #0000FF } /* Name.Function */
 50 | body .nl { color: #A0A000 } /* Name.Label */
 51 | body .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */
 52 | body .nt { color: #008000; font-weight: bold } /* Name.Tag */
 53 | body .nv { color: #19177C } /* Name.Variable */
 54 | body .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */
 55 | body .w { color: #bbbbbb } /* Text.Whitespace */
 56 | body .mb { color: #666666 } /* Literal.Number.Bin */
 57 | body .mf { color: #666666 } /* Literal.Number.Float */
 58 | body .mh { color: #666666 } /* Literal.Number.Hex */
 59 | body .mi { color: #666666 } /* Literal.Number.Integer */
 60 | body .mo { color: #666666 } /* Literal.Number.Oct */
 61 | body .sa { color: #BA2121 } /* Literal.String.Affix */
 62 | body .sb { color: #BA2121 } /* Literal.String.Backtick */
 63 | body .sc { color: #BA2121 } /* Literal.String.Char */
 64 | body .dl { color: #BA2121 } /* Literal.String.Delimiter */
 65 | body .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */
 66 | body .s2 { color: #BA2121 } /* Literal.String.Double */
 67 | body .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */
 68 | body .sh { color: #BA2121 } /* Literal.String.Heredoc */
 69 | body .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */
 70 | body .sx { color: #008000 } /* Literal.String.Other */
 71 | body .sr { color: #BB6688 } /* Literal.String.Regex */
 72 | body .s1 { color: #BA2121 } /* Literal.String.Single */
 73 | body .ss { color: #19177C } /* Literal.String.Symbol */
 74 | body .bp { color: #008000 } /* Name.Builtin.Pseudo */
 75 | body .fm { color: #0000FF } /* Name.Function.Magic */
 76 | body .vc { color: #19177C } /* Name.Variable.Class */
 77 | body .vg { color: #19177C } /* Name.Variable.Global */
 78 | body .vi { color: #19177C } /* Name.Variable.Instance */
 79 | body .vm { color: #19177C } /* Name.Variable.Magic */
 80 | body .il { color: #666666 } /* Literal.Number.Integer.Long */
 81 | 
 82 |   </style>
 83 | </head>
 84 | <body>
 85 |   
 86 |   <h2>2014-01-14-twitter-trending-riemann.clj</h2>
 87 |   <p><a href="2014-01-14-twitter-trending-riemann.clj">view raw</a></p>
 88 | 
 89 | <div class="highlight"><pre><span></span><span class="c1">; -*- mode: clojure; -*-</span>
 90 | <span class="c1">; vim: filetype=clojure</span>
 91 | 
 92 | <span class="p">(</span><span class="nf">logging/init</span><span class="p">)</span>
 93 | <span class="p">(</span><span class="nf">instrumentation</span> <span class="p">{</span><span class="ss">:enabled?</span> <span class="nv">false</span><span class="p">})</span>
 94 | <span class="p">(</span><span class="nf">udp-server</span><span class="p">)</span>
 95 | <span class="p">(</span><span class="nf">tcp-server</span><span class="p">)</span>
 96 | <span class="p">(</span><span class="nf">periodically-expire</span> <span class="mi">1</span><span class="p">)</span>
 97 | 
 98 | <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">store</span>    <span class="p">(</span><span class="nf">index</span><span class="p">)</span>
 99 |       <span class="nv">trending</span> <span class="p">(</span><span class="nf">top</span> <span class="mi">10</span> <span class="ss">:metric</span> <span class="p">(</span><span class="nf">tag</span> <span class="s">&quot;top&quot;</span> <span class="nv">store</span><span class="p">)</span> <span class="nv">store</span><span class="p">)]</span>
100 |   <span class="p">(</span><span class="nf">streams</span>
101 |     <span class="p">(</span><span class="nf">by</span> <span class="ss">:service</span> <span class="p">(</span><span class="nf">moving-time-window</span> <span class="mi">3600</span> <span class="p">(</span><span class="nf">smap</span> <span class="nv">folds/sum</span> <span class="nv">trending</span><span class="p">)))))</span>
102 | </pre></div>
103 | 
104 | 
105 |   <h2>2014-01-14-twitter-trending-firehose.rb</h2>
106 |   <p><a href="2014-01-14-twitter-trending-firehose.rb">view raw</a></p>
107 | 
108 | <div class="highlight"><pre><span></span><span class="nb">require</span> <span class="s1">&#39;tweetstream&#39;</span>
109 | <span class="nb">require</span> <span class="s1">&#39;riemann/client&#39;</span>
110 | 
111 | <span class="no">TweetStream</span><span class="o">.</span><span class="n">configure</span> <span class="k">do</span> <span class="o">|</span><span class="n">config</span><span class="o">|</span>
112 |   <span class="n">config</span><span class="o">.</span><span class="n">consumer_key</span>       <span class="o">=</span> <span class="s1">&#39;xxx&#39;</span>
113 |   <span class="n">config</span><span class="o">.</span><span class="n">consumer_secret</span>    <span class="o">=</span> <span class="s1">&#39;xxx&#39;</span>
114 |   <span class="n">config</span><span class="o">.</span><span class="n">oauth_token</span>        <span class="o">=</span> <span class="s1">&#39;xxx&#39;</span>
115 |   <span class="n">config</span><span class="o">.</span><span class="n">oauth_token_secret</span> <span class="o">=</span> <span class="s1">&#39;xxx&#39;</span>
116 |   <span class="n">config</span><span class="o">.</span><span class="n">auth_method</span>        <span class="o">=</span> <span class="ss">:oauth</span>
117 | <span class="k">end</span>
118 | 
119 | <span class="n">riemann</span> <span class="o">=</span> <span class="no">Riemann</span><span class="o">::</span><span class="no">Client</span><span class="o">.</span><span class="n">new</span>
120 | 
121 | 
122 | <span class="no">TweetStream</span><span class="o">::</span><span class="no">Client</span><span class="o">.</span><span class="n">new</span><span class="o">.</span><span class="n">sample</span> <span class="k">do</span> <span class="o">|</span><span class="n">status</span><span class="o">|</span>
123 |   <span class="n">tags</span> <span class="o">=</span> <span class="n">status</span><span class="o">.</span><span class="n">text</span><span class="o">.</span><span class="n">scan</span><span class="p">(</span><span class="sr">/\s#([[:alnum:]]+)/</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">{</span><span class="o">|</span><span class="n">x</span><span class="o">|</span> <span class="n">x</span><span class="o">.</span><span class="n">first</span><span class="o">.</span><span class="n">downcase</span><span class="p">}</span>
124 | 
125 |   <span class="n">tags</span><span class="o">.</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">tag</span><span class="o">|</span>
126 |     <span class="nb">puts</span> <span class="s2">&quot;emitting </span><span class="si">#{</span><span class="n">tag</span><span class="si">}</span><span class="s2">&quot;</span>
127 |     <span class="n">riemann</span> <span class="o">&lt;&lt;</span> <span class="p">{</span>
128 |       <span class="ss">service</span><span class="p">:</span> <span class="s2">&quot;</span><span class="si">#{</span><span class="n">tag</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">,</span>
129 |       <span class="ss">metric</span><span class="p">:</span> <span class="mi">1</span><span class="o">.</span><span class="mi">0</span><span class="p">,</span>
130 |       <span class="ss">tags</span><span class="p">:</span> <span class="o">[</span><span class="s2">&quot;twitter&quot;</span><span class="o">]</span><span class="p">,</span>
131 |       <span class="ss">ttl</span><span class="p">:</span> <span class="mi">3600</span>
132 |     <span class="p">}</span>
133 |   <span class="k">end</span>
134 | <span class="k">end</span>
135 | </pre></div>
136 | </body>
137 | </html>
138 | 


--------------------------------------------------------------------------------
/content/post/009-the-death-of-the-configuration-file.md:
--------------------------------------------------------------------------------
  1 | #+title: The death of the configuration file
  2 | #+date: 2011-09-18
  3 | 
  4 | Taking on a new platform design recently I thought it was interesting to
  5 | see how things evolved in the past years and how we design and think
  6 | about platform architecture.
  7 | 
  8 | ### So what do we do ?
  9 | 
 10 | As system developers, system administrators and system engineers, what
 11 | do we do ?
 12 | 
 13 | -   We develop software
 14 | -   We design architectures
 15 | -   We configure systems
 16 | 
 17 | But it isn't the purpose of our jobs, for most of us, our purpose is to
 18 | generate business value. From a non technical perspective we generate
 19 | business value by creating a system which renders one or many functions
 20 | and provides insight into its operation.
 21 | 
 22 | And we do this by developing, logging, configuration and maintaining
 23 | software across many machines.
 24 | 
 25 | When I started doing this - back when knowing how to write a sendmail
 26 | configuration file could get you a paycheck - it all came down to
 27 | setting up a few machines, a database server a web server a mail server,
 28 | each logging locally and providing its own way of reporting metrics.
 29 | 
 30 | When designing custom software, you would provide reports over a local
 31 | **AF_UNIX** socket, and configure your software by writing elegant
 32 | parsers with **yacc** (or its GNU equivalent, **bison**).
 33 | 
 34 | When I joined the **OpenBSD** team, I did a lot of work on configuration
 35 | files, ask any members of the team, the configuration files are a big
 36 | concern, and careful attention is put into clean, human readable and
 37 | writable syntax, additionally, all configuration files are expected to
 38 | look and feel the same, for consistency.
 39 | 
 40 | It seems as though the current state of large applications now demands
 41 | another way to interact with operating systems, and some tools are now
 42 | leading the way.
 43 | 
 44 | ### So what has changed ?
 45 | 
 46 | While our mission is still the same from a non technical perspective,
 47 | the technical landscape has evolved and went through several phases.
 48 | 
 49 | 1.  The first era of repeatable architecture
 50 | 
 51 |     We first realized that as soon as several machines performed the
 52 |     same task the need for repeatable, coherent environments became
 53 |     essential. Typical environments used a combination of **cfengine**,
 54 |     **NFS** and mostly **perl** scripts to achieve these goals.
 55 | 
 56 |     Insight and reporting was then providing either by horrible
 57 |     proprietary kludges that I shall not name here, or emergent tools
 58 |     such as **netsaint** (now **nagios**), **mrtg** and the like.
 59 | 
 60 | 2.  The XML mistake
 61 | 
 62 |     Around that time, we started hearing more and more about **XML**,
 63 |     then touted as the solution to almost every problem. The rationale
 64 |     was that **XML** was - **somewhat** - easy to parse, and would allow
 65 |     developers to develop configuration interfaces separately from the
 66 |     core functionality.
 67 | 
 68 |     While this was a noble goal, it was mostly a huge failure. Above
 69 |     all, it was a victory of developers over people using their
 70 |     software, since they didn't bother writing syntax parsers and let
 71 |     users cope with the complicated syntax.
 72 | 
 73 |     Another example was the difference between Linux's **iptables** and
 74 |     OpenBSD's **pf**. While the former was supposed to be the backend
 75 |     for a firewall handling tool that never saw the light of day, the
 76 |     latter provided a clean syntax.
 77 | 
 78 | 3.  Infrastructure as code
 79 | 
 80 |     Fast forward a couple of years, most users of **cfengine** were fed
 81 |     up with its limitations, architectures while following the same
 82 |     logic as before became bigger and bigger. The need for repeatable
 83 |     and sane environments was as important as it ever was.
 84 | 
 85 |     At that point of time, **PXE** installations were added to the mix
 86 |     of big infrastructures and many people started looking at **puppet**
 87 |     as a viable alternative to **cfengine**.
 88 | 
 89 |     **puppet** provided a cleaner environment, and allowed easier
 90 |     formalization of technology, platform and configuration.
 91 |     Philosophically though, **puppet** stays very close to **cfengine**
 92 |     by providing a way to configure large amounts of system through a
 93 |     central repository.
 94 | 
 95 |     At that point, large architectures also needed command and control
 96 |     interfaces. As noted before, most of these were implemented as
 97 |     **perl** or shell scripts in **SSH** loops.
 98 | 
 99 |     On the monitoring and graphing front, not much was happening,
100 |     **nagios** and **cacti** were almost ubiquitous, while some tools
101 |     such as **ganglia** and **collectd** were making a bit of progress.
102 | 
103 | ### Where are we now ?
104 | 
105 | At some point recently, our applications started doing more. While for a
106 | long time the canonical dynamic web application was a busy forum, more
107 | complex sites started appearing everywhere. We were not building and
108 | operating sites anymore but applications. And while with the help of
109 | **haproxy**, **varnish** and the likes, the frontend was mostly a
110 | settled affair, complex backends demanded more work.
111 | 
112 | At the same time the advent of social enabled applications demanded much
113 | more insight into the habits of users in applications and thorough
114 | analytics.
115 | 
116 | New tools emerged to help us along the way:
117 | 
118 | -   In memory key value caches such as **memcached** and **redis**
119 | -   Fast elastic key value stores such as **cassandra**
120 | -   Distributed computing frameworks such as **hadoop**
121 | -   And of course on demand virtualized instances, aka: **The Cloud**
122 | 
123 | 1.  Some daemons only provide small functionality
124 | 
125 |     The main difference in the new stack found in backend systems is
126 |     that the software stacks that run are not useful on their own
127 |     anymore.
128 | 
129 |     Software such as **zookeeper**, **kafka**, **rabbitmq** serve no
130 |     other purpose that to provide supporting services in applications
131 |     and their functionality are almost only available as libraries to be
132 |     used in distributed application code.
133 | 
134 | 2.  Infrastructure as code is not infrastructure in code !
135 | 
136 |     What we missed along the way it seems is that even though our
137 |     applications now span multiple machines and daemons provide a subset
138 |     of functionality, most tools still reason with the machine as the
139 |     top level abstraction.
140 | 
141 |     **puppet** for instance is meant to configure nodes, not cluster and
142 |     makes dependencies very hard to manage. A perfect example is the
143 |     complications involved in setting up configurations dependent on
144 |     other machines.
145 | 
146 |     Monitoring and graphing, except for **ganglia** has long suffered
147 |     from the same problem.
148 | 
149 | ### The new tools we need
150 | 
151 | We need to kill local configurations, plain and simple. With a simple
152 | enough library to interact with distant nodes, starting and stopping
153 | service, configuration can happen in a single place and instead of
154 | relying on a repository based configuration manager, configuration
155 | should happen from inside applications and not be an external process.
156 | 
157 | If this happens in a library, command & control must also be added to
158 | the mix, with centralized and tagged logging, reporting and metrics.
159 | 
160 | This is going to take some time, because it is a huge shift in the way
161 | we write software and design applications. Today, configuration
162 | management is a very complex stack of workarounds for non standardized
163 | interactions with local package management, service control and software
164 | configuration.
165 | 
166 | Today dynamically configuring **bind**, **haproxy** and **nginx**,
167 | installing a package on a **Debian** or **OpenBSD**, restarting a
168 | service, all these very simple tasks which we automate and operate from
169 | a central repository force us to build complex abstractions. When using
170 | **puppet**, **chef** or **pallet**, we write complex templates because
171 | software was meant to be configured by humans.
172 | 
173 | The same goes for checking the output of running arbitrary scripts on
174 | machines.
175 | 
176 | 1.  Where we'll be tomorrow
177 | 
178 |     With the ease PaaS solutions bring to developers, and offers such as
179 |     the ones from VMWare and open initiatives such as OpenStack, it
180 |     seems as though virtualized environments will very soon be found
181 |     everywhere, even in private companies which will deploy such
182 |     environments on their own hardware.
183 | 
184 |     I would not bet on it happening but a terse input and output format
185 |     for system tools and daemons would go a long way in ensuring easy
186 |     and fast interaction with configuration management and command and
187 |     control software.
188 | 
189 |     While it was a mistake to try to push **XML** as a terse format
190 |     replacing configuration file to interact with single machines, a
191 |     terse format is needed to interact with many machines providing the
192 |     same service, or to run many tasks in parallel - even though,
193 |     admittedly , tools such as **capistrano** or **mcollective** do a
194 |     good job at running things and providing sensible output.
195 | 
196 | 2.  The future is now !
197 | 
198 |     Some projects are leading the way in this new orientation, 2011 as
199 |     I've seen it called will be the year of the time series boom. For
200 |     package management and logging, Jordan Sissel released such great
201 |     tools as **logstash** and **fpm**. For easy graphing and deployment
202 |     **etsy** released great tools, amongst which **statsd**.
203 | 
204 |     As for bridging the gap between provisionning, configuration
205 |     management, command and control and deploys I think two tools, both
206 |     based on jclouds[^1] are going in the right direction:
207 | 
208 |     -   Whirr[^2]: Which let you start a cluster through code, providing
209 | 
210 |     recipes for standard deploys (**zookeeper**, **hadoop**)
211 | 
212 |     -   pallet[^3]: Which lets you describe your infrastructure as code
213 |         and
214 | 
215 |     interact with it in your own code. **pallet**'s phase approach to
216 |     cluster configuration provides a smooth dependency framework which
217 |     allows easy description of dependencies between configuration across
218 |     different clusters of machines.
219 | 
220 | 3.  Who's getting left out ?
221 | 
222 |     One area where things seem to move much slower is network device
223 |     configuration, for people running open source based load-balancers
224 |     and firewalls, things are looking a bit nicer, but the switch
225 |     landscape is a mess. As tools mostly geared towards public cloud
226 |     services will make their way in private corporate environments,
227 |     hopefully they'll also get some of the programmable
228 | 
229 | [^1]: <http://www.jclouds.org>
230 | 
231 | [^2]: <http://whirr.apache.org>
232 | 
233 | [^3]: <https://github.com/pallet/pallet>
234 | 


--------------------------------------------------------------------------------
/static/files/2016-12-17-atomic-database.html:
--------------------------------------------------------------------------------
  1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
  2 |    "http://www.w3.org/TR/html4/strict.dtd">
  3 | 
  4 | <html>
  5 | <head>
  6 |   <title>2016-12-17-atomic-database.clj</title>
  7 |   <meta http-equiv="content-type" content="text/html; charset=utf-8">
  8 |   <style type="text/css">
  9 | td.linenos { background-color: #f0f0f0; padding-right: 10px; }
 10 | span.lineno { background-color: #f0f0f0; padding: 0 5px 0 5px; }
 11 | pre { line-height: 125%; }
 12 | body .hll { background-color: #ffffcc }
 13 | body  { background: #f8f8f8; }
 14 | body .c { color: #408080; font-style: italic } /* Comment */
 15 | body .err { border: 1px solid #FF0000 } /* Error */
 16 | body .k { color: #008000; font-weight: bold } /* Keyword */
 17 | body .o { color: #666666 } /* Operator */
 18 | body .ch { color: #408080; font-style: italic } /* Comment.Hashbang */
 19 | body .cm { color: #408080; font-style: italic } /* Comment.Multiline */
 20 | body .cp { color: #BC7A00 } /* Comment.Preproc */
 21 | body .cpf { color: #408080; font-style: italic } /* Comment.PreprocFile */
 22 | body .c1 { color: #408080; font-style: italic } /* Comment.Single */
 23 | body .cs { color: #408080; font-style: italic } /* Comment.Special */
 24 | body .gd { color: #A00000 } /* Generic.Deleted */
 25 | body .ge { font-style: italic } /* Generic.Emph */
 26 | body .gr { color: #FF0000 } /* Generic.Error */
 27 | body .gh { color: #000080; font-weight: bold } /* Generic.Heading */
 28 | body .gi { color: #00A000 } /* Generic.Inserted */
 29 | body .go { color: #888888 } /* Generic.Output */
 30 | body .gp { color: #000080; font-weight: bold } /* Generic.Prompt */
 31 | body .gs { font-weight: bold } /* Generic.Strong */
 32 | body .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
 33 | body .gt { color: #0044DD } /* Generic.Traceback */
 34 | body .kc { color: #008000; font-weight: bold } /* Keyword.Constant */
 35 | body .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */
 36 | body .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */
 37 | body .kp { color: #008000 } /* Keyword.Pseudo */
 38 | body .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */
 39 | body .kt { color: #B00040 } /* Keyword.Type */
 40 | body .m { color: #666666 } /* Literal.Number */
 41 | body .s { color: #BA2121 } /* Literal.String */
 42 | body .na { color: #7D9029 } /* Name.Attribute */
 43 | body .nb { color: #008000 } /* Name.Builtin */
 44 | body .nc { color: #0000FF; font-weight: bold } /* Name.Class */
 45 | body .no { color: #880000 } /* Name.Constant */
 46 | body .nd { color: #AA22FF } /* Name.Decorator */
 47 | body .ni { color: #999999; font-weight: bold } /* Name.Entity */
 48 | body .ne { color: #D2413A; font-weight: bold } /* Name.Exception */
 49 | body .nf { color: #0000FF } /* Name.Function */
 50 | body .nl { color: #A0A000 } /* Name.Label */
 51 | body .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */
 52 | body .nt { color: #008000; font-weight: bold } /* Name.Tag */
 53 | body .nv { color: #19177C } /* Name.Variable */
 54 | body .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */
 55 | body .w { color: #bbbbbb } /* Text.Whitespace */
 56 | body .mb { color: #666666 } /* Literal.Number.Bin */
 57 | body .mf { color: #666666 } /* Literal.Number.Float */
 58 | body .mh { color: #666666 } /* Literal.Number.Hex */
 59 | body .mi { color: #666666 } /* Literal.Number.Integer */
 60 | body .mo { color: #666666 } /* Literal.Number.Oct */
 61 | body .sa { color: #BA2121 } /* Literal.String.Affix */
 62 | body .sb { color: #BA2121 } /* Literal.String.Backtick */
 63 | body .sc { color: #BA2121 } /* Literal.String.Char */
 64 | body .dl { color: #BA2121 } /* Literal.String.Delimiter */
 65 | body .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */
 66 | body .s2 { color: #BA2121 } /* Literal.String.Double */
 67 | body .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */
 68 | body .sh { color: #BA2121 } /* Literal.String.Heredoc */
 69 | body .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */
 70 | body .sx { color: #008000 } /* Literal.String.Other */
 71 | body .sr { color: #BB6688 } /* Literal.String.Regex */
 72 | body .s1 { color: #BA2121 } /* Literal.String.Single */
 73 | body .ss { color: #19177C } /* Literal.String.Symbol */
 74 | body .bp { color: #008000 } /* Name.Builtin.Pseudo */
 75 | body .fm { color: #0000FF } /* Name.Function.Magic */
 76 | body .vc { color: #19177C } /* Name.Variable.Class */
 77 | body .vg { color: #19177C } /* Name.Variable.Global */
 78 | body .vi { color: #19177C } /* Name.Variable.Instance */
 79 | body .vm { color: #19177C } /* Name.Variable.Magic */
 80 | body .il { color: #666666 } /* Literal.Number.Integer.Long */
 81 | 
 82 |   </style>
 83 | </head>
 84 | <body>
 85 |   <h2>2016-12-17-atomic-database.clj</h2>
 86 |   <p><a href="2016-12-17-atomic-database.clj">view raw</a></p>
 87 |   
 88 | 
 89 | <div class="highlight"><pre><span></span><span class="p">(</span><span class="kd">ns </span><span class="nv">game.score</span>
 90 |   <span class="s">&quot;Utilities to record and look up &quot;</span>
 91 |   <span class="p">(</span><span class="ss">:require</span> <span class="p">[</span><span class="nv">clojure.edn</span> <span class="ss">:as</span> <span class="nv">edn</span><span class="p">]))</span>
 92 | 
 93 | <span class="p">(</span><span class="kd">defn </span><span class="nv">make-score-db</span>
 94 |   <span class="s">&quot;Build a database of high scores&quot;</span>
 95 |   <span class="p">[]</span>
 96 |   <span class="p">(</span><span class="nf">atom</span> <span class="nv">nil</span><span class="p">))</span>
 97 | 
 98 | <span class="p">(</span><span class="k">def </span><span class="nv">compare-scores</span>
 99 |   <span class="s">&quot;A function which keeps the highest numerical value.</span>
100 | <span class="s">   Handles nil previous values.&quot;</span>
101 |   <span class="p">(</span><span class="nf">fnil</span> <span class="nb">max </span><span class="mi">0</span><span class="p">))</span>
102 | 
103 | <span class="p">(</span><span class="kd">defn </span><span class="nv">record-score!</span>
104 |   <span class="s">&quot;Record a score for user, store only if higher than</span>
105 | <span class="s">   previous or no previous score exists&quot;</span>
106 |   <span class="p">[</span><span class="nv">scores</span> <span class="nv">user</span> <span class="nv">score</span><span class="p">]</span>
107 |   <span class="p">(</span><span class="nf">swap!</span> <span class="nv">scores</span> <span class="nv">update</span> <span class="nv">user</span> <span class="nv">compare-scores</span> <span class="nv">score</span><span class="p">))</span>
108 | 
109 | <span class="p">(</span><span class="kd">defn </span><span class="nv">user-high-score</span>
110 |   <span class="s">&quot;Lookup highest score for user, may yield nil&quot;</span>
111 |   <span class="p">[</span><span class="nv">scores</span> <span class="nv">user</span><span class="p">]</span>
112 |   <span class="p">(</span><span class="nb">get </span><span class="o">@</span><span class="nv">scores</span> <span class="nv">user</span><span class="p">))</span>
113 | 
114 | <span class="p">(</span><span class="kd">defn </span><span class="nv">high-score</span>
115 |   <span class="s">&quot;Lookup absolute highest score, may yield nil</span>
116 | <span class="s">   when no scores have been recorded&quot;</span>
117 |   <span class="p">[</span><span class="nv">scores</span><span class="p">]</span>
118 |   <span class="p">(</span><span class="nb">last </span><span class="p">(</span><span class="nb">sort-by val </span><span class="o">@</span><span class="nv">scores</span><span class="p">)))</span>
119 | 
120 | <span class="p">(</span><span class="kd">defn </span><span class="nv">dump-to-path</span>
121 |   <span class="s">&quot;Store a value&#39;s representation to a given path&quot;</span>
122 |   <span class="p">[</span><span class="nb">path </span><span class="nv">value</span><span class="p">]</span>
123 |   <span class="p">(</span><span class="nf">spit</span> <span class="nb">path </span><span class="p">(</span><span class="nb">pr-str </span><span class="nv">value</span><span class="p">)))</span>
124 | 
125 | <span class="p">(</span><span class="kd">defn </span><span class="nv">load-from-path</span>
126 |   <span class="s">&quot;Load a value from its representation stored in a given path.</span>
127 | <span class="s">   When reading fails, yield nil&quot;</span>
128 |   <span class="p">[</span><span class="nv">path</span><span class="p">]</span>
129 |   <span class="p">(</span><span class="nf">try</span>
130 |     <span class="p">(</span><span class="nf">edn/read-string</span> <span class="p">(</span><span class="nb">slurp </span><span class="nv">path</span><span class="p">))</span>
131 |     <span class="p">(</span><span class="nf">catch</span> <span class="nv">Exception</span> <span class="nv">_</span><span class="p">)))</span>
132 | 
133 | <span class="p">(</span><span class="kd">defn </span><span class="nv">persist-fn</span>
134 |   <span class="s">&quot;Yields an atom watch-fn that dumps new states to a path&quot;</span>
135 |   <span class="p">[</span><span class="nv">path</span><span class="p">]</span>
136 |   <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">_</span> <span class="nv">_</span> <span class="nv">_</span> <span class="nv">state</span><span class="p">]</span>
137 |     <span class="p">(</span><span class="nf">dump-to-path</span> <span class="nb">path </span><span class="nv">state</span><span class="p">)))</span>
138 | 
139 | <span class="p">(</span><span class="kd">defn </span><span class="nv">file-backed-atom</span>
140 |    <span class="s">&quot;An atom that loads its initial state from a file and persists each new state</span>
141 | <span class="s">    to the same path&quot;</span>
142 |    <span class="p">[</span><span class="nv">path</span><span class="p">]</span>
143 |    <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">init</span>  <span class="p">(</span><span class="nf">load-from-path</span> <span class="nv">path</span><span class="p">)</span>
144 |          <span class="nv">state</span> <span class="p">(</span><span class="nf">atom</span> <span class="nv">init</span><span class="p">)]</span>
145 |      <span class="p">(</span><span class="nf">add-watch</span> <span class="nv">state</span> <span class="ss">:persist-watcher</span> <span class="p">(</span><span class="nf">persist-fn</span> <span class="nv">path</span><span class="p">))</span>
146 |      <span class="nv">state</span><span class="p">))</span>
147 | 
148 | <span class="p">(</span><span class="nf">comment</span>
149 |   <span class="p">(</span><span class="k">def </span><span class="nv">scores</span> <span class="p">(</span><span class="nf">file-backed-atom</span> <span class="s">&quot;/tmp/scores.db&quot;</span><span class="p">))</span>
150 |   <span class="p">(</span><span class="nf">high-score</span> <span class="nv">scores</span><span class="p">)</span>         <span class="c1">;; =&gt; nil</span>
151 |   <span class="p">(</span><span class="nf">user-high-score</span> <span class="nv">scores</span> <span class="ss">:a</span><span class="p">)</span> <span class="c1">;; =&gt; nil</span>
152 |   <span class="p">(</span><span class="nf">record-score!</span> <span class="nv">scores</span> <span class="ss">:a</span> <span class="mi">2</span><span class="p">)</span> <span class="c1">;; =&gt; {:a 2}</span>
153 |   <span class="p">(</span><span class="nf">record-score!</span> <span class="nv">scores</span> <span class="ss">:b</span> <span class="mi">3</span><span class="p">)</span> <span class="c1">;; =&gt; {:a 2 :b 3}</span>
154 |   <span class="p">(</span><span class="nf">record-score!</span> <span class="nv">scores</span> <span class="ss">:b</span> <span class="mi">1</span><span class="p">)</span> <span class="c1">;; =&gt; {:a 2 :b 3}</span>
155 |   <span class="p">(</span><span class="nf">record-score!</span> <span class="nv">scores</span> <span class="ss">:a</span> <span class="mi">4</span><span class="p">)</span> <span class="c1">;; =&gt; {:a 4 :b 3}</span>
156 |   <span class="p">(</span><span class="nf">user-high-score</span> <span class="nv">scores</span> <span class="ss">:a</span><span class="p">)</span> <span class="c1">;; =&gt; 4</span>
157 |   <span class="p">(</span><span class="nf">high-score</span> <span class="nv">scores</span><span class="p">)</span>         <span class="c1">;; =&gt; [:a 4]</span>
158 |   <span class="p">)</span>
159 | </pre></div>
160 | </body>
161 | </html>
162 | 


--------------------------------------------------------------------------------
/content/post/014-weekend-project-ghetto-rpc-with-redis-ruby-and-clojure.md:
--------------------------------------------------------------------------------
  1 | #+title: Weekend project: Ghetto RPC with redis, ruby and clojure
  2 | #+date: 2012-11-11
  3 | 
  4 | There's a fair amount of things that are pretty much set on current
  5 | architectures. Configuration management is handled by chef, puppet (or
  6 | pallet, for the brave). Monitoring and graphing is getter better by the
  7 | day thanks to products such as collectd, graphite and riemann. But one
  8 | area which - at least to me - still has no obvious go-to solution is
  9 | command and control.
 10 | 
 11 | There are a few choices which fall in two categories: ssh for-loops and
 12 | pubsub based solutions. As far as ssh for loops are concerned,
 13 | capistrano (ruby), fabric (python), rundeck (java) and pallet (clojure)
 14 | will do the trick, while the obvious candidate in the pubsub based space
 15 | is mcollective.
 16 | 
 17 | [Mcollective](https://github.com/puppetlabs/marionette-collective) has a
 18 | single transport system, namely STOMP, preferably set-up over RabbitMQ.
 19 | It's a great product and I recommend checking it out, but two aspects of
 20 | the solution prompted me to write a simple - albeit less featured -
 21 | alternative:
 22 | 
 23 | -   There's currently no other transport method than STOMP and I was
 24 |     reluctant to bring RabbitMQ into the already well blended technology
 25 |     mix in front of me.
 26 | -   The client implementation is ruby only.
 27 | 
 28 | So let me here engage in a bit of NIHilism and describe a redis based
 29 | approach to command and control.
 30 | 
 31 | The scope of the tool would be rather limited and only handle these
 32 | tasks:
 33 | 
 34 | -   Node discovery and filtering
 35 | -   Request / response mechanism
 36 | -   Asynchronous communication (out of order replies)
 37 | 
 38 | ### Enter redis
 39 | 
 40 | To allow out of order replies, the protocol will need to broadcast
 41 | requests and listen for replies separately. We will thus need both a
 42 | pub-sub mechanism for requests and a queue for replies.
 43 | 
 44 | While redis is initially an in-memory key value store with optional
 45 | persistence, it offers a wide range of data structures (see the full
 46 | list at <http://redis.io>) and pub-sub support. No explicit queue
 47 | function exist, but two operations on lists provide the same
 48 | functionality.
 49 | 
 50 | Let's see how this works in practice, with the standard redis-client
 51 | `redis-cli` and assuming you know how to run and connect to a redis
 52 | server:
 53 | 
 54 | 1.  Queue Example
 55 | 
 56 |     Here is how to push items on a queue named `my_queue`:
 57 | 
 58 |     ```bash
 59 |     redis 127.0.0.1:6379> LPUSH my_queue first
 60 |     (integer) 1
 61 |     redis 127.0.0.1:6379> LPUSH my_queue second
 62 |     (integer) 2
 63 |     redis 127.0.0.1:6379> LPUSH my_queue third
 64 |     (integer) 3
 65 |     ```
 66 | 
 67 |     You can now subsequently issue the following command to pop items:
 68 | 
 69 |     ```bash
 70 |     redis 127.0.0.1:6379> BRPOP my_queue 0
 71 |     1) "my_queue"
 72 |     2) "first"
 73 |     redis 127.0.0.1:6379> BRPOP my_queue 0
 74 |     1) "my_queue"
 75 |     2) "second"
 76 |     redis 127.0.0.1:6379> BRPOP my_queue 0
 77 |     1) "my_queue"
 78 |     2) "third"
 79 |     ```
 80 | 
 81 |     LPUSH as its name implies pushes items on the left (head) of a list,
 82 |     while BRPOP pops items from the right (tail) of a list, in a
 83 |     blocking manner, with a timeout argument which we set to 0, meaning
 84 |     that the action will block forever if no items are available for
 85 |     popping.
 86 | 
 87 |     This basic queue mechanism is the main mechanism used in several
 88 |     open source projecs such as logstash, resque, sidekick, and many
 89 |     others.
 90 | 
 91 | 2.  Pub-Sub Example
 92 | 
 93 |     Queues can be subscribed to through the `SUBSCRIBE` command, you'll
 94 |     need to open two clients, start by issuing this in the first:
 95 | 
 96 |     ```bash
 97 |     redis 127.0.0.1:6379> SUBSCRIBE my_exchange
 98 |     Reading messages... (press Ctrl-C to quit)
 99 |     1) "subscribe"
100 |     2) "my_hub"
101 |     3) (integer) 1
102 |     ```
103 | 
104 |     You are now listening on the `my_exchange` exchange, issue the
105 |     following in the second terminal:
106 | 
107 |     ```bash
108 |     redis 127.0.0.1:6379> PUBLISH my_exchange hey
109 |     (integer) 1
110 |     ```
111 | 
112 |     You'll now see this in the first terminal:
113 | 
114 |     ```bash
115 |     1) "message"
116 |     2) "my_hub"
117 |     3) "hey"
118 |     ```
119 | 
120 | 3.  Differences between queues and pub-sub
121 | 
122 |     The pub-sub mechanism in redis, broadcasts to all subscribers and
123 |     will not queue up data for disconnect subscribers, where-as queues
124 |     will deliver to the first available consumer, but will queue up (in
125 |     RAM, so make sure of your consuming ability)
126 | 
127 | ### Designing the protocol
128 | 
129 | With the following building blocks in place, a simple layered protocol
130 | can be designed offering the following functionality, offering the
131 | following workflow:
132 | 
133 | -   A control box broadcasts a requests with a unique ID (`UUID`), with
134 |     a command and node specification
135 | -   All nodes matching the specification reply immediately with a
136 |     `START` status, indicating that the requests has been acknowledged
137 | -   All nodes refusing to go ahead reply with a `NOOP` status
138 | -   Once execution is finished, nodes reply with a `COMPLETE` status
139 | 
140 | Acknowledgments and replies will be implemented over queues, solely to
141 | demonstrate working with queues, using pub-sub for replies would lead to
142 | cleaner code.
143 | 
144 | If we model this around `JSON`, we can thus work with the following
145 | payloads, starting with requests:
146 | 
147 | ```javascript
148 | request = {
149 |   reply_to: "51665ac9-bab5-4995-aa80-09bc79cfb2bd",
150 |   match: {
151 |     all: false, /* setting to true matches all nodes */
152 |     node_facts: {
153 |       hostname: "www*" /* allowing simple glob(3) type matches */
154 |     }
155 |   },
156 |   command: {
157 |     provider: "uptime",
158 |     args: { 
159 |      averages: {
160 |        shortterm: true,
161 |        midterm: true,
162 |        longterm: true
163 |      }
164 |     }
165 |   }
166 | }
167 | ```
168 | 
169 | `START` responses would then use the following format:
170 | 
171 | ```javascript
172 | response = {
173 |   in_reply_to: "51665ac9-bab5-4995-aa80-09bc79cfb2bd",
174 |   uuid: "5b4197bd-a537-4cc7-972f-d08ea5760feb",
175 |   hostname: "www01.example.com",
176 |   status: "start"
177 | }
178 | ```
179 | 
180 | `NOOP` responses would drop the sequence UUID not needed:
181 | 
182 | ```javascript
183 | response = {
184 |   in_reply_to: "51665ac9-bab5-4995-aa80-09bc79cfb2bd",
185 |   hostname: "www01.example.com",
186 |   status: "noop"
187 | }
188 | ```
189 | 
190 | Finally, `COMPLETE` responses would include the result of command
191 | execution:
192 | 
193 | ```javascript
194 | response = {
195 |   in_reply_to: "51665ac9-bab5-4995-aa80-09bc79cfb2bd",
196 |   uuid: "5b4197bd-a537-4cc7-972f-d08ea5760feb",
197 |   hostname: "www01.example.com",
198 |   status: "complete",
199 |   output: {
200 |     exit: 0,
201 |     time: "23:17:20",
202 |     up: "4 days, 1:45",
203 |     users: 6,
204 |     load_averages: [ 0.06, 0.10, 0.13 ]
205 |   }
206 | }
207 | ```
208 | 
209 | We essentially end up with an architecture where each node is a daemon
210 | while the command and control interface acts as a client.
211 | 
212 | ### Securing the protocol
213 | 
214 | Since this is a proof of concept protocol and we want implementation to
215 | be as simple as possible, a somewhat acceptable compromise would be to
216 | share an SSH private key specific to command and control messages
217 | amongst nodes and sign requests and responses with it.
218 | 
219 | SSL keys would also be appropriate, but using ssh keys allows the use of
220 | the simple `ssh-keygen(1)` command.
221 | 
222 | Here is a stock ruby snippet, gem which performs signing with an SSH
223 | key, given a passphrase-less key.
224 | 
225 | ```ruby
226 | require 'openssl'
227 | 
228 | signature = File.open '/path/to/private-key' do |file|
229 |   digest = OpenSSL::Digest::SHA1.digest("some text")
230 |   OpenSSL::PKey::DSA.new(file).syssign(digest)
231 | end
232 | ```
233 | 
234 | To verify a signature here is the relevant snippet:
235 | 
236 | ```ruby
237 | require 'openssl'
238 | 
239 | valid? = File.open '/path/to/private-key' do |file|
240 | 
241 |   OpenSSL::PKey::DSA.new(file).sysverify("some text", sig)
242 | end
243 | ```
244 | 
245 | This implements the common scheme of signing a SHA1 digest with a DSA
246 | key (we could just as well sign with an RSA key by using
247 | `OpenSSL::PKey::RSA`)
248 | 
249 | A better way of doing this would be to sign every request with the
250 | host's private key, and let the controller look up known host keys to
251 | validate the signature.
252 | 
253 | ### The clojure side of things
254 | 
255 | My drive for implementing a clojure controller is integration in the
256 | command and control tool I am using to interact with a number of things.
257 | 
258 | This means I only did the work to implement the controller side of
259 | things. Reading SSH keys meant pulling in the
260 | [bouncycastle](http://www.bouncycastle.org/) libs and the apache
261 | commons-codec lib for base64:
262 | 
263 | ```clojure
264 | (import '[java.security                   Signature Security KeyPair]
265 |         '[org.bouncycastle.jce.provider   BouncyCastleProvider]
266 |         '[org.bouncycastle.openssl        PEMReader]
267 |         '[org.apache.commons.codec.binary Base64])
268 | (require '[clojure.java.io :as io])
269 | 
270 | 
271 | (def algorithms {:dss "SHA1withDSA"
272 |                  :rsa "SHA1withRSA"})
273 | 
274 | ;; getting a public and private key from a path
275 | (def keypair (let [pem (-> (PEMReader. (io/reader "/path/to/key")) .readObject)]
276 |                {:public (.getPublic pem)
277 |                 :private (.getPrivate pem)}))
278 | 
279 | (def keytype :dss)
280 | 
281 | (defn sign
282 |   [content]
283 |   (-> (doto (Signature/getInstance (get algorithms keytype))
284 |         (.initSign (:private keypair))
285 |         (.update (.getBytes str)))
286 |       (.sign)
287 |       (Base64/encodeBase64string)))
288 | 
289 | (defn verify
290 |   [content signature]
291 |   (-> (doto (Signature/getInstance (get algorithms keytype))
292 |         (.initVerify (:public keypair))
293 |         (.update (.getBytes str)))
294 |       (.verify (-> signature Base64/decodeBase64))))
295 | ```
296 | 
297 | Redis support has several options, I used the `jedis` Java library which
298 | has support for everything we're interested in.
299 | 
300 | ### Wrapping up
301 | 
302 | I have early - read: with lots of room for improvements, and a few
303 | corners cut - implementations of the protocol, both the agent and
304 | controller code in ruby, and the controller code in clojure, wrapped in
305 | my IRC bot in clojure, which might warrant another article.
306 | 
307 | The code can be found here: <https://github.com/pyr/amiral> (name
308 | alternatives welcome!)
309 | 
310 | If you just want to try out, you can fetch the `amiral` gem in ruby, and
311 | start an agent like so:
312 | 
313 | ```bash
314 | $ amiral.rb -k /path/to/privkey agent
315 | ```
316 | 
317 | You can then test querying the agent through a controller:
318 | 
319 | ```bash
320 | $ amiral.rb -k /path/to/privkey controller uptime
321 | accepting acknowledgements for 2 seconds
322 | got 1/1 positive acknowledgements
323 | got 1/1 responses
324 | phoenix.spootnik.org: 09:06:15 up 5 days, 10:48, 10 users,  load average: 0.08, 0.06, 0.05
325 | ```
326 | 
327 | If you're feeling adventurous you can now start the clojure controller,
328 | it's configuration is relatively straightforward, but a bit more
329 | involved since it's part of an IRC + HTTP bot framework:
330 | 
331 | ```clojure
332 | {:transports {amiral.transport.HTTPTransport {:port 8080}
333 |               amiral.transport.irc/create    {:host "irc.freenode.net"
334 |                                               :channel "#mychan"}}
335 |  :executors {amiral.executor.fleet/create    {:keytype :dss
336 |                                               :keypath "/path/to/key"}}}
337 | ```
338 | 
339 | In that config we defined two ways of listening for incoming controller
340 | requests: IRC and HTTP, and we added an "executor" i.e: a way of doing
341 | something.
342 | 
343 | You can now query your hosts through HTTP:
344 | 
345 | ```bash
346 | $ curl -XPOST -H 'Content-Type: application/json' -d '{"args":["uptime"]}' http://localhost:8080/amiral/fleet
347 | {"count":1,
348 |  "message":"phoenix.spootnik.org: 09:40:57 up 5 days, 11:23, 10 users,  load average: 0.15, 0.19, 0.16",
349 |  "resps":[{"in_reply_to":"94ab9776-e201-463b-8f16-d33fbb75120f",
350 |            "uuid":"23f508da-7c30-432b-b492-f9d77a809a2a",
351 |            "status":"complete",
352 |            "output":{"exit":0,
353 |                      "time":"09:40:57",
354 |                      "since":"5 days, 11:23",
355 |                      "users":"10",
356 |                      "averages":["0.15","0.19","0.16"],
357 |                      "short":"09:40:57 up 5 days, 11:23, 10 users,  load average: 0.15, 0.19, 0.16"},
358 |            "hostname":"phoenix.spootnik.org"}]}
359 | ```
360 | 
361 | Or on IRC:
362 | 
363 | ```
364 | 09:42 < pyr> amiral: fleet uptime
365 | 09:42 < amiral> pyr: waiting 2 seconds for acks
366 | 09:43 < amiral> pyr: got 1/1 positive acknowledgement
367 | 09:43 < amiral> pyr: got 1 responses
368 | 09:43 < amiral> pyr: phoenix.spootnik.org: 09:42:57 up 5 days, 11:25, 10 users,  load average: 0.16, 0.20, 0.17
369 | ```
370 | 
371 | ### Next Steps
372 | 
373 | This was a fun experiment, but there are two outstanding problems which
374 | will need to be addressed quickly
375 | 
376 | -   Tests test tests. This was a PoC project to start with, I should
377 |     have known better and wrote tests along the way.
378 | -   The queue based reply handling makes controller logic complex, and
379 |     timeout handling approximate, it should be switched to pub-sub
380 | -   The signing should be done based on known hosts' public keys instead
381 |     of the shared key used now.
382 | -   The agent should expose more common actions: service interaction,
383 |     puppet runs, etc.
384 | 
385 | 


--------------------------------------------------------------------------------
/content/post/026-diving-into-the-python-pickle-format.md:
--------------------------------------------------------------------------------
  1 | #+title: Diving into the Python Pickle formatt
  2 | #+date: 2014-04-05
  3 | 
  4 | **pickle** is python's serialization format, able to freeze data, as
  5 | long as all leaves in class hierarchies are storeable. pickle falls into
  6 | the category of formats that I'm not a huge fan of. Like all
  7 | serialization formats heavily tied to a language, it makes interop
  8 | harder and pushes platform and language concerns all the way to the
  9 | storage layer.
 10 | 
 11 | You could do worse than look into Protobuf and Avro when looking for a
 12 | fast serialization format for your network protocol needs. When speed
 13 | isn't so much of an issue, EDN and JSON are also good candidates.
 14 | 
 15 | I had no choice but to look into pickle since I am writing a
 16 | compatibility layer for the carbon binary protocol in clojure within the
 17 | [cyanite](https://github.com/pyr/cyanite) project.
 18 | 
 19 | The work described in this article is available in
 20 | [pickler](https://github.com/pyr/pickler).
 21 | 
 22 | ### Format definition
 23 | 
 24 | One of the first hurdles when investigating pickle is finding its
 25 | reference. I didn't find a formal format definition, but it ended up not
 26 | being that part to piece things together.
 27 | 
 28 | I started off looking at how graphite serializes metrics, the structure
 29 | is rather simple, and ends up looking like this:
 30 | 
 31 | ``` python
 32 | metrics = [
 33 |   [ "web1.cpu0.user", [ 1332444075, 10.5 ] ],
 34 |   [ "web1.cpu1.user", [ 1332444076, 90.3 ] ]
 35 | ]
 36 | ```
 37 | 
 38 | We can now add in some code to write the metrics out:
 39 | 
 40 | ``` python
 41 | pickle.dump(metrics, open("frozen.p", "wb"))
 42 | ```
 43 | 
 44 | In addition to this, the best source seems obviously to be the
 45 | documentation and source code of the `pickletools` python modules (See
 46 | <https://docs.python.org/3.4/library/pickletools.html> and
 47 | <http://hg.python.org/cpython/file/087cdbf49e80/Lib/pickletools.py#l38>)
 48 | 
 49 | In addition to this, it's going to be useful to work with the hexdump of
 50 | the output data from above:
 51 | 
 52 | ```
 53 | 00000000: 8003 5d71 0028 5d71 0128 580e 0000 0077  ..]q.(]q.(X....w
 54 | 00000010: 6562 312e 6370 7530 2e75 7365 7271 025d  eb1.cpu0.userq.]
 55 | 00000020: 7103 284a ab7b 6b4f 4740 2500 0000 0000  q.(J.{kOG@%.....
 56 | 00000030: 0065 655d 7104 2858 0e00 0000 7765 6231  .ee]q.(X....web1
 57 | 00000040: 2e63 7075 312e 7573 6572 7105 5d71 0628  .cpu1.userq.]q.(
 58 | 00000050: 4aac 7b6b 4f47 4056 9333 3333 3333 6565  J.{kOG@V.33333ee
 59 | 00000060: 652e                                     e.
 60 | ```
 61 | 
 62 | ### A stack-based virtual machine.
 63 | 
 64 | To get a sense of how pickle works, a good approach is to use the
 65 | disassembly facility provided by `python -m pickletools <file.pickle>`,
 66 | for the above file this generates:
 67 | 
 68 | ```
 69 |     0: \x80 PROTO      3
 70 |     2: ]    EMPTY_LIST
 71 |     3: q    BINPUT     0
 72 |     5: (    MARK
 73 |     6: ]        EMPTY_LIST
 74 |     7: q        BINPUT     1
 75 |     9: (        MARK
 76 |    10: X            BINUNICODE 'web1.cpu0.user'
 77 |    29: q            BINPUT     2
 78 |    31: ]            EMPTY_LIST
 79 |    32: q            BINPUT     3
 80 |    34: (            MARK
 81 |    35: J                BININT     1332444075
 82 |    40: G                BINFLOAT   10.5
 83 |    49: e                APPENDS    (MARK at 34)
 84 |    50: e            APPENDS    (MARK at 9)
 85 |    51: ]        EMPTY_LIST
 86 |    52: q        BINPUT     4
 87 |    54: (        MARK
 88 |    55: X            BINUNICODE 'web1.cpu1.user'
 89 |    74: q            BINPUT     5
 90 |    76: ]            EMPTY_LIST
 91 |    77: q            BINPUT     6
 92 |    79: (            MARK
 93 |    80: J                BININT     1332444076
 94 |    85: G                BINFLOAT   90.3
 95 |    94: e                APPENDS    (MARK at 79)
 96 |    95: e            APPENDS    (MARK at 54)
 97 |    96: e        APPENDS    (MARK at 5)
 98 |    97: .    STOP
 99 | highest protocol among opcodes = 2
100 | ```
101 | 
102 | Looking at the code and documentation it becomes evident that we are
103 | dealing with a stack based virtual machine which keeps track of objects.
104 | The file is just a list of serialized opcodes, the first one being
105 | expected to be the protocol version and the last one a stop opcode. When
106 | the stop opcode is met, the current object on the stack is popped.
107 | 
108 | In the case of graphite data the objects built are simple collections,
109 | the relevant operations can be trimmed down to:
110 | 
111 | ```
112 |  2: ]    EMPTY_LIST
113 |  6: ]        EMPTY_LIST
114 | 10: X            BINUNICODE 'web1.cpu0.user'
115 | 31: ]            EMPTY_LIST
116 | 35: J                BININT     1332444075
117 | 40: G                BINFLOAT   10.5
118 | 49: e                APPENDS
119 | 50: e            APPENDS
120 | 51: ]        EMPTY_LIST
121 | 55: X            BINUNICODE 'web1.cpu1.user'
122 | 76: ]            EMPTY_LIST
123 | 80: J                BININT     1332444076
124 | 85: G                BINFLOAT   90.3
125 | 94: e                APPENDS
126 | 95: e            APPENDS
127 | 96: e        APPENDS
128 | ```
129 | 
130 | ### Parsing opcodes
131 | 
132 | <style>.hex1 { color: #859900; }
133 |        .hex2 { color: #b58900; }
134 |        .hex3 { color: #268bd2; }
135 | </style>
136 | 
137 | In terms of layout on disk, opcodes are either fixed size or contain a
138 | fixed size field indicating the size of the variable part.
139 | 
140 | ####  Protocol opcode
141 | 
142 | The protocol opcode has a code of `0x80` and is followed by a single  byte for the version.
143 | 
144 | <pre>
145 |     00000000: <span class="hex1">8003</span> 5d71 0028 5d71 0128 580e 0000 0077  <span class="hex1">..</span>]q.(]q.(X....w
146 |     00000010: 6562 312e 6370 7530 2e75 7365 7271 025d  eb1.cpu0.userq.]
147 |     00000020: 7103 284a ab7b 6b4f 4740 2500 0000 0000  q.(J.{kOG@%.....
148 |     00000030: 0065 655d 7104 2858 0e00 0000 7765 6231  .ee]q.(X....web1
149 |     00000040: 2e63 7075 312e 7573 6572 7105 5d71 0628  .cpu1.userq.]q.(
150 |     00000050: 4aac 7b6b 4f47 4056 9333 3333 3333 6565  J.{kOG@V.33333ee
151 |     00000060: 652e                                     e.
152 | </pre>
153 | 
154 | #### Stop opcode
155 | 
156 | The stop opcode (`0x2e` or `.` in ASCII) denotes the end of the pickle data.
157 | 
158 | <pre>
159 |     00000000: 8003 5d71 0028 5d71 0128 580e 0000 0077  ..]q.(]q.(X....w
160 |     00000010: 6562 312e 6370 7530 2e75 7365 7271 025d  eb1.cpu0.userq.]
161 |     00000020: 7103 284a ab7b 6b4f 4740 2500 0000 0000  q.(J.{kOG@%.....
162 |     00000030: 0065 655d 7104 2858 0e00 0000 7765 6231  .ee]q.(X....web1
163 |     00000040: 2e63 7075 312e 7573 6572 7105 5d71 0628  .cpu1.userq.]q.(
164 |     00000050: 4aac 7b6b 4f47 4056 9333 3333 3333 6565  J.{kOG@V.33333ee
165 |     00000060: 65<span class="hex1">2e</span>           e<span class="hex1">.</span>
166 | </pre>
167 | 
168 | #### Empty list opcode
169 | 
170 | The empty list opcode has code `0x5d` (the ASCII equivalent of `]`), this opcode has no additional data.
171 | 
172 | <pre>
173 |     00000000: 8003 <span class="hex1">5d</span>71 0028 <span class="hex1">5d</span>71 0128 580e 0000 0077  ..<span class="hex1">]</span>q.(<span class="hex1">]</span>q.(X....w
174 |     00000010: 6562 312e 6370 7530 2e75 7365 7271 02<span class="hex1">5d</span>  eb1.cpu0.userq.<span class="hex1">]</span>
175 |     00000020: 7103 284a ab7b 6b4f 4740 2500 0000 0000  q.(J.{kOG@%.....
176 |     00000030: 0065 65<span class="hex1">5d</span> 7104 2858 0e00 0000 7765 6231  .ee]q.(X....web1
177 |     00000040: 2e63 7075 312e 7573 6572 7105 <span class="hex1">5d</span>71 0628  .cpu1.userq.<span class="hex1">]</span>q.(
178 |     00000050: 4aac 7b6b 4f47 4056 9333 3333 3333 6565  J.{kOG@V.33333ee
179 |     00000060: 652e                                     e.
180 | </pre>
181 | 
182 | #### Append opcode
183 | 
184 | The append opcode denotes the end of a list, the currently open
185 | collection should be closed and pushed back onto the stack.
186 | 
187 | <pre>
188 |     00000000: 8003 5d71 0028 5d71 0128 580e 0000 0077  ..]q.(]q.(X....w
189 |     00000010: 6562 312e 6370 7530 2e75 7365> 7271 025d eb1.cpu0.userq.]
190 |     00000020: 7103 284a ab7b 6b4f 4740 2500 0000 0000  q.(J.{kOG@%.....
191 |     00000030: 00<span class="hex1">65 65</span>5d 7104 2858 0e00 0000 7765 6231  .<span class="hex1">ee</span>]q.(X....web1
192 |     00000040: 2e63 7075 312e 7573 6572 7105 5d71 0628  .cpu1.userq.]q.(
193 |     00000050: 4aac 7b6b 4f47 4056 9333 3333 3333 <span class="hex1">6565</span>  J.{kOG@V.33333<span class="hex1">ee</span>
194 |     00000060: <span class="hex1">65</span>2e                                     <span class="hex1">e</span>.
195 | </pre>
196 | 
197 | #### Unicode opcode
198 | 
199 | The unicode opcode has code `0x58` (or ASCII `X`) and follows a simple structure:
200 | 
201 | ``` c
202 |     struct bin_unicode {
203 |        char       code;    /* 0x58 */
204 |        u_int32_t  length;  /* payload size in network byte order */
205 |        char      *payload; /* variable size */
206 |     };
207 | ```
208 | 
209 | Here are the three fields highlighted from our example payloads each  time they appear:
210 | 
211 | <pre>
212 |     00000000: 8003 5d71 0028 5d71 0128 <span class="hex1">58</span><span class="hex2">0e 0000 00</span><span class="hex3">77</span> ..]q.(]q.(<span class="hex1">X</span><span class="hex2">....</span><span class="hex3">w</span>
213 |     00000010: <span class="hex3">6562 312e 6370 7530 2e75 7365 72</span>71 025d  <span class="hex3">eb1.cpu0.user</span>q.]
214 |     00000020: 7103 284a ab7b 6b4f 4740 2500 0000 0000  q.(J.{kOG@%.....
215 |     00000030: 0065 655d 7104 28<span class="hex1">58</span> <span class="hex2">0e00 0000</span> <span class="hex3">7765 6231</span>  .ee]q.(<span class="hex1">X</span><span class="hex2">....</span><span class="hex3">web1</span>
216 |     00000040: <span class="hex3">2e63 7075 312e 7573 6572</span> 7105 5d71 0628  <span class="hex3">.cpu1.user</span>q.]q.(
217 |     00000050: 4aac 7b6b 4f47 4056 9333 3333 3333 6565  J.{kOG@V.33333ee
218 |     00000060: 652e                                     e.
219 | </pre>
220 | 
221 | ####  Integer opcode
222 | 
223 | Integers are stored with the `0x4a` opcode at a fixed length of 4
224 | bytes and in network byte order.
225 | 
226 | <pre>
227 |     00000000: 8003 5d71 0028 5d71 0128 580e 0000 0077 ..]q.(]q.(X....w
228 |     00000010: 6562 312e 6370 7530 2e75 7365 7271 025d  eb1.cpu0.userq.]
229 |     00000020: 7103 28<span class="hex1">4a</span> <span class="hex2">ab7b 6b4f</span> 4740 2500 0000 0000  q.(<span class="hex1">J</span><span class="hex2">.{kO</span>G@%.....
230 |     00000030: 0065 655d 7104 2858 0e00 0000 7765 6231  .ee]q.(X....web1
231 |     00000040: 2e63 7075 312e 7573 6572 7105 5d71 0628  .cpu1.userq.]q.(
232 |     00000050: <span class="hex1">4a</span><span class="hex2">ac 7b6b 4f</span>47 4056 9333 3333 3333 6565  <span class="hex1">J</span><span class="hex2">.{kO</span>G@V.33333ee
233 |     00000060: 652e                                     e.
234 | </pre>
235 | 
236 | #### The infamous double opcode
237 | 
238 | The way doubles are serialized is a bit startling, it comes down to
239 | just writing out the double storage. In C deserializing would come down to:
240 | 
241 | ``` c
242 |     double deserialize(const char *input) {
243 |       double output;
244 | 
245 |       memcpy(&output, input, sizeof(output));
246 |       return (output);
247 |     }
248 | ```
249 | 
250 | <pre>
251 |     00000000: 8003 5d71 0028 5d71 0128 580e 0000 0077  ..]q.(]q.(X....w
252 |     00000010: 6562 312e 6370 7530 2e75 7365 7271 025d  eb1.cpu0.userq.]
253 |     00000020: 7103 284a ab7b 6b4f <span class="hex1">47</span><span class="hex2">40 2500 0000 0000</span>  q.(J.{kO<span class="hex1">G</span><span class="hex2">@%.....</span>
254 |     00000030: <span class="hex2">00</span>65 655d 7104 2858 0e00 0000 7765 6231  .ee]q.(X....web1
255 |     00000040: 2e63 7075 312e 7573 6572 7105 5d71 0628  .cpu1.userq.]q.(
256 |     00000050: 4aac 7b6b 4f<span class="hex1">47</span> <span class="hex2">4056 9333 3333 3333</span> 6565  J.{kO<span class="hex1">G</span><span class="hex2">@V.33333</span>ee
257 |     00000060: 652e                                     e.
258 | </pre>
259 | 
260 | ### AST Generation
261 | 
262 | The Abstract Syntax Tree for such a format is nothing more than a list
263 | of opcode. Parsing just requires making sure the first opcode is a
264 | protocol one and should stop when a stop opcode is met.
265 | 
266 | Generating an AST for the above syntax in clojure turned out to be very
267 | simple, provided we worked with the java
268 | [ByteBuffer](http://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html)
269 | class.
270 | 
271 | Here's the bulk of the work:
272 | 
273 | ``` clojure
274 | (defn raw->ast
275 |   "Convert binary data into a list of pickle opcodes and data"
276 |   [bb]
277 |   (lazy-seq
278 |    (when (pos? (.remaining bb))
279 |      (let [b    (bit-and 0xff (.get bb))
280 |            elem (opcode b bb)]
281 |        (cons elem (raw->ast bb))))))
282 | ```
283 | 
284 | A seq is built by fetching a byte and sending it to an `opcode`
285 | function, along with the rest of the buffer.
286 | 
287 | The `opcode` function is best built as a `multimethod` which dispatches
288 | on the opcode: `(defmulti opcode (fn [b _] (bit-or b 0x00)))`, methods
289 | can then be implemented simply, for instance here are append and int
290 | parsing:
291 | 
292 | ``` clojure
293 | (defmethod opcode 0x4a
294 |   [_ bb]
295 |   {:type :int :val (.getInt bb)})
296 | 
297 | (defmethod opcode 0x65
298 |   [_ bb]
299 |   {:type :append})
300 | ```
301 | 
302 | With this, we end up with an AST of the following form. It's now much
303 | easier to write functions that parse this AST and extract
304 | 
305 | ``` clojure
306 | ({:type :protocol, :version 3}
307 |  {:type :startlist}
308 |  {:type :binput, :index 0}
309 |  {:type :mark}
310 |  {:type :startlist}
311 |  {:type :binput, :index 1}
312 |  {:type :mark}
313 |  {:type :unicode, :size 14, :val "web1.cpu0.user"}
314 |  {:type :binput, :index 2}
315 |  {:type :startlist}
316 |  {:type :binput, :index 3}
317 |  {:type :mark}
318 |  {:type :int, :val 1332444075}
319 |  {:type :double, :val 10.5}
320 |  {:type :append}
321 |  {:type :append}
322 |  {:type :startlist}
323 |  {:type :binput, :index 4}
324 |  {:type :mark}
325 |  {:type :unicode, :size 14, :val "web1.cpu1.user"}
326 |  {:type :binput, :index 5}
327 |  {:type :startlist}
328 |  {:type :binput, :index 6}
329 |  {:type :mark}
330 |  {:type :int, :val 1332444076}
331 |  {:type :double, :val 90.3}
332 |  {:type :append}
333 |  {:type :append}
334 |  {:type :append}
335 |  {:type :stop})
336 | ```
337 | 
338 | ### Some final thoughts on pickle
339 | 
340 | I still think pickle should be avoided in general, but I found myself in
341 | one of the rare cases where it's necessary to interact with it from
342 | outside python. If you're a python developer and following along, please
343 | consider other serialization formats.
344 | 
345 | Hopefully This should give you enough to start playing around with
346 | pickle, here are a few resources for doing so in other languages:
347 | 
348 | -   In haskell: <https://hackage.haskell.org/package/python-pickle>
349 | -   Pyrolite in Java: <https://pythonhosted.org/Pyro4/pyrolite.html>
350 | 
351 | 


--------------------------------------------------------------------------------