├── Rollup.md
├── TPC-DS.md
├── TPC-H.md
├── irbrc.md
├── Backsolve.md
├── CSP-Rails.md
├── Vault-PKI.md
├── Dokku-Rails.md
├── Presto-Mac.md
├── Safely.md
├── Two-Metrics.md
├── Jupyter-Rails.md
├── Just-Table-It.md
├── Postgres-Users.md
├── Scaling-Reads.md
├── Bulk-Upsert-Ruby-Rails.md
├── Data-Science-SQL.md
├── Error-Reporting-R.md
├── Favorite-Quotes.md
├── Hardening-Devise.md
├── Leadership-Reads.md
├── Management-Reads.md
├── Open-Source-Projects.md
├── PgBouncer-Setup.md
├── Programming-Reads.md
├── README.md
├── Rails-on-Heroku.md
├── Scaling-Postgres.md
├── Security-Checks.md
├── Startup-Security.md
├── Strong-Parameters.md
├── Trying-Out-Vault.md
├── Dokku-Digital-Ocean.md
├── Encryption-Keys-Rails.md
├── Learn-Data-Science.md
├── Text-Indexes-Postgres.md
├── Host-Your-Own-Postgres.md
├── R-Postgres-and-Database-URLs.md
├── Short-Guide-to-Metrics.md
├── Verify-Slack-Requests.md
├── Google-OAuth-with-Devise.md
├── New-Rails-App-Checklist.md
├── AWS-Client-Side-Encryption.md
├── Navigator-Send-Beacon-Rails.md
├── Securing-Database-Traffic.md
├── Securing-Emails-Rails.md
├── The-Origin-of-SQL-Queries.md
├── Distributed-Architecture-Talks.md
├── Anonymizing-IPs.md
├── Development-Rails.md
└── archive
    ├── dokku-rails.md
    ├── irbrc.md
    ├── distributed-architecture-talks.md
    ├── modern-encryption-mongoid.md
    ├── data-science-sql.md
    ├── startup-security.md
    ├── ruby-openssl-1-1.md
    ├── leadership-reads.md
    ├── programming-reads.md
    ├── short-guide-to-metrics.md
    ├── verify-slack-requests.md
    ├── navigator-send-beacon-rails.md
    ├── security-checks.md
    ├── error-reporting-r.md
    ├── introducing-archer.md
    ├── jupyter-rails.md
    ├── bulk-upsert.md
    ├── introducing-pdscan.md
    ├── management-reads.md
    ├── safely-pattern.md
    ├── presto-mac.md
    ├── two-metrics.md
    ├── tpc-ds.md
    ├── tpc-h.md
    ├── just-table-it.md
    ├── strong-parameters.md
    ├── xgboost-lightgbm-come-to-ruby.md
    ├── vault-pki.md
    ├── r-database-urls.md
    ├── hardening-devise.md
    ├── anonymizing-ips.md
    ├── backsolve.md
    ├── aws-client-side-encryption.md
    ├── onnx-runtime-ruby.md
    ├── trying-out-vault.md
    ├── large-text-indexes.md
    ├── learn-data-science.md
    ├── devise-argon2.md
    ├── lockbox-types.md
    ├── the-origin-of-sql-queries.md
    ├── blind-index-1-0.md
    ├── rollup.md
    ├── new-rails-app-checklist.md
    ├── google-oauth-with-devise.md
    ├── artistic-style-transfer-ruby.md
    ├── activestorage-s3-encryption.md
    ├── pgbouncer-setup.md
    ├── csp-rails.md
    ├── emotion-recognition-ruby.md
    ├── pghero-2-0.md
    ├── hybrid-cryptography-rails.md
    ├── postgres-sslmode-explained.md
    ├── ruby-ml-for-python-coders.md
    ├── scaling-reads.md
    ├── encryption-keys.md
    ├── tensorflow-ruby.md
    ├── numo.md
    ├── securing-pgbouncer-amazon-rds.md
    ├── host-your-own-postgres.md
    ├── rails-on-heroku.md
    ├── daru.md
    ├── introducing-dexter.md
    ├── postgres-users.md
    ├── dokku-digital-ocean.md
    ├── securing-user-emails-lockbox.md
    ├── modern-encryption-rails.md
    ├── decryption-keys.md
    ├── gem-patterns.md
    ├── securing-user-emails-in-rails.md
    ├── new-ml-gems.md
    ├── rails-meet-data-science.md
    └── scaling-the-monolith.md


/Rollup.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/rollup)
2 | 


--------------------------------------------------------------------------------
/TPC-DS.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/tpc-ds)
2 | 


--------------------------------------------------------------------------------
/TPC-H.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/tpc-h)
2 | 


--------------------------------------------------------------------------------
/irbrc.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/irbrc)
2 | 


--------------------------------------------------------------------------------
/Backsolve.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/backsolve)
2 | 


--------------------------------------------------------------------------------
/CSP-Rails.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/csp-rails)
2 | 


--------------------------------------------------------------------------------
/Vault-PKI.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/vault-pki)
2 | 


--------------------------------------------------------------------------------
/Dokku-Rails.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/dokku-rails)
2 | 


--------------------------------------------------------------------------------
/Presto-Mac.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/presto-mac)
2 | 


--------------------------------------------------------------------------------
/Safely.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/safely-pattern)
2 | 


--------------------------------------------------------------------------------
/Two-Metrics.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/two-metrics)
2 | 


--------------------------------------------------------------------------------
/Jupyter-Rails.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/jupyter-rails)
2 | 


--------------------------------------------------------------------------------
/Just-Table-It.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/just-table-it)
2 | 


--------------------------------------------------------------------------------
/Postgres-Users.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/postgres-users)
2 | 


--------------------------------------------------------------------------------
/Scaling-Reads.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/scaling-reads)
2 | 


--------------------------------------------------------------------------------
/Bulk-Upsert-Ruby-Rails.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/bulk-upsert)
2 | 


--------------------------------------------------------------------------------
/Data-Science-SQL.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/data-science-sql)
2 | 


--------------------------------------------------------------------------------
/Error-Reporting-R.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/error-reporting-r)
2 | 


--------------------------------------------------------------------------------
/Favorite-Quotes.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/favorite-quotes)
2 | 


--------------------------------------------------------------------------------
/Hardening-Devise.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/hardening-devise)
2 | 


--------------------------------------------------------------------------------
/Leadership-Reads.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/leadership-reads)
2 | 


--------------------------------------------------------------------------------
/Management-Reads.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/management-reads)
2 | 


--------------------------------------------------------------------------------
/Open-Source-Projects.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/opensource)
2 | 


--------------------------------------------------------------------------------
/PgBouncer-Setup.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/pgbouncer-setup)
2 | 


--------------------------------------------------------------------------------
/Programming-Reads.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/programming-reads)
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org)
2 | 
3 | [Archive](archive/)
4 | 


--------------------------------------------------------------------------------
/Rails-on-Heroku.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/rails-on-heroku)
2 | 


--------------------------------------------------------------------------------
/Scaling-Postgres.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/scaling-postgres)
2 | 


--------------------------------------------------------------------------------
/Security-Checks.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/security-checks)
2 | 


--------------------------------------------------------------------------------
/Startup-Security.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/startup-security)
2 | 


--------------------------------------------------------------------------------
/Strong-Parameters.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/strong-parameters)
2 | 


--------------------------------------------------------------------------------
/Trying-Out-Vault.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/trying-out-vault)
2 | 


--------------------------------------------------------------------------------
/Dokku-Digital-Ocean.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/dokku-digital-ocean)
2 | 


--------------------------------------------------------------------------------
/Encryption-Keys-Rails.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/encryption-keys)
2 | 


--------------------------------------------------------------------------------
/Learn-Data-Science.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/learn-data-science)
2 | 


--------------------------------------------------------------------------------
/Text-Indexes-Postgres.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/large-text-indexes)
2 | 


--------------------------------------------------------------------------------
/Host-Your-Own-Postgres.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/host-your-own-postgres)
2 | 


--------------------------------------------------------------------------------
/R-Postgres-and-Database-URLs.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/r-database-urls)
2 | 


--------------------------------------------------------------------------------
/Short-Guide-to-Metrics.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/short-guide-to-metrics)
2 | 


--------------------------------------------------------------------------------
/Verify-Slack-Requests.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/verify-slack-requests)
2 | 


--------------------------------------------------------------------------------
/Google-OAuth-with-Devise.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/google-oauth-with-devise)
2 | 


--------------------------------------------------------------------------------
/New-Rails-App-Checklist.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/new-rails-app-checklist)
2 | 


--------------------------------------------------------------------------------
/AWS-Client-Side-Encryption.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/aws-client-side-encryption)
2 | 


--------------------------------------------------------------------------------
/Navigator-Send-Beacon-Rails.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/navigator-send-beacon-rails)
2 | 


--------------------------------------------------------------------------------
/Securing-Database-Traffic.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/securing-pgbouncer-amazon-rds)
2 | 


--------------------------------------------------------------------------------
/Securing-Emails-Rails.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/securing-user-emails-in-rails)
2 | 


--------------------------------------------------------------------------------
/The-Origin-of-SQL-Queries.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/the-origin-of-sql-queries)
2 | 


--------------------------------------------------------------------------------
/Distributed-Architecture-Talks.md:
--------------------------------------------------------------------------------
1 | [New home](https://ankane.org/distributed-architecture-talks)
2 | 


--------------------------------------------------------------------------------
/Anonymizing-IPs.md:
--------------------------------------------------------------------------------
1 | # Anonymizing IPs in Ruby
2 | 
3 | [New home](https://ankane.org/anonymizing-ips)
4 | 


--------------------------------------------------------------------------------
/Development-Rails.md:
--------------------------------------------------------------------------------
1 | [New home](https://github.com/ankane/rails-best-practices/blob/master/Development.md)
2 | 


--------------------------------------------------------------------------------
/archive/dokku-rails.md:
--------------------------------------------------------------------------------
 1 | # Rails on Dokku
 2 | 
 3 | ## Console
 4 | 
 5 | To open a Rails console, run:
 6 | 
 7 | ```sh
 8 | dokku run rails console
 9 | ```
10 | 
11 | ## Migrations
12 | 
13 | ```sh
14 | dokku run rails db:migrate
15 | ```
16 | 


--------------------------------------------------------------------------------
/archive/irbrc.md:
--------------------------------------------------------------------------------
 1 | # irbrc
 2 | 
 3 | My simple `~/.irbrc`
 4 | 
 5 | ```ruby
 6 | require "irb/completion"
 7 | require "irb/ext/save-history"
 8 | IRB.conf[:SAVE_HISTORY] = 10000
 9 | require "awesome_print"
10 | AwesomePrint.irb!
11 | ```
12 | 


--------------------------------------------------------------------------------
/archive/distributed-architecture-talks.md:
--------------------------------------------------------------------------------
1 | # Distributed Architecture Talks
2 | 
3 | Great talks
4 | 
5 | - [Microservices](https://www.youtube.com/watch?v=2yko4TbC8cI) - Martin Fowler
6 | - [The State of the Art in Microservices](https://www.youtube.com/watch?v=pwpxq9-uw_0) - Adrian Cockcroft of Netflix
7 | - [Services and Rails: The Shit They Don’t Tell You](https://www.youtube.com/watch?v=GuJ49PNBsn8) - Brian Morton of Yammer
8 | - [Can Time-Travel Keep You From Blowing Up The Enterprise?](https://www.youtube.com/watch?v=23NhP4x3AAE) - David Copeland of Stitch Fix
9 | 


--------------------------------------------------------------------------------
/archive/modern-encryption-mongoid.md:
--------------------------------------------------------------------------------
 1 | # Modern Encryption for Mongoid
 2 | 
 3 | I’m happy to announce that Lockbox now supports Mongoid. This makes it easy to add application-level encryption to your MongoDB documents.
 4 | 
 5 | <p style="text-align: center;"><img src="/images/modern-encryption-mongoid.png" alt="Lockbox" /></p>
 6 | 
 7 | Blind Index also now supports Mongoid for cases where you need to query for exact matches.
 8 | 
 9 | Get the latest versions of [Lockbox](https://github.com/ankane/lockbox) and [Blind Index](https://github.com/ankane/blind_index) today!
10 | 


--------------------------------------------------------------------------------
/archive/data-science-sql.md:
--------------------------------------------------------------------------------
 1 | # Data Science SQL
 2 | 
 3 | [Root mean squared error](https://www.kaggle.com/wiki/RootMeanSquaredError)
 4 | 
 5 | ```sql
 6 | SELECT SQRT(AVG(POWER(y - y_pred, 2))) AS rmse FROM ...
 7 | ```
 8 | 
 9 | [Mean absolute error](https://www.kaggle.com/wiki/MeanAbsoluteError)
10 | 
11 | ```sql
12 | SELECT AVG(ABS(y - y_pred)) AS mae FROM ...
13 | ```
14 | 
15 | Mean error
16 | 
17 | ```sql
18 | SELECT AVG(y_pred - y) AS me FROM ...
19 | ```
20 | 
21 | Median - [get it here](https://github.com/ankane/median.sql)
22 | 
23 | ```sql
24 | SELECT MEDIAN(y) FROM ...
25 | 


--------------------------------------------------------------------------------
/archive/startup-security.md:
--------------------------------------------------------------------------------
 1 | # Startup Security
 2 | 
 3 | A few simple steps to keep you secure.
 4 | 
 5 | 1. Require 2-factor authentication for important accounts, like [Gmail](https://www.google.com/landing/2step/) and [GitHub](https://help.github.com/articles/about-two-factor-authentication/).
 6 | 
 7 | 2. Require hard drives to be encrypted. [FileVault makes this easy](https://support.apple.com/en-us/HT204837) on Macs.
 8 | 
 9 | 3. Use [DMARC](https://dmarc.org/overview/) to verify emails sent from your domain. [dmarcian](https://dmarcian.com/) is one provider.
10 | 
11 | 4. Use a team password manager like [1Password](https://1password.com/) to share passwords.
12 | 


--------------------------------------------------------------------------------
/archive/ruby-openssl-1-1.md:
--------------------------------------------------------------------------------
 1 | # Ruby with OpenSSL 1.1
 2 | 
 3 | Some Ruby features like `scrypt` and `hkdf` require OpenSSL 1.1. Here’s how to make it work on Mac:
 4 | 
 5 | Install rbenv and OpenSSL 1.1
 6 | 
 7 | ```sh
 8 | brew install rbenv ruby-build openssl@1.1
 9 | ```
10 | 
11 | Install Ruby
12 | 
13 | ```sh
14 | RUBY_CONFIGURE_OPTS="--with-openssl-dir=/usr/local/opt/openssl@1.1" \
15 | rbenv install 2.6.3
16 | ```
17 | 
18 | Open an interactive shell to confirm it worked
19 | 
20 | ```sh
21 | rbenv shell 2.6.3
22 | irb
23 | ```
24 | 
25 | And run
26 | 
27 | ```sh
28 | require "openssl"
29 | OpenSSL::OPENSSL_VERSION
30 | OpenSSL::KDF.methods - Object.methods
31 | ```
32 | 


--------------------------------------------------------------------------------
/archive/leadership-reads.md:
--------------------------------------------------------------------------------
 1 | # Great Leadership Reads
 2 | 
 3 | A few books and articles I’ve read on leadership that changed my everyday relationships
 4 | 
 5 | ## Books
 6 | 
 7 | - [Leadership and Self-Deception](https://www.amazon.com/gp/product/B00GUPYRUS)
 8 | - [How to Have Confidence and Power in Dealing with People](https://www.amazon.com/Have-Confidence-Power-Dealing-People-ebook/dp/B01CXHESLE)
 9 | - [The 21 Irrefutable Laws of Leadership](https://www.amazon.com/gp/product/B001ECQK9S)
10 | 
11 | ## Articles
12 | 
13 | - [Power Up Your Team with Nonviolent Communication Principles](https://firstround.com/review/power-up-your-team-with-nonviolent-communication-principles/)
14 | 


--------------------------------------------------------------------------------
/archive/programming-reads.md:
--------------------------------------------------------------------------------
1 | # Great Programming Reads
2 | 
3 | - [Prolific Engineers Take Small Bites - Patterns in Developer Impact](https://blog.gitprime.com/check-in-frequency-and-codebase-impact-the-surprising-correlation/)
4 | - [The Twelve-Factor App](https://12factor.net/)
5 | - [Open Source (Almost) Everything](https://tom.preston-werner.com/2011/11/22/open-source-everything.html)
6 | - [A Baseline for Front-End Developers, 2015](https://rmurphey.com/blog/2015/03/23/a-baseline-for-front-end-developers-2015)
7 | - [Software Engineering at Google](https://arxiv.org/pdf/1702.01715.pdf)
8 | - [The Majestic Monolith](https://m.signalvnoise.com/the-majestic-monolith-29166d022228)
9 | 


--------------------------------------------------------------------------------
/archive/short-guide-to-metrics.md:
--------------------------------------------------------------------------------
 1 | # A Short Guide to Metrics
 2 | 
 3 | Simple rules to follow when creating metrics
 4 | 
 5 | 1. **Over time:** You must see how metrics change over time. Ideally you can view them by day, week, and month. No pie charts!
 6 | 2. **In groups:** Metrics require balance. Think of the extremes. If you over-optimize for one metric, what problem will it create?
 7 | 3. **Weighted appropriately:** If different times of the week or geographic areas are more important, your metrics should reflect this.
 8 | 4. **Robust:** Chances are your data is not pristine. Be careful of outliers. Median or percentile can be better than the mean.
 9 | 5. **Keep it simple:** People should be able to understand the metric.
10 | 
11 | Nice to have
12 | 
13 | 1. **Responsive:** Prefer metrics that change quickly as a result of your actions.
14 | 


--------------------------------------------------------------------------------
/archive/verify-slack-requests.md:
--------------------------------------------------------------------------------
 1 | # Verify Slack Requests in Rails
 2 | 
 3 | Slack [signs its requests](https://api.slack.com/docs/verifying-requests-from-slack) so you can verify they’re authentic.
 4 | 
 5 | Here’s a method you can use in your Rails controllers for it.
 6 | 
 7 | ```ruby
 8 | def request_verified?
 9 |   timestamp = request.headers["X-Slack-Request-Timestamp"]
10 |   signature = request.headers["X-Slack-Signature"]
11 |   signing_secret = ENV.fetch("SLACK_SIGNING_SECRET")
12 | 
13 |   if Time.at(timestamp.to_i) < 5.minutes.ago
14 |     return false # expired
15 |   end
16 | 
17 |   basestring = "v0:#{timestamp}:#{request.body.read}"
18 |   my_signature = "v0=#{OpenSSL::HMAC.hexdigest("SHA256", signing_secret, basestring)}"
19 | 
20 |   ActiveSupport::SecurityUtils.secure_compare(my_signature, signature)
21 | end
22 | ```
23 | 
24 | :lock:
25 | 


--------------------------------------------------------------------------------
/archive/navigator-send-beacon-rails.md:
--------------------------------------------------------------------------------
 1 | # navigator.sendBeacon and Rails
 2 | 
 3 | [navigator.sendBeacon](https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon) is a neat new API. It allows you to send an asynchronous `POST` request without delaying the page unload.
 4 | 
 5 | To prevent `Can't verify CSRF token authenticity` with Rails, use the method below:
 6 | 
 7 | ```javascript
 8 | var data = new FormData();
 9 | data.append("hello", "beacon");
10 | 
11 | // add CSRF
12 | var param = document.querySelector("meta[name=csrf-param]").getAttribute("content");
13 | var token = document.querySelector("meta[name=csrf-token]").getAttribute("content");
14 | data.append(param, token);
15 | 
16 | navigator.sendBeacon("/beacon", data);
17 | ```
18 | 
19 | For a real-world use case, check out [Ahoy.js](https://github.com/ankane/ahoy.js).
20 | 
21 | :anchor:
22 | 


--------------------------------------------------------------------------------
/archive/security-checks.md:
--------------------------------------------------------------------------------
 1 | # Security Checks
 2 | 
 3 | ### Verify SSL certificate chain
 4 | 
 5 | ```sh
 6 | openssl s_client -connect www.yahoo.com:443 -CAfile /usr/local/etc/openssl/cert.pem
 7 | ```
 8 | 
 9 | You should see `verify return:1` for each certificate in the chain.
10 | 
11 | ### Host header injection
12 | 
13 | [Read about it here](http://carlos.bueno.org/2008/06/host-header-injection.html).
14 | 
15 | ```sh
16 | curl -i --header "Host: evilsite.com" https://www.yahoo.com
17 | ```
18 | 
19 | Your site is vulnerable if `evilsite.com` appears in the results.
20 | 
21 | ### SPF
22 | 
23 | Check if your SPF record is valid. [Enter your domain here](https://www.kitterman.com/spf/validate.html).
24 | 
25 | ### DNSSEC
26 | 
27 | Very few sites have this right now.
28 | 
29 | ```sh
30 | dig pir.org +dnssec
31 | ```
32 | 
33 | [See how to interpret the results](https://docs.menandmice.com/display/MM/How+to+test+DNSSEC+validation).
34 | 


--------------------------------------------------------------------------------
/archive/error-reporting-r.md:
--------------------------------------------------------------------------------
 1 | # Error Reporting in R
 2 | 
 3 | R supports global error handling, making it easy to report all errors without individual `tryCatch` statements.
 4 | 
 5 | Create a file to source at the start of all your scripts.
 6 | 
 7 | ```R
 8 | if (!interactive()) {
 9 |   options(error = function() {
10 |     message <- geterrmessage()
11 | 
12 |     ### your error reporting goes here
13 |     rollbar.error(message)
14 |     ###
15 | 
16 |     write("Execution halted", stderr())
17 |     q("no", status = 1, runLast = FALSE)
18 |   })
19 | }
20 | ```
21 | 
22 | Unfortunately, there’s no way to get filenames and line numbers (if you manage to do this, let me know!). Thankfully, the last line of the messages includes calls.
23 | 
24 | ```txt
25 | Error in func3(b) : unused argument (b)
26 | Calls: func1 -> func2 -> func3
27 | ```
28 | 
29 | Happy production debugging! :dolphin:
30 | 
31 | #### If you use Rollbar...
32 | 
33 | Check out the [Rollbar](https://github.com/ankane/rollbar) package.
34 | 


--------------------------------------------------------------------------------
/archive/introducing-archer.md:
--------------------------------------------------------------------------------
 1 | # Introducing Archer: Rails Console History for Heroku, Docker, and More
 2 | 
 3 | <p style="text-align: center;"><img src="/images/archer.png" alt="Archer" /></p>
 4 | 
 5 | Many companies today run infrastructure where machines or containers can be replaced at any time, so you can’t depend on them for permanent storage. One place this is especially painful is the Rails console. Console history can save a lot of typing.
 6 | 
 7 | This is where [Archer](https://github.com/ankane/archer) comes in. Add it your project, and it’ll begin to use the database to store history.
 8 | 
 9 | Archer supports multiple users so everyone on the team can have their own history. On Heroku, you can specify a user when starting the console with:
10 | 
11 | ```sh
12 | heroku run USER=andrew rails console
13 | ```
14 | 
15 | Set up an [alias](https://shapeshed.com/unix-alias/) to save some typing.
16 | 
17 | ```sh
18 | alias hc="heroku run USER=andrew rails console"
19 | ```
20 | 
21 | Add [Archer](https://github.com/ankane/archer) to your team today.
22 | 


--------------------------------------------------------------------------------
/archive/jupyter-rails.md:
--------------------------------------------------------------------------------
 1 | # Jupyter + Rails
 2 | 
 3 | Jupyter notebooks are a great alternative to the Rails console for doing exploratory data analysis and building predictive models. Here’s how to get setup:
 4 | 
 5 | First, install [Jupyter](https://jupyter.org). With Homebrew, use:
 6 | 
 7 | ```sh
 8 | brew install jupyterlab
 9 | ```
10 | 
11 | Add to your Gemfile
12 | 
13 | ```ruby
14 | gem 'iruby', group: :development
15 | ```
16 | 
17 | Run
18 | 
19 | ```sh
20 | bundle install
21 | bundle exec iruby register --force
22 | ```
23 | 
24 | Start Jupyter
25 | 
26 | ```sh
27 | jupyter notebook
28 | ```
29 | 
30 | Create a notebook and add to the top
31 | 
32 | ```ruby
33 | require "./config/environment"
34 | ```
35 | 
36 | > If not at Rails root, use `Dir.chdir("path/to/root") { require "./config/environment" }`
37 | 
38 | And science away
39 | 
40 | ```ruby
41 | User.last
42 | ```
43 | 
44 | If you use Git, add to `.gitignore`
45 | 
46 | ```txt
47 | .ipynb_checkpoints
48 | ```
49 | 
50 | If you use Nyaplot, use the `master` branch to fix [an issue](https://github.com/domitry/nyaplot/issues/52) with empty charts.
51 | 
52 | ```ruby
53 | gem 'nyaplot', github: 'domitry/nyaplot'
54 | ```
55 | 


--------------------------------------------------------------------------------
/archive/bulk-upsert.md:
--------------------------------------------------------------------------------
 1 | # Bulk Upsert in Ruby/Rails
 2 | 
 3 | The [upsert](https://github.com/seamusabshere/upsert) gem is great for individual upserts, but for performant bulk upserts, use the [activerecord-import](https://github.com/zdennis/activerecord-import) gem.
 4 | 
 5 | Add a unique index on the columns to upsert on (if it’s not your primary key)
 6 | 
 7 | ```rb
 8 | class AddUpsertIndexOnForecasts < ActiveRecord::Migration[5.2]
 9 |   def change
10 |     add_index :forecasts, [:date], unique: true
11 |   end
12 | end
13 | ```
14 | 
15 | Prep your records
16 | 
17 | ```ruby
18 | records = [
19 |   {date: "2018-01-01", value: 10},
20 |   {date: "2018-02-01", value: 15},
21 |   {date: "2018-03-01", value: 23}
22 | ]
23 | ```
24 | 
25 | For PostgreSQL 9.5+ and SQLite 3.24+, do:
26 | 
27 | ```ruby
28 | Forecast.import(records,
29 |   validate: false,
30 |   on_duplicate_key_update: {
31 |     conflict_target: [:date],
32 |     columns: [:value]
33 |   }
34 | )
35 | ```
36 | 
37 | For MySQL, do:
38 | 
39 | ```ruby
40 | Forecast.import(records,
41 |   validate: false,
42 |   on_duplicate_key_update: [:value]
43 | )
44 | ```
45 | 
46 | [Official docs](https://github.com/zdennis/activerecord-import/wiki/On-Duplicate-Key-Update)
47 | 


--------------------------------------------------------------------------------
/archive/introducing-pdscan.md:
--------------------------------------------------------------------------------
 1 | # Introducing pdscan: Scan Your Data Stores for Unencrypted Personal Data
 2 | 
 3 | It's important to understand where personal data is stored in your applications. Personal data that’s not encrypted at the application level is especially vulnerable in the event of a breach.
 4 | 
 5 | [pdscan](https://github.com/ankane/pdscan) is a command line tool to help you identify this data.
 6 | 
 7 | <p style="text-align: center;"><img src="/images/pdscan.gif" alt="pdscan" /></p>
 8 | 
 9 | It uses data sampling and column naming to find data and produces minimal database load. It scans for:
10 | 
11 | - Last names
12 | - Email addresses
13 | - IP addresses
14 | - Street addresses (US)
15 | - Phone numbers (US)
16 | - Credit card numbers
17 | - Social security numbers
18 | - Dates of birth
19 | - Location data
20 | 
21 | It also scans for other unencrypted sensitive data, like OAuth tokens, which could be used to access personal data. It currently supports Postgres, MySQL, MariaDB, and SQLite, but it shouldn’t be too difficult to add other data stores like MongoDB and Elasticsearch. It’s written in Go, so it’s fast and has no runtime dependencies.
22 | 
23 | Give [pdscan](https://github.com/ankane/pdscan) a try today.
24 | 


--------------------------------------------------------------------------------
/archive/management-reads.md:
--------------------------------------------------------------------------------
 1 | # Great Management Reads
 2 | 
 3 | ## Posts
 4 | 
 5 | - [Radical Candor](https://firstround.com/review/radical-candor-the-surprising-secret-to-being-a-good-boss/)
 6 | - [101 Questions to Ask in One on Ones](https://jasonevanish.com/2014/05/29/101-questions-to-ask-in-1-on-1s/)
 7 | - [My Best Manager Did This](https://ask.metafilter.com/300002/My-best-manager-did-this)
 8 | 
 9 | ## Books
10 | 
11 | - [High Output Management](https://www.amazon.com/High-Output-Management-Andrew-Grove/dp/0679762884)
12 | 
13 | ## Engineering-Specific Posts
14 | 
15 | - [This Is What Impactful Engineering Leadership Looks Like](https://firstround.com/review/this-is-what-impactful-engineering-leadership-looks-like/)
16 | - [Engineering Management](http://algeri-wong.com/yishan/engineering-management.html)
17 | - [44 Engineering Management Lessons](https://www.defmacro.org/2014/10/03/engman.html)
18 | 
19 | ## Advice
20 | 
21 | > You have to be your team’s best ally and biggest challenger. You can’t be a great leader by care-taking alone. Push for their best work. [@marcprecipice](https://twitter.com/i/moments/791738696978403328)
22 | 
23 | ## Culture
24 | 
25 | - [Netflix Culture Deck](https://www.slideshare.net/reed2001/culture-1798664)
26 | 


--------------------------------------------------------------------------------
/archive/safely-pattern.md:
--------------------------------------------------------------------------------
 1 | # The Safely Pattern
 2 | 
 3 | The Safely Pattern is a simple one. It allows you to tag non-critical code by wrapping it in a function. It’s built on top of exception handling and follows these rules:
 4 | 
 5 | 1. Raise exceptions in development and test environments
 6 | 2. Catch and report exceptions in other environments
 7 | 
 8 | Here’s a basic implementation in JavaScript:
 9 | 
10 | ```javascript
11 | function safely(nonCriticalCode) {
12 |   try {
13 |     nonCriticalCode();
14 |   } catch (e) {
15 |     if (env === "development" || env === "test") {
16 |       throw(e);
17 |     }
18 |     report(e);
19 |   }
20 | }
21 | ```
22 | 
23 | Its advantages over typical exception handling are:
24 | 
25 | 1. It’s easier to write and debug code when errors aren’t caught in development and test environments
26 | 2. It allows you to keep reporting [DRY](https://en.wikipedia.org/wiki/Don't_repeat_yourself)
27 | 
28 | It’s recommended to mark exceptions when reporting so it’s clear they were handled with this pattern. One way of doing this is to prefix the exception message with `[safely]`.
29 | 
30 | There are currently implementations in [Ruby](https://github.com/ankane/safely) and [JavaScript](https://github.com/ankane/safely.js).
31 | 


--------------------------------------------------------------------------------
/archive/presto-mac.md:
--------------------------------------------------------------------------------
 1 | # Installing Presto for Mac
 2 | 
 3 | [Presto](https://prestodb.io/) is a “Distributed SQL Query Engine for Big Data” that gives you the ability to join across data stores! :tada:
 4 | 
 5 | ## Server
 6 | 
 7 | The easiest way to install Presto is with [Homebrew](https://brew.sh).
 8 | 
 9 | ```sh
10 | brew install presto
11 | ```
12 | 
13 | Next, add a connector. Here’s the list of [available ones](https://prestodb.io/docs/current/connector.html).
14 | 
15 | For PostgreSQL, create `/usr/local/opt/presto/libexec/etc/catalog/mydb.properties` with:
16 | 
17 | ```ini
18 | connector.name=postgresql
19 | connection-url=jdbc:postgresql://localhost:5432/mydbname
20 | connection-user=myuser
21 | connection-password=mysecret
22 | ```
23 | 
24 | And start the server with:
25 | 
26 | ```sh
27 | presto-server run
28 | ```
29 | 
30 | ## Client
31 | 
32 | Presto comes with a CLI
33 | 
34 | ```sh
35 | presto --catalog mydb --schema public
36 | ```
37 | 
38 | And run:
39 | 
40 | ```sql
41 | SHOW TABLES;
42 | ```
43 | 
44 | Try one of your tables with:
45 | 
46 | ```sql
47 | SELECT * FROM mytable;
48 | ```
49 | 
50 | There are also clients in [many different languages](https://prestodb.io/resources.html#libraries) you can use.
51 | 
52 | :rabbit: :tophat: :sparkles:
53 | 


--------------------------------------------------------------------------------
/archive/two-metrics.md:
--------------------------------------------------------------------------------
 1 | # The Two Metrics You Need
 2 | 
 3 | When interviewing candidates for Instacart’s first site reliability engineer, I volunteered to cover monitoring as one of my topics. I’d start by asking “What metrics should we be monitoring?”
 4 | 
 5 | One candidate gave an answer that astounded me.  He said,
 6 | 
 7 | > There are only two things I care about: errors and latency
 8 | 
 9 | More specifically:
10 | 
11 | - the sum of 5xx status codes
12 | - latency across all requests - average or 95th percentile
13 | 
14 | Both must be measured at the load balancer. Errors include those generated by the application and by the load balancer.
15 | 
16 | **Place alerts on these metrics to detect problems with the health of your site.**  It is significantly more effective than relying on services which monitor a few endpoints (you should do this as well).
17 | 
18 | Here’s how to get them on a few services.
19 | 
20 | ## Amazon ELB
21 | 
22 | CloudWatch gives them for free
23 | 
24 | - Errors = Sum ELB 5XXs + Sum HTTP 5XXs
25 | - Latency = Average Latency
26 | 
27 | ## Heroku
28 | 
29 | Add Librato.
30 | 
31 | ```sh
32 | heroku addons:create librato:development
33 | ```
34 | 
35 | - Errors = Sum of `router.status.5xx`
36 | - Latency = Sum of `router.service.perc95` and `router.connect.perc95`
37 | 


--------------------------------------------------------------------------------
/archive/tpc-ds.md:
--------------------------------------------------------------------------------
 1 | # TPC-DS with Postgres
 2 | 
 3 | [TPC-DS](http://www.tpc.org/tpcds/) is a database benchmark.
 4 | 
 5 | ```sh
 6 | git clone https://github.com/gregrahn/tpcds-kit.git
 7 | cd tpcds-kit/tools
 8 | make OS=MACOS
 9 | ```
10 | 
11 | Create the database and load the schema
12 | 
13 | ```sh
14 | createdb tpcds
15 | psql tpcds -f tpcds.sql
16 | ```
17 | 
18 | Generate data
19 | 
20 | ```sh
21 | ./dsdgen -FORCE -VERBOSE
22 | ```
23 | 
24 | Load the data
25 | 
26 | ```sh
27 | for i in `ls *.dat`; do
28 |   table=${i/.dat/}
29 |   echo "Loading $table..."
30 |   sed 's/|$//' $i > /tmp/$i
31 |   psql tpcds -q -c "TRUNCATE $table"
32 |   psql tpcds -c "\\copy $table FROM '/tmp/$i' CSV DELIMITER '|'"
33 | done
34 | ```
35 | 
36 | Generate queries
37 | 
38 | ```sh
39 | ./dsqgen -DIRECTORY ../query_templates -INPUT ../query_templates/templates.lst \
40 |   -VERBOSE Y -QUALIFY Y -DIALECT netezza
41 | ```
42 | 
43 | Run queries
44 | 
45 | ```sh
46 | psql tpcds -c "ANALYZE VERBOSE"
47 | psql tpcds < query_0.sql
48 | ```
49 | 
50 | ## Bonus: Add Indexes with Dexter
51 | 
52 | Install [Dexter](https://github.com/ankane/dexter)
53 | 
54 | ```sh
55 | gem install pgdexter
56 | ```
57 | 
58 | And run
59 | 
60 | ```sh
61 | for i in `seq 1 10`; do
62 |   dexter tpcds query_0.sql --input-format sql --create
63 | done
64 | ```
65 | 


--------------------------------------------------------------------------------
/archive/tpc-h.md:
--------------------------------------------------------------------------------
 1 | # TPC-H with Postgres
 2 | 
 3 | [TPC-H](http://www.tpc.org/tpch/) is a database benchmark.
 4 | 
 5 | ```sh
 6 | git clone https://github.com/gregrahn/tpch-kit.git
 7 | cd tpch-kit/dbgen
 8 | make -f Makefile.osx
 9 | ```
10 | 
11 | Create the database and load the schema
12 | 
13 | ```sh
14 | createdb tpch
15 | psql tpch -f dss.ddl
16 | ```
17 | 
18 | Generate data
19 | 
20 | ```sh
21 | ./dbgen -vf -s 1
22 | ```
23 | 
24 | Load the data
25 | 
26 | ```sh
27 | for i in `ls *.tbl`; do
28 |   table=${i/.tbl/}
29 |   echo "Loading $table..."
30 |   sed 's/|$//' $i > /tmp/$i
31 |   psql tpch -q -c "TRUNCATE $table"
32 |   psql tpch -c "\\copy $table FROM '/tmp/$i' CSV DELIMITER '|'"
33 | done
34 | ```
35 | 
36 | Generate queries
37 | 
38 | ```sh
39 | mkdir /tmp/queries
40 | for i in `ls queries/*.sql`; do
41 |   tail -r $i | sed '2s/;//' | tail -r > /tmp/$i
42 | done
43 | 
44 | DSS_QUERY=/tmp/queries ./qgen | sed 's/limit -1//' | sed 's/day (3)/day/' > queries.sql
45 | ```
46 | 
47 | Run queries
48 | 
49 | ```sh
50 | psql tpch -c "ANALYZE VERBOSE"
51 | psql tpch < queries.sql
52 | ```
53 | 
54 | ## Bonus: Add Indexes with Dexter
55 | 
56 | Install [Dexter](https://github.com/ankane/dexter)
57 | 
58 | ```sh
59 | gem install pgdexter
60 | ```
61 | 
62 | And run
63 | 
64 | ```sh
65 | for i in `seq 1 5`; do
66 |   dexter tpch queries.sql --input-format sql --create
67 | done
68 | ```
69 | 


--------------------------------------------------------------------------------
/archive/just-table-it.md:
--------------------------------------------------------------------------------
 1 | # Just Table It
 2 | 
 3 | When it comes to data, you can mistakenly optimize by trying to choose the “right” technology for the job. Often, the best choice is right in front of you: your database. Relational database scale pretty well, despite what you’ve been told in recent years. Don’t introduce a new data store into your stack if you don’t need to, and don’t store interesting data in logs unless you can easily query them. Generally:
 4 | 
 5 | **If you need to query data, throw it in a table.**
 6 | 
 7 | At [Instacart](https://www.instacart.com), we’ve stored:
 8 | 
 9 | - customer analytics, like visits and page views (yes, customer analytics!!)
10 | - emails sent
11 | - errors our customers see
12 | - slow requests
13 | - location updates from shoppers
14 | - audits for models
15 | 
16 | Even today, we store most of these in PostgreSQL. Your time is better spent adding value to customers than trying to anticipate how to handle this data at scale. You can figure it out when you get there.
17 | 
18 | :mount_fuji:
19 | 
20 | ---
21 | 
22 | We’ve open sourced much of the technology we use to do the above.
23 | 
24 | - [Ahoy](https://github.com/ankane/ahoy) for analytics
25 | - [Ahoy Email](https://github.com/ankane/ahoy_email) for emails
26 | - [Notable](https://github.com/ankane/notable) for errors and slow requests
27 | 
28 | And
29 | 
30 | - [Blazer](https://github.com/ankane/blazer) to analyze the data
31 | 


--------------------------------------------------------------------------------
/archive/strong-parameters.md:
--------------------------------------------------------------------------------
 1 | # attr_accessible to Strong Parameters
 2 | 
 3 | Running Rails 4 with `attr_accessible`? Upgrade in three **safe and easy** steps
 4 | 
 5 | ## 1
 6 | 
 7 | First, log all instances of forbidden attributes. Add to `config/application.rb`:
 8 | 
 9 | ```ruby
10 | config.action_controller.permit_all_parameters = false
11 | ```
12 | 
13 | And create an initializer `config/initializers/forbidden_attributes.rb` with:
14 | 
15 | ```ruby
16 | class ActiveRecord::Base
17 |   protected
18 |   def sanitize_for_mass_assignment_with_forbidden_attributes(*args)
19 |     attributes = args[0]
20 |     if attributes.respond_to?(:permitted?) && !attributes.permitted?
21 |       if Rails.env.development? || Rails.env.test? || ENV["RAISE_FORBIDDEN_ATTRIBUTES"]
22 |         raise ActiveModel::ForbiddenAttributesError
23 |       end
24 |       Rails.logger.warn "Forbidden attributes: #{self.class.name}"
25 |     end
26 |     sanitize_for_mass_assignment_without_forbidden_attributes(*args)
27 |   end
28 |   alias_method_chain :sanitize_for_mass_assignment, :forbidden_attributes
29 | end
30 | ```
31 | 
32 | ## 2
33 | 
34 | Fix all instances.
35 | 
36 | ```ruby
37 | User.create(params[:user])
38 | ```
39 | 
40 | to
41 | 
42 | ```ruby
43 | User.create(params.require(:user).permit(:name))
44 | ```
45 | 
46 | ## 3
47 | 
48 | Remove:
49 | 
50 | - all instances of `attr_accessible`
51 | - `config/initializers/forbidden_attributes.rb`
52 | - `protected_attributes` from your Gemfile
53 | 


--------------------------------------------------------------------------------
/archive/xgboost-lightgbm-come-to-ruby.md:
--------------------------------------------------------------------------------
 1 | # XGBoost and LightGBM Come to Ruby
 2 | 
 3 | <p style="text-align: center; margin-bottom: 0;">
 4 |   <img src="/images/ruby-xgboost.png" alt="Ruby and XGBoost" />
 5 | </p>
 6 | 
 7 | I’m happy to announce that XGBoost - and it’s cousin LightGBM from Microsoft - are now available for Ruby!
 8 | 
 9 | XGBoost and LightGBM are powerful machine learning libraries that use a technique called gradient boosting. Gradient boosting performs well on a [large range of datasets](https://machinelearningmastery.com/start-with-gradient-boosting/) and is common among winning solutions in ML competitions.
10 | 
11 | XGBoost and LightGBM are already available for popular ML languages like Python and R. The Ruby gems follow similar interfaces and use the same C APIs under the hood.
12 | 
13 | Make predictions with as little code as:
14 | 
15 | ```ruby
16 | model = Xgb::Regressor.new
17 | model.fit(x_train, y_train)
18 | model.predict(x_test)
19 | ```
20 | 
21 | While Ruby still lags behind other languages for machine learning, the ecosystem is getting better. [Rumale](https://github.com/yoshoku/rumale) is under active development and supports a large number of algorithms with an interface similar to Scikit-Learn. There’s also [Daru](https://github.com/SciRuby/daru) which is similar to Pandas. The addition of gradient boosting covers another key category.
22 | 
23 | Check out the [Xgb](https://github.com/ankane/xgb) and [LightGBM](https://github.com/ankane/lightgbm) gems today!
24 | 


--------------------------------------------------------------------------------
/archive/vault-pki.md:
--------------------------------------------------------------------------------
 1 | # Vault for PKI
 2 | 
 3 | Here’s how to use Vault for public key infrastructure.
 4 | 
 5 | ---
 6 | 
 7 | **Update:** Vault now has a [great article](https://learn.hashicorp.com/vault/secrets-management/sm-pki-engine) on this
 8 | 
 9 | ---
10 | 
11 | Install the [latest version of Vault](https://www.vaultproject.io/downloads.html) and jq
12 | 
13 | ```sh
14 | sudo apt-get install unzip jq
15 | wget https://releases.hashicorp.com/vault/0.9.0/vault_0.9.0_linux_amd64.zip
16 | unzip vault_0.9.0_linux_amd64.zip
17 | sudo mv vault /usr/local/bin
18 | ```
19 | 
20 | Start Vault (we use development mode for this tutorial)
21 | 
22 | ```sh
23 | vault server -dev
24 | ```
25 | 
26 | Create a PKI secret backend
27 | 
28 | ```sh
29 | export VAULT_ADDR='http://127.0.0.1:8200'
30 | 
31 | vault mount pki
32 | vault mount-tune -max-lease-ttl=87600h pki
33 | 
34 | vault write pki/root/generate/internal common_name=myvault.com ttl=87600h
35 | 
36 | vault write pki/config/urls issuing_certificates="http://127.0.0.1:8200/v1/pki/ca" \
37 |     crl_distribution_points="http://127.0.0.1:8200/v1/pki/crl"
38 | 
39 | vault write pki/roles/yourrole \
40 |     allowed_domains="yourhost" \
41 |     allow_subdomains="false" max_ttl="72h"
42 | ```
43 | 
44 | And issue certificates
45 | 
46 | ```sh
47 | data=`vault write -format=json pki/issue/yourrole common_name=yourhost`
48 | 
49 | jq -r '.data.certificate' <<< $data > cert.pem
50 | jq -r '.data.private_key' <<< $data > key.pem
51 | jq -r '.data.issuing_ca' <<< $data > ca.pem
52 | ```
53 | 


--------------------------------------------------------------------------------
/archive/r-database-urls.md:
--------------------------------------------------------------------------------
 1 | # R and Database URLs
 2 | 
 3 | **Note:** This approach is now built into the [dbx package](https://github.com/ankane/dbx)
 4 | 
 5 | ---
 6 | 
 7 | To use a `DATABASE_URL` with R, do:
 8 | 
 9 | ## Postgres
10 | 
11 | ```R
12 | library(RPostgreSQL)
13 | library(httr)
14 | 
15 | establishConnection <- function(url=Sys.getenv("DATABASE_URL"))
16 | {
17 |   cred <- parse_url(url)
18 |   if (!identical(cred$scheme, "postgres")) stop("Invalid database url")
19 |   if (is.null(cred$username)) cred$username <- ""
20 |   if (is.null(cred$password)) cred$password <- ""
21 |   if (is.null(cred$port)) cred$port <- 5432
22 |   dbConnect(PostgreSQL(), host=cred$hostname, port=cred$port,
23 |     user=cred$username, password=cred$password, dbname=cred$path)
24 | }
25 | 
26 | con <- establishConnection()
27 | dbGetQuery(con, "SELECT true AS success")
28 | ```
29 | 
30 | ## MySQL
31 | 
32 | ```R
33 | library(RMySQL)
34 | library(httr)
35 | 
36 | establishConnection <- function(url=Sys.getenv("DATABASE_URL"))
37 | {
38 |   cred <- parse_url(url)
39 |   if (!identical(cred$scheme, "mysql")) stop("Invalid database url")
40 |   if (is.null(cred$username)) cred$username <- "root"
41 |   if (is.null(cred$password)) cred$password <- ""
42 |   if (is.null(cred$port)) cred$port <- 3306
43 |   dbConnect(MySQL(), host=cred$hostname, port=cred$port,
44 |     user=cred$username, password=cred$password, dbname=cred$path)
45 | }
46 | 
47 | con <- establishConnection()
48 | dbGetQuery(con, "SELECT true AS success")
49 | ```
50 | 
51 | :cake:
52 | 


--------------------------------------------------------------------------------
/archive/hardening-devise.md:
--------------------------------------------------------------------------------
 1 | # Hardening Devise
 2 | 
 3 | A few basic steps to make your [Devise](https://github.com/plataformatec/devise) setup more secure :lock:
 4 | 
 5 | ### Send notifications for important events
 6 | 
 7 | Like a user changing his or her email or password. For email changes, notify the old email address to prevent quiet account takeovers (changing the email then resetting the password).
 8 | 
 9 | In `config/initializers/devise.rb`, set:
10 | 
11 | ```ruby
12 | config.send_email_changed_notification = true
13 | config.send_password_change_notification = true
14 | ```
15 | 
16 | ### Rate limit login attempts
17 | 
18 | Use Devise’s `Lockable` module to protect individual accounts. This will lock an account after too many attempts.
19 | 
20 | Use a library like [Rack::Attack](https://github.com/kickstarter/rack-attack) to slow down [credential stuffing](https://en.wikipedia.org/wiki/Credential_stuffing). This will prevent an IP address from trying to sign into many different accounts using credentials from data breaches.
21 | 
22 | Create `config/initializers/rack_attack.rb` with:
23 | 
24 | ```ruby
25 | Rack::Attack.throttle("logins/ip", limit: 20, period: 1.hour) do |req|
26 |   req.ip if req.post? && req.path.start_with?("/users/sign_in")
27 | end
28 | 
29 | ActiveSupport::Notifications.subscribe("rack.attack") do |name, start, finish, request_id, req|
30 |   puts "Throttled #{req.env["rack.attack.match_discriminator"]}"
31 | end
32 | ```
33 | 
34 | ### Record and monitor login attempts
35 | 
36 | Use [AuthTrail](https://github.com/ankane/authtrail) to record login attempts.
37 | 
38 | ---
39 | 
40 | Remember, [defense in depth](https://en.wikipedia.org/wiki/Defense_in_depth_%28computing%29)!
41 | 
42 | For more, check out [Secure Rails](https://github.com/ankane/secure_rails).
43 | 


--------------------------------------------------------------------------------
/archive/anonymizing-ips.md:
--------------------------------------------------------------------------------
 1 | # Anonymizing IPs in Ruby
 2 | 
 3 | With the [GDPR](https://en.wikipedia.org/wiki/General_Data_Protection_Regulation) just around the corner, here are two useful ways to protect your users’ IP addresses.
 4 | 
 5 | Both support IPv4 and IPv6, and are included in the [ip_anonymizer](https://github.com/ankane/ip_anonymizer) gem.
 6 | 
 7 | ## Masking
 8 | 
 9 | This is the approach [Google Analytics uses for IP anonymization](https://support.google.com/analytics/answer/2763052):
10 | 
11 | - For IPv4, the last octet is set to 0
12 | - For IPv6, the last 80 bits are set to zeros
13 | 
14 | ```ruby
15 | require "ipaddr"
16 | 
17 | def mask_ip(ip)
18 |   addr = IPAddr.new(ip)
19 |   if addr.ipv4?
20 |     # set last octet to 0
21 |     addr.mask(24).to_s
22 |   else
23 |     # set last 80 bits to zeros
24 |     addr.mask(48).to_s
25 |   end
26 | end
27 | ```
28 | 
29 | Examples
30 | 
31 | ```ruby
32 | mask_ip("8.8.4.4")
33 | # => "8.8.4.0"
34 | 
35 | mask_ip("2001:4860:4860:0:0:0:0:8844")
36 | # => "2001:4860:4860::"
37 | ```
38 | 
39 | ## Hashing
40 | 
41 | This transforms IP addresses with a keyed hash function (PBKDF2-HMAC-SHA256). If an unkeyed function is used (like SHA1), it’s trivial to build a rainbow table.
42 | 
43 | ```ruby
44 | require "ipaddr"
45 | require "openssl"
46 | 
47 | def hash_ip(ip, key:)
48 |   addr = IPAddr.new(ip)
49 |   key_len = addr.ipv4? ? 4 : 16
50 |   family = addr.ipv4? ? Socket::AF_INET : Socket::AF_INET6
51 | 
52 |   keyed_hash = OpenSSL::PKCS5.pbkdf2_hmac(addr.to_s, key, 20000, key_len, "sha256")
53 |   IPAddr.new(keyed_hash.bytes.inject {|a, b| (a << 8) + b }, family).to_s
54 | end
55 | ```
56 | 
57 | Examples
58 | 
59 | ```ruby
60 | hash_ip("8.8.4.4", key: "secret")
61 | # => "114.124.40.57"
62 | 
63 | hash_ip("2001:4860:4860:0:0:0:0:8844", key: "secret")
64 | # => "49a2:718:9704:cf11:2068:4c15:587c:1e15"
65 | ```
66 | 


--------------------------------------------------------------------------------
/archive/backsolve.md:
--------------------------------------------------------------------------------
 1 | # Backsolving in Ruby
 2 | 
 3 | QR decomposition is a [stable way to solve linear regression](https://statsmaths.github.io/stat612/lectures/lec13/lecture13.pdf).
 4 | 
 5 | ```ruby
 6 | require "matrix"
 7 | 
 8 | x = Matrix.columns([[1, 1, 1, 1, 1], [1, 2, 3, 4, 5], [4, 2, 5, 6, 1]])
 9 | y = Matrix.column_vector([145, 225, 355, 465, 515])
10 | ```
11 | 
12 | You can use the [extendmatrix](https://github.com/clbustos/extendmatrix) gem to do decomposition in pure Ruby. Givens rotations [are faster](https://en.wikipedia.org/wiki/QR_decomposition#Using_Givens_rotations), but the implementation appears to have a bug.
13 | 
14 | ```ruby
15 | require "extendmatrix"
16 | 
17 | r = x.houseR
18 | q = x.houseQ
19 | ```
20 | 
21 | Next, split R into a p-by-p matrix and Q into a n-by-p matrix (see [page 32](https://statsmaths.github.io/stat612/lectures/lec13/lecture13.pdf) for reasoning).
22 | 
23 | ```ruby
24 | r1 = r.minor(0...x.column_count, 0...x.column_count)
25 | q1 = q.minor(0...x.row_count, 0...x.column_count)
26 | ```
27 | 
28 | Finally, backsolve. Code ported from [Stack Overflow](https://codereview.stackexchange.com/questions/110039/back-substitution-method-for-solving-linear-system).
29 | 
30 | ```ruby
31 | def backsolve(a, b)
32 |   x = Matrix.zero(b.row_count, b.column_count)
33 | 
34 |   # use an array since matrices
35 |   # are immutable in Ruby
36 |   x = x.to_a
37 | 
38 |   c = 1.0 / a[-1, -1]
39 |   x[-1] = b.row(-1).map { |v| c * v }
40 | 
41 |   (b.row_count - 2).downto(0) do |i|
42 |     c = 1.0 / a[i, i]
43 |     s = (a.minor(i..i, (i+1)..-1) * Matrix.rows(x[(i + 1)..-1])).to_a[0]
44 |     x[i] = b.row(i).zip(s).map { |v, w| v - w }.map { |v| c * v }
45 |   end
46 | 
47 |   Matrix.rows(x)
48 | end
49 | ```
50 | 
51 | Run it to get the coefficients
52 | 
53 | ```ruby
54 | backsolve(r1, q1.transpose * y)
55 | ```
56 | 
57 | Regression solved!
58 | 


--------------------------------------------------------------------------------
/archive/aws-client-side-encryption.md:
--------------------------------------------------------------------------------
 1 | # Client-Side Encryption with AWS and Ruby
 2 | 
 3 | AWS makes it easy to enable server-side encryption on many of its services, but it also provides ways to do client-side encryption well. Here are a few ways in Ruby.
 4 | 
 5 | ## S3
 6 | 
 7 | Gem: [aws-sdk-s3](https://github.com/aws/aws-sdk-ruby)
 8 | 
 9 | ```ruby
10 | client = Aws::S3::Encryption::Client.new(
11 |   kms_key_id: "alias/my-key"
12 | )
13 | 
14 | client.put_object(
15 |   body: File.read("test.txt"),
16 |   bucket: "my-bucket",
17 |   key: "test.txt"
18 | )
19 | 
20 | resp = client.get_object(
21 |   bucket: "my-bucket",
22 |   key: "test.txt"
23 | )
24 | 
25 | puts resp.body.read
26 | ```
27 | 
28 | Use `encryption_key` instead of `kms_key_id` to manage keys yourself.
29 | 
30 | ## S3 + CarrierWave
31 | 
32 | Gem: [carrierwave-aws](https://github.com/sorentwo/carrierwave-aws)
33 | 
34 | ```ruby
35 | CarrierWave.configure do |config|
36 |   config.storage    = :aws
37 |   config.aws_bucket = "my-bucket"
38 |   config.aws_acl    = "private"
39 | 
40 |   config.aws_credentials = {
41 |     client: Aws::S3::Encryption::Client.new(kms_key_id: "alias/my-key")
42 |   }
43 | end
44 | ```
45 | 
46 | Use `encryption_key` instead of `kms_key_id` to manage keys yourself.
47 | 
48 | This can also be set on individual uploaders.
49 | 
50 | ```ruby
51 | class AvatarUploader < CarrierWave::Uploader::Base
52 |   def aws_credentials
53 |     {
54 |       client: Aws::S3::Encryption::Client.new(kms_key_id: "alias/my-key")
55 |     }
56 |   end
57 | end
58 | ```
59 | 
60 | ## Active Record Attributes
61 | 
62 | Gem: [kms_encrypted](https://github.com/ankane/kms_encrypted)
63 | 
64 | ```ruby
65 | class User < ApplicationRecord
66 |   has_kms_key key_id: "alias/my-key"
67 | 
68 |   attr_encrypted :email, key: :kms_key
69 | end
70 | ```
71 | 
72 | ## Active Storage
73 | 
74 | Check out [this post](https://ankane.org/activestorage-s3-encryption).
75 | 


--------------------------------------------------------------------------------
/archive/onnx-runtime-ruby.md:
--------------------------------------------------------------------------------
 1 | # Score Almost Any Machine Learning Model in Ruby
 2 | 
 3 | Ruby isn’t a common choice for machine learning, but companies running Ruby can get tremendous value from it.
 4 | 
 5 | I’m happy to announce it’s now possible to build advanced models in TensorFlow, Scikit-learn, PyTorch, and a number of other tools, and score them in Ruby with minimal friction.
 6 | 
 7 | To do this previously, you’d need to either:
 8 | 
 9 | 1. Shell out
10 | 2. Create a microservice
11 | 3. Use a bridge like [PyCall](https://github.com/mrkn/pycall.rb) or [RSRuby](https://github.com/alexgutteridge/rsruby)
12 | 
13 | All of these approaches require managing another language in production. Luckily, there’s now a better way.
14 | 
15 | ONNX (pronounced “On-ix”) is a serialization format for storing models created by Facebook and Microsoft. It was designed for neural networks but now supports traditional ML models as well. Based on its current adoption, I won’t be surprised if it replaces PMML as the de facto interchange format for ML models.
16 | 
17 | To run ONNX models, Microsoft created [ONNX Runtime](https://github.com/microsoft/onnxruntime), a “cross-platform, high performance scoring engine for ML models.” ONNX Runtime has a C API, which Ruby is happy to use. Thanks to FFI, it even works on JRuby!
18 | 
19 | <p style="text-align: center;"><img src="/images/onnx-runtime-ruby.png" alt="ONNX Runtime Ruby" /></p>
20 | 
21 | ONNX Runtime is designed to be fast, and Microsoft saw significant increases in [performance](https://cloudblogs.microsoft.com/opensource/2019/05/22/onnx-runtime-machine-learning-inferencing-0-4-release/) for a number of models after deploying it.
22 | 
23 | This is another step forward for machine learning in Ruby. Earlier this month, XGBoost and LightGBM also [came to Ruby](https://ankane.org/xgboost-lightgbm-come-to-ruby).
24 | 
25 | Check out the [ONNX Runtime](https://github.com/ankane/onnxruntime) gem today!
26 | 


--------------------------------------------------------------------------------
/archive/trying-out-vault.md:
--------------------------------------------------------------------------------
 1 | # Trying Out Vault for Postgres Credentials
 2 | 
 3 | Install [Vault](https://www.vaultproject.io/), as well as JQ for JSON parsing
 4 | 
 5 | ```sh
 6 | brew install vault jq
 7 | ```
 8 | 
 9 | Start the dev server
10 | 
11 | ```sh
12 | vault server -dev
13 | ```
14 | 
15 | Then open another window. For this demo, we’ll create a new Postgres database.
16 | 
17 | ```sh
18 | createdb myapp
19 | ```
20 | 
21 | Create a Postgres user for Vault to manage other users
22 | 
23 | ```sh
24 | psql -c "CREATE USER vault WITH CREATEROLE ENCRYPTED PASSWORD 'secret';" myapp
25 | ```
26 | 
27 | And create a role to grant to temporary users. This is where you should configure privileges (omitted).
28 | 
29 | ```sh
30 | psql -c "CREATE ROLE app;" myapp
31 | ```
32 | 
33 | Configure Vault. We set a default TTL of 10 seconds for users to test.
34 | 
35 | ```sh
36 | export VAULT_ADDR='http://127.0.0.1:8200'
37 | 
38 | vault mount database
39 | 
40 | vault write database/config/postgresql \
41 |     plugin_name=postgresql-database-plugin \
42 |     allowed_roles="app" \
43 |     connection_url="postgresql://vault:secret@localhost:5432/myapp?sslmode=disable"
44 | 
45 | vault write database/roles/app \
46 |     db_name=postgresql \
47 |     creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';" \
48 |     default_ttl="10s" \
49 |     max_ttl="24h"
50 | ```
51 | 
52 | Fetch temporary credentials
53 | 
54 | ```sh
55 | vault read -format=json database/creds/app
56 | ```
57 | 
58 | Save the result as environment variables (with JQ)
59 | 
60 | ```sh
61 | data=`vault read -format=json database/creds/app`
62 | export PGUSER=`echo $data | jq -r '.data.username'`
63 | export PGPASSWORD=`echo $data | jq -r '.data.password'`
64 | ```
65 | 
66 | Test the new user
67 | 
68 | ```sh
69 | psql -c "SELECT current_user;" myapp
70 | ```
71 | 
72 | Wait 10 seconds and re-run the command to confirm the user no longer exists
73 | 


--------------------------------------------------------------------------------
/archive/large-text-indexes.md:
--------------------------------------------------------------------------------
 1 | # Large Text Indexes in Postgres
 2 | 
 3 | ---
 4 | 
 5 | **Note:** This article was written for Postgres 9.6 and below. For Postgres 10+, use hash indexes instead.
 6 | 
 7 | ---
 8 | 
 9 | An index on a sufficiently large `text` column can take up more space than the table itself. If you only need to check for equality, you can significantly reduce the size of the index.
10 | 
11 | At first glance, a hash index seem perfect for this. However, you shouldn’t use them since [they are not currently WAL-logged](https://www.postgresql.org/docs/9.6/indexes-types.html). Instead, use an expression index:
12 | 
13 | ```sql
14 | CREATE INDEX CONCURRENTLY ON table_name (CAST(md5(column_name) AS uuid));
15 | ```
16 | 
17 | Cast to a `uuid` since it’s 16 bytes - the [perfect size](https://dba.stackexchange.com/questions/115271/what-is-the-optimal-data-type-for-an-md5-field) to store an md5 hash.
18 | 
19 | 
20 | Add an extra condition to your queries so the index is used.
21 | 
22 | ```sql
23 | SELECT * FROM table_name WHERE column_name = 'some_value'
24 |   AND md5(column_name)::uuid = md5('some_value')::uuid; -- add this
25 | ```
26 | 
27 | Keep the original equality comparison in the unlikely chance of a hash collision.
28 | 
29 | Finally, confirm it worked:
30 | 
31 | ```sql
32 | EXPLAIN ANALYZE
33 |   SELECT * FROM table_name WHERE column_name = 'some_value'
34 |   AND md5(column_name)::uuid = md5('some_value')::uuid;
35 | ```
36 | 
37 | This should show the new index being used.
38 | 
39 | ```
40 | Index Scan using table_name_md5_idx on table_name  (cost=0.58..8.60 rows=1 width=172) (actual time=0.012..0.012 rows=0 loops=1)
41 |   Index Cond: ((md5(column_name)::uuid = '9619030c-750b-4300-95b1-2d365196de91'::uuid)
42 |   Filter: (column_name = 'some_value'::text)
43 | Planning time: 0.062 ms
44 | Execution time: 0.027 ms
45 | ```
46 | 
47 | If it’s not, run `ANALYZE VERBOSE table_name;` and try again.
48 | 
49 | This reduced the size of one of our indexes at [Instacart](https://www.instacart.com) by **7x!** :slot_machine:
50 | 


--------------------------------------------------------------------------------
/archive/learn-data-science.md:
--------------------------------------------------------------------------------
 1 | # Learn Data Science
 2 | 
 3 | R and Python are two popular languages for data science. We use both at [Instacart](https://www.instacart.com).
 4 | 
 5 | **This is a short guide for R.**
 6 | 
 7 | It’s quick and everything is completely free. Follow in order for a smoother ride.
 8 | 
 9 | ### Setup
10 | 
11 | Download [R](https://cloud.r-project.org/) and [RStudio Desktop](https://www.rstudio.com/products/rstudio/download/).
12 | 
13 | ### Getting Started
14 | 
15 | Complete [Try R](https://tryr.codeschool.com). Then read about [data structures](http://adv-r.had.co.nz/Data-structures.html) and [subsetting](http://adv-r.had.co.nz/Subsetting.html). The `str()` command will be a favorite as you learn.
16 | 
17 | ### Your First Model
18 | 
19 | Kaggle is a platform for data science competitions. Complete the [“getting started” competition](https://www.kaggle.com/c/titanic) by following [this great tutorial](https://trevorstephens.com/post/72916401642/titanic-getting-started-with-r) (5 parts).
20 | 
21 | ### Keep Learning
22 | 
23 | Read and do the labs in [An Introduction to Statistical Learning](https://www-bcf.usc.edu/~gareth/ISL/) (available as a free PDF). Understand the bias-variance tradeoff.
24 | 
25 | Check out [R for Data Science](http://r4ds.had.co.nz/) and these courses:
26 | 
27 | - [Practical Machine Learning](https://www.coursera.org/learn/practical-machine-learning)
28 | - [Exploratory Data Analysis](https://www.coursera.org/learn/exploratory-data-analysis)
29 | - [Regression Models](https://www.coursera.org/learn/regression-models)
30 | - [Data Cleaning](https://www.coursera.org/learn/data-cleaning)
31 | 
32 | Get a quick intro to [time-series analysis](https://a-little-book-of-r-for-time-series.readthedocs.org/en/latest/src/timeseries.html).
33 | 
34 | Also check out [Google’s R Style Guide](https://google.github.io/styleguide/Rguide.xml).
35 | 
36 | ### Good to Know
37 | 
38 | - variables and function names in R can (and often do) contain dots - `lm.fit` is a variable name, not a call to the `lm` object
39 | 


--------------------------------------------------------------------------------
/archive/devise-argon2.md:
--------------------------------------------------------------------------------
 1 | # Argon2 with Devise
 2 | 
 3 | bcrypt has been a great choice for safely storing passwords. However, as time has passed, a better alternative has emerged: [Argon2](https://password-hashing.net/). OWASP [now recommends](https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/Password_Storage_Cheat_Sheet.md) Argon2 for new applications. With a little bit of code, you can use Argon2 with Devise.
 4 | 
 5 | Devise supports [custom encryptors](https://github.com/plataformatec/devise-encryptable). However, it requires a separate column to store a salt, which isn’t needed as Argon2 stores the salt in the password hash (like bcrypt).
 6 | 
 7 | Instead, add [argon2](https://github.com/technion/ruby-argon2) to your Gemfile:
 8 | 
 9 | ```ruby
10 | gem 'argon2', '>= 2'
11 | ```
12 | 
13 | And create `config/initializers/devise_argon2.rb` with:
14 | 
15 | ```ruby
16 | module Argon2Encryptor
17 |   def digest(klass, password)
18 |     if klass.pepper.present?
19 |       password = "#{password}#{klass.pepper}"
20 |     end
21 |     ::Argon2::Password.create(password)
22 |   end
23 | 
24 |   def compare(klass, hashed_password, password)
25 |     return false if hashed_password.blank?
26 | 
27 |     if hashed_password.start_with?("$argon2")
28 |       if klass.pepper.present?
29 |         password = "#{password}#{klass.pepper}"
30 |       end
31 |       ::Argon2::Password.verify_password(password, hashed_password)
32 |     else
33 |       super
34 |     end
35 |   end
36 | end
37 | 
38 | Devise::Encryptor.singleton_class.prepend(Argon2Encryptor)
39 | ```
40 | 
41 | All new passwords will be hashed with Argon2. For existing passwords, rotate to Argon2 when a user signs in. Add to your model:
42 | 
43 | ```ruby
44 | class User < ApplicationRecord
45 |   def valid_password?(password)
46 |     valid = super
47 |     if valid && !encrypted_password.start_with?("$argon2")
48 |       self.password = password
49 |       save(validate: false)
50 |     end
51 |     valid
52 |   end
53 | end
54 | ```
55 | 
56 | You can also [rehash all passwords at once](https://www.michalspacek.com/upgrading-existing-password-hashes), but it’s a bit more complicated.
57 | 
58 | Congrats, your password storage is even stronger!
59 | 


--------------------------------------------------------------------------------
/archive/lockbox-types.md:
--------------------------------------------------------------------------------
 1 | # Lockbox: Now with Types
 2 | 
 3 | A new version of [Lockbox](https://ankane.org/modern-encryption-rails) was just released with support for types, making it easier to encrypt non-string fields.
 4 | 
 5 | ```ruby
 6 | class User < ApplicationRecord
 7 |   encrypts :born_on, type: :date
 8 |   encrypts :salary, type: :integer
 9 | end
10 | ```
11 | 
12 | Previously, you’d need to perform typecasting yourself, making it harder to work with encrypted fields. All of these types are supported:
13 | 
14 | - date
15 | - datetime
16 | - boolean
17 | - integer
18 | - float
19 | - binary
20 | - json
21 | - hash
22 | 
23 | Types are automatically detected for serialized fields for maximum compatibility with existing code and libraries.
24 | 
25 | ```ruby
26 | class User < ApplicationRecord
27 |   serialize :properties, JSON
28 |   encrypts :properties # detects JSON type
29 | end
30 | ```
31 | 
32 | It even works with custom serializers.
33 | 
34 | ## Padding
35 | 
36 | This release also adds support for padding. Padding can help conceal the exact length of messages. As the [Libsodium docs](https://libsodium.gitbook.io/doc/padding) explain:
37 | 
38 | > Most modern cryptographic constructions disclose message lengths. The ciphertext for a given message will always have the same length, or add a constant number of bytes to it. For most applications, this is not an issue. But in some specific situations, [...] hiding the length may be desirable.
39 | 
40 | Suppose a person’s health is categorized as either:
41 | 
42 | - excellent
43 | - average
44 | - poor
45 | 
46 | Even if this value is encrypted, it’s easy to know the status of a person since each category has a different length, which carries over to the ciphertext. Padding addresses this by adding data to the end of each message before encryption. You can enable padding for a field with:
47 | 
48 | ```ruby
49 | class Person < ApplicationRecord
50 |   encrypts :health_status, padding: true
51 | end
52 | ```
53 | 
54 | This expands all messages to a multiple of 16 bytes. You can configure the multiple as needed based on your data.
55 | 
56 | Get started with types and padding by grabbing the [latest version](https://github.com/ankane/lockbox) of Lockbox today!
57 | 


--------------------------------------------------------------------------------
/archive/the-origin-of-sql-queries.md:
--------------------------------------------------------------------------------
 1 | # The Origin of SQL Queries
 2 | 
 3 | Do you know what part of your application is generating that time-consuming database query? There’s a much simpler way than `grep`.
 4 | 
 5 | **Add comments to your queries!!**
 6 | 
 7 | Turn:
 8 | 
 9 | ```sql
10 | SELECT * FROM pandas WHERE mood = 'sad'
11 | ```
12 | 
13 | into
14 | 
15 | ```sql
16 | SELECT * FROM pandas WHERE mood = 'happy'
17 | /*application:Nature,job:EatBambooJob*/
18 | ```
19 | 
20 | Whether you use [PgHero](https://github.com/ankane/pghero), [pg_stat_statements](https://www.postgresql.org/docs/current/static/pgstatstatements.html) on its own, or `log_min_duration_statement` to log slow queries, comments can help!
21 | 
22 | ## Ruby on Rails
23 | 
24 | [Marginalia](https://github.com/basecamp/marginalia) is great. We prefer to customize slightly in `config/initializers/marginalia.rb`.
25 | 
26 | ```ruby
27 | module Marginalia
28 |   module Comment
29 |     # add namespace to controller
30 |     def self.controller
31 |       if marginalia_controller.respond_to?(:controller_path)
32 |         marginalia_controller.controller_path
33 |       end
34 |     end
35 |   end
36 | end
37 | 
38 | # add job
39 | Marginalia::Comment.components << :job
40 | ```
41 | 
42 | ## Python
43 | 
44 | With SQLAlchemy and Flask:
45 | 
46 | ```python
47 | from flask import current_app, request
48 | from sqlalchemy.engine import Engine
49 | from sqlalchemy import event
50 | 
51 | @event.listens_for(Engine, "before_cursor_execute", retval=True)
52 | def annotate_queries(conn, cursor, statement, parameters, context, executemany):
53 |     comment = ""
54 |     try:
55 |         comment = " /*application:{},endpoint:{}*/".format(current_app.name,
56 |                                                              request.endpoint)
57 |     except RuntimeError:  # running in the CLI
58 |         try:
59 |             comment = " /*application:{}*/".format(current_app.name)
60 |         except RuntimeError:  # running in a REPL
61 |             pass
62 |     return statement + comment, parameters
63 | ```
64 | 
65 | ## R
66 | 
67 | With [dbx](https://github.com/ankane/dbx), use:
68 | 
69 | ```r
70 | options(dbx_comment=TRUE)
71 | ```
72 | 
73 | ## Other Languages and Frameworks
74 | 
75 | Please [submit a PR](https://github.com/ankane/shorts/pulls)!
76 | 


--------------------------------------------------------------------------------
/archive/blind-index-1-0.md:
--------------------------------------------------------------------------------
 1 | # Blind Index 1.0
 2 | 
 3 | <p style="text-align: center;"><img src="/images/blind-index-1-0.png" alt="Blind Index 1.0" /></p>
 4 | 
 5 | Blind indexing is an approach to [securely search encrypted data](https://paragonie.com/blog/2017/05/building-searchable-encrypted-databases-with-php-and-sql) with minimal information leakage.
 6 | 
 7 | I’m happy to announce that [Blind Index 1.0](https://github.com/ankane/blind_index) was just released! Here are the key improvements.
 8 | 
 9 | ## Stronger Algorithm
10 | 
11 | This release adds support for Argon2id and makes it the default algorithm.
12 | 
13 | Argon2 is a memory-hard function. You specify the amount of memory required to compute a hash, and if an attacker tries to compute the hash with less memory, it takes significantly more time to compute. This allows it to better resist attacks on specialized hardware like ASICs.
14 | 
15 | Argon2 is significantly better than PBKDF2 (the previous default), so we recommend upgrading for better security.
16 | 
17 | ## Less Keys to Manage
18 | 
19 | It’s a good practice to use a separate key for each blind index. However, generating, storing, and deploying new keys can be burdensome. Thanks to [this key separation method](https://ciphersweet.paragonie.com/internals/key-hierarchy) by CipherSweet, this is no longer needed. Instead, you can use a single master key and the library will derive separate keys for each blind index automatically. You no longer have to worry about managing additional secrets.
20 | 
21 | ## Better Naming
22 | 
23 | In earlier versions, blind index columns took the format `encrypted_#{name}_bidx`. This was done to match the encrypted columns of the attr_encrypted gem. However, this column is a hash rather than encrypted data, so the `encrypted_` prefix doesn’t really make sense. It was removed in this release.
24 | 
25 | ## Support for Lockbox
26 | 
27 | This release also adds support for [Lockbox](https://ankane.org/modern-encryption-rails), a modern encryption library for Rails.
28 | 
29 | ## Summary
30 | 
31 | Blind Index 1.0 brings a number of improvements, and there’s a [smooth path to upgrading](https://github.com/ankane/blind_index#upgrading) with zero downtime.
32 | 
33 | If you’re not encrypting data today because it makes it impossible to query, check out [Blind Index](https://github.com/ankane/blind_index).
34 | 


--------------------------------------------------------------------------------
/archive/rollup.md:
--------------------------------------------------------------------------------
  1 | # Package Your JavaScript Libraries With Rollup
  2 | 
  3 | [Rollup](https://rollupjs.org/guide/en) is a great tool for building libraries.
  4 | 
  5 | > [“Webpack for apps, and Rollup for libraries”](https://medium.com/webpack/webpack-and-rollup-the-same-but-different-a41ad427058c)
  6 | 
  7 | Run:
  8 | 
  9 | ```sh
 10 | yarn add rollup rollup-plugin-buble rollup-plugin-commonjs rollup-plugin-node-resolve rollup-plugin-uglify --dev
 11 | ```
 12 | 
 13 | Add to `package.json`:
 14 | 
 15 | ```json
 16 | {
 17 |   "main": "dist/my-project.js",
 18 |   "module": "dist/my-project.esm.js",
 19 |   "scripts": {
 20 |     "build": "rollup -c"
 21 |   }
 22 | }
 23 | ```
 24 | 
 25 | Add `dist/` to your `.gitignore`.
 26 | 
 27 | Create `rollup.config.js` with:
 28 | 
 29 | ```es6
 30 | import buble from "rollup-plugin-buble";
 31 | import commonjs from "rollup-plugin-commonjs";
 32 | import pkg from "./package.json";
 33 | import resolve from "rollup-plugin-node-resolve";
 34 | import uglify from "rollup-plugin-uglify";
 35 | 
 36 | const input = "src/index.js";
 37 | const outputName = "MyProject";
 38 | const external = Object.keys(pkg.peerDependencies || {});
 39 | const esExternal = external.concat(Object.keys(pkg.dependencies || {}));
 40 | const banner =
 41 | `/*
 42 |  * ${pkg.name}
 43 |  * ${pkg.description}
 44 |  * ${pkg.repository.url}
 45 |  * v${pkg.version}
 46 |  * ${pkg.license} License
 47 |  */
 48 | `;
 49 | 
 50 | export default [
 51 |   {
 52 |     input: input,
 53 |     output: {
 54 |       name: outputName,
 55 |       file: pkg.main,
 56 |       format: "umd",
 57 |       banner: banner
 58 |     },
 59 |     external: external,
 60 |     plugins: [
 61 |       resolve(),
 62 |       commonjs(),
 63 |       buble()
 64 |     ]
 65 |   },
 66 |   {
 67 |     input: input,
 68 |     output: {
 69 |       name: outputName,
 70 |       file: pkg.main.replace(/\.js$/, ".min.js"),
 71 |       format: "umd"
 72 |     },
 73 |     external: external,
 74 |     plugins: [
 75 |       resolve(),
 76 |       commonjs(),
 77 |       buble(),
 78 |       uglify()
 79 |     ]
 80 |   },
 81 |   {
 82 |     input: input,
 83 |     output: {
 84 |       file: pkg.module,
 85 |       format: "es",
 86 |       banner: banner
 87 |     },
 88 |     external: esExternal,
 89 |     plugins: [
 90 |       buble()
 91 |     ]
 92 |   }
 93 | ];
 94 | ```
 95 | 
 96 | And run:
 97 | 
 98 | ```sh
 99 | yarn build
100 | ```
101 | 
102 | :fire:
103 | 


--------------------------------------------------------------------------------
/archive/new-rails-app-checklist.md:
--------------------------------------------------------------------------------
  1 | # New Rails App Checklist
  2 | 
  3 | How I personally start new apps
  4 | 
  5 | ## Create Project
  6 | 
  7 | Get the latest version of Rails
  8 | 
  9 | ```sh
 10 | gem install rails
 11 | ```
 12 | 
 13 | Create a new app
 14 | 
 15 | ```sh
 16 | rails new <name> -d postgresql --skip-turbolinks
 17 | ```
 18 | 
 19 | Don’t fret too much over the name - you can easily update it later
 20 | 
 21 | ## Version Control
 22 | 
 23 | Add Git
 24 | 
 25 | ```sh
 26 | git add .
 27 | git commit -m "Hello app"
 28 | ```
 29 | 
 30 | ## App Config
 31 | 
 32 | Make a few updates to `config/application.rb`
 33 | 
 34 | Disable unwanted generators
 35 | 
 36 | ```ruby
 37 | config.generators do |g|
 38 |   g.assets false
 39 |   g.helper false
 40 |   g.test_framework nil
 41 | end
 42 | ```
 43 | 
 44 | Set time zone
 45 | 
 46 | ```ruby
 47 | config.time_zone = "Pacific Time (US & Canada)"
 48 | ```
 49 | 
 50 | ## Services
 51 | 
 52 | Create a directory for services
 53 | 
 54 | ```sh
 55 | mkdir app/services
 56 | ```
 57 | 
 58 | And create `app/services/application_service.rb` with:
 59 | 
 60 | ```ruby
 61 | class ApplicationService
 62 |   include Rails.application.routes.url_helpers
 63 | 
 64 |   def self.perform(*args)
 65 |     new.perform(*args)
 66 |   end
 67 | 
 68 |   protected
 69 | 
 70 |   def default_url_options
 71 |     Rails.application.config.action_mailer.default_url_options
 72 |   end
 73 | end
 74 | ```
 75 | 
 76 | ## Templates
 77 | 
 78 | Add [Haml](https://github.com/indirect/haml-rails)
 79 | 
 80 | ```ruby
 81 | gem 'haml-rails'
 82 | ```
 83 | 
 84 | and run
 85 | 
 86 | ```sh
 87 | rake haml:erb2haml
 88 | ```
 89 | 
 90 | ## Console
 91 | 
 92 | Add [Awesome Print](https://github.com/awesome-print/awesome_print)
 93 | 
 94 | ```ruby
 95 | gem 'awesome_print', require: false
 96 | ```
 97 | 
 98 | and have it run when the console starts in `config/application.rb`
 99 | 
100 | ```ruby
101 | console do
102 |   require "awesome_print"
103 |   AwesomePrint.irb!
104 | end
105 | ```
106 | 
107 | ## Environment
108 | 
109 | Add [dotenv](https://github.com/bkeepers/dotenv)
110 | 
111 | ```ruby
112 | gem 'dotenv-rails', groups: [:development, :test]
113 | ```
114 | 
115 | Create an env file and exclude it from version control
116 | 
117 | ```ruby
118 | touch .env
119 | echo ".env" >> .gitignore
120 | ```
121 | 
122 | ## Lastly
123 | 
124 | Run
125 | 
126 | ```sh
127 | rails db:create
128 | rails s
129 | ```
130 | 
131 | and create something awesome
132 | 


--------------------------------------------------------------------------------
/archive/google-oauth-with-devise.md:
--------------------------------------------------------------------------------
  1 | # Google OAuth with Devise
  2 | 
  3 | Here’s a quick guide to setting up Google OAuth as your app’s exclusive authentication method
  4 | 
  5 | Add to your Gemfile
  6 | 
  7 | ```ruby
  8 | gem 'devise'
  9 | gem 'omniauth-google-oauth2'
 10 | gem 'dotenv-rails', groups: [:development, :test]
 11 | ```
 12 | 
 13 | And run
 14 | 
 15 | ```sh
 16 | rails generate devise:install
 17 | ```
 18 | 
 19 | Create a `User` model
 20 | 
 21 | ```sh
 22 | rails g model User
 23 | ```
 24 | 
 25 | In the migration, add
 26 | 
 27 | ```ruby
 28 | create_table :users do |t|
 29 |   t.string :name
 30 |   t.string :email
 31 |   t.string :provider
 32 |   t.string :uid
 33 |   t.string :remember_token
 34 |   t.datetime :remember_created_at
 35 |   t.timestamps null: false
 36 | end
 37 | 
 38 | add_index :users, :email, unique: true
 39 | add_index :users, [:uid, :provider], unique: true
 40 | ```
 41 | 
 42 | In your `User` model, add
 43 | 
 44 | ```ruby
 45 | devise :rememberable, :omniauthable, omniauth_providers: [:google_oauth2]
 46 | ```
 47 | 
 48 | Create a controller
 49 | 
 50 | ```
 51 | rails g controller OmniauthCallbacks
 52 | ```
 53 | 
 54 | with
 55 | 
 56 | ```ruby
 57 | class OmniauthCallbacksController < Devise::OmniauthCallbacksController
 58 |   # replace with your authenticate method
 59 |   skip_before_action :authenticate_user!
 60 | 
 61 |   def google_oauth2
 62 |     auth = request.env["omniauth.auth"]
 63 |     user = User.where(provider: auth["provider"], uid: auth["uid"])
 64 |             .first_or_initialize(email: auth["info"]["email"])
 65 |     user.name ||= auth["info"]["name"]
 66 |     user.save!
 67 | 
 68 |     user.remember_me = true
 69 |     sign_in(:user, user)
 70 | 
 71 |     redirect_to after_sign_in_path_for(user)
 72 |   end
 73 | end
 74 | ```
 75 | 
 76 | In your routes, add
 77 | 
 78 | ```ruby
 79 | devise_for :users, controllers: {omniauth_callbacks: "omniauth_callbacks"}
 80 | ```
 81 | 
 82 | In `config/devise.rb`, add
 83 | 
 84 | ```ruby
 85 | config.omniauth :google_oauth2, ENV["GOOGLE_CLIENT_ID"], ENV["GOOGLE_CLIENT_SECRET"], access_type: "online"
 86 | ```
 87 | 
 88 | Follow the [Google API Setup](https://github.com/zquestz/omniauth-google-oauth2#google-api-setup) instructions and add your credentials in `.env`
 89 | 
 90 | ```
 91 | GOOGLE_CLIENT_ID=0000000
 92 | GOOGLE_CLIENT_SECRET=0000000
 93 | ```
 94 | 
 95 | ## Bonus
 96 | 
 97 | To remove the hash from the end of the URL after sign-in, use:
 98 | 
 99 | ```js
100 | var href = window.location.href;
101 | if (href[href.length - 1] === "#") {
102 |   if (typeof window.history.replaceState == "function") {
103 |     history.replaceState({}, "", href.slice(0, -1));
104 |   }
105 | }
106 | ```
107 | 


--------------------------------------------------------------------------------
/archive/artistic-style-transfer-ruby.md:
--------------------------------------------------------------------------------
 1 | # Artistic Style Transfer in Ruby
 2 | 
 3 | The [ONNX Model Zoo](https://github.com/onnx/models) has a number of interesting pretrained deep learning models. Thanks to the ONNX Runtime, we can run them in Ruby.
 4 | 
 5 | Today, we’ll look at artistic style transfer. Here’s [the model](https://github.com/onnx/models/tree/master/vision/style_transfer/fast_neural_style) we’ll use.
 6 | 
 7 | First, download the [pretrained model](https://github.com/onnx/models/raw/master/vision/style_transfer/fast_neural_style/models/opset9/rain_princess.onnx) and this awesome shot of a lynx.
 8 | 
 9 | <p style="text-align: center; margin-bottom: 0;"><img src="/images/lynx.jpg" alt="Lynx" /></p>
10 | 
11 | <p class="image-description">
12 |   Photo from the <a href="https://www.flickr.com/photos/usfws_alaska/32576677167/" target="_blank">U.S. Fish and Wildlife Service</a>
13 | </p>
14 | 
15 | Install the ONNX Runtime, MiniMagick, and Numo::NArray gems. MiniMagick allows us to manipulate images, and Numo::NArray makes it easier to work with multi-dimensional arrays.
16 | 
17 | ```ruby
18 | gem "onnxruntime"
19 | gem "mini_magick"
20 | gem "numo-narray"
21 | ```
22 | 
23 | Next, load the image. We resize it to be the model dimensions.
24 | 
25 | ```ruby
26 | img = MiniMagick::Image.open("lynx.jpg")
27 | img.resize "224x224^", "-gravity", "center", "-extent", "224x224"
28 | pixels = img.get_pixels
29 | ```
30 | 
31 | And load the model
32 | 
33 | ```ruby
34 | model = OnnxRuntime::Model.new("rain_princess.onnx")
35 | ```
36 | 
37 | Perform the preprocessing steps from the model docs
38 | 
39 | ```ruby
40 | pixels = Numo::NArray.cast(img.get_pixels)
41 | pixels = pixels.transpose(2, 0, 1)
42 | pixels = pixels.expand_dims(0)
43 | ```
44 | 
45 | Run the model
46 | 
47 | ```ruby
48 | result = model.predict(input1: pixels)
49 | ```
50 | 
51 | Perform the postprocessing steps
52 | 
53 | ```ruby
54 | out_pixels = Numo::NArray.cast(result["output1"].first)
55 | out_pixels = out_pixels.clip(0, 255).cast_to(Numo::UInt8)
56 | out_pixels = out_pixels.transpose(1, 2, 0)
57 | ```
58 | 
59 | And save the image
60 | 
61 | ```ruby
62 | img = MiniMagick::Image.import_pixels(out_pixels.to_binary, img.width, img.height, 8, "rgb", "jpg")
63 | img.write("output.jpg")
64 | ```
65 | 
66 | <p style="text-align: center;"><img src="/images/lynx-rain-princess.jpg" alt="Lynx with Rain Princess style" /></p>
67 | 
68 | Four other styles are also available.
69 | 
70 | <p style="text-align: center; margin-bottom: 0;" class="img-grid">
71 |   <img src="/images/lynx-udnie.jpg" alt="Lynx with Udnie style" />
72 |   <img src="/images/lynx-candy.jpg" alt="Lynx with Candy style" />
73 |   <img src="/images/lynx-mosaic.jpg" alt="Lynx with Mosaic style" />
74 |   <img src="/images/lynx-pointilism.jpg" alt="Lynx with Pointilism style" />
75 | </p>
76 | 
77 | Here’s the [complete code](https://gist.github.com/ankane/33ffa59ea0f5add37edee04fa7aacd9e). Now go out and try it with your own images!
78 | 


--------------------------------------------------------------------------------
/archive/activestorage-s3-encryption.md:
--------------------------------------------------------------------------------
 1 | # Active Storage S3 Client-Side Encryption
 2 | 
 3 | Use [client-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html) to encrypt your data before sending it to S3.
 4 | 
 5 | You can provide an encryption key to use directly or a [KMS](https://aws.amazon.com/kms/) key for envelope encryption. With envelope encryption, a data encryption key is retrieved from KMS and used to encrypt the file. An encrypted version of the key is stored in the object metadata. When downloading the file, the encrypted key is sent to KMS to be decrypted and then used to decrypt the file. The AWS SDK handles all this automatically.
 6 | 
 7 | An advantage of this approach is S3 never sees the unencrypted file (unlike with server-side encryption).
 8 | 
 9 | First, we’ll create a service that extends the built-in S3 service. Create `lib/active_storage/services/encrypted_s3_service.rb` with:
10 | 
11 | ```ruby
12 | require "active_storage/service/s3_service"
13 | 
14 | module ActiveStorage
15 |   class Service::EncryptedS3Service < Service::S3Service
16 |     attr_reader :encryption_client
17 | 
18 |     def initialize(bucket:, upload: {}, **options)
19 |       super_options = options.except(:kms_key_id, :encryption_key)
20 |       super(bucket: bucket, upload: upload, **super_options)
21 |       @encryption_client = Aws::S3::Encryption::Client.new(options)
22 |     end
23 | 
24 |     def upload(key, io, checksum: nil, **)
25 |       instrument :upload, key: key, checksum: checksum do
26 |         begin
27 |           encryption_client.put_object(
28 |             upload_options.merge(
29 |               body: io,
30 |               content_md5: checksum,
31 |               bucket: bucket.name,
32 |               key: key
33 |             )
34 |           )
35 |         rescue Aws::S3::Errors::BadDigest
36 |           raise ActiveStorage::IntegrityError
37 |         end
38 |       end
39 |     end
40 | 
41 |     def download(key, &block)
42 |       if block_given?
43 |         raise NotImplementedError, "#get_object with :range not supported yet"
44 |       else
45 |         instrument :download, key: key do
46 |           encryption_client.get_object(
47 |             bucket: bucket.name,
48 |             key: key
49 |           ).body.string.force_encoding(Encoding::BINARY)
50 |         end
51 |       end
52 |     end
53 | 
54 |     def download_chunk(key, range)
55 |       raise NotImplementedError, "#get_object with :range not supported yet"
56 |     end
57 |   end
58 | end
59 | ```
60 | 
61 | Note that downloading chunks isn’t supported with client-side encryption.
62 | 
63 | Then update `config/storage.yml` to use it:
64 | 
65 | ```yml
66 | amazon:
67 |   service: EncryptedS3
68 |   bucket: my-bucket
69 |   kms_key_id: alias/my-key
70 | ```
71 | 
72 | Use `encryption_key` instead of `kms_key_id` to manage keys yourself.
73 | 
74 | And that’s it!
75 | 
76 | For client-side encryption without Active Storage, check out [this post](https://ankane.org/aws-client-side-encryption).
77 | 


--------------------------------------------------------------------------------
/archive/pgbouncer-setup.md:
--------------------------------------------------------------------------------
  1 | # PgBouncer Setup
  2 | 
  3 | In under 5 minutes
  4 | 
  5 | ## Get Started
  6 | 
  7 | Here’s the flow:
  8 | 
  9 | ```text
 10 | Web app -> PgBouncer -> Postgres
 11 | ```
 12 | 
 13 | You can install PgBouncer on the same server as Postgres or a separate server. For Amazon RDS, you won’t have shell access to the database server, so you’ll need to spin up another EC2 instance to run PgBouncer.
 14 | 
 15 | ```text
 16 | Web app -> EC2 running PgBouncer -> RDS instance
 17 | ```
 18 | 
 19 | Start by launching a new instance of the latest LTS version of Ubuntu Server. Once the server is ready, ssh in. For the latest version of PgBouncer, we’ll use the [official Postgres APT repository](https://wiki.postgresql.org/wiki/Apt).
 20 | 
 21 | ```sh
 22 | sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
 23 | wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
 24 | sudo apt-get update
 25 | sudo apt-get install pgbouncer
 26 | ```
 27 | 
 28 | ## Configure PgBouncer
 29 | 
 30 | Edit `/etc/pgbouncer/pgbouncer.ini`. The important settings are:
 31 | 
 32 | ```ini
 33 | [databases]
 34 | YOUR-DBNAME = host=YOUR-HOST port=5432 dbname=YOUR-DBNAME
 35 | 
 36 | [pgbouncer]
 37 | listen_addr = *
 38 | listen_port = 6432
 39 | auth_type = md5
 40 | auth_file = /etc/pgbouncer/userlist.txt
 41 | pool_mode = transaction
 42 | server_reset_query =
 43 | ```
 44 | 
 45 | [View all settings](https://pgbouncer.github.io/config.html)
 46 | 
 47 | Create `/etc/pgbouncer/userlist.txt` with:
 48 | 
 49 | ```
 50 | "USERNAME1" "PASSWORD1"
 51 | "USERNAME2" "PASSWORD2"
 52 | ```
 53 | 
 54 | Use the same credentials as your database server.
 55 | 
 56 | ## Start the Service
 57 | 
 58 | ```sh
 59 | sudo service pgbouncer start
 60 | ```
 61 | 
 62 | Then reboot the server and confirm the process comes back up.
 63 | 
 64 | ## Test
 65 | 
 66 | ```sh
 67 | psql -h 127.0.0.1 -p 6432 -d YOUR-DBNAME -U USERNAME1
 68 | ```
 69 | 
 70 | ## Increase File Limits
 71 | 
 72 | If you need more than 1,000 connections to PgBouncer, you’ll need to increase file limits.
 73 | 
 74 | Append to `/etc/default/pgbouncer`:
 75 | 
 76 | ```sh
 77 | ulimit -n 16384
 78 | ```
 79 | 
 80 | Restart the service with:
 81 | 
 82 | ```sh
 83 | sudo service pgbouncer restart
 84 | ```
 85 | 
 86 | To confirm it worked, find the process ID and run:
 87 | 
 88 | ```sh
 89 | cat /proc/<pid>/limits
 90 | ```
 91 | 
 92 | `Max open files` should reflect the value above.
 93 | 
 94 | ## App Changes
 95 | 
 96 | Be sure to disable prepared statements, as they will not work with PgBouncer in transaction mode.
 97 | 
 98 | ## Statement Timeouts
 99 | 
100 | To use a [statement timeout](https://www.postgresql.org/docs/current/static/runtime-config-client.html#GUC-STATEMENT-TIMEOUT), run:
101 | 
102 | ```sql
103 | ALTER ROLE USERNAME1 SET statement_timeout = 5000;
104 | ```
105 | 
106 | ## Congrats
107 | 
108 | You’ve successfully set up PgBouncer.
109 | 


--------------------------------------------------------------------------------
/archive/csp-rails.md:
--------------------------------------------------------------------------------
  1 | # Adding CSP to Rails
  2 | 
  3 | Content Security Policy can be an effective way to prevent XSS attacks. If you aren’t familiar, here’s a [great intro](https://www.html5rocks.com/en/tutorials/security/content-security-policy/).
  4 | 
  5 | To get started with Rails, first add the header to all requests in your `ApplicationController`. We want to start by blocking content in development so we notice it, but only report it in production so nothing breaks.
  6 | 
  7 | ```ruby
  8 | before_action :set_csp
  9 | 
 10 | # use constants and freeze for performance
 11 | CSP_HEADER_NAME = (Rails.env.production? ? "Content-Security-Policy-Report-Only" : "Content-Security-Policy").freeze
 12 | CSP_HEADER_VALUE = "default-src *; report-uri /csp_reports?report_only=#{CSP_HEADER_NAME.include?("Report-Only")}".freeze
 13 | 
 14 | def set_csp
 15 |   response.headers[CSP_HEADER_NAME] = CSP_HEADER_VALUE
 16 | end
 17 | ```
 18 | 
 19 | ## Reports
 20 | 
 21 | Create a model to track reports.
 22 | 
 23 | ```sh
 24 | rails g model CspReport
 25 | ```
 26 | 
 27 | And in the migration, do:
 28 | 
 29 | ```ruby
 30 | class CreateCspReports < ActiveRecord::Migration
 31 |   def change
 32 |     create_table :csp_reports do |t|
 33 |       t.text :document_uri
 34 |       t.text :referrer
 35 |       t.text :violated_directive
 36 |       t.text :effective_directive
 37 |       t.text :original_policy
 38 |       t.text :blocked_uri
 39 |       t.integer :status_code
 40 |       t.text :user_agent
 41 |       t.boolean :report_only
 42 |       t.timestamp :created_at
 43 |     end
 44 |   end
 45 | end
 46 | ```
 47 | 
 48 | Add a controller to create the reports.
 49 | 
 50 | ```ruby
 51 | class CspReportsController < ApplicationController
 52 |   skip_before_action :verify_authenticity_token
 53 | 
 54 |   def create
 55 |     report = JSON.parse(request.body.read)["csp-report"]
 56 |     CspReport.create!(
 57 |       document_uri: report["document-uri"],
 58 |       referrer: report["referrer"],
 59 |       violated_directive: report["violated-directive"],
 60 |       effective_directive: report["effective-directive"],
 61 |       original_policy: report["original-policy"],
 62 |       blocked_uri: report["blocked-uri"],
 63 |       status_code: report["status-code"],
 64 |       user_agent: request.user_agent,
 65 |       report_only: params[:report_only] == "true"
 66 |     )
 67 |     head :ok
 68 |   end
 69 | end
 70 | ```
 71 | 
 72 | Don’t forget the route.
 73 | 
 74 | ```ruby
 75 | resources :csp_reports, only: [:create]
 76 | ```
 77 | 
 78 | ## Enforcing the Policy
 79 | 
 80 | Once the reports stop, you’ll want to enforce the policy in production.
 81 | 
 82 | ```ruby
 83 | CSP_HEADER_NAME = "Content-Security-Policy".freeze
 84 | ```
 85 | 
 86 | ## Testing New Policies
 87 | 
 88 | You can have both an enforced policy and a report only policy, so use this to your advantage when changing policies. Make the new policy report only for a bit before enforcing it.
 89 | 
 90 | ```ruby
 91 | before_action :set_csp_report_only
 92 | 
 93 | # use constants and freeze for performance
 94 | CSP_REPORT_ONLY_HEADER_NAME = "Content-Security-Policy-Report-Only".freeze
 95 | CSP_REPORT_ONLY_HEADER_VALUE = "default-src https:; report-uri /csp_reports?report_only=true".freeze
 96 | 
 97 | def set_csp_report_only
 98 |   response.headers[CSP_REPORT_ONLY_HEADER_NAME] = CSP_REPORT_ONLY_HEADER_VALUE
 99 | end
100 | ```
101 | 


--------------------------------------------------------------------------------
/archive/emotion-recognition-ruby.md:
--------------------------------------------------------------------------------
 1 | # Emotion Recognition in Ruby
 2 | 
 3 | Welcome to another installment of deep learning in Ruby. Today, we’ll look at [FER+](https://github.com/Microsoft/FERPlus), a deep convolutional neural network for emotion recognition developed at Microsoft. The project is open source, and there’s a pretrained model in the [ONNX Model Zoo](https://github.com/onnx/models/tree/master/vision/body_analysis/emotion_ferplus) that we can get running quickly in Ruby.
 4 | 
 5 | First, download [the model](https://onnxzoo.blob.core.windows.net/models/opset_8/emotion_ferplus/emotion_ferplus.tar.gz) and this photo of a park ranger.
 6 | 
 7 | <p style="text-align: center; margin-bottom: 0;"><img src="/images/ranger.jpg" alt="Park Ranger" /></p>
 8 | 
 9 | <p class="image-description">
10 |   Photo from <a href="https://www.flickr.com/photos/yellowstonenps/45800621201/" target="_blank">Yellowstone National Park</a>
11 | </p>
12 | 
13 | We’ll use MiniMagick to prepare the image and the ONNX Runtime gem to run the model.
14 | 
15 | ```ruby
16 | gem "mini_magick"
17 | gem "onnxruntime"
18 | ```
19 | 
20 | For the image, we need to zoom in on her face, resize it to 64x64, and convert it to grayscale. Typically, we’d use a face detection model to find the bounding box and use that information to crop the image, but for simplicity, we’ll do just do it manually.
21 | 
22 | ```ruby
23 | img = MiniMagick::Image.open("ranger.jpg")
24 | img.crop "100x100+60+20", "-gravity", "center" # manual crop
25 | img.resize "64x64^", "-gravity", "center", "-extent", "64x64"
26 | img.colorspace "Gray"
27 | img.write "resized.jpg"
28 | ```
29 | 
30 | Here’s a blown up version:
31 | 
32 | <p style="text-align: center;"><img src="/images/ranger-resized.jpg" alt="Park Ranger" /></p>
33 | 
34 | Finally, create a 64x64 matrix of the grayscale intensities.
35 | 
36 | ```ruby
37 | # all pixels are the same for grayscale, so just get one of them
38 | pixels = img.get_pixels.flat_map { |r| r.map(&:first) }
39 | input = OnnxRuntime::Utils.reshape(pixels, [1, 1, 64, 64])
40 | ```
41 | 
42 | Now that the input is prepared, we can load and run the model.
43 | 
44 | ```ruby
45 | model = OnnxRuntime::Model.new("model.onnx")
46 | output = model.predict("Input3" => input)
47 | ```
48 | 
49 | We use [softmax](https://victorzhou.com/blog/softmax/) to convert the model output into probabilities.
50 | 
51 | ```ruby
52 | def softmax(x)
53 |   exp = x.map { |v| Math.exp(v - x.max) }
54 |   exp.map { |v| v / exp.sum }
55 | end
56 | 
57 | probabilities = softmax(output["Plus692_Output_0"].first)
58 | ```
59 | 
60 | Then map the labels and sort by highest probability.
61 | 
62 | ```ruby
63 | emotion_labels = [
64 |   "neutral", "happiness", "surprise", "sadness",
65 |   "anger", "disgust", "fear", "contempt"
66 | ]
67 | pp emotion_labels.zip(probabilities).sort_by { |_, v| -v }.to_h
68 | ```
69 | 
70 | And the results are in:
71 | 
72 | ```
73 | {
74 |   "happiness" => 0.9999839207138284,
75 |   "surprise"  => 1.0569785479062501e-05,
76 |   "neutral"   => 4.826811128840592e-06,
77 |   "anger"     => 4.63037778140089e-07,
78 |   "sadness"   => 9.574742925740587e-08,
79 |   "contempt"  => 7.941520916580971e-08,
80 |   "fear"      => 2.8803367665891773e-08,
81 |   "disgust"   => 1.568577943664937e-08
82 | }
83 | ```
84 | 
85 | There’s a 99.9% probability she looks happy in the photo. Not bad!
86 | 
87 | Here’s the [complete code](https://gist.github.com/ankane/3bb4ddbf84edd7f05a24cd3697ccd9a7). Now go out and try it with your own images!
88 | 


--------------------------------------------------------------------------------
/archive/pghero-2-0.md:
--------------------------------------------------------------------------------
 1 | # PgHero 2.0 Has Arrived
 2 | 
 3 | It’s been over 2 years since PgHero 1.0 was released as a performance dashboard for Postgres. Since then, a number of new features have been added.
 4 | 
 5 | - checks for serious issues like transaction ID wraparound and integer overflow
 6 | - the ability to capture and view query stats over time
 7 | - suggested indexes to give you a better idea of how to optimize queries (check out [Dexter](https://ankane.org/introducing-dexter) for automatic indexing)
 8 | 
 9 | PgHero 2.0 provides even more insight into your database performance with two additional features: query details and space stats.
10 | 
11 | ## Query Details
12 | 
13 | PgHero makes it easy to see the most time-consuming queries during a given time period, but it’s hard to follow an individual query’s performance over time. When you run into issues, it’s not always easy to uncover what happened. Are the top queries during an incident consistently the most time-consuming, or are they new? Did the number of calls increase or was it the average time?
14 | 
15 | The new [Query Details page](https://pghero.dokkuapp.com/datakick/queries/588635171) helps solve this.
16 | 
17 | <p style="text-align: center;"><img src="/images/pghero-2-0-query-details.png" alt="PgHero Query Details Page" /></p>
18 | 
19 | This page allows you to deep dive into an individual query. View charts of total time, average time, and calls over the past 24 hours to see how they’ve moved.
20 | 
21 | For those who [annotate queries](https://ankane.org/the-origin-of-sql-queries), you’ve likely realized the comment in PgHero only tells you one of the places a query is coming from since similar queries are grouped together. Now, you can get a better idea of all the places it’s called.
22 | 
23 | <p style="font-size: 1.25rem; color: #666; margin-top: 1.5rem; margin-bottom: 1.5rem;">If you don’t annotate queries, you should!!</p>
24 | 
25 | This page also lists tables in the query and their indexes so you can quickly see if an index is missing, and an “Explain” button is usually available to help you debug (but may be missing if PgHero hasn’t captured an unnormalized version of the query recently).
26 | 
27 | ## Space Stats
28 | 
29 | PgHero 2.0 also helps you manage storage space. You can track the growth of tables and indexes over time and view this data on the [Space page](https://pghero.dokkuapp.com/datakick/space). To see the fastest growing relations, click on the “7d Growth” header.
30 | 
31 | <p style="text-align: center;"><img src="/images/pghero-2-0-space-stats.png" alt="PgHero Space Stats Page" /></p>
32 | 
33 | In addition, this page now reports unused indexes to help reclaim space. If you use read replicas, be sure to check that indexes aren’t used on any of them before dropping.
34 | 
35 | You can also view the growth for an individual table or index over the past 30 days.
36 | 
37 | <p style="text-align: center;"><img src="/images/pghero-2-0-space-growth.png" alt="PgHero Space Growth Page" /></p>
38 | 
39 | Lastly, there’s syntax highlighting for all SQL for improved readability.
40 | 
41 | <p style="text-align: center; margin-bottom: 0;"><img src="/images/pghero-2-0-syntax-highlighting.png" alt="PgHero Syntax Highlighting" /></p>
42 | 
43 | <p class="image-description">Much better :)</p>
44 | 
45 | So what are you waiting for? Get the [latest version](https://github.com/ankane/pghero) of PgHero today.
46 | 
47 | <div style="margin-top: 2rem;"></div>
48 | 
49 | Note: If you use PgHero outside the dashboard, there are some [breaking changes](https://github.com/ankane/pghero/blob/master/guides/Rails.md#200) from 1.x to be aware of.
50 | 


--------------------------------------------------------------------------------
/archive/hybrid-cryptography-rails.md:
--------------------------------------------------------------------------------
 1 | # Hybrid Cryptography on Rails
 2 | 
 3 | <p style="text-align: center;"><img src="/images/key-defense.jpg" alt="Keys" /></p>
 4 | 
 5 | [Hybrid cryptography](https://en.wikipedia.org/wiki/Hybrid_cryptosystem) allows certain servers to encrypt data without the ability to decrypt it. This can greatly limit damage in the event of a breach.
 6 | 
 7 | Suppose we have a service that sends text messages to customers. Customers enter their phone number through the website or mobile app.
 8 | 
 9 | With hybrid cryptography, we can set up web servers to only encrypt phone numbers. Text messages can be sent through background jobs which run on a different set of servers - ones that can decrypt and don’t allow inbound traffic. If internal employees need to view phone numbers, they can use a separate set of web servers that are only accessible through the company VPN.
10 | 
11 | &nbsp; | Encrypt | Decrypt | &nbsp;
12 | --- | --- | --- | ---
13 | Customer web servers | ✓ |
14 | Background workers | ✓ | ✓ | No inbound traffic
15 | Internal web servers | ✓ | ✓ | Requires VPN
16 | 
17 | ## Setup
18 | 
19 | Install [Libsodium](https://github.com/crypto-rb/rbnacl/wiki/Installing-libsodium) and add [Lockbox](https://github.com/ankane/lockbox) and [RbNaCl](https://github.com/crypto-rb/rbnacl) to your Gemfile:
20 | 
21 | ```ruby
22 | gem 'lockbox'
23 | gem 'rbnacl'
24 | ```
25 | 
26 | Generate keys in the Rails console with:
27 | 
28 | ```ruby
29 | Lockbox.generate_key_pair
30 | ```
31 | 
32 | Store the keys with your other secrets. This is typically Rails credentials or an environment variable ([dotenv](https://github.com/bkeepers/dotenv) is great for this). Be sure to use different keys in development and production.
33 | 
34 | ```sh
35 | PHONE_ENCRYPTION_KEY=...
36 | PHONE_DECRYPTION_KEY=...
37 | ```
38 | 
39 | Only set the decryption key on servers that should be able to decrypt.
40 | 
41 | ## Database Fields
42 | 
43 | We’ll store phone numbers in an encrypted database field. Create a migration to add a new column for the encrypted data.
44 | 
45 | ```ruby
46 | class AddEncryptedPhoneToUsers < ActiveRecord::Migration[5.2]
47 |   def change
48 |     add_column :users, :phone_ciphertext, :string
49 |   end
50 | end
51 | ```
52 | 
53 | In the model, add:
54 | 
55 | ```ruby
56 | class User < ApplicationRecord
57 |   encrypts :phone, algorithm: "hybrid", encryption_key: ENV["PHONE_ENCRYPTION_KEY"], decryption_key: ENV["PHONE_DECRYPTION_KEY"]
58 | end
59 | ```
60 | 
61 | Set a user’s phone number to ensure it works.
62 | 
63 | ## Files
64 | 
65 | Suppose we also need to accept sensitive documents. We can take a similar approach with file uploads.
66 | 
67 | For Active Storage, use:
68 | 
69 | ```ruby
70 | class User < ApplicationRecord
71 |   encrypts_attached :document, algorithm: "hybrid", encryption_key: ENV["PHONE_ENCRYPTION_KEY"], decryption_key: ENV["PHONE_DECRYPTION_KEY"]
72 | end
73 | ```
74 | 
75 | For CarrierWave, use:
76 | 
77 | ```ruby
78 | class DocumentUploader < CarrierWave::Uploader::Base
79 |   encrypt algorithm: "hybrid", encryption_key: ENV["PHONE_ENCRYPTION_KEY"], decryption_key: ENV["PHONE_DECRYPTION_KEY"]
80 | end
81 | ```
82 | 
83 | You can also encrypt an IO stream directly.
84 | 
85 | ```ruby
86 | box = Lockbox.new(algorithm: "hybrid", encryption_key: ENV["PHONE_ENCRYPTION_KEY"], decryption_key: ENV["PHONE_DECRYPTION_KEY"])
87 | box.encrypt(params[:file])
88 | ```
89 | 
90 | ## Conclusion
91 | 
92 | You’ve now seen an approach for keeping your data safe in the event a server is compromised. For more on data protection, check out [Securing Sensitive Data in Rails](https://ankane.org/sensitive-data-rails).
93 | 


--------------------------------------------------------------------------------
/archive/postgres-sslmode-explained.md:
--------------------------------------------------------------------------------
 1 | # Postgres SSLMODE Explained
 2 | 
 3 | When you connect to a database, Postgres uses the `sslmode` parameter to determine the security of the connection. There are many options, so here’s an analogy to web security:
 4 | 
 5 | - `disable` is HTTP
 6 | - `verify-full` is HTTPS
 7 | 
 8 | All the other options fall somewhere in between, and by design, make less guarantees of security than HTTPS in your browser does.
 9 | 
10 | <p style="text-align: center;"><img src="/images/sslmode-screenshot.png" alt="Screenshot" /></p>
11 | 
12 | This includes the default `prefer`. The [Postgres docs](https://www.postgresql.org/docs/current/libpq-ssl.html) have a great table explaining this:
13 | 
14 | <p style="text-align: center;"><img src="/images/sslmode-table.png" alt="Table" /></p>
15 | 
16 | Other modes like `require` are still useful in protecting against passive attacks (sniffing), but are vulnerable to active attacks that can compromise your credentials. Tarjei Husøy created [postgres-mitm](https://thusoy.com/2016/mitming-postgres) to demonstrate this.
17 | 
18 | ## Defense
19 | 
20 | The best way to protect a database is to limit inbound traffic. Require a VPN or SSH tunneling through a [bastion host](https://medium.com/@bill_73959/understanding-bastions-hosts-6ccd457e41ac) to connect. This ensures connections are always secure, and even if database credentials are compromised, an attacker won’t be able to access the database.
21 | 
22 | If this is not feasible, always use `verify-full`. This includes from code, psql, SQL clients, and other tools like [pgsync](https://github.com/ankane/pgsync) and [pgslice](https://github.com/ankane/pgslice).
23 | 
24 | You can specify `sslmode` in the connection URI:
25 | 
26 | ```text
27 | postgresql://user:pass@host/dbname?sslmode=verify-full&sslrootcert=ca.pem
28 | ```
29 | 
30 | Or use environment variables.
31 | 
32 | ```sh
33 | PGSSLMODE=verify-full PGSSLROOTCERT=ca.pem
34 | ```
35 | 
36 | Libraries for most programming languages have options as well.
37 | 
38 | ```ruby
39 | PG.connect(sslmode: "verify-full", sslrootcert: "ca.pem")
40 | ```
41 | 
42 | ## Certificates
43 | 
44 | To verify an SSL/TLS certificate, the client checks it against a root certificate. Your browser ships with root certificates to verify HTTPS websites. Postgres doesn’t come with any root certificates, so to use `verify-full`, you must specify one.
45 | 
46 | Here are root certificates for a number of providers:
47 | 
48 | Provider | Certificate | Docs
49 | --- | --- | ---
50 | Amazon RDS | [Download](https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem) | [View](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html#PostgreSQL.Concepts.General.SSL)
51 | Google Cloud SQL | In Account | [View](https://cloud.google.com/sql/docs/postgres/connect-admin-ip)
52 | Digital Ocean | In Account | [View](https://www.digitalocean.com/docs/databases/how-to/clusters/secure-clusters/)
53 | Citus Data | [Download](https://console.citusdata.com/citus.crt) | [View](https://docs.citusdata.com/en/v8.0/cloud/security.html)
54 | 
55 | There’s no way to use `verify-full` with Heroku Postgres, so use caution when connecting from networks you don't fully trust. Instead of `heroku pg:psql`, use:
56 | 
57 | ```sh
58 | heroku run psql \$DATABASE_URL
59 | ```
60 | 
61 | This securely connects to a dyno before connecting to the database.
62 | 
63 | If you use PgBouncer, [set up secure connections](https://ankane.org/securing-pgbouncer-amazon-rds) for it as well.
64 | 
65 | ## Conclusion
66 | 
67 | Hopefully this helps you understand connection security a bit better.
68 | 
69 | <div style="margin-top: 2rem;"></div>
70 | 
71 | Updates
72 | 
73 | - August 2019: Added Digital Ocean
74 | 


--------------------------------------------------------------------------------
/archive/ruby-ml-for-python-coders.md:
--------------------------------------------------------------------------------
 1 | # Ruby ML for Python Coders
 2 | 
 3 | <p style="text-align: center; margin-bottom: 0;">
 4 |   <img src="/images/python-ruby.png" alt="Python and Ruby" style="max-height: 200px;" />
 5 | </p>
 6 | 
 7 | Curious to try machine learning in Ruby? Here’s a short cheatsheet for Python coders.
 8 | 
 9 | Data structure basics
10 | 
11 | - [Numo: NumPy for Ruby](/numo)
12 | - [Daru: Pandas for Ruby](/daru)
13 | 
14 | Libraries
15 | 
16 | Category | Python | Ruby
17 | --- | --- | ---
18 | Multi-dimensional arrays | [NumPy](https://github.com/numpy/numpy) | [Numo](https://github.com/ruby-numo/numo-narray)
19 | Data frames | [Pandas](https://github.com/pandas-dev/pandas) | [Daru](https://github.com/SciRuby/daru)
20 | Visualization | [Matplotlib](https://github.com/matplotlib/matplotlib) | [Nyaplot](https://github.com/domitry/nyaplot)
21 | Predictive modeling | [Scikit-learn](https://github.com/scikit-learn/scikit-learn) | [Rumale](https://github.com/yoshoku/rumale)
22 | Gradient boosting | [XGBoost](https://github.com/dmlc/xgboost), [LightGBM](https://github.com/Microsoft/LightGBM) | [XGBoost](https://github.com/ankane/xgb), [LightGBM](https://github.com/ankane/lightgbm)
23 | Deep learning | [PyTorch](https://github.com/pytorch/pytorch), [TensorFlow](https://github.com/tensorflow/tensorflow) | [Torch-rb](https://github.com/ankane/torch-rb), [TensorFlow](https://github.com/ankane/tensorflow) (TensorFlow :construction:)
24 | Recommendations | [Surprise](https://github.com/NicolasHug/Surprise), [Implicit](https://github.com/benfred/implicit/) | [Disco](https://github.com/ankane/disco)
25 | Approximate nearest neighbors | [NGT](https://github.com/yahoojapan/NGT), [Annoy](https://github.com/spotify/annoy) | [NGT](https://github.com/ankane/ngt), [Hanny](https://github.com/yoshoku/hanny)
26 | Factorization machines | [xLearn](https://github.com/aksnzhy/xlearn) | [xLearn](https://github.com/ankane/xlearn)
27 | Natural language processing | [spaCy](https://github.com/explosion/spaCy), [NTLK](https://github.com/nltk/nltk) | [Many gems](https://github.com/arbox/nlp-with-ruby) (nothing comprehensive :cry:)
28 | Text classification | [fastText](https://github.com/facebookresearch/fastText) | [fastText](https://github.com/ankane/fasttext)
29 | Forecasting | [Prophet](https://github.com/facebook/prophet) | :cry:
30 | Optimization | [CVXPY](https://github.com/cvxgrp/cvxpy), [PuLP](https://github.com/coin-or/pulp), [SCS](https://github.com/cvxgrp/scs), [OSQP](https://github.com/oxfordcontrol/osqp) | [CBC](https://github.com/gverger/ruby-cbc), [SCS](https://github.com/ankane/scs), [OSQP](https://github.com/ankane/osqp)
31 | Reinforcement learning | [Vowpal Wabbit](https://github.com/VowpalWabbit/vowpal_wabbit) | [Vowpal Wabbit](https://github.com/ankane/vowpalwabbit)
32 | Scoring engine | [ONNX Runtime](https://github.com/Microsoft/onnxruntime) | [ONNX Runtime](https://github.com/ankane/onnxruntime), [Menoh](https://github.com/pfnet-research/menoh-ruby)
33 | 
34 | This list is by no means comprehensive. Some Ruby libraries are ones I created, as mentioned [here](/new-ml-gems).
35 | 
36 | If you’re planning to add Ruby support to your ML library:
37 | 
38 | Category | Python | Ruby
39 | --- | --- | ---
40 | FFI (native) | [ctypes](https://docs.python.org/3/library/ctypes.html) | [Fiddle](https://ruby-doc.org/stdlib-2.7.0/libdoc/fiddle/rdoc/Fiddle.html)
41 | FFI (library) | [cffi](https://cffi.readthedocs.io/en/latest/) | [FFI](https://github.com/ffi/ffi)
42 | C++ extensions | [pybind11](https://github.com/pybind/pybind11) | [Rice](https://github.com/jasonroelofs/rice)
43 | Compile to C | [Cython](https://github.com/cython/cython) | [Rubex](https://github.com/SciRuby/rubex)
44 | 
45 | Give Ruby a shot for your next maching learning project!
46 | 


--------------------------------------------------------------------------------
/archive/scaling-reads.md:
--------------------------------------------------------------------------------
  1 | # Scaling Reads
  2 | 
  3 | **Note:** This approach is now packaged into [a gem](https://github.com/ankane/distribute_reads) :gem:
  4 | 
  5 | ---
  6 | 
  7 | One of the easier ways to scale your database is to distribute reads to replicas.
  8 | 
  9 | ## Desire
 10 | 
 11 | Here’s the desired behavior:
 12 | 
 13 | ```ruby
 14 | User.find(1)                  # primary
 15 | 
 16 | distribute_reads do
 17 |   # use replica for reads
 18 |   User.maximum(:visits_count) # replica
 19 |   User.find(2)                # replica
 20 | 
 21 |   # until a write
 22 |   # then switch to primary
 23 |   User.create!                # primary
 24 |   User.last                   # primary
 25 | end
 26 | ```
 27 | 
 28 | ## Contenders
 29 | 
 30 | We looked at a number of libraries, including [Octopus](https://github.com/tchandy/octopus), [Octoshark](https://github.com/dalibor/octoshark), and [Replica Pools](https://github.com/kickstarter/replica_pools).
 31 | 
 32 | The winner was [Makara](https://github.com/taskrabbit/makara) - it handles failover well and has a simple configuration.
 33 | 
 34 | ## Getting Started
 35 | 
 36 | First, install Makara.
 37 | 
 38 | ```ruby
 39 | gem 'makara'
 40 | ```
 41 | 
 42 | There are 3 important `ENV` variables in our setup.
 43 | 
 44 | - `DATABASE_URL` - primary database
 45 | - `REPLICA_DATABASE_URL` - replica database (can use the primary database in development)
 46 | - `MAKARA` - feature flag for a smooth rollout
 47 | 
 48 | Here are sample values:
 49 | 
 50 | ```sh
 51 | DATABASE_URL=postgres://nerd:secret@localhost:5432/db_development
 52 | REPLICA_DATABASE_URL=postgres://nerd:secret@localhost:5432/db_development
 53 | MAKARA=true
 54 | ```
 55 | 
 56 | Next, update `config/database.yml`.
 57 | 
 58 | ```yml
 59 | development: &default
 60 |   <% if ENV["MAKARA"] %>
 61 |   url: postgresql-makara:///
 62 |   makara:
 63 |     sticky: true
 64 |     connections:
 65 |       - role: master
 66 |         name: primary
 67 |         url: <%= ENV["DATABASE_URL"] %>
 68 |       - name: replica
 69 |         url: <%= ENV["REPLICA_DATABASE_URL"] %>
 70 |   <% else %>
 71 |   adapter: postgresql
 72 |   url: <%= ENV["DATABASE_URL"] %>
 73 |   <% end %>
 74 | 
 75 | production:
 76 |   <<: *default
 77 | ```
 78 | 
 79 | We don’t use the middleware, so we remove it by adding to `config/application.rb`:
 80 | 
 81 | ```ruby
 82 | config.middleware.delete Makara::Middleware
 83 | ```
 84 | 
 85 | Also, we want to read from primary by default so have to patch Makara. Create an initializer `config/initializers/makara.rb` with:
 86 | 
 87 | ```ruby
 88 | Makara::Cache.store = :noop
 89 | 
 90 | module DefaultToPrimary
 91 |   def _appropriate_pool(*args)
 92 |     return @master_pool unless Thread.current[:distribute_reads]
 93 |     super
 94 |   end
 95 | end
 96 | 
 97 | Makara::Proxy.send :prepend, DefaultToPrimary
 98 | 
 99 | module DistributeReads
100 |   def distribute_reads
101 |     previous_value = Thread.current[:distribute_reads]
102 |     begin
103 |       Thread.current[:distribute_reads] = true
104 |       Makara::Context.set_current(Makara::Context.generate)
105 |       yield
106 |     ensure
107 |       Thread.current[:distribute_reads] = previous_value
108 |     end
109 |   end
110 | end
111 | 
112 | Object.send :include, DistributeReads
113 | ```
114 | 
115 | To distribute reads, use:
116 | 
117 | ```ruby
118 | total_users = distribute_reads { User.count }
119 | ```
120 | 
121 | You can also put multiple lines in a block.
122 | 
123 | ```ruby
124 | distribute_reads do
125 |   User.max(:visits_count)
126 |   Order.sum(:revenue_cents)
127 |   Visit.average(:duration)
128 | end
129 | ```
130 | 
131 | ## Test Drive
132 | 
133 | In the Rails console, run:
134 | 
135 | ```ruby
136 | User.first                       # primary
137 | distribute_reads { User.last }   # replica
138 | ```
139 | 
140 | :heart: Happy scaling
141 | 


--------------------------------------------------------------------------------
/archive/encryption-keys.md:
--------------------------------------------------------------------------------
 1 | # Strong Encryption Keys for Rails
 2 | 
 3 | <p style="text-align: center;"><img src="/images/encryption-keys3.png" alt="Encryption Keys" /></p>
 4 | 
 5 | Encryption is a common way to protect sensitive data. Generating a secure key is an important part of the process.
 6 | 
 7 | [attr_encrypted](https://github.com/attr-encrypted/attr_encrypted), the popular encryption library for Rails, uses AES-256-GCM by default, which takes a 256-bit key. So how can we generate a secure one?
 8 | 
 9 | *If you’re in a hurry, feel free to skip to [the answer](#a-better-way).*
10 | 
11 | ## Take 1
12 | 
13 | One way to generate a key is:
14 | 
15 | ```ruby
16 | SecureRandom.base64(32).first(32)
17 | ```
18 | 
19 | This generates a 32 character string that looks pretty secure. Each character has 64 possible values (letters, numbers, / and +). However, a single byte can represent 256 possible values. We’ve eliminated 75% of possible values per byte, which compounds across all 32 bytes. Here’s the math:
20 | 
21 | Method | Possible Keys | Equivalent
22 | --- | --- | ---
23 | Random | 256<sup>32</sup> | 2<sup>256</sup>
24 | Take 1 | 64<sup>32</sup> | 2<sup>192</sup>
25 | 
26 | This reduces the number of possible keys by 99.999999999999999994%. Luckily, computers have not (yet) been able to brute force 128-bit keys, which have 2<sup>128</sup> possible values.
27 | 
28 | ## Why 256?
29 | 
30 | So why do we use 256-bit keys to begin with? Security researcher Graham Sutherland [puts it well](https://security.stackexchange.com/questions/14068/why-most-people-use-256-bit-encryption-instead-of-128-bit):
31 | 
32 | “Essentially it’s about security margin. The longer the key, the higher the effective security. If there is ever a break in AES that reduces the effective number of operations required to crack it, a bigger key gives you a better chance of staying secure.”
33 | 
34 | Also, quantum computers are expected to brute force in [square root time](https://blog.agilebits.com/2013/03/09/guess-why-were-moving-to-256-bit-aes-keys/). This means a 256-bit key could be brute forced in the same time as traditional computers can brute force a 128-bit key.
35 | 
36 | ## A Better Way
37 | 
38 | The right way to generate a random 32-byte key is:
39 | 
40 | ```ruby
41 | SecureRandom.random_bytes(32)
42 | ```
43 | 
44 | However, we can’t store this directly in Rails credentials or as an environment variable. We need to encode it first. Hex is a popular encoding. Rails uses this for its master key in Rails 5.2.
45 | 
46 | ```ruby
47 | SecureRandom.random_bytes(32).unpack("H*").first
48 | ```
49 | 
50 | Ruby provides a helper to do this:
51 | 
52 | ```ruby
53 | SecureRandom.hex(32)
54 | ```
55 | 
56 | To decode the key, use:
57 | 
58 | ```ruby
59 | [hex_key].pack("H*")
60 | ```
61 | 
62 | We now have a much stronger key. If you store the key as an environment variable, your model should look something like:
63 | 
64 | ```ruby
65 | class User < ApplicationRecord
66 |   attr_encrypted :email, key: [ENV["EMAIL_ENCRYPTION_KEY"]].pack("H*")
67 | end
68 | ```
69 | 
70 | ## Libraries
71 | 
72 | Libraries should educate users on how to generate sufficiently random keys. The [rbnacl](https://github.com/crypto-rb/rbnacl) gem has a neat way of enforcing this - it checks if a string is binary before allowing it as a key.
73 | 
74 | ```ruby
75 | if key.encoding != Encoding::BINARY
76 |   raise ArgumentError, "Insecure key - key must use binary encoding"
77 | end
78 | ```
79 | 
80 | This prevents our initial (flawed) method from working. I’ve incorporated this approach into the [blind_index](https://github.com/ankane/blind_index) gem and [opened an issue](https://github.com/attr-encrypted/attr_encrypted/issues/311) with attr_encrypted to get the author’s thoughts.
81 | 
82 | ## Conclusion
83 | 
84 | While secure key generation provides better protection against brute force attacks, it won’t help at all if the key is compromised. Limit who has access to encryption keys as well. For more security, consider a [key management service](https://github.com/ankane/kms_encrypted) to manage your keys.
85 | 
86 | Happy encrypting!
87 | 


--------------------------------------------------------------------------------
/archive/tensorflow-ruby.md:
--------------------------------------------------------------------------------
  1 | # TensorFlow Object Detection in Ruby
  2 | 
  3 | The [ONNX Runtime](https://github.com/ankane/onnxruntime) gem makes it easy to run Tensorflow models in Ruby. This short tutorial will show you how. It’s based on [this tutorial](https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/ConvertingSSDMobilenetToONNX.ipynb) from tf2onnx.
  4 | 
  5 | We’ll use SSD Mobilenet, which can detect multiple objects in an image.
  6 | 
  7 | First, download the [pretrained model](https://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz) from the official TensorFlow Models project and this awesome shot of polar bears.
  8 | 
  9 | <p style="text-align: center; margin-bottom: 0;"><img src="/images/bears.jpg" alt="Bears" /></p>
 10 | 
 11 | <p class="image-description">
 12 |   Photo from the <a href="https://www.flickr.com/photos/usfws_alaska/5165301186/" target="_blank">U.S. Fish and Wildlife Service</a>
 13 | </p>
 14 | 
 15 | Install [tf2onnx](https://github.com/onnx/tensorflow-onnx)
 16 | 
 17 | ```sh
 18 | pip install tf2onnx
 19 | ```
 20 | 
 21 | And convert the model to ONNX
 22 | 
 23 | ```sh
 24 | python -m tf2onnx.convert --opset 10 --fold_const \
 25 |   --saved-model ssd_mobilenet_v1_coco_2018_01_28/saved_model \
 26 |   --output model.onnx
 27 | ```
 28 | 
 29 | Next, install the ONNX Runtime and MiniMagick gems
 30 | 
 31 | ```ruby
 32 | gem "onnxruntime"
 33 | gem "mini_magick"
 34 | ```
 35 | 
 36 | Load the image
 37 | 
 38 | ```ruby
 39 | img = MiniMagick::Image.open("bears.jpg")
 40 | pixels = img.get_pixels
 41 | ```
 42 | 
 43 | And the model
 44 | 
 45 | ```ruby
 46 | model = OnnxRuntime::Model.new("model.onnx")
 47 | ```
 48 | 
 49 | Check the model inputs
 50 | 
 51 | ```ruby
 52 | p model.inputs
 53 | ```
 54 | 
 55 | The shape is `[-1, -1, -1, 3]`. `-1` indicates any size. `pixels` has the shape `[img.width, img.height, 3]`. The model is designed to process multiple images at once, which is where the final dimension comes from.
 56 | 
 57 | Let’s run the model:
 58 | 
 59 | ```ruby
 60 | result = model.predict("image_tensor:0" => [pixels])
 61 | ```
 62 | 
 63 | The model gives us a number of different outputs, like the number of detections, labels, scores, and boxes. Let’s print the results:
 64 | 
 65 | ```ruby
 66 | p result["num_detections:0"]
 67 | # [3.0]
 68 | p result["detection_classes:0"]
 69 | # [[23.0, 23.0, 88.0, 1.0, ...]]
 70 | ```
 71 | 
 72 | We can see there were three detections, and if we look at the first three elements in the detection classes array, they are the numbers 23, 23, and 88. These correspond to [COCO](http://cocodataset.org/) labels. We can [look these up](https://github.com/amikelive/coco-labels/blob/master/coco-labels-paper.txt) and see that 23 is bear and 88 is teddy bear. Mostly right!
 73 | 
 74 | With a bit more code, we can apply boxes and labels to the image.
 75 | 
 76 | ```ruby
 77 | coco_labels = {
 78 |   23 => "bear",
 79 |   88 => "teddy bear"
 80 | }
 81 | 
 82 | def draw_box(img, label, box)
 83 |   width, height = img.dimensions
 84 | 
 85 |   # calculate box
 86 |   thickness = 2
 87 |   top = (box[0] * height).round - thickness
 88 |   left = (box[1] * width).round - thickness
 89 |   bottom = (box[2] * height).round + thickness
 90 |   right = (box[3] * width).round + thickness
 91 | 
 92 |   # draw box
 93 |   img.combine_options do |c|
 94 |     c.draw "rectangle #{left},#{top} #{right},#{bottom}"
 95 |     c.fill "none"
 96 |     c.stroke "red"
 97 |     c.strokewidth thickness
 98 |   end
 99 | 
100 |   # draw text
101 |   img.combine_options do |c|
102 |     c.draw "text #{left},#{top - 5} \"#{label}\""
103 |     c.fill "red"
104 |     c.pointsize 18
105 |   end
106 | end
107 | 
108 | result["num_detections:0"].each_with_index do |n, idx|
109 |   n.to_i.times do |i|
110 |     label = result["detection_classes:0"][idx][i].to_i
111 |     label = coco_labels[label] || label
112 |     box = result["detection_boxes:0"][idx][i]
113 |     draw_box(img, label, box)
114 |   end
115 | end
116 | 
117 | # save image
118 | img.write("labeled.jpg")
119 | ```
120 | 
121 | And the result:
122 | 
123 | <p style="text-align: center;"><img src="/images/bears-labeled.jpg" alt="Bears Labeled" /></p>
124 | 
125 | Here’s the [complete code](https://gist.github.com/ankane/4a9681c8d9b9e814debe9e3ea836529d). Now go out and try it with your own images!
126 | 


--------------------------------------------------------------------------------
/archive/numo.md:
--------------------------------------------------------------------------------
  1 | # Numo: NumPy for Ruby
  2 | 
  3 | <p style="text-align: center; margin-bottom: 0;"><img src="/images/numo.jpg" alt="Numo" /></p>
  4 | 
  5 | <p class="image-description">
  6 |   Photo by <a href="https://unsplash.com/photos/e28-krnIVmo" target="_blank">Jonas Svidras</a>
  7 | </p>
  8 | 
  9 | NumPy is an extremely popular library for machine learning in Python. It provides an efficient way to work with large, multi-dimensional arrays. What you may not know is Ruby has a library with similar functionality. It’s called Numo, and in this post, we’ll look at what you can do with it.
 10 | 
 11 | ## Basic Operations
 12 | 
 13 | Numo’s core data structure is the multi-dimensional array, which has methods for mathematical operations. These operations are written in C, so they’re much faster than performing the same operations in Ruby.
 14 | 
 15 | Let’s start by creating a Numo array from a Ruby array.
 16 | 
 17 | ```ruby
 18 | x = Numo::DFloat.cast([[1, 2, 3], [4, 5, 6]])
 19 | ```
 20 | 
 21 | Each array has shape. We created a 2x3 2D array, but arrays can be 1D, 3D, or more.
 22 | 
 23 | ```ruby
 24 | x.shape # [2, 3]
 25 | ```
 26 | 
 27 | Read a row or column with:
 28 | 
 29 | ```ruby
 30 | x[0, true] # 1st row - [1, 2, 3]
 31 | x[true, 2] # 3rd column - [3, 6]
 32 | ```
 33 | 
 34 | We can add a constant value:
 35 | 
 36 | ```ruby
 37 | x + 2 # [[3, 4, 5], [6, 7, 8]]
 38 | ```
 39 | 
 40 | Or add arrays:
 41 | 
 42 | ```ruby
 43 | x + x # [[2, 4, 6], [8, 10, 12]]
 44 | ```
 45 | 
 46 | Some operations like mean and sum can be run over a specific axis.
 47 | 
 48 | ```ruby
 49 | x.sum(0)  # sum of each column - [5, 7, 9]
 50 | x.mean(1) # mean of each row - [2, 5]
 51 | ```
 52 | 
 53 | We can also change its shape - useful for preparing data for models.
 54 | 
 55 | ```ruby
 56 | x.reshape(3, 2) # [[1, 2], [3, 4], [5, 6]]
 57 | ```
 58 | 
 59 | If you’re familiar with NumPy operations, there are [side-by-side examples](https://github.com/ruby-numo/numo-narray/wiki/100-narray-exercises) and a table showing how the [functions map](https://github.com/ruby-numo/numo-narray/wiki/Numo-vs-numpy).
 60 | 
 61 | ## Building Models
 62 | 
 63 | [Rumale](https://github.com/yoshoku/rumale) is a machine learning library similar to Python’s Scikit-learn. It uses Numo for inputs and outputs. Here’s a basic example of linear regression.
 64 | 
 65 | ```ruby
 66 | # generate data: y = 1 + 2(x0) + 3(x1)
 67 | x = Numo::DFloat.asarray([[0, 1], [1, 0], [1, 2]])
 68 | y = 1 + 2 * x[true, 0] + 3 * x[true, 1]
 69 | 
 70 | # train
 71 | model = Rumale::LinearModel::LinearRegression.new(
 72 |           fit_bias: true, max_iter: 10000)
 73 | model.fit(x, y)
 74 | 
 75 | # predict
 76 | model.predict(x)
 77 | ```
 78 | 
 79 | Rumale has many, many models and other useful tools for:
 80 | 
 81 | - Regression: linear, ridge, lasso, support vector machines
 82 | - Classification: logistic regression, naive Bayes, K-nearest neighbors, support vector machines
 83 | - Clustering: K-means, Gaussian mixture model
 84 | - Dimensionality reduction: principal component analysis
 85 | 
 86 | Scikit-learn has a great cheat-sheet to help you decide what do use:
 87 | 
 88 | <p style="text-align: center; margin-bottom: 0;"><a href="/images/scikit-learn-cheat-sheet.png" target="_blank"><img src="/images/scikit-learn-cheat-sheet.png" alt="Scikit-learn Cheat Sheet" /></a>
 89 | </p>
 90 | 
 91 | <p class="image-description">
 92 |   Image from <a href="https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html" target="_blank">Scikit-learn</a> (BSD License)
 93 | </p>
 94 | 
 95 | ## Storing Data
 96 | 
 97 | Numo arrays can be marshaled just like other Ruby objects. This allows you to save your work and resume it at a later time.
 98 | 
 99 | ```ruby
100 | # save
101 | File.binwrite("x.dump", Marshal.dump(x))
102 | 
103 | # load
104 | x = Marshal.load(File.binread("x.dump"))
105 | ```
106 | 
107 | [Npy](https://github.com/ankane/npy) allows you to save and load arrays in the same format as NumPy. This is more performant than marshaling.
108 | 
109 | ```ruby
110 | # save
111 | Npy.save("x.npy", x)
112 | 
113 | # load
114 | x = Npy.load("x.npy")
115 | ```
116 | 
117 | It also makes it easy to load datasets like [MNIST](https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz).
118 | 
119 | ```ruby
120 | mnist = Npy.load_npz("mnist.npz")
121 | ```
122 | 
123 | ## Summary
124 | 
125 | You now have a basic introduction to Numo and know how to:
126 | 
127 | - perform basic operations
128 | - build a model
129 | - store data
130 | 
131 | Consider [Numo](https://github.com/ruby-numo/numo-narray) for your next machine learning project.
132 | 


--------------------------------------------------------------------------------
/archive/securing-pgbouncer-amazon-rds.md:
--------------------------------------------------------------------------------
  1 | # Securing Database Traffic with PgBouncer and Amazon RDS
  2 | 
  3 | Securing database traffic inside your network can be a great step for defense in depth. It’s also a necessity for [Zero Trust Networks](https://www.amazon.com/Zero-Trust-Networks-Building-Untrusted/dp/1491962194).
  4 | 
  5 | Both Amazon RDS and PgBouncer have built-in support for TLS, but it’s a little bit of work to get it set up. This tutorial will show you how.
  6 | 
  7 | ## Direct Connections
  8 | 
  9 | The first step is to make sure all direct connections are secure. Luckily, Amazon RDS has a parameter named `rds.force_ssl` for this. Once it’s applied, you’ll see an error if you try to connect without TLS. You can test this out with:
 10 | 
 11 | ```sh
 12 | psql "postgresql://user:secret@dbhost:5432/ssltest?sslmode=disable
 13 | ```
 14 | 
 15 | You’ll see an error like `FATAL:  no pg_hba.conf entry ... SSL off` if everything is configured correctly.
 16 | 
 17 | There are a number of possible values for `sslmode`, which you can [read about here](https://www.postgresql.org/docs/current/static/libpq-ssl.html). The most secure (and one we want) is `verify-full`, as it provides protection against both eavesdropping and man-in-the-middle attacks. This mode requires you to provide a root certificate to verify against. AWS makes this certificate available on [their website](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html#PostgreSQL.Concepts.General.SSL).
 18 | 
 19 | ```sh
 20 | wget https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem
 21 | ```
 22 | 
 23 | To use it with `psql`, run:
 24 | 
 25 | ```sh
 26 | psql "postgresql://user:secret@dbhost:5432/ssltest?sslmode=verify-full&sslrootcert=rds-combined-ca-bundle.pem"
 27 | ```
 28 | 
 29 | Once connected, you should see an `SSL connection` line before the first prompt.
 30 | 
 31 | There’s also an extension you can use (useful for non-`psql` connections).
 32 | 
 33 | ```sql
 34 | CREATE EXTENSION IF NOT EXISTS sslinfo;
 35 | SELECT ssl_is_used();
 36 | ```
 37 | 
 38 | Now direct connections are good, so let’s secure connections from PgBouncer to the database.
 39 | 
 40 | ## PgBouncer to the Database
 41 | 
 42 | Follow [this guide](pgbouncer-setup) to set up PgBouncer. Once that’s completed, there are two settings to add to `/etc/pgbouncer/pgbouncer.ini`:
 43 | 
 44 | ```ini
 45 | server_tls_sslmode = verify-full
 46 | server_tls_ca_file = /path/to/rds-combined-ca-bundle.pem
 47 | ```
 48 | 
 49 | Restart the service
 50 | 
 51 | ```sh
 52 | sudo service pgbouncer restart
 53 | ```
 54 | 
 55 | And test it
 56 | 
 57 | ```sh
 58 | psql "postgresql://user:secret@bouncerhost:6432/ssltest"
 59 | ```
 60 | 
 61 | The connection should succeed and the server should report SSL is used.
 62 | 
 63 | ```sql
 64 | SELECT ssl_is_used();
 65 | ```
 66 | 
 67 | We’ve now successfully encrypted traffic between the bouncer and the database!
 68 | 
 69 | However, you’ll notice the `psql` prompt does not have an `SSL connection` line as it did before. You can also use `sslmode=disable` to successfully connect, and programs like `tcpdump` or [tshark](https://www.wireshark.org/docs/man-pages/tshark.html) will show unencrypted traffic between the client and the bouncer. You can test this out with:
 70 | 
 71 | ```sh
 72 | sudo tcpdump -i lo -X -s 0 'port 6432'
 73 | ```
 74 | 
 75 | Run commands in `psql` and you’ll see plaintext statements printed.
 76 | 
 77 | ## Clients to PgBouncer
 78 | 
 79 | This last flow is the trickiest. PgBouncer 1.7+ supports TLS, but we need to create keys and certificates for it. For this, we’ll create a private [PKI](https://en.wikipedia.org/wiki/Public_key_infrastructure). [Minica](https://github.com/jsha/minica) and [Vault](https://www.vaultproject.io/) are two ways to do this.
 80 | 
 81 | We’ll use Minica (here are [instructions for Vault](vault-pki)). Install the latest version:
 82 | 
 83 | ```sh
 84 | sudo apt-get install minica
 85 | ```
 86 | 
 87 | And run:
 88 | 
 89 | ```sh
 90 | minica --domains bouncerhost
 91 | ```
 92 | 
 93 | We now have the files we need to connect. Add the key and certificate to `/etc/pgbouncer/pgbouncer.ini`:
 94 | 
 95 | ```ini
 96 | client_tls_sslmode = require # not verify-full
 97 | client_tls_key_file = /path/to/bouncerhost/key.pem
 98 | client_tls_cert_file = /path/to/bouncerhost/cert.pem
 99 | ```
100 | 
101 | And restart the service. To connect, we once again use `verify-full` but this time with the root certificate we generated above:
102 | 
103 | ```sh
104 | psql "postgresql://user:secret@bouncerhost:6432/ssltest?sslmode=verify-full&sslrootcert=minica.pem"
105 | ```
106 | 
107 | Confirm the `SSL connection` line is printed and `sslmode=disable` no longer works.
108 | 
109 | We’ve now successfully encrypted traffic end-to-end!
110 | 


--------------------------------------------------------------------------------
/archive/host-your-own-postgres.md:
--------------------------------------------------------------------------------
  1 | # Host Your Own Postgres
  2 | 
  3 | :elephant: Get running with the last version of Postgres in minutes
  4 | 
  5 | ## Set Up Server
  6 | 
  7 | Spin up a new server with Ubuntu 16.04.
  8 | 
  9 | Firewall
 10 | 
 11 | ```sh
 12 | sudo ufw allow ssh
 13 | sudo ufw enable
 14 | ```
 15 | 
 16 | [Automatic updates](https://help.ubuntu.com/16.04/serverguide/automatic-updates.html)
 17 | 
 18 | ```sh
 19 | sudo apt-get -y install unattended-upgrades
 20 | echo 'APT::Periodic::Unattended-Upgrade "1";' >> /etc/apt/apt.conf.d/10periodic
 21 | ```
 22 | 
 23 | Time zone
 24 | 
 25 | ```sh
 26 | sudo dpkg-reconfigure tzdata
 27 | ```
 28 | 
 29 | and select `None of the above`, then `UTC`.
 30 | 
 31 | ## Install Postgres
 32 | 
 33 | Install PostgreSQL 10
 34 | 
 35 | ```sh
 36 | echo "deb https://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main" > /etc/apt/sources.list.d/pgdg.list
 37 | wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
 38 | sudo apt-get update
 39 | sudo apt-get install -qq -y postgresql-10 postgresql-contrib
 40 | ```
 41 | 
 42 | ## Configure
 43 | 
 44 | Edit `/etc/postgresql/10/main/postgresql.conf`.
 45 | 
 46 | ```sh
 47 | # general
 48 | max_connections = 100
 49 | 
 50 | # logging
 51 | log_min_duration_statement = 100 # log queries over 100ms
 52 | log_temp_files = 0               # log all temp files
 53 | 
 54 | # stats
 55 | shared_preload_libraries = 'pg_stat_statements'
 56 | pg_stat_statements.max = 1000
 57 | ```
 58 | 
 59 | ## Remote Connections
 60 | 
 61 | Enable remote connections if needed
 62 | 
 63 | ```sh
 64 | echo "host all all 0.0.0.0/0 md5" >> /etc/postgresql/9.6/main/pg_hba.conf
 65 | echo "listen_addresses = '*'" >> /etc/postgresql/9.6/main/postgresql.conf
 66 | sudo service postgresql restart
 67 | ```
 68 | 
 69 | And update the firewall
 70 | 
 71 | ```sh
 72 | sudo ufw allow 5432/tcp # for all ips
 73 | sudo ufw allow from 127.0.0.1 to any port 5432 proto tcp # specific ip
 74 | sudo ufw enable
 75 | ```
 76 | 
 77 | ## Provisioning
 78 | 
 79 | Create a new user and database for each of your apps
 80 | 
 81 | ```sh
 82 | sudo su - postgres
 83 | psql
 84 | ```
 85 | 
 86 | And run:
 87 | 
 88 | ```sql
 89 | CREATE USER myapp WITH PASSWORD 'mypassword';
 90 | ALTER USER myapp WITH CONNECTION LIMIT 20;
 91 | CREATE DATABASE myapp_production OWNER myapp;
 92 | ```
 93 | 
 94 | Generate a random password with:
 95 | 
 96 | ```sh
 97 | cat /dev/urandom | LC_CTYPE=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1
 98 | ```
 99 | 
100 | ## Backups
101 | 
102 | ### Daily
103 | 
104 | Store backups on S3
105 | 
106 | - [Amazon S3 Backup Scripts](https://github.com/collegeplus/s3-shell-backups/blob/master/s3-postgresql-backup.sh)
107 | - [Automatic Backups to Amazon S3 are Easy ](https://rossta.net/blog/automatic-backups-to-amazon-s3-are-easy.html)
108 | 
109 | *TODO: better instructions*
110 | 
111 | ### Continuous
112 | 
113 | Rollback to a specific point in time with [WAL-E](https://github.com/wal-e/wal-e).
114 | 
115 | Opbeat has a [great tutorial](https://opbeat.com/blog/posts/postgresql-backup-to-s3-part-one/).
116 | 
117 | ## Logging
118 | 
119 | [Papertrail](https://papertrailapp.com) is great and has a free plan.
120 | 
121 | Install remote syslog
122 | 
123 | ```sh
124 | cd /tmp
125 | wget https://github.com/papertrail/remote_syslog2/releases/download/v0.13/remote_syslog_linux_amd64.tar.gz
126 | tar xzf ./remote_syslog*.tar.gz
127 | cd remote_syslog
128 | sudo cp ./remote_syslog /usr/local/bin
129 | ```
130 | 
131 | Create `/etc/log_files.yml` with:
132 | 
133 | ```sh
134 | files:
135 |   - /var/log/postgresql/*.log
136 | destination:
137 |   host: logs.papertrailapp.com
138 |   port: 12345
139 |   protocol: tls
140 | ```
141 | 
142 | ### Archive
143 | 
144 | Archive logs to S3
145 | 
146 | ```sh
147 | sudo apt-get install logrotate s3cmd
148 | s3cmd --configure
149 | ```
150 | 
151 | Add to `/etc/logrotate.d/postgresql-common`:
152 | 
153 | ```conf
154 | sharedscripts
155 | postrotate
156 |   s3cmd sync /var/log/postgresql/*.gz s3://mybucket/logs/
157 | endscript
158 | ```
159 | 
160 | Test with:
161 | 
162 | ```sh
163 | logrotate -fv /etc/logrotate.d/postgresql-common
164 | ```
165 | 
166 | ## TODO
167 | 
168 | - scripts
169 | 
170 |   ```sh
171 |   pghost bootstrap
172 |   pghost allow all
173 |   pghost allow 127.0.0.1
174 |   pghost backup:all
175 |   pghost backup myapp
176 |   pghost restore myapp
177 |   pghost provision myapp
178 |   pghost logs:syslog logs.papertrailapp.com 12345
179 |   pghost logs:archive mybucket/logs
180 |   ```
181 | 
182 | - monitoring (Graphite, CloudWatch, etc)
183 | 
184 | ## Resources
185 | 
186 | - [Copy your server logs to Amazon S3 using Logrotate and s3cmd](https://www.shanestillwell.com/2013/04/04/copy-your-server-logs-to-amazon-s3-using-logrotate-and-s3cmd/)
187 | 


--------------------------------------------------------------------------------
/archive/rails-on-heroku.md:
--------------------------------------------------------------------------------
  1 | # Rails on Heroku
  2 | 
  3 | [The official guide](https://devcenter.heroku.com/articles/getting-started-with-rails4) is a great place to start, but there’s more you can do to make life easier.
  4 | 
  5 | :tangerine: Based on lessons learned in the early days of [Instacart](https://www.instacart.com/)
  6 | 
  7 | ## Deploys
  8 | 
  9 | For zero downtime deploys, enable [preboot](https://devcenter.heroku.com/articles/preboot). This will cause deploys to take a few minutes longer to go live, but it’s better than impacting your users.
 10 | 
 11 | ```sh
 12 | heroku features:enable -a appname preboot
 13 | ```
 14 | 
 15 | Add a preload check make sure your app boots. Create `lib/tasks/preload.rake` with:
 16 | 
 17 | ```ruby
 18 | task preload: :environment do
 19 |   Rails.application.eager_load!
 20 |   ::Rails::Engine.subclasses.map(&:instance).each { |engine| engine.eager_load! }
 21 |   ActiveRecord::Base.descendants
 22 | end
 23 | ```
 24 | 
 25 | And add a [release phase](https://devcenter.heroku.com/articles/release-phase) task to your `Procfile` to run the preload script and (optionally) migrations.
 26 | 
 27 | ```sh
 28 | release: bundle exec rails preload db:migrate
 29 | ```
 30 | 
 31 | Create a deployment script in `bin/deploy`. Here’s an example:
 32 | 
 33 | ```sh
 34 | #!/usr/bin/env bash
 35 | 
 36 | function notify() {
 37 |   # add your chat service
 38 |   echo $1
 39 | }
 40 | 
 41 | notify "Deploying"
 42 | 
 43 | git checkout master -q && git pull origin master -q && \
 44 | git push origin master -q && git push heroku master
 45 | 
 46 | if [ $? -eq 0 ]; then
 47 |   notify "Deploy complete"
 48 | else
 49 |   notify "Deploy failed"
 50 | fi
 51 | ```
 52 | 
 53 | Be sure to `chmod +x bin/deploy`. Replace the `echo` command with a call to your chat service ([Hipchat instructions](https://github.com/hipchat/hipchat-cli)).
 54 | 
 55 | Deploy with:
 56 | 
 57 | ```sh
 58 | bin/deploy
 59 | ```
 60 | 
 61 | ## Migrations
 62 | 
 63 | Follow best practices for [zero downtime migrations](https://github.com/ankane/strong_migrations).
 64 | 
 65 | If you start to see errors about prepared statements after running migrations, disable them.
 66 | 
 67 | ```yml
 68 | production:
 69 |   prepared_statements: false
 70 | ```
 71 | 
 72 | Don’t worry! Your app will still be fast (and you’ll probably do this anyways at scale since PgBouncer requires it).
 73 | 
 74 | ## Rollbacks
 75 | 
 76 | Create a rollback script in `bin/rollback`.
 77 | 
 78 | ```sh
 79 | #!/usr/bin/env bash
 80 | 
 81 | function notify() {
 82 |   # add your chat service
 83 |   echo $1
 84 | }
 85 | 
 86 | notify "Rolling back"
 87 | 
 88 | heroku rollback
 89 | 
 90 | if [ $? -eq 0 ]; then
 91 |   notify "Rollback complete"
 92 | else
 93 |   notify "Rollback failed"
 94 | fi
 95 | ```
 96 | 
 97 | Don’t forget to `chmod +x bin/rollback`. Rollback with:
 98 | 
 99 | ```sh
100 | bin/rollback
101 | ```
102 | 
103 | ## Logs
104 | 
105 | Add [Papertrail](https://papertrailapp.com/) to make your logs easily searchable.
106 | 
107 | ```sh
108 | heroku addons:create papertrail
109 | ```
110 | 
111 | Set it up to [archive logs to S3](https://help.papertrailapp.com/kb/how-it-works/permanent-log-archives/).
112 | 
113 | ## Performance
114 | 
115 | Add a performance monitoring service like New Relic.
116 | 
117 | ```sh
118 | heroku addons:create newrelic
119 | ```
120 | 
121 | And follow the [installation instructions](https://devcenter.heroku.com/articles/newrelic).
122 | 
123 | Use a [CDN](https://en.wikipedia.org/wiki/Content_delivery_network) like [Amazon CloudFront](https://devcenter.heroku.com/articles/using-amazon-cloudfront-cdn) to serve assets.
124 | 
125 | ## Autoscaling
126 | 
127 | Check out [HireFire](https://www.hirefire.io/).
128 | 
129 | ## Productivity
130 | 
131 | Use [Archer](https://github.com/ankane/archer) to enable console history.
132 | 
133 | Use [aliases](https://www.digitalocean.com/community/tutorials/an-introduction-to-useful-bash-aliases-and-functions) for less typing.
134 | 
135 | ```sh
136 | alias hc="heroku run rails console"
137 | ```
138 | 
139 | ## Staging
140 | 
141 | Create a separate app for staging.
142 | 
143 | ```sh
144 | heroku create staging-appname -r staging
145 | heroku config:set RAILS_ENV=staging RACK_ENV=staging -r staging
146 | ```
147 | 
148 | Deploy with:
149 | 
150 | ```sh
151 | git push staging branch:master
152 | ```
153 | 
154 | You may also want to password protect your staging environment.
155 | 
156 | ```ruby
157 | class ApplicationController < ActionController::Base
158 |   http_basic_authenticate_with name: "happy", password: "carrots" if Rails.env.staging?
159 | end
160 | ```
161 | 
162 | ## Lastly...
163 | 
164 | Have suggestions? [Please share](https://github.com/ankane/shorts/issues/new). For more tips, check out [Production Rails](https://github.com/ankane/production_rails).
165 | 
166 | :hatched_chick: Happy coding!
167 | 


--------------------------------------------------------------------------------
/archive/daru.md:
--------------------------------------------------------------------------------
  1 | # Daru: Pandas for Ruby
  2 | 
  3 | <p style="text-align: center; margin-bottom: 0;"><img src="/images/daru.jpg" alt="Panda" /></p>
  4 | 
  5 | <p class="image-description">
  6 |   Photo by <a href="https://unsplash.com/photos/1o8VV8yOw40" target="_blank">Bruce Hong</a>
  7 | </p>
  8 | 
  9 | NumPy and Pandas are two extremely popular libraries for machine learning in Python. Last post, we looked at [Numo](https://ankane.org/numo), a Ruby library similar to NumPy. As luck would have it, there’s a library similar to Pandas as well. It’s called Daru, and it’s the focus of this post.
 10 | 
 11 | ## Overview
 12 | 
 13 | Daru is a data analysis library. Its core data structure is the data frame, which is similar to an in-memory database table. Data frames have rows and columns, and each column has a specific data type. Let’s create a data frame with the most populous countries:
 14 | 
 15 | ```ruby
 16 | df = Daru::DataFrame.new(
 17 |   country: ["China", "India", "USA"],
 18 |   population: [1433, 1366, 329] # in millions
 19 | )
 20 | ```
 21 | 
 22 | <p class="image-description" style="margin-top: -0.5rem;">
 23 |   Population data from the <a href="https://population.un.org/wpp/" target="_blank">United Nations</a>, 2019
 24 | </p>
 25 | 
 26 | Here’s what it looks like:
 27 | 
 28 | ```text
 29 |      country population
 30 | 0      China       1433
 31 | 1      India       1366
 32 | 2        USA        329
 33 | ```
 34 | 
 35 | You can get specific columns with:
 36 | 
 37 | ```ruby
 38 | df[:country]
 39 | df[:country, :population]
 40 | ```
 41 | 
 42 | Or specific rows with:
 43 | 
 44 | ```ruby
 45 | df.first(2)  # first 2 rows
 46 | df.last(2)   # last 2 rows
 47 | df.row[1]    # 2nd row
 48 | df.row[1..2] # 2nd and 3rd row
 49 | ```
 50 | 
 51 | ## Filtering, Sorting, and Grouping
 52 | 
 53 | Select countries with over 1 billion people.
 54 | 
 55 | ```ruby
 56 | df.where(df[:population] > 1000)
 57 | ```
 58 | 
 59 | For equality, use `eq` or `in`.
 60 | 
 61 | ```ruby
 62 | df.where(df[:country].eq("China"))
 63 | df.where(df[:country].in(["USA", "India"]))
 64 | ```
 65 | 
 66 | Negate a condition with `!`.
 67 | 
 68 | ```ruby
 69 | df.where(!df[:country].eq("India"))
 70 | ```
 71 | 
 72 | Combine operators with `&` (and) and `|` (or).
 73 | 
 74 | ```ruby
 75 | df.where(df[:country].eq("USA") | (df[:population] < 1400))
 76 | ```
 77 | 
 78 | Sort the data frame by a column with:
 79 | 
 80 | ```ruby
 81 | df.sort([:population])
 82 | df.sort([:country], ascending: [false])
 83 | ```
 84 | 
 85 | You can also group data and perform aggregations.
 86 | 
 87 | ```ruby
 88 | cities = Daru::DataFrame.new(
 89 |   country: ["China", "China", "India"],
 90 |   city: ["Shanghai", "Beijing", "Mumbai"]
 91 | )
 92 | cities.group_by([:country]).count
 93 | ```
 94 | 
 95 | ## Combining Data Frames
 96 | 
 97 | There are a number of ways to combine data frames. You can add rows:
 98 | 
 99 | ```ruby
100 | countries = Daru::DataFrame.new(
101 |   country: ["Indonesia", "Pakistan"],
102 |   population: [271, 217] # in millions
103 | )
104 | df.concat(countries)
105 | ```
106 | 
107 | Or add columns:
108 | 
109 | ```ruby
110 | locations = Daru::DataFrame.new(
111 |   continent: ["Asia", "Asia", "North America"],
112 |   planet: ["Earth", "Earth", "Earth"]
113 | )
114 | df.merge(locations)
115 | ```
116 | 
117 | You can also perform joins like in SQL.
118 | 
119 | ```ruby
120 | cities = Daru::DataFrame.new(
121 |   country: ["China", "China", "India"],
122 |   city: ["Shanghai", "Beijing", "Mumbai"]
123 | )
124 | df.join(cities, how: :inner, on: [:country])
125 | ```
126 | 
127 | ## Reading and Writing Data
128 | 
129 | Daru makes it easy to load data from a CSV file.
130 | 
131 | ```ruby
132 | Daru::DataFrame.from_csv("countries.csv")
133 | ```
134 | 
135 | After manipulating the data, you can save it back to a CSV file.
136 | 
137 | ```ruby
138 | df.write_csv("countries_v2.csv")
139 | ```
140 | 
141 | You can also load data directly from Active Record.
142 | 
143 | ```ruby
144 | relation = Country.where("population > 100")
145 | Daru::DataFrame.from_activerecord(relation)
146 | ```
147 | 
148 | ## Plotting
149 | 
150 | For plotting, use a Jupyter notebook with [IRuby](https://github.com/sciruby/iruby). Create a plot with:
151 | 
152 | ```ruby
153 | df.plot type: :bar, x: :country, y: :population do |plot, diagram|
154 |   plot.x_label "Country"
155 |   plot.y_label "Population (millions)"
156 |   diagram.color(Nyaplot::Colors.Pastel1)
157 | end
158 | ```
159 | 
160 | <p style="text-align: center;"><img src="/images/daru-plot.png" alt="Daru Plot" /></p>
161 | 
162 | You can also create line charts, scatter plots, box plots, and histograms.
163 | 
164 | ## Summary
165 | 
166 | You’ve now seen how to use Daru to:
167 | 
168 | - create data frames
169 | - filter, sort, and group data
170 | - combine data frames
171 | - create plots
172 | 
173 | Try out [Daru](https://github.com/SciRuby/daru) for your next analysis.
174 | 


--------------------------------------------------------------------------------
/archive/introducing-dexter.md:
--------------------------------------------------------------------------------
 1 | # Introducing Dexter, the Automatic Indexer for Postgres
 2 | 
 3 | <p style="text-align: center;"><img src="/images/dexter.jpg" alt="Dexter" /></p>
 4 | 
 5 | Your database knows which queries are running. It also has a pretty good idea of which indexes are best for a given query. And since indexes don’t change the results of a query, they’re really just a performance optimization. So why do we always need a human to choose them?
 6 | 
 7 | Introducing [Dexter](https://github.com/ankane/dexter). Dexter indexes your database for you. You can still do it yourself, but Dexter will do a pretty good job.
 8 | 
 9 | Dexter works in two phases:
10 | 
11 | 1. Collect queries
12 | 2. Generate indexes
13 | 
14 | We’ll walk through each of them.
15 | 
16 | ### Phase 1: Collect
17 | 
18 | You can stream Postgres log files directly to Dexter. Dexter finds lines like:
19 | 
20 | ```txt
21 | LOG:  duration: 14.077 ms  statement: SELECT * FROM ratings WHERE user_id = 3;
22 | ```
23 | 
24 | And parses out the query and duration. It uses fingerprinting to group queries. Queries with the same parse tree but different values are grouped together. For instance, both of the following queries have the same fingerprint.
25 | 
26 | ```sql
27 | SELECT * FROM ratings WHERE user_id = 2;
28 | SELECT * FROM ratings WHERE user_id = 3;
29 | ```
30 | 
31 | The data is aggregated to get the total execution time by fingerprint. You can get similar information from the [pg_stat_statements view](https://www.postgresql.org/docs/current/static/pgstatstatements.html), except queries in the view are normalized. This means, you get:
32 | 
33 | ```sql
34 | SELECT * FROM ratings WHERE user_id = ?;
35 | ```
36 | 
37 | instead of
38 | 
39 | ```sql
40 | SELECT * FROM ratings WHERE user_id = 3;
41 | ```
42 | 
43 | However, we need the actual values to determine costs in the next step. To prevent over-indexing, you can set a threshold for the total execution time before a query is considered for indexing.
44 | 
45 | ### Phase 2. Generate
46 | 
47 | To generate indexes, Dexter creates hypothetical indexes to try to speed up the slow queries we’ve just collected. Hypothetical indexes show how a query’s execution plan would change if an actual index existed. They take virtually no time to create, don’t require any disk space, and are only visible to the current session. You can read more about [hypothetical indexes here](https://rjuju.github.io/postgresql/2015/07/02/how-about-hypothetical-indexes.html).
48 | 
49 | The main steps Dexter takes are:
50 | 
51 | 1.  Filter out queries on system tables and other databases
52 | 2.  Analyze tables for up-to-date planner statistics if they haven’t been analyzed recently
53 | 3.  Get the initial cost of queries
54 | 4.  Create hypothetical indexes on columns that aren’t already indexes
55 | 5.  Get costs again and see if any hypothetical indexes were used
56 | 
57 | While fairly straightforward, this approach is extremely powerful, as it uses the Postgres query planner to figure out the best index(es) for a query. Hypothetical indexes that were used AND significantly reduced cost are selected to be indexes.
58 | 
59 | To be safe, indexes are only logged by default. This allows you to use Dexter for index suggestions if you want to manually verify them first. When you let Dexter create indexes, they’re created concurrently to limit the impact on database performance.
60 | 
61 | ```txt
62 | 2017-06-25T17:52:22+00:00 Index found: ratings (user_id)
63 | 2017-06-25T17:52:22+00:00 Creating index: CREATE INDEX CONCURRENTLY ON ratings (user_id)
64 | 2017-06-25T17:52:37+00:00 Index created: 15243 ms
65 | ```
66 | 
67 | ### Trade-offs and Limitations
68 | 
69 | The big advantage of indexes is faster data retrieval. On the flip side, indexes add overhead to write operations, like INSERT, UPDATE, and DELETE, as indexes must be updated as well. Indexes also take up disk space.
70 | 
71 | Because of this, you may not want to index write-heavy tables. Dexter does not currently try to identify these tables automatically, but you can pass them in by hand.
72 | 
73 | As for other limitations, Dexter does not try to create multicolumn indexes (edit: this is no longer the case). Dexter also assumes the search_path for queries is the same as the user running Dexter. You’ll still need to create unique constraints on your own. Dexter also requires the [HypoPG](https://github.com/HypoPG/hypopg) extension, which isn’t available on some hosted providers like Heroku and Amazon RDS.
74 | 
75 | * * *
76 | 
77 | It’s time to make forgotten indexes a problem of the past.
78 | 
79 | [Add Dexter to your team](https://github.com/ankane/dexter) today.
80 | 
81 | ### Thanks
82 | 
83 | This software wouldn’t be possible without [HypoPG](https://github.com/HypoPG/hypopg), which allows you to create hypothetical indexes, and [pg_query](https://github.com/lfittl/pg_query), which allows you to parse and fingerprint queries. A big thanks to Dalibo and [Lukas Fittl](https://medium.com/@LukasFittl) respectively.
84 | 


--------------------------------------------------------------------------------
/archive/postgres-users.md:
--------------------------------------------------------------------------------
  1 | # Bootstrapping Postgres Users
  2 | 
  3 | Setting up database users for an app can be challenging if you don’t do it often. Good permissions add a layer of security and can minimize the chances of developer mistakes.
  4 | 
  5 | The three types of users we’ll cover are:
  6 | 
  7 | Type | Description | Read | Write | Modify
  8 | --- | --- | --- | --- | ---
  9 | migrations | Schema changes | ✓ | ✓ | ✓
 10 | app | Reading and writing data | ✓ | ✓ |
 11 | analytics | Data analysis and reporting | ✓ | |
 12 | 
 13 | Before we jump into it, there’s something you should know about new databases.
 14 | 
 15 | ## New Databases
 16 | 
 17 | After creating a new database, all users can access it and create tables in the `public` schema. This isn’t what we want. To fix this, run:
 18 | 
 19 | ```sql
 20 | REVOKE ALL ON DATABASE mydb FROM PUBLIC;
 21 | 
 22 | REVOKE ALL ON SCHEMA public FROM PUBLIC;
 23 | ```
 24 | 
 25 | Be sure to replace `mydb` with your database name.
 26 | 
 27 | ## Roles
 28 | 
 29 | PostgreSQL uses the concept of roles to manage privileges. Roles can be used to define groups and users. A user is simply a role with a password and permission to log in.
 30 | 
 31 | The approach we’ll take is to create a group and add users to it. This makes it easy to rotate credentials in the future: just add a second user to the group, set your app’s configuration to the new user, and remove the original one.
 32 | 
 33 | ## Migrations
 34 | 
 35 | First, we need a group to manage the schema. We could use a superuser, but this isn’t a great idea, as superusers can access all databases, change permissions, and create new roles. Instead, let’s create a new group.
 36 | 
 37 | ```sql
 38 | CREATE ROLE migrations;
 39 | 
 40 | GRANT CONNECT ON DATABASE mydb TO migrations;
 41 | 
 42 | GRANT ALL ON SCHEMA public TO migrations;
 43 | 
 44 | ALTER ROLE migrations SET lock_timeout TO '5s';
 45 | ```
 46 | 
 47 | We set a lock timeout so migrations don’t disrupt normal database activity while attempting to acquire a lock.
 48 | 
 49 | Now, we can create a user who’s a member of the group.
 50 | 
 51 | ```sql
 52 | CREATE ROLE migrator WITH LOGIN ENCRYPTED PASSWORD 'secret' IN ROLE migrations;
 53 | 
 54 | ALTER ROLE migrator SET role TO 'migrations';
 55 | ```
 56 | 
 57 | The last statement ensures tables created by the user are owned by the group.
 58 | 
 59 | You can generate a nice password from the command line with:
 60 | 
 61 | ```sh
 62 | cat /dev/urandom | LC_CTYPE=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1
 63 | ```
 64 | 
 65 | ## App
 66 | 
 67 | Next, let’s create a group for our app. It’ll need to read and write data but shouldn’t need to modify the schema or truncate tables. We also want to set a statement timeout to prevent long running queries from degrading database performance.
 68 | 
 69 | ```sql
 70 | CREATE ROLE app;
 71 | 
 72 | GRANT CONNECT ON DATABASE mydb TO app;
 73 | 
 74 | GRANT USAGE ON SCHEMA public TO app;
 75 | 
 76 | GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app;
 77 | 
 78 | GRANT SELECT, USAGE ON ALL SEQUENCES IN SCHEMA public TO app;
 79 | 
 80 | ALTER DEFAULT PRIVILEGES FOR ROLE migrations IN SCHEMA public
 81 |     GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO app;
 82 | 
 83 | ALTER DEFAULT PRIVILEGES FOR ROLE migrations IN SCHEMA public
 84 |     GRANT SELECT, USAGE ON SEQUENCES TO app;
 85 | 
 86 | ALTER ROLE app SET statement_timeout TO '30s';
 87 | ```
 88 | 
 89 | > **Note:** The default privileges statements reference the group used for migrations. If you use Amazon RDS, you must run these statements as the migrator user we created above (since you don’t have access to a true superuser).
 90 | 
 91 | Then, create a user with:
 92 | 
 93 | ```sql
 94 | CREATE ROLE myapp WITH LOGIN ENCRYPTED PASSWORD 'secret' IN ROLE app;
 95 | ```
 96 | 
 97 | ## Analytics
 98 | 
 99 | Finally, let’s create a group to be used for data analysis, reporting, and business intelligence tools (like [Blazer](https://github.com/ankane/blazer), our open-source one). These users are often referred to as a *read-only users*. We don’t want them to be able to mistakenly update data.
100 | 
101 | ```sql
102 | CREATE ROLE analytics;
103 | 
104 | GRANT CONNECT ON DATABASE mydb TO analytics;
105 | 
106 | GRANT USAGE ON SCHEMA public TO analytics;
107 | 
108 | GRANT SELECT ON ALL TABLES IN SCHEMA public TO analytics;
109 | 
110 | ALTER DEFAULT PRIVILEGES FOR ROLE migrations IN SCHEMA public
111 |     GRANT SELECT ON TABLES TO analytics;
112 | 
113 | ALTER ROLE analytics SET statement_timeout TO '3min';
114 | ```
115 | 
116 | Once again, creating a user is relatively straightforward.
117 | 
118 | ```sql
119 | CREATE ROLE bi WITH LOGIN ENCRYPTED PASSWORD 'secret' IN ROLE analytics;
120 | ```
121 | 
122 | ## Summary
123 | 
124 | You now know how to create different types of Postgres users. Spending a bit of time upfront to configure your users can make them easier to manage in the long run. This should give you a nice foundation.
125 | 


--------------------------------------------------------------------------------
/archive/dokku-digital-ocean.md:
--------------------------------------------------------------------------------
  1 | # Dokku on DigitalOcean
  2 | 
  3 | :droplet: Your very own PaaS
  4 | 
  5 | ## Create Droplet
  6 | 
  7 | Create new droplet with Ubuntu 16.04. Be sure to use an SSH key.
  8 | 
  9 | ## Install Dokku
 10 | 
 11 | ```sh
 12 | wget https://raw.githubusercontent.com/dokku/dokku/v0.12.5/bootstrap.sh
 13 | sudo DOKKU_TAG=v0.12.5 bash bootstrap.sh
 14 | ```
 15 | 
 16 | And visit your server’s IP address in your browser to complete installation.
 17 | 
 18 | If you have a domain, use virtualhost naming. Otherwise, Dokku will use different ports for each deploy of your app. You can add easily add a domain later.
 19 | 
 20 | ## Add a Firewall
 21 | 
 22 | Create a [firewall](https://cloud.digitalocean.com/networking/firewalls)
 23 | 
 24 | Inbound Rules
 25 | 
 26 | - SSH from your [external IP](https://www.google.com/search?q=external+ip)
 27 | - HTTP and HTTPS from all IPv4 and all IPv6
 28 | 
 29 | Outbound Rules
 30 | 
 31 | - ICMP, all TCP, and all UDP from all IPv4 and all IPv6
 32 | 
 33 | ## Set Up Server
 34 | 
 35 | Turn on [automatic updates](https://help.ubuntu.com/16.04/serverguide/automatic-updates.html)
 36 | 
 37 | ```sh
 38 | sudo apt-get -y install unattended-upgrades
 39 | echo 'APT::Periodic::Unattended-Upgrade "1";' >> /etc/apt/apt.conf.d/10periodic
 40 | ```
 41 | 
 42 | Enable swap
 43 | 
 44 | ```sh
 45 | sudo fallocate -l 4G /swapfile
 46 | sudo chmod 600 /swapfile
 47 | sudo mkswap /swapfile
 48 | sudo swapon /swapfile
 49 | sudo sh -c 'echo "/swapfile none swap sw 0 0" >> /etc/fstab'
 50 | ```
 51 | 
 52 | Configure time zone
 53 | 
 54 | ```sh
 55 | sudo dpkg-reconfigure tzdata
 56 | ```
 57 | 
 58 | and select `None of the above`, then `UTC`.
 59 | 
 60 | ## Deploy
 61 | 
 62 | Get the official Dokku client locally
 63 | 
 64 | ```sh
 65 | git clone git@github.com:progrium/dokku.git ~/.dokku
 66 | 
 67 | # add the following to either your
 68 | # .bashrc, .bash_profile, or .profile file
 69 | alias dokku='$HOME/.dokku/contrib/dokku_client.sh'
 70 | ```
 71 | 
 72 | Create app
 73 | 
 74 | ```sh
 75 | dokku apps:create myapp
 76 | ```
 77 | 
 78 | Add a `CHECKS` file
 79 | 
 80 | ```txt
 81 | WAIT=2
 82 | ATTEMPTS=15
 83 | /
 84 | ```
 85 | 
 86 | Deploy
 87 | 
 88 | ```sh
 89 | git remote add dokku dokku@dokkuhost:myapp
 90 | git push dokku master
 91 | ```
 92 | 
 93 | ## Workers
 94 | 
 95 | Dokku only runs web processes by default. If you have workers or other process types, use:
 96 | 
 97 | ```sh
 98 | dokku ps:scale worker=1
 99 | ```
100 | 
101 | ## One-Off Jobs
102 | 
103 | ```sh
104 | dokku run rails db:migrate
105 | dokku run rails console
106 | ```
107 | 
108 | ## Scheduled Jobs
109 | 
110 | Two options
111 | 
112 | 1. Add a [custom clock process](https://devcenter.heroku.com/articles/scheduled-jobs-custom-clock-processes) to your Procfile
113 | 
114 | 2. Or create `/etc/cron.d/myapp` with:
115 | 
116 |   ```
117 |   PATH=/usr/local/bin:/usr/bin:/bin
118 |   SHELL=/bin/bash
119 |   * * * * * dokku dokku --rm run myapp rake task1
120 |   0 0 * * * dokku dokku --rm run myapp rake task2
121 |   ```
122 | 
123 | ## Custom Domains
124 | 
125 | ```sh
126 | dokku domains:add www.datakick.org
127 | ```
128 | 
129 | ## SSL
130 | 
131 | Get free SSL certificates thanks to [Let’s Encrypt](https://letsencrypt.org/). On the server, run:
132 | 
133 | ```sh
134 | dokku plugin:install https://github.com/dokku/dokku-letsencrypt.git
135 | dokku letsencrypt:cron-job --add
136 | ```
137 | 
138 | And locally, run:
139 | 
140 | ```sh
141 | dokku config:set --no-restart DOKKU_LETSENCRYPT_EMAIL=your@email.tld
142 | dokku letsencrypt
143 | ```
144 | 
145 | ## Logging
146 | 
147 | Use syslog to ship your logs to a service. [Papertrail](https://papertrailapp.com) is great and has a free plan.
148 | 
149 | For apps, use:
150 | 
151 | ```sh
152 | dokku plugin:install https://github.com/michaelshobbs/dokku-logspout.git
153 | dokku plugin:install https://github.com/michaelshobbs/dokku-hostname.git
154 | dokku logspout:server syslog+tls://logs.papertrailapp.com:12345
155 | dokku logspout:start
156 | ```
157 | 
158 | For nginx and other logs, install [remote_syslog2](https://github.com/papertrail/remote_syslog2)
159 | 
160 | ```sh
161 | cd /tmp
162 | wget https://github.com/papertrail/remote_syslog2/releases/download/v0.18/remote_syslog_linux_amd64.tar.gz
163 | tar xzf ./remote_syslog*.tar.gz
164 | cd remote_syslog
165 | sudo cp ./remote_syslog /usr/local/bin
166 | ```
167 | 
168 | Create `/etc/log_files.yml` with:
169 | 
170 | ```sh
171 | files:
172 |   - /var/log/nginx/*.log
173 |   - /var/log/unattended-upgrades/*.log
174 | destination:
175 |   host: logs.papertrailapp.com
176 |   port: 12345
177 |   protocol: tls
178 | ```
179 | 
180 | And run:
181 | 
182 | ```sh
183 | remote_syslog
184 | ```
185 | 
186 | ## Database
187 | 
188 | Check out [Host Your Own Postgres](host-your-own-postgres).
189 | 
190 | ## Memcached
191 | 
192 | ```sh
193 | dokku plugin:install https://github.com/dokku/dokku-memcached.git
194 | dokku memcached:create lolipop
195 | dokku memcached:link lolipop myapp
196 | ```
197 | 
198 | ## Redis
199 | 
200 | ```sh
201 | dokku plugin:install https://github.com/dokku/dokku-redis.git
202 | dokku redis:create lolipop
203 | dokku redis:link lolipop myapp
204 | ```
205 | 
206 | ## TODO
207 | 
208 | - [Monitoring](https://www.brianchristner.io/how-to-setup-docker-monitoring/)
209 | 
210 | ## Bonus
211 | 
212 | Find great Docker projects at [Awesome Docker](https://github.com/veggiemonk/awesome-docker).
213 | 
214 | ## Resources
215 | 
216 | - [Additional Recommended Steps for New Ubuntu 14.04 Servers](https://www.digitalocean.com/community/tutorials/additional-recommended-steps-for-new-ubuntu-14-04-servers)
217 | 


--------------------------------------------------------------------------------
/archive/securing-user-emails-lockbox.md:
--------------------------------------------------------------------------------
  1 | # Securing User Emails in Rails with Lockbox
  2 | 
  3 | <p style="text-align: center;"><img src="/images/securing-user-emails-lockbox.png" alt="Model Code" /></p>
  4 | 
  5 | ---
  6 | 
  7 | *This is an update to [Securing User Emails in Rails](https://ankane.org/securing-user-emails-in-rails) with a number of improvements:*
  8 | 
  9 | - *Works with Devise’s email changed notifications*
 10 | - *Works with Devise’s reconfirmable option*
 11 | - *Stores encrypted data in a single field*
 12 | - *You only need to manage a single key*
 13 | 
 14 | ---
 15 | 
 16 | Email addresses are a common form of personal data, and they’re often stored unencrypted. If an attacker gains access to the database or backups, emails will be compromised.
 17 | 
 18 | This post will walk you through a practical approach to protecting emails. It works with [Devise](https://github.com/plataformatec/devise), the most popular authentication framework for Rails, and is general enough to work with others.
 19 | 
 20 | ## Strategy
 21 | 
 22 | We’ll use two concepts to make this happen: encryption and blind indexing. Encryption gives us a way to securely store the data, and blind indexing provides a way to look it up.
 23 | 
 24 | Blind indexing works by computing a hash of the data. You’re probably familiar with hash functions like MD5 and SHA1. Rather than one of these, we use a hash function that takes a secret key and uses [key stretching](https://en.wikipedia.org/wiki/Key_stretching) to slow down brute force attempts. You can read more about [blind indexing here](https://www.sitepoint.com/how-to-search-on-securely-encrypted-database-fields/).
 25 | 
 26 | We’ll use the [Lockbox](https://github.com/ankane/lockbox) gem for encryption and the [Blind Index](https://github.com/ankane/blind_index) gem for blind indexing.
 27 | 
 28 | ## Instructions
 29 | 
 30 | Let’s assume you have a `User` model with an email field.
 31 | 
 32 | Add to your Gemfile:
 33 | 
 34 | ```ruby
 35 | gem 'lockbox'
 36 | gem 'blind_index'
 37 | ```
 38 | 
 39 | And run:
 40 | 
 41 | ```sh
 42 | bundle install
 43 | ```
 44 | 
 45 | Generate a key
 46 | 
 47 | ```ruby
 48 | Lockbox.generate_key
 49 | ```
 50 | 
 51 | Store the key with your other secrets. This is typically Rails credentials or an environment variable ([dotenv](https://github.com/bkeepers/dotenv) is great for this). Be sure to use different keys in development and production.
 52 | 
 53 | Set the following environment variables with your key (you can use this one in development)
 54 | 
 55 | ```sh
 56 | LOCKBOX_MASTER_KEY=0000000000000000000000000000000000000000000000000000000000000000
 57 | ```
 58 | 
 59 | or create `config/initializers/lockbox.rb` with something like
 60 | 
 61 | ```ruby
 62 | Lockbox.master_key = Rails.application.credentials.lockbox_master_key
 63 | ```
 64 | 
 65 | Next, let’s replace the email field with an encrypted version. Create a migration:
 66 | 
 67 | ```sh
 68 | rails generate migration add_email_ciphertext_to_users
 69 | ```
 70 | 
 71 | And add:
 72 | 
 73 | ```ruby
 74 | class AddEmailCiphertextToUsers < ActiveRecord::Migration[5.2]
 75 |   def change
 76 |     # encrypted data
 77 |     add_column :users, :email_ciphertext, :string
 78 | 
 79 |     # blind index
 80 |     add_column :users, :email_bidx, :string
 81 |     add_index :users, :email_bidx, unique: true
 82 | 
 83 |     # drop original here unless we have existing users
 84 |     remove_column :users, :email
 85 |   end
 86 | end
 87 | ```
 88 | 
 89 | Then migrate:
 90 | 
 91 | ```sh
 92 | rails db:migrate
 93 | ```
 94 | 
 95 | Add to your user model:
 96 | 
 97 | ```ruby
 98 | class User < ApplicationRecord
 99 |   encrypts :email
100 |   blind_index :email
101 | end
102 | ```
103 | 
104 | Create a new user and confirm it works.
105 | 
106 | ## Existing Users
107 | 
108 | If you have existing users, we need to backfill the data before dropping the email column.
109 | 
110 | ```ruby
111 | class User < ApplicationRecord
112 |   encrypts :email, migrating: true
113 |   blind_index :email, migrating: true
114 | end
115 | ```
116 | 
117 | Backfill the data in the Rails console:
118 | 
119 | ```ruby
120 | Lockbox.migrate(User)
121 | ```
122 | 
123 | Then update the model to the desired state:
124 | 
125 | ```ruby
126 | class User < ApplicationRecord
127 |   encrypts :email
128 |   blind_index :email
129 | 
130 |   # remove this line after dropping email column
131 |   self.ignored_columns = ["email"]
132 | end
133 | ```
134 | 
135 | Finally, drop the email column.
136 | 
137 | ## Reconfirmable
138 | 
139 | If you use the confirmable module with `reconfirmable`, you should also encrypt the `unconfirmed_email` field.
140 | 
141 | ```ruby
142 | class AddUnconfirmedEmailToUsers < ActiveRecord::Migration[5.2]
143 |   def change
144 |     add_column :users, :unconfirmed_email_ciphertext, :text
145 |   end
146 | end
147 | ```
148 | 
149 | And add `unconfirmed_email` to the list of encrypted fields and a new method:
150 | 
151 | ```ruby
152 | class User < ApplicationRecord
153 |   encrypts :email, :unconfirmed_email
154 | end
155 | ```
156 | 
157 | ## Logging
158 | 
159 | We also need to make sure email addresses aren’t logged. Add to `config/initializers/filter_parameter_logging.rb`:
160 | 
161 | ```ruby
162 | Rails.application.config.filter_parameters += [:email]
163 | ```
164 | 
165 | Use [Logstop](https://github.com/ankane/logstop) to filter anything that looks like an email address as an extra line of defense. Add to your Gemfile:
166 | 
167 | ```ruby
168 | gem 'logstop'
169 | ```
170 | 
171 | And create `config/initializers/logstop.rb` with:
172 | 
173 | ```ruby
174 | Logstop.guard(Rails.logger)
175 | ```
176 | 
177 | ## Summary
178 | 
179 | We now have a way to encrypt emails and query for exact matches. You can apply this same approach to other fields as well. For more security, consider a [key management service](https://github.com/ankane/kms_encrypted) to manage your keys.
180 | 


--------------------------------------------------------------------------------
/archive/modern-encryption-rails.md:
--------------------------------------------------------------------------------
  1 | # Modern Encryption for Rails
  2 | 
  3 | <p style="text-align: center;"><img src="/images/modern-encryption-rails.png" alt="Lockbox" /></p>
  4 | 
  5 | Encrypting sensitive data at the application-level is crucial for data security. Since writing [Securing Sensitive Data in Rails](https://ankane.org/sensitive-data-rails), I haven’t been able to shake the feeling that encryption in Rails could be easier and cleaner.
  6 | 
  7 | To address this, I created a library called [Lockbox](https://github.com/ankane/lockbox). Here are some of the principles behind it.
  8 | 
  9 | ## Easy to Use, Hard to Misuse
 10 | 
 11 | Many cryptography mistakes happen during implementation. Lockbox provides good defaults and is designed to be hard to misuse. You don’t need to deal with initialization vectors and it only supports secure algorithms.
 12 | 
 13 | ## Popular Integrations
 14 | 
 15 | Sensitive data can appear in many places, like database fields, file uploads, and strings. You shouldn’t need different libraries for each of these.
 16 | 
 17 | Lockbox can encrypt your data in all of these forms. It has built-in integrations with Active Record, Active Storage, and CarrierWave.
 18 | 
 19 | ## Zero Downtime Migrations
 20 | 
 21 | At some point, you may want to encrypt existing data. This should be easy to do, and most importantly, not require any downtime. Lockbox provides a single method you can use for this once your model is configured:
 22 | 
 23 | ```ruby
 24 | Lockbox.migrate(User)
 25 | ```
 26 | 
 27 | No need to write one-off backfill scripts.
 28 | 
 29 | ## Maximum Compatibility
 30 | 
 31 | Encrypting attributes shouldn’t break existing code or libraries. To make this possible, methods like `attribute_changed?` and `attribute_was` should behave similarly regardless of whether or not an attribute is encrypted. Lockbox includes these methods in its test suite for maximum compatibility.
 32 | 
 33 | This allows features like Devise’s ability to send email change notifications to work when the email attribute is encrypted, which is an important measure to prevent account hijacking.
 34 | 
 35 | ```ruby
 36 | Devise.setup do |config|
 37 |   config.send_email_changed_notification = true
 38 | end
 39 | ```
 40 | 
 41 | You can even query encrypted attributes thanks to the [blind_index](https://github.com/ankane/blind_index) gem.
 42 | 
 43 | ## Modern Algorithms
 44 | 
 45 | Lockbox uses AES-GCM for [authenticated encryption](https://tonyarcieri.com/all-the-crypto-code-youve-ever-written-is-probably-broken). It also supports XSalsa20 (thanks to Libsodium), which is recommended by [some cryptographers](https://latacora.micro.blog/2018/04/03/cryptographic-right-answers.html).
 46 | 
 47 | ## Less Keys To Manage
 48 | 
 49 | It’s a good practice to use a different encryption key for each field to make it more difficult for attackers and to reduce the likelihood of a [nonce collision](https://www.cryptologie.net/article/402/is-symmetric-security-solved/). However, this can be burdensome for developers.
 50 | 
 51 | Instead, we can use a single master key and derive separate keys for each field from it. This approach is taken from [CipherSweet](https://ciphersweet.paragonie.com/internals/key-hierarchy), an encryption library for PHP and Node.js. Now developers can safely add encrypted fields without having to worry about generating and storing additional secrets.
 52 | 
 53 | You can still specify keys for certain fields if you prefer, but it’s no longer required. Lockbox also works with [KMS Encrypted](https://github.com/ankane/kms_encrypted) if you want to use a key management service to manage your keys.
 54 | 
 55 | ## Built-In Key Rotation
 56 | 
 57 | It’s good security hygiene to rotate your encryption keys from time-to-time. Lockbox makes this easy by allowing you to specify previous versions of keys and algorithm:
 58 | 
 59 | ```ruby
 60 | class User < ApplicationRecord
 61 |   encrypts :email, previous_versions: [{key: previous_key}]
 62 | end
 63 | ```
 64 | 
 65 | New data is encrypted with the new key and algorithm, while older data can still be decrypted.
 66 | 
 67 | ## Cleaner Schema
 68 | 
 69 | [attr_encrypted](https://github.com/attr-encrypted/attr_encrypted), the de facto encryption library for database fields, uses two fields for each encrypted attribute: one for the ciphertext and another for the initialization vector.
 70 | 
 71 | ```ruby
 72 | encrypted_email
 73 | encrypted_email_iv
 74 | ```
 75 | 
 76 | However, it’s possible to store both in a single field for a cleaner schema.
 77 | 
 78 | ```ruby
 79 | email_ciphertext
 80 | ```
 81 | 
 82 | ## Hybrid Cryptography
 83 | 
 84 | Hybrid cryptography allows certain servers to encrypt data without the ability to decrypt it. This can do a better job [protecting data](https://ankane.org/decryption-keys) than symmetric cryptography when you can use it. Lockbox makes it just as easy to use hybrid cryptography.
 85 | 
 86 | ```ruby
 87 | class User < ApplicationRecord
 88 |   encrypts :email, algorithm: "hybrid", encryption_key: encryption_key, decryption_key: decryption_key
 89 | end
 90 | ```
 91 | 
 92 | ## Updates
 93 | 
 94 | Since this post was originally published:
 95 | 
 96 | - Lockbox also supports [types](https://ankane.org/lockbox-types)
 97 | - Here’s how to [encrypt user email addresses](https://ankane.org/securing-user-emails-lockbox)
 98 | - Lockbox supports [Mongoid](https://ankane.org/modern-encryption-mongoid)
 99 | 
100 | ## Summary
101 | 
102 | You’ve now seen what Lockbox brings to encryption for Rails. To summarize, it:
103 | 
104 | - Is hard to misuse
105 | - Works with database fields, files, and strings
106 | - Makes it easy to migrate existing data without downtime
107 | - Maximizes compatibility with existing code and libraries
108 | - Uses modern algorithms
109 | - Requires you to only manage a single encryption key
110 | - Makes key rotation easy
111 | - Stores encrypted data in a single field
112 | - Supports hybrid cryptography
113 | 
114 | Try out [Lockbox](https://github.com/ankane/lockbox) today.
115 | 
116 | *Already use a library for encryption? No worries, it’s [easy to migrate](https://github.com/ankane/lockbox#migrating-from-another-library).*
117 | 


--------------------------------------------------------------------------------
/archive/decryption-keys.md:
--------------------------------------------------------------------------------
 1 | # Why and How to Keep Your Decryption Keys Off Web Servers
 2 | 
 3 | <p style="text-align: center;"><img src="/images/key-defense.jpg" alt="Keys" /></p>
 4 | 
 5 | Suppose a worst-case scenario happens: an attacker finds a remote code execution vulnerability and creates a [reverse shell](https://hackernoon.com/reverse-shell-cf154dfee6bd) on one of your web servers. They then find the database credentials, connect to your database, and steal the data.
 6 | 
 7 | For unencrypted data and data encrypted at the storage level, it’s game over. The attacker has it all. If data is encrypted at the application level with symmetric encryption but the encryption key is accessible from the server, it’s exactly the same. The attacker has all they need to decrypt the data offline.
 8 | 
 9 | This is the case whether you store the encryption key in configuration management, an environment variable, or dynamically load it from an outside source. If your app can access the key, it’s vulnerable to compromise.
10 | 
11 | The best way to defend against this attack is make sure the compromised server isn’t able to decrypt data. Web servers are typically most exposed to attacks. If your web servers accept sensitive data but don’t need to show it in its entirety back to users, they should be able to encrypt the data and write it to the database, but not decrypt it. The data can be decrypted and processed by background workers that don’t allow inbound traffic.
12 | 
13 | You likely can’t do this for all of your data, but you should do it for all of the data you can. Sometimes it’s possible to just show partial information back to users. This is universal for saved credit cards.
14 | 
15 | <p style="text-align: center;"><img src="/images/credit-cards.png" alt="Credit cards" /></p>
16 | 
17 | In these cases, you can store the partial data in a separate field which web servers can decrypt, while not allowing them to decrypt the full data.
18 | 
19 | ## Practical Example
20 | 
21 | Suppose we have a service that sends text messages to customers. Customers enter their phone number through the website or mobile app.
22 | 
23 | We can set up web servers so they can only encrypt phone numbers. Text messages can be sent through background jobs which run on a different set of servers - ones that can decrypt and don’t allow inbound traffic. If internal employees need to view full phone numbers, they can use a separate set of web servers that are only accessible through the company VPN.
24 | 
25 | &nbsp; | Encrypt | Decrypt | &nbsp;
26 | --- | --- | --- | ---
27 | Customer web servers | ✓ |
28 | Background workers | ✓ | ✓ | No inbound traffic
29 | Internal web servers | ✓ | ✓ | Requires VPN
30 | 
31 | If customers need to see their saved phone numbers, you can show them the last 4 digits, which are stored in a separate field.
32 | 
33 | ## Approaches
34 | 
35 | Two approaches you can take to accomplish this are:
36 | 
37 | 1. Hybrid cryptography
38 | 2. Cryptography as a service
39 | 
40 | ## Hybrid Cryptography
41 | 
42 | Public key cryptography, or asymmetric cryptography, uses different keys to perform encryption and decryption. Servers that need to encrypt have the encryption key and servers that need to decrypt  have the decryption key.
43 | 
44 | However, public key cryptography is much less efficient than symmetric cryptography, so most implementations combine the two. They use public key cryptography to exchange a symmetric key, and symmetric cryptography to encrypt the data. This is called hybrid cryptography, and it’s how TLS and GPG work.
45 | 
46 | X25519 is a modern key exchange algorithm that’s [widely deployed](https://ianix.com/pub/curve25519-deployment.html) and [currently recommended](https://paragonie.com/blog/2019/03/definitive-2019-guide-cryptographic-key-sizes-and-algorithm-recommendations#after-fold).
47 | 
48 | [Libsodium](https://libsodium.gitbook.io/doc/), which uses X25519, is a great option for hybrid cryptography in applications. It has [libraries](https://libsodium.gitbook.io/doc/bindings_for_other_languages) for most languages.
49 | 
50 | ## Cryptography as a Service
51 | 
52 | Another approach is to use a service to perform encryption and decryption. This service can allow some sets of servers to encrypt and others to decrypt. You could write your own (micro)service, but there are a number of existing solutions, often called key management services (KMS).
53 | 
54 | - [Vault](https://www.vaultproject.io/)
55 | - [AWS KMS](https://aws.amazon.com/kms/)
56 | - [Google Cloud KMS](https://cloud.google.com/kms/)
57 | 
58 | These services don’t store the encrypted data - they just encrypt and decrypt on-demand. You can either encrypt data directly with the KMS or use envelope encryption.
59 | 
60 | ### Direct Encryption
61 | 
62 | With direct encryption, you don’t need to set up encryption in your app. Whenever you need to encrypt or decrypt data, simply send the data to the KMS.
63 | 
64 | However, this has a few downsides. It exposes the unencrypted data to the KMS, which is disastrous if the KMS alone is breached. It’s also less efficient for large files and hosted services have a fairly low limit on the size of data you can encrypt.
65 | 
66 | ### Envelope Encryption
67 | 
68 | Another approach is envelope encryption, which addresses the issues above but requires encryption in your app.
69 | 
70 | To encrypt, generate a random encryption key, known as a data encryption key (DEK), and use it to encrypt the data. Then encrypt the DEK with the KMS and store the encrypted version.
71 | 
72 | To decrypt, decrypt the DEK with the KMS and then use it to decrypt the data. This way, the KMS only ever sees the DEK.
73 | 
74 | ### Auditing
75 | 
76 | Another benefit of cryptography as a service is auditing. You can see exactly when data or DEKs are decrypted, and there’s no way to get around the auditing without compromising the KMS. This makes it easy to tell which information was accessed during a breach.
77 | 
78 | ## Conclusion
79 | 
80 | We don’t encrypt data for a sunny day. You’ve now seen two approaches to limit damage in the event of a web server breach.
81 | 
82 | If you use Ruby on Rails, I’ve written a companion piece on [hybrid cryptography](/hybrid-cryptography-rails) with code for how to do this.
83 | 


--------------------------------------------------------------------------------
/archive/gem-patterns.md:
--------------------------------------------------------------------------------
  1 | # Gem Patterns
  2 | 
  3 | I’ve created [a few](https://ankane.org/opensource?language=Ruby) Ruby gems over the years, and there are a number of patterns I’ve found myself repeating that I wanted to share. I didn’t invent them, but have long forgotten where I first saw them. They are:
  4 | 
  5 | - [Rails Migrations](#rails-migrations)
  6 | - [Rails Dependencies](#rails-dependencies)
  7 | - [Testing Against Multiple Dependency Versions](#testing-against-multiple-dependency-versions)
  8 | - [Testing Against Rails](#testing-against-rails)
  9 | - [Coding Your Gemspec](#coding-your-gemspec)
 10 | 
 11 | Let’s dig into each of them. In the examples, the gem is called `hello`.
 12 | 
 13 | ## Rails Migrations
 14 | 
 15 | Create a template in `lib/generators/hello/templates/migration.rb.tt`:
 16 | 
 17 | ```ruby
 18 | class <%= migration_class_name %> < ActiveRecord::Migration<%= migration_version %>
 19 |   def change
 20 |     # your migration
 21 |   end
 22 | end
 23 | ```
 24 | 
 25 | The `.tt` extension denotes Thor template. [Thor](https://github.com/erikhuda/thor) is what Rails uses under the hood.
 26 | 
 27 | Add `lib/generators/hello/install_generator.rb`
 28 | 
 29 | ```ruby
 30 | require "rails/generators/active_record"
 31 | 
 32 | module Hello
 33 |   module Generators
 34 |     class InstallGenerator < Rails::Generators::Base
 35 |       include ActiveRecord::Generators::Migration
 36 |       source_root File.join(__dir__, "templates")
 37 | 
 38 |       def copy_migration
 39 |         migration_template "migration.rb", "db/migrate/install_hello.rb", migration_version: migration_version
 40 |       end
 41 | 
 42 |       def migration_version
 43 |         "[#{ActiveRecord::VERSION::MAJOR}.#{ActiveRecord::VERSION::MINOR}]"
 44 |       end
 45 |     end
 46 |   end
 47 | end
 48 | ```
 49 | 
 50 | This lets you run:
 51 | 
 52 | ```sh
 53 | rails generate hello:install
 54 | ```
 55 | 
 56 | Change the generator path and class name to match your gem. They must match exactly what Rails expects to work.
 57 | 
 58 | [Example](https://github.com/ankane/archer/blob/master/lib/generators/archer/install_generator.rb)
 59 | 
 60 | ## Rails Dependencies
 61 | 
 62 | If your gem depends on Rails, add `railties` and any other Rails libraries it needs.
 63 | 
 64 | ```ruby
 65 | spec.add_dependency "railties", ">= 5"
 66 | spec.add_dependency "activerecord", ">= 5"
 67 | ```
 68 | 
 69 | I typically require a [supported version](https://rubyonrails.org/security/) of Rails.
 70 | 
 71 | In code, don’t require Rails gems directly, as this can cause them to load early and introduce issues.
 72 | 
 73 | ```ruby
 74 | require "active_record" # bad!!
 75 | 
 76 | ActiveRecord::Base.include(Hello::Model)
 77 | ```
 78 | 
 79 | Instead, do:
 80 | 
 81 | ```ruby
 82 | require "active_support"
 83 | 
 84 | ActiveSupport.on_load(:active_record) do
 85 |   include Hello::Model
 86 | end
 87 | ```
 88 | 
 89 | [Example](https://github.com/ankane/hightop/blob/master/lib/hightop.rb)
 90 | 
 91 | ## Testing Against Multiple Dependency Versions
 92 | 
 93 | If your gem has dependencies, you may want to test against multiple versions of a dependency. For instance, you may want to test against multiple versions of Active Record.
 94 | 
 95 | To do this, create a `test/gemfiles` directory (or `spec/gemfiles` if you use RSpec).
 96 | 
 97 | Create `test/gemfiles/activerecord50.gemfile` with:
 98 | 
 99 | ```ruby
100 | source "https://rubygems.org"
101 | 
102 | gemspec path: "../../"
103 | 
104 | gem "activerecord", "~> 5.0.0"
105 | ```
106 | 
107 | Install with:
108 | 
109 | ```sh
110 | BUNDLE_GEMFILE=test/gemfiles/activerecord50.gemfile bundle install
111 | ```
112 | 
113 | And run with:
114 | 
115 | ```sh
116 | BUNDLE_GEMFILE=test/gemfiles/activerecord50.gemfile bundle exec rake
117 | ```
118 | 
119 | [Example](https://github.com/ankane/groupdate/tree/master/test/gemfiles)
120 | 
121 | On Travis CI, you can add to `.travis.yml`:
122 | 
123 | ```yml
124 | gemfile:
125 |   - Gemfile
126 |   - test/gemfiles/activerecord50.gemfile
127 | ```
128 | 
129 | You can also use a library like [Appraisal](https://github.com/thoughtbot/appraisal) to help generate and run these files.
130 | 
131 | ## Testing Against Rails
132 | 
133 | To test against Rails, use a library like [Combustion](https://github.com/pat/combustion). It’s designed to be used with RSpec, but I haven’t had any issues with Minitest. Combustion generates some files that aren’t needed, so I just delete them.
134 | 
135 | ```ruby
136 | Combustion.initialize! :all
137 | ```
138 | 
139 | [Example](https://github.com/ankane/field_test/tree/master/test)
140 | 
141 | ## Coding Your Gemspec
142 | 
143 | There are a variety of ways to code your gemspec. Here’s the one I like to use:
144 | 
145 | ```ruby
146 | require_relative "lib/hello/version"
147 | 
148 | Gem::Specification.new do |spec|
149 |   spec.name          = "hello"
150 |   spec.version       = Hello::VERSION
151 |   spec.summary       = "Hello world"
152 |   spec.homepage      = "https://github.com/you/hello"
153 |   spec.license       = "MIT"
154 | 
155 |   spec.author        = "Your Name"
156 |   spec.email         = "you@example.com"
157 | 
158 |   spec.files         = Dir["*.{md,txt}", "{lib}/**/*"]
159 |   spec.require_path  = "lib"
160 | 
161 |   spec.required_ruby_version = ">= 2.4"
162 | 
163 |   spec.add_dependency "activesupport", ">= 5"
164 | 
165 |   spec.add_development_dependency "bundler"
166 |   spec.add_development_dependency "rake"
167 | end
168 | ```
169 | 
170 | Change `files` if your gem has `app`, `config`, or `vendor` directories. I typically use the [last supported version](https://www.ruby-lang.org/en/downloads/branches/) for the minimum Ruby version.
171 | 
172 | If your gem has an executable file, add:
173 | 
174 | ```ruby
175 | spec.bindir        = "exe"
176 | spec.executables   = ["hello"]
177 | ```
178 | 
179 | [Don’t check in](https://yehudakatz.com/2010/12/16/clarifying-the-roles-of-the-gemspec-and-gemfile/) `Gemfile.lock`.
180 | 
181 | Some gems have moved development dependencies entirely out of the gemspec and into the Gemfile, which is another option.
182 | 
183 | ## Summary
184 | 
185 | You’ve now seen five patterns that can be useful for Ruby gems. Now go build something awesome!
186 | 


--------------------------------------------------------------------------------
/archive/securing-user-emails-in-rails.md:
--------------------------------------------------------------------------------
  1 | # Securing User Emails in Rails
  2 | 
  3 | ---
  4 | 
  5 | *There is an [updated version](https://ankane.org/securing-user-emails-lockbox) of this post.*
  6 | 
  7 | ---
  8 | 
  9 | The GDPR goes into effect next Friday. Whether or not you serve European residents, it’s a great reminder that we have the responsibility to build systems in a way that protects user privacy.
 10 | 
 11 | Email addresses are a common form of personal data, and they’re often stored unencrypted. If an attacker gains access to the database or backups, emails will be compromised.
 12 | 
 13 | This post will walk you through a practical approach to protecting emails. It works with [Devise](https://github.com/plataformatec/devise), the most popular authentication framework for Rails, and is general enough to work with others.
 14 | 
 15 | ## Strategy
 16 | 
 17 | We’ll use two concepts to make this happen: encryption and blind indexing. Encryption gives us a way to securely store the data, and blind indexing provides a way to look it up.
 18 | 
 19 | Blind indexing works by computing a hash of the data. You’re probably familiar with hash functions like MD5 and SHA1. Rather than one of these, we use a hash function that takes a secret key and uses [key stretching](https://en.wikipedia.org/wiki/Key_stretching) to slow down brute force attempts. You can read more about [blind indexing here](https://www.sitepoint.com/how-to-search-on-securely-encrypted-database-fields/).
 20 | 
 21 | We’ll use the [attr_encrypted gem](https://github.com/attr-encrypted/attr_encrypted) for encryption and the [blind_index gem](https://github.com/ankane/blind_index) for blind indexing.
 22 | 
 23 | ## Instructions
 24 | 
 25 | Let’s assume you have a `User` model with an email field.
 26 | 
 27 | Add to your Gemfile:
 28 | 
 29 | ```ruby
 30 | gem 'attr_encrypted'
 31 | gem 'blind_index'
 32 | ```
 33 | 
 34 | And run:
 35 | 
 36 | ```sh
 37 | bundle install
 38 | ```
 39 | 
 40 | Next, let’s replace the email field with an encrypted version. Create a migration:
 41 | 
 42 | ```sh
 43 | rails g migration add_encrypted_email_to_users
 44 | ```
 45 | 
 46 | And add:
 47 | 
 48 | ```ruby
 49 | class AddEncryptedEmailToUsers < ActiveRecord::Migration[5.2]
 50 |   def change
 51 |     # encrypted data
 52 |     add_column :users, :encrypted_email, :string
 53 |     add_column :users, :encrypted_email_iv, :string
 54 |     add_index :users, :encrypted_email_iv, unique: true
 55 | 
 56 |     # blind index
 57 |     add_column :users, :encrypted_email_bidx, :string
 58 |     add_index :users, :encrypted_email_bidx, unique: true
 59 | 
 60 |     # drop original here unless we have existing users
 61 |     remove_column :users, :email
 62 |   end
 63 | end
 64 | ```
 65 | 
 66 | We use one column to store the encrypted data, one to store [the IV](http://www.cryptofails.com/post/70059609995/crypto-noobs-1-initialization-vectors), and another to store the blind index.
 67 | 
 68 | We add a unique index on the IV since reusing an IV with the same key in AES-GCM (the default algorithm for attr_encrypted) will [leak the key](https://csrc.nist.gov/csrc/media/projects/block-cipher-techniques/documents/bcm/joux_comments.pdf).
 69 | 
 70 | Then migrate:
 71 | 
 72 | ```sh
 73 | rails db:migrate
 74 | ```
 75 | 
 76 | Next, generate keys. We use environment variables to store the keys as hex-encoded strings ([dotenv](https://github.com/bkeepers/dotenv) is great for this). [Here’s an explanation](https://ankane.org/encryption-keys) of why `pack` is used. *Do not commit them to source control.* Generate one key for encryption and one key for hashing. You can generate keys in the Rails console with:
 77 | 
 78 | ```ruby
 79 | SecureRandom.hex(32)
 80 | ```
 81 | 
 82 | For development, you can use these:
 83 | 
 84 | ```sh
 85 | EMAIL_ENCRYPTION_KEY=0000000000000000000000000000000000000000000000000000000000000000
 86 | EMAIL_BLIND_INDEX_KEY=ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
 87 | ```
 88 | 
 89 | Add to your user model:
 90 | 
 91 | ```ruby
 92 | class User < ApplicationRecord
 93 |   attr_encrypted :email, key: [ENV["EMAIL_ENCRYPTION_KEY"]].pack("H*")
 94 |   blind_index :email, key: [ENV["EMAIL_BLIND_INDEX_KEY"]].pack("H*")
 95 | end
 96 | ```
 97 | 
 98 | > `pack` is used to decode the hex value
 99 | 
100 | Create a new user and confirm it works.
101 | 
102 | ## Existing Users
103 | 
104 | If you have existing users, we need to backfill the data before dropping the email column. We temporarily use a virtual attribute - `protected_email` - so we can backfill without downtime.
105 | 
106 | ```ruby
107 | class User < ApplicationRecord
108 |   attr_encrypted :protected_email, key: [ENV["EMAIL_ENCRYPTION_KEY"]].pack("H*"), attribute: "encrypted_email"
109 |   blind_index :protected_email, key: [ENV["EMAIL_BLIND_INDEX_KEY"]].pack("H*"), attribute: "email", bidx_attribute: "encrypted_email_bidx"
110 | 
111 |   before_validation :protect_email, if: -> { email_changed? }
112 | 
113 |   def protect_email
114 |     self.protected_email = email
115 |     compute_protected_email_bidx
116 |   end
117 | end
118 | ```
119 | 
120 | Backfill the data in the Rails console:
121 | 
122 | ```ruby
123 | User.where(encrypted_email: nil).find_each do |user|
124 |   user.protect_email
125 |   user.save!
126 | end
127 | ```
128 | 
129 | Then update the model to the desired state:
130 | 
131 | ```ruby
132 | class User < ApplicationRecord
133 |   attr_encrypted :email, key: [ENV["EMAIL_ENCRYPTION_KEY"]].pack("H*")
134 |   blind_index :email, key: [ENV["EMAIL_BLIND_INDEX_KEY"]].pack("H*")
135 | 
136 |   # remove this line after dropping email column
137 |   self.ignored_columns = ["email"]
138 | end
139 | ```
140 | 
141 | Finally, drop the email column.
142 | 
143 | ## Logging
144 | 
145 | We also need to make sure email addresses aren’t logged. Add to `config/initializers/filter_parameter_logging.rb`:
146 | 
147 | ```ruby
148 | Rails.application.config.filter_parameters += [:email]
149 | ```
150 | 
151 | Use [Logstop](https://github.com/ankane/logstop) to filter anything that looks like an email address as an extra line of defense. Add to your Gemfile:
152 | 
153 | ```ruby
154 | gem 'logstop'
155 | ```
156 | 
157 | And create `config/initializers/logstop.rb` with:
158 | 
159 | ```ruby
160 | Logstop.guard(Rails.logger)
161 | ```
162 | 
163 | ## Summary
164 | 
165 | We now have a way to encrypt data and query for exact matches. You can apply this same approach to other fields as well. For more security, consider a [key management service](https://github.com/ankane/kms_encrypted) to manage your keys.
166 | 


--------------------------------------------------------------------------------
/archive/new-ml-gems.md:
--------------------------------------------------------------------------------
  1 | # 16 New ML Gems for Ruby
  2 | 
  3 | <p style="text-align: center; margin-bottom: 0;">
  4 |   <img src="/images/ml-gems-2.png" alt="New ML Gems" style="max-height: 300px;" />
  5 | </p>
  6 | 
  7 | In August, I set out to improve the machine learning ecosystem for Ruby. I wasn’t sure where it would go. Over the next 5 months, I ended up releasing 16 libraries and learned a lot along the way. I wanted to share some of that knowledge and introduce some of the libraries you can now use in Ruby.
  8 | 
  9 | ## The Theme
 10 | 
 11 | There are many great machine libraries for Python, so a natural place to start was to see what it’d take to bring them to Ruby. It turned out to be a lot less work than expected based on a common theme.
 12 | 
 13 | ML libraries want to be fast. This means less time waiting and more time iterating. However, interpreted languages like Python and Ruby aren’t relatively fast. How do libraries overcome this?
 14 | 
 15 | The key is they do most of the work in a compiled language - typically C++ - and have wrappers for other languages like Python.
 16 | 
 17 | This was really great news. The same approach and code could be used for Ruby.
 18 | 
 19 | ## The Patterns
 20 | 
 21 | Ruby has a number of ways to call C and C++ code.
 22 | 
 23 | Native extensions are one method. They’re written in C or C++ and use [Ruby’s C API](https://silverhammermba.github.io/emberb/c/). You may have noticed gems with native extensions taking longer to install, as they need to compile.
 24 | 
 25 | ```c
 26 | void Init_stats()
 27 | {
 28 |     VALUE mStats = rb_define_module("Stats");
 29 |     rb_define_module_function(mStats, "mean", mean, 2);
 30 | }
 31 | ```
 32 | 
 33 | A more general way for one language to call another is a foreign function interface, or FFI. It requires a C API (due to C++ name mangling), which many machine learning libraries had. An advantage of FFI is you can define the interface in the host language - in our case, Ruby.
 34 | 
 35 | Ruby supports FFI with Fiddle. It was added in Ruby 1.9, but appears to be [“the Ruby standard library’s best kept secret.”](https://www.honeybadger.io/blog/use-any-c-library-from-ruby-via-fiddle-the-ruby-standard-librarys-best-kept-secret/)
 36 | 
 37 | ```ruby
 38 | module Stats
 39 |   extend Fiddle::Importer
 40 |   dlload "libstats.so"
 41 |   extern "double mean(int a, int b)"
 42 | end
 43 | ```
 44 | 
 45 | There’s also the [FFI](https://github.com/ffi/ffi) gem, which provides higher-level functionality and overcomes some limitations of Fiddle (like the ability to pass structs by value).
 46 | 
 47 | ```ruby
 48 | module Stats
 49 |   extend FFI::Library
 50 |   ffi_lib "stats"
 51 |   attach_function :mean, [:int, :int], :double
 52 | end
 53 | ```
 54 | 
 55 | For libraries without a C API, [Rice](https://github.com/jasonroelofs/rice) provides a really nice way to bind C++ code (similar to Python’s pybind11).
 56 | 
 57 | ```cpp
 58 | void Init_stats()
 59 | {
 60 |     Module mStats = define_module("Stats");
 61 |     mStats.define_singleton_method("mean", &mean);
 62 | }
 63 | ```
 64 | 
 65 | Another approach is SWIG (Simplified Wrapper and Interface Generator). You create an interface file and then run SWIG to generate the bindings. Gusto has a [good tutorial](https://engineering.gusto.com/simple-ruby-c-extensions-with-swig/) on this.
 66 | 
 67 | ```swig
 68 | %module stats
 69 | 
 70 | double mean(int, int);
 71 | ```
 72 | 
 73 | There’s also [Rubex](https://github.com/SciRuby/rubex), which lets you write Ruby-like code that compiles to C (similar to Python’s Cython). It also provides the ability to interface with C libraries.
 74 | 
 75 | ```ruby
 76 | lib "<stats.h>"
 77 |   double mean(int, int)
 78 | end
 79 | ```
 80 | 
 81 | None of the approaches above are specific to machine learning, so you can use them with any C or C++ library.
 82 | 
 83 | ## The Libraries
 84 | 
 85 | Libraries were chosen based on popularity and performance. Many have a similar interface to their Python counterpart to make it easy to follow existing tutorials. Libraries are broken down into categories below with brief descriptions.
 86 | 
 87 | ### Gradient Boosting
 88 | 
 89 | [XGBoost](https://github.com/ankane/xgb) and [LightGBM](https://github.com/ankane/lightgbm) are gradient boosting libraries. Gradient boosting is a powerful technique for building predictive models that fits many small decision trees that together make robust predictions, even with outliers and missing values. Gradient boosting performs well on tabular data.
 90 | 
 91 | ### Deep Learning
 92 | 
 93 | [Torch-rb](https://github.com/ankane/torch-rb) and [TensorFlow](https://github.com/ankane/tensorflow) are deep learning libraries. Torch-rb is built on LibTorch, the library that powers PyTorch. Deep learning has been very successful in areas like image recognition and natural language processing.
 94 | 
 95 | ### Recommendations
 96 | 
 97 | [Disco](https://github.com/ankane/disco) is a recommendation library. It looks at ratings or actions from users to predict other items they might like, known as collaborative filtering. Matrix factorization is a common way to accomplish this.
 98 | 
 99 | [LIBMF](https://github.com/ankane/libmf) is a high-performance matrix factorization library.
100 | 
101 | Collaborative filtering can also find similar users and items. If you have a large number of users or items, an approximate nearest neighbor algorithm can speed up the search. Spotify [does this](https://github.com/spotify/annoy#background) for music recommendations.
102 | 
103 | [NGT](https://github.com/ankane/ngt) is an approximate nearest neighbor library that performs extremely well on benchmarks (in Python/C++).
104 | 
105 | <p style="text-align: center; margin-bottom: 0;">
106 |   <img src="/images/ann-benchmarks.png" alt="ANN Benchmarks" />
107 | </p>
108 | 
109 | <p class="image-description">
110 |   Image from <a href="https://github.com/erikbern/ann-benchmarks" target="_blank">ANN Benchmarks</a>, MIT license</a>
111 | </p>
112 | 
113 | Another promising technique for recommendations is factorization machines. The traditional approach to collaborative filtering builds a model exclusively from past ratings or actions. However, you may have additional *side information* about users or items. Factorization machines can incorporate this data. They can also perform classification and regression.
114 | 
115 | [xLearn](https://github.com/ankane/xlearn) is a high-performance library for factorization machines.
116 | 
117 | ### Optimization
118 | 
119 | Optimization finds the best solution to a problem out of many possible solutions. Scheduling and vehicle routing are two common tasks. Optimization problems have an objective function to minimize (or maximize) and a set of constraints.
120 | 
121 | Linear programming is an approach you can use when the objective function and constraints are linear. Here’s a really good [introductory series](https://www.youtube.com/watch?v=0TD9EQcheZM) if you want to learn more.
122 | 
123 | [SCS](https://github.com/ankane/scs) is a library that can solve [many types](https://www.cvxpy.org/tutorial/advanced/index.html#choosing-a-solver) of optimization problems.
124 | 
125 | [OSQP](https://github.com/ankane/osqp) is another that’s specifically designed for quadratic problems.
126 | 
127 | ### Text Classification
128 | 
129 | [fastText](https://github.com/ankane/fasttext) is a text classification and word representation library. It can label documents with one or more categories, which is useful for content tagging, spam filtering, and language detection. It can also compute word vectors, which can be compared to find similar words and analogies.
130 | 
131 | ### Interoperability
132 | 
133 | It’s nice when languages play nicely together.
134 | 
135 | [ONNX Runtime](https://github.com/ankane/onnxruntime) is a scoring engine for ML models. You can build a model in one language, save it in the ONNX format, and run it in another. Here’s [an example](/tensorflow-ruby).
136 | 
137 | [Npy](https://github.com/ankane/npy) is a library for saving and loading NumPy `npy` and `npz` files. It uses [Numo](/numo) for multi-dimensional arrays.
138 | 
139 | ### Others
140 | 
141 | [Vowpal Wabbit](https://github.com/ankane/vowpalwabbit) specializes in online learning. It’s great for reinforcement learning as well as supervised learning where you want to train a model incrementally instead of all at once. This is nice when you have a lot of data.
142 | 
143 | [ThunderSVM](https://github.com/ankane/thundersvm) is an SVM library that runs in parallel on either CPUs or GPUs.
144 | 
145 | [GSLR](https://github.com/ankane/gslr) is a linear regression library powered by GSL that supports both ordinary least squares and ridge regression. It can be used alone or to improve the performance of [Eps](https://github.com/ankane/eps).
146 | 
147 | ## Shout-out
148 | 
149 | I wanted to also give a shout-out to another library that entered the scene in 2019.
150 | 
151 | [Rumale](https://github.com/yoshoku/rumale) is a machine learning library that supports many, many algorithms, similar to Python’s Scikit-learn. Thanks [@yoshoku](https://github.com/yoshoku) for the amazing work!
152 | 
153 | ## Final Word
154 | 
155 | There are now many state-of-the-art machine learning libraries available for Ruby. If you’re a Ruby engineering who’s interested in machine learning, now’s a good time to try it. Also, if you come across a C or C++ library you want to use in Ruby, you’ve seen a few ways to do it. Let’s make Ruby a great language for machine learning.
156 | 


--------------------------------------------------------------------------------
/archive/rails-meet-data-science.md:
--------------------------------------------------------------------------------
  1 | # Rails, Meet Data Science
  2 | 
  3 | <p style="text-align: center;"><img src="/images/rails-meet-ds.png" alt="Rails, Meet Data Science" /></p>
  4 | 
  5 | Organizations today have more data than ever. Predictive modeling is a powerful way to use this data to solve problems and create better experiences for customers. For instance, do a better job keeping items in stock by predicting demand or lower costs by predicting fraud. If you use Ruby on Rails, it can be tough to know how to incorporate this into your app.
  6 | 
  7 | We’ll go over four patterns you can use for prediction with Rails. We used all four successfully during my time at [Instacart](https://www.instacart.com). They can work when you have no data scientists (when I started) as well as when you have a strong data science team.
  8 | 
  9 | ## Patterns
 10 | 
 11 | With predictive modeling, you first train a model and then use it to predict. The patterns can be grouped by the language used for each task:
 12 | 
 13 | Pattern | Train | Predict
 14 | --- | --- | ---
 15 | 1 | 3rd Party | 3rd Party
 16 | 2 | Ruby | Ruby
 17 | 3 | Another Language | Ruby
 18 | 4 | Another Language | Another Language
 19 | 
 20 | Two popular languages for data science are Python and R.
 21 | 
 22 | You can decide which pattern to use for each model you build. We’ll walk through the approaches and discuss the pros and cons of each.
 23 | 
 24 | ## Pattern 1: Use a 3rd Party
 25 | 
 26 | Before building a model in-house, it’s good to see what already exists. There are a number of external services you can use for specific problems. Here are a few:
 27 | 
 28 | - Fraud - [Sift Science](https://siftscience.com/)
 29 | - Recommendations - [Tamber](https://tamber.com/)
 30 | - Anomaly Detection & Forecasting - [Trend](https://trendapi.org/)
 31 | - NLP - [Amazon Comprehend](https://aws.amazon.com/comprehend/) and [Google Cloud Natural Language](https://cloud.google.com/natural-language/)
 32 | - Vision - [AWS Rekognition](https://aws.amazon.com/rekognition/) and [Google Cloud Vision](https://cloud.google.com/vision/)
 33 | 
 34 | Pros
 35 | 
 36 | - Get domain knowledge from the company
 37 | - Fast to implement and easy to maintain
 38 | 
 39 | Cons
 40 | 
 41 | - Not easy to iterate if it doesn’t fit your needs
 42 | - Vendor lock-in
 43 | 
 44 | ## Pattern 2: Train and Predict in Ruby
 45 | 
 46 | Ruby has a number of libraries for building simple models. Simple models can perform very well since a large part of model building is [feature engineering](https://en.wikipedia.org/wiki/Feature_engineering). This is a great option if there are no data scientists in your company or on your team. A developer can own the model end-to-end, which is great for speed and iteration.
 47 | 
 48 | Here are a few libraries for building models in Ruby:
 49 | 
 50 | - [Eps](https://github.com/ankane/eps) - good for beginners
 51 | - [Rumale](https://github.com/yoshoku/rumale) - good for advanced users
 52 | - [Xgb](https://github.com/ankane/xgb) - XGBoost
 53 | - [LightGBM](https://github.com/ankane/lightgbm) - LightGBM
 54 | - And [many more](https://github.com/arbox/machine-learning-with-ruby)
 55 | 
 56 | Once a model is trained, you’ll need to store it. You can use methods provided by the library, or marshal if none exist. You can store the models as files or in the database.
 57 | 
 58 | Be sure to commit the code used to train models so you can update them with newer data in the future. The Rails console is a decent place to create them, or use a [Jupyter notebook](https://jupyter.org/) running [IRuby](https://github.com/SciRuby/iruby) for better visualizations (see [setup instructions for Rails](https://ankane.org/jupyter-rails)).
 59 | 
 60 | Pros
 61 | 
 62 | - Simple models can perform well
 63 | - No need to introduce a new language
 64 | 
 65 | Cons
 66 | 
 67 | - Limited tools for building models
 68 | - Limited model selection
 69 | - Many people who have experience building models don’t know Ruby
 70 | 
 71 | ## Pattern 3: Train in Another Language, Predict in Ruby
 72 | 
 73 | Ruby is getting better for data science thanks to [SciRuby](https://github.com/SciRuby/sciruby). However, languages like R and Python currently have much better tools. Also, many people who have experience building models don’t know Ruby.
 74 | 
 75 | Luckily, you can build models in another language and predict in Ruby. This way, you can use more advanced tools for visualization, validation, and tuning without adding complexity to your production stack. If you don’t have data scientists, you can use this pattern to contract with one.
 76 | 
 77 | Here are models that can currently predict in Ruby:
 78 | 
 79 | - [Eps](https://github.com/ankane/eps) - Linear Regression, Naive Bayes
 80 | - [Scoruby](https://github.com/asafschers/scoruby) - Random Forest, GBM, Decision Tree, Naive Bayes
 81 | - [Xgb](https://github.com/ankane/xgb) - XGBoost
 82 | - [LightGBM](https://github.com/ankane/lightgbm) - LightGBM
 83 | 
 84 | For this to work, models need to be stored in a shared format that both languages understand. PMML and PFA are two interchange formats. PFA is newer but has less adoption than PMML. Andrey Melentyev has a [great post](https://www.andrey-melentyev.com/model-interoperability.html) on the topic.
 85 | 
 86 | Once again, it’s important that models are reproducible. This allows you to update them with newer data in the future. Be sure to follow software engineering best practices like:
 87 | 
 88 | - Use source control (create a new repo or add to your existing repo)
 89 | - Use a package manager for a reproducible environment
 90 | - Keep credentials out of source control (use `.env` or `.Renviron`)
 91 | 
 92 | Here are some tools you can use:
 93 | 
 94 | Function | Python | R
 95 | --- | --- | --- | ---
 96 | Package management | [Pipenv](https://pipenv.readthedocs.io/en/latest/) | [Jetpack](https://github.com/ankane/jetpack)
 97 | Database access | [SQLAlchemy](https://www.sqlalchemy.org/) | [dbx](https://github.com/ankane/dbx)
 98 | PMML export | [sklearn2pmml](https://github.com/jpmml/sklearn2pmml) | [pmml](https://cran.r-project.org/package=pmml)
 99 | 
100 | One place to be careful is implementing the features in Ruby. It must be consistent with how they were implemented in training. To ensure this is correct, verify it programmatically. Create a CSV file with ids and predictions from the original model and confirm the Ruby predictions match. Here’s some [example code](https://github.com/ankane/eps#verifying).
101 | 
102 | Pros
103 | 
104 | - Better tools for model building
105 | - No need to operate a new language in production
106 | 
107 | Cons
108 | 
109 | - Need to introduce a new language in development
110 | - Limited model selection
111 | - Need to create features in two languages
112 | 
113 | ## Pattern 4: Train and Predict in Another Language
114 | 
115 | The last option we’ll cover is doing both training and prediction outside Ruby. This is great if you have a team of data scientists who specialize in another language. This pattern allows data scientists to own models end-to-end.
116 | 
117 | It also gives you access to models that are not available in Ruby. For instance, there are forecasting libraries like [Prophet](https://facebook.github.io/prophet/) and deep learning libraries like [TensorFlow](https://www.tensorflow.org/).
118 | 
119 | The implementation depends on how predictions are generated. Two common ways are batch and real-time.
120 | 
121 | ---
122 | 
123 | ### Batch Predictions
124 | 
125 | Batch predictions are generated asynchronously and are typically run on a regular interval. This can be every minute or once a week. An example is a daily job that updates demand forecasts for the following weeks. Predictions can be stored and later used by the Rails app as needed.
126 | 
127 | Don’t be afraid to read and write directly to the database. While microservice design patterns caution against using the database as an API, we didn’t have much issue with it. When updating records, it’s also a good idea to write audits to see how predictions change over time.
128 | 
129 | Jobs can be scheduled with cron, or ideally a distributed scheduler like [Mani](https://github.com/sherinkurian/mani) for high availability. If you need to let the Rails app know a job has completed, you can do this through your messaging system. HTTP works great if you don’t have one.
130 | 
131 | ---
132 | 
133 | ### Real-Time Predictions
134 | 
135 | Real-time predictions are generated synchronously and are triggered by calls from the Rails app. An example is recommending items to a user at checkout based off what’s in their cart.
136 | 
137 | HTTP is a common choice for retrieving predictions, but you can use a messaging system or even pipes. Great tools for HTTP are [Django](https://www.djangoproject.com/) and [Flask](http://flask.pocoo.org/) for Python and [Plumber](https://www.rplumber.io/) for R.
138 | 
139 | ---
140 | 
141 | As with the other patterns, follow best engineering practices. In addition to ones previously mentioned:
142 | 
143 | - Use a framework, or at the very least a consistent project structure
144 | - Keep code [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself)
145 | 
146 | Don’t be afraid to use Rails to manage the database schema. It’s easy enough for data scientists to learn to create and run migrations. Otherwise, you need to support another system for schema changes.
147 | 
148 | To store models, you most likely won’t use an interchange format, since libraries can’t load them. Instead, use serialization specific to the language, like pickle in Python and serialize in R.
149 | 
150 | If deciding between Python and R, Python has more general purpose libraries, so it’s easier to run in production.
151 | 
152 | Pros
153 | 
154 | - Larger selection of models available
155 | - Data scientists can own models end-to-end
156 | 
157 | Cons
158 | 
159 | - Need to run multiple languages in production
160 | 
161 | ## Conclusion
162 | 
163 | You’ve now seen four great patterns for bringing predictive models to Rails. Each has different trade-offs, so we recommend taking the simplest approach that works for you. No matter which you choose, make sure your models are reproducible.
164 | 
165 | Happy modeling!
166 | 
167 | <div style="margin-top: 2rem;"></div>
168 | 
169 | Updates
170 | 
171 | - May 2019: Added Rumale
172 | - August 2019: Added Xgb and LightGBM
173 | 


--------------------------------------------------------------------------------
/archive/scaling-the-monolith.md:
--------------------------------------------------------------------------------
  1 | # Scaling the Monolith
  2 | 
  3 | Many companies start out with a single web application. As the team and codebase grow, things feel less organized and common tasks like booting the app and running the test suite take longer and longer. It can be tempting to turn to microservices to alleviate some of this pain. However, distributed systems add a significant amount of complexity and mental overhead.
  4 | 
  5 | Before you decide to split apart your app, there are a number of tactics you can use to scale it [majestically](https://m.signalvnoise.com/the-majestic-monolith-29166d022228#.bst5vwy6r). Spend a significant amount of time trying to solve your existing problems before making big changes.
  6 | 
  7 | The topics we’ll cover are:
  8 | 
  9 | - [Code](#code)
 10 | - [Errors](#errors)
 11 | - [Boot Times & Memory](#boot-times-memory)
 12 | - [Testing](#testing)
 13 | - [Databases](#databases)
 14 | - [Stability](#stability)
 15 | 
 16 | The examples are geared towards Rails apps, but the principles apply to any codebase.
 17 | 
 18 | ## Code
 19 | 
 20 | Rails models and controllers tend to get larger and larger. Rails introduced [concerns](https://signalvnoise.com/posts/3372-put-chubby-models-on-a-diet-with-concerns) as one way to address this. Concerns allow you to pull out related logic into a separate file.
 21 | 
 22 | Service objects are another nice pattern for this. Here’s [an example](https://hackernoon.com/service-objects-in-ruby-on-rails-and-you-79ca8a1c946e) of a service object. There’s not a standard way to create service objects, but it’s a good idea to decide on a convention for your app. You can use gems like [Interactor](https://github.com/collectiveidea/interactor) to establish one.
 23 | 
 24 | Use namespaces to organize code.
 25 | 
 26 | ```ruby
 27 | class Admin::UsersController < Admin::BaseController
 28 | end
 29 | ```
 30 | 
 31 | Some teams also prefer to use Rails engines, although I’m not a fan of this approach. Here’s a [good comparison](https://stackoverflow.com/a/29641532/1177228) of the pros and cons of each.
 32 | 
 33 | ## Errors
 34 | 
 35 | As the team grows, it’s important that errors get routed to the right place. You can use the [ownership](https://github.com/ankane/ownership) gem to help with this. Add it to controllers, jobs, and rake tasks.
 36 | 
 37 | ```ruby
 38 | class WelcomeJob < ApplicationJob
 39 |   owner :growth
 40 | end
 41 | ```
 42 | 
 43 | `git blame` can help with assigning initial owners.
 44 | 
 45 | ## Boot Times & Memory
 46 | 
 47 | As your app accumulates more gems and files, its boot time and memory usage grow. There have been a number of projects over the years to speed up boot time. [Spring](https://github.com/rails/spring) was introduced in Rails 4.1 and keeps your app running in the background so it doesn’t have to boot every time you run a new command.
 48 | 
 49 | Last year, Shopify released [Bootsnap](https://github.com/Shopify/bootsnap), which caches expensive loading computations. It’s now part of Rails 5.2 and can be used with earlier versions of Rails as well. With Bootsnap, “the core Shopify platform - a rather large monolithic application - boots about 75% faster, dropping from around 25s to 6.5s.”
 50 | 
 51 | Another tactic is lazy loading files. Instead of incurring a speed and memory penalty at startup to load files, you can incur it the first time a request or job requires it. If it’s never needed, it’s never loaded. You can specify which gems to load in your Gemfile.
 52 | 
 53 | ```rb
 54 | gem 'groupdate', require: false
 55 | ```
 56 | 
 57 | You can also use different Bundler groups to selectively load gems for different environments.
 58 | 
 59 | ```rb
 60 | group :web do
 61 |   gem 'rack-attack'
 62 | end
 63 | 
 64 | group :admin_web do
 65 |   gem 'activeadmin'
 66 | end
 67 | 
 68 | group :worker do
 69 |   gem 'premailer-rails'
 70 | end
 71 | ```
 72 | 
 73 | Read how to [set it up here](https://engineering.harrys.com/2014/07/29/hacking-bundler-groups.html).
 74 | 
 75 | Use [Bumbler](https://github.com/nevir/Bumbler) to see how long each gem takes to load and [Derailed Benchmarks](https://github.com/schneems/derailed_benchmarks) to see memory usage. Focus on the top ones and leave the rest.
 76 | 
 77 | If a gem is slow, there’s a chance it may be doing a lot of work upfront. You can try to debug the gem and fix it. Here’s an [example](https://github.com/ankane/area/commit/2c8cc47d151828ebdcce0e7060b7ac77a4c2f9ce) of speeding up initial load time by only reading a CSV file when it’s needed.
 78 | 
 79 | ## Testing
 80 | 
 81 | As the number of tests grow, the test suite can become slow. [TestProf](https://test-prof.evilmartians.io) provides a number of tools to profile and optimize your tests. You can also use a library like [Database Cleaner](https://github.com/DatabaseCleaner/database_cleaner) to quickly clean the database after tests.
 82 | 
 83 | In development, you can use Guard for [Minitest](https://github.com/guard/guard-minitest) or [RSpec](https://github.com/guard/guard-rspec) to automatically run tests when relevant files are modified. Also make sure it’s easy to manually run common subsets of tests. You can use tags in RSpec for this.
 84 | 
 85 | ```sh
 86 | rspec --tags growth
 87 | ```
 88 | 
 89 | The key to speeding up the entire test suite is parallelization. Stripe has a [great post](https://stripe.com/blog/distributed-ruby-testing) about how they were able to get three hours of tests to run in three minutes. With continuous integration, split tests across multiple machines. Both [Travis](https://docs.travis-ci.com/user/speeding-up-the-build/#parallelizing-your-builds-across-virtual-machines) and [Circle](https://circleci.com/docs/2.0/parallelism-faster-jobs/) support this. You can use [ParallelTests](https://github.com/grosser/parallel_tests) in development to use all the cores on your machine. Rails 6 will run tests in parallel by default.
 90 | 
 91 | Another way to speed up tests is to change your schema dump format to SQL.
 92 | 
 93 | ```ruby
 94 | config.active_record.schema_format = :sql
 95 | ```
 96 | 
 97 | This allows you to load the database schema for tests without booting the Rails app. With Postgres, you can use:
 98 | 
 99 | ```sh
100 | psql < db/structure.sql
101 | ```
102 | 
103 | To prevent slow tests from being added, automatically fail tests that take too long. With RSpec, you can do:
104 | 
105 | ```ruby
106 | RSpec.configure do |config|
107 |   config.around(:each) do |example|
108 |     duration = Benchmark.realtime(&example)
109 |     raise "Test took over 2 seconds to run" if duration > 2
110 |   end
111 | end
112 | ```
113 | 
114 | Start with a higher value and ratchet it down as you fix tests that are slow. You can see the slowest tests with:
115 | 
116 | ```sh
117 | rspec --profile
118 | ```
119 | 
120 | As the number of tests grows, there’s a higher chance of a random network issue causing an individual test to fail. Automatically retry failing tests to cut down on noise. With RSpec, you can use [RSpec::Retry](https://github.com/NoRedInk/rspec-retry) for this.
121 | 
122 | ```ruby
123 | require "rspec/retry"
124 | 
125 | RSpec.configure do |config|
126 |   config.around(:each) do |example|
127 |     example.run_with_retry retry: 2 # must be 2 to retry once (shrug)
128 |   end
129 | end
130 | ```
131 | 
132 | For test failures, make sure they get routed to the committer. You can use webhooks from your CI platform to do this.
133 | 
134 | ## Databases
135 | 
136 | Modern relational databases can scale extremely well if you follow best practices.
137 | 
138 | One of the most important things you can do is set a [statement timeout](https://github.com/ankane/the-ultimate-guide-to-ruby-timeouts#statement-timeouts-1) to prevent bad queries from taking too many resources.
139 | 
140 | ```yml
141 | production:
142 |   variables:
143 |     statement_timeout: 250 # ms
144 | ```
145 | 
146 | It’s also good to track which queries consume the most CPU time. With Postgres, you can use [PgHero](https://github.com/ankane/pghero) for this.
147 | 
148 | <p style="text-align: center;"><img src="/images/query-stats.png" alt="PgHero" /></p>
149 | 
150 | Use [Marginalia](https://github.com/basecamp/marginalia) to make it easy to identity the origin of queries. This adds a comment to the end of queries like `/*application:Datakick,controller:items,action:edit*/` so you can see where they’re coming from.
151 | 
152 | Add defensive measures as well. For instance, pause low priority job queues automatically when the database CPU gets too high.
153 | 
154 | ```ruby
155 | Sidekiq::Queue.new("low").pause!
156 | ```
157 | 
158 | As the team grows, so does the chance of someone accidentally running a migration that takes down the site. [Strong Migrations](https://github.com/ankane/strong_migrations) can help prevent downtime due to database migrations. It raises an error if you try to run an unsafe operation and gives instructions for a better way to do it.
159 | 
160 | <p style="text-align: center;"><img src="/images/strong-migrations.png" alt="Strong Migrations" /></p>
161 | 
162 | Some tables can accumulate a lot of columns. You can split them into multiple tables based off concern that have a 1-to-1 relationship.
163 | 
164 | Scale reads by fixing N+1 queries and caching frequent queries. [Bullet](https://github.com/flyerhzm/bullet) can help you identify N+1 queries. If you still have high load after spending a good amount of time on these, use [Distribute Reads](https://github.com/ankane/distribute_reads) for replicas.
165 | 
166 | Scale writes and space with additional databases. Use [Multiverse](https://github.com/ankane/multiverse) to manage them. This can also be good if you have business domains with different workloads. It adds complexity and removes the ability to join certain tables, but can increase stability.
167 | 
168 | Partitioning is another strategy for space for tables that only need recent data. You can use [pgslice](https://github.com/ankane/pgslice) for Postgres.
169 | 
170 | While Rails has built-in connection pooling, connections can become an issue when you have a lot of servers. With Postgres, use a connection pooler like [PgBouncer](https://ankane.org/pgbouncer-setup) when you start to hit 500 connections.
171 | 
172 | Be hesitant to introduce new data stores. Most of the time you can [just table data](https://ankane.org/just-table-it). It’s often not worth having another technology to manage if your current stack can do the job.
173 | 
174 | ## Stability
175 | 
176 | Your monolith is one codebase, but you can increase stability by isolating different parts of the app in production. Have separate load balancers and web servers for your customer site and admin site so customers aren’t impacted if the admin site goes down. Use separate workers for different groups of queues so a backed up queue or bad job won’t affect the whole system.
177 | 
178 | You can separate by business domain, which will be aligned with teams if you have vertical teams. This also allows you to scale different parts of your app independently as if they were different services.
179 | 
180 | ## Conclusion
181 | 
182 | As you’ve seen, there are a number of things you can do to scale your monolith. Focus on developer happiness and productivity as well as system stability. Keep track of metrics over time that impact developers, like boot time, test suite time, and deploy time. It’s also good to invest in projects that make it comfortable to ship code fast, like quick rollbacks. Overall, spend a decent amount of time trying to solve your exact pain points before breaking your app apart to solve them.
183 | 


--------------------------------------------------------------------------------