├── Rollup.md ├── TPC-DS.md ├── TPC-H.md ├── irbrc.md ├── Backsolve.md ├── CSP-Rails.md ├── Vault-PKI.md ├── Dokku-Rails.md ├── Presto-Mac.md ├── Safely.md ├── Two-Metrics.md ├── Jupyter-Rails.md ├── Just-Table-It.md ├── Postgres-Users.md ├── Scaling-Reads.md ├── Bulk-Upsert-Ruby-Rails.md ├── Data-Science-SQL.md ├── Error-Reporting-R.md ├── Favorite-Quotes.md ├── Hardening-Devise.md ├── Leadership-Reads.md ├── Management-Reads.md ├── Open-Source-Projects.md ├── PgBouncer-Setup.md ├── Programming-Reads.md ├── README.md ├── Rails-on-Heroku.md ├── Scaling-Postgres.md ├── Security-Checks.md ├── Startup-Security.md ├── Strong-Parameters.md ├── Trying-Out-Vault.md ├── Dokku-Digital-Ocean.md ├── Encryption-Keys-Rails.md ├── Learn-Data-Science.md ├── Text-Indexes-Postgres.md ├── Host-Your-Own-Postgres.md ├── R-Postgres-and-Database-URLs.md ├── Short-Guide-to-Metrics.md ├── Verify-Slack-Requests.md ├── Google-OAuth-with-Devise.md ├── New-Rails-App-Checklist.md ├── AWS-Client-Side-Encryption.md ├── Navigator-Send-Beacon-Rails.md ├── Securing-Database-Traffic.md ├── Securing-Emails-Rails.md ├── The-Origin-of-SQL-Queries.md ├── Distributed-Architecture-Talks.md ├── Anonymizing-IPs.md ├── Development-Rails.md └── archive ├── dokku-rails.md ├── irbrc.md ├── distributed-architecture-talks.md ├── modern-encryption-mongoid.md ├── data-science-sql.md ├── startup-security.md ├── ruby-openssl-1-1.md ├── leadership-reads.md ├── programming-reads.md ├── short-guide-to-metrics.md ├── verify-slack-requests.md ├── navigator-send-beacon-rails.md ├── security-checks.md ├── error-reporting-r.md ├── introducing-archer.md ├── jupyter-rails.md ├── bulk-upsert.md ├── introducing-pdscan.md ├── management-reads.md ├── safely-pattern.md ├── presto-mac.md ├── two-metrics.md ├── tpc-ds.md ├── tpc-h.md ├── just-table-it.md ├── strong-parameters.md ├── xgboost-lightgbm-come-to-ruby.md ├── vault-pki.md ├── r-database-urls.md ├── hardening-devise.md ├── anonymizing-ips.md ├── backsolve.md ├── aws-client-side-encryption.md ├── onnx-runtime-ruby.md ├── trying-out-vault.md ├── large-text-indexes.md ├── learn-data-science.md ├── devise-argon2.md ├── lockbox-types.md ├── the-origin-of-sql-queries.md ├── blind-index-1-0.md ├── rollup.md ├── new-rails-app-checklist.md ├── google-oauth-with-devise.md ├── artistic-style-transfer-ruby.md ├── activestorage-s3-encryption.md ├── pgbouncer-setup.md ├── csp-rails.md ├── emotion-recognition-ruby.md ├── pghero-2-0.md ├── hybrid-cryptography-rails.md ├── postgres-sslmode-explained.md ├── ruby-ml-for-python-coders.md ├── scaling-reads.md ├── encryption-keys.md ├── tensorflow-ruby.md ├── numo.md ├── securing-pgbouncer-amazon-rds.md ├── host-your-own-postgres.md ├── rails-on-heroku.md ├── daru.md ├── introducing-dexter.md ├── postgres-users.md ├── dokku-digital-ocean.md ├── securing-user-emails-lockbox.md ├── modern-encryption-rails.md ├── decryption-keys.md ├── gem-patterns.md ├── securing-user-emails-in-rails.md ├── new-ml-gems.md ├── rails-meet-data-science.md └── scaling-the-monolith.md /Rollup.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/rollup) 2 | -------------------------------------------------------------------------------- /TPC-DS.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/tpc-ds) 2 | -------------------------------------------------------------------------------- /TPC-H.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/tpc-h) 2 | -------------------------------------------------------------------------------- /irbrc.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/irbrc) 2 | -------------------------------------------------------------------------------- /Backsolve.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/backsolve) 2 | -------------------------------------------------------------------------------- /CSP-Rails.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/csp-rails) 2 | -------------------------------------------------------------------------------- /Vault-PKI.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/vault-pki) 2 | -------------------------------------------------------------------------------- /Dokku-Rails.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/dokku-rails) 2 | -------------------------------------------------------------------------------- /Presto-Mac.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/presto-mac) 2 | -------------------------------------------------------------------------------- /Safely.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/safely-pattern) 2 | -------------------------------------------------------------------------------- /Two-Metrics.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/two-metrics) 2 | -------------------------------------------------------------------------------- /Jupyter-Rails.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/jupyter-rails) 2 | -------------------------------------------------------------------------------- /Just-Table-It.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/just-table-it) 2 | -------------------------------------------------------------------------------- /Postgres-Users.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/postgres-users) 2 | -------------------------------------------------------------------------------- /Scaling-Reads.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/scaling-reads) 2 | -------------------------------------------------------------------------------- /Bulk-Upsert-Ruby-Rails.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/bulk-upsert) 2 | -------------------------------------------------------------------------------- /Data-Science-SQL.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/data-science-sql) 2 | -------------------------------------------------------------------------------- /Error-Reporting-R.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/error-reporting-r) 2 | -------------------------------------------------------------------------------- /Favorite-Quotes.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/favorite-quotes) 2 | -------------------------------------------------------------------------------- /Hardening-Devise.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/hardening-devise) 2 | -------------------------------------------------------------------------------- /Leadership-Reads.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/leadership-reads) 2 | -------------------------------------------------------------------------------- /Management-Reads.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/management-reads) 2 | -------------------------------------------------------------------------------- /Open-Source-Projects.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/opensource) 2 | -------------------------------------------------------------------------------- /PgBouncer-Setup.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/pgbouncer-setup) 2 | -------------------------------------------------------------------------------- /Programming-Reads.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/programming-reads) 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org) 2 | 3 | [Archive](archive/) 4 | -------------------------------------------------------------------------------- /Rails-on-Heroku.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/rails-on-heroku) 2 | -------------------------------------------------------------------------------- /Scaling-Postgres.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/scaling-postgres) 2 | -------------------------------------------------------------------------------- /Security-Checks.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/security-checks) 2 | -------------------------------------------------------------------------------- /Startup-Security.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/startup-security) 2 | -------------------------------------------------------------------------------- /Strong-Parameters.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/strong-parameters) 2 | -------------------------------------------------------------------------------- /Trying-Out-Vault.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/trying-out-vault) 2 | -------------------------------------------------------------------------------- /Dokku-Digital-Ocean.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/dokku-digital-ocean) 2 | -------------------------------------------------------------------------------- /Encryption-Keys-Rails.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/encryption-keys) 2 | -------------------------------------------------------------------------------- /Learn-Data-Science.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/learn-data-science) 2 | -------------------------------------------------------------------------------- /Text-Indexes-Postgres.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/large-text-indexes) 2 | -------------------------------------------------------------------------------- /Host-Your-Own-Postgres.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/host-your-own-postgres) 2 | -------------------------------------------------------------------------------- /R-Postgres-and-Database-URLs.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/r-database-urls) 2 | -------------------------------------------------------------------------------- /Short-Guide-to-Metrics.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/short-guide-to-metrics) 2 | -------------------------------------------------------------------------------- /Verify-Slack-Requests.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/verify-slack-requests) 2 | -------------------------------------------------------------------------------- /Google-OAuth-with-Devise.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/google-oauth-with-devise) 2 | -------------------------------------------------------------------------------- /New-Rails-App-Checklist.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/new-rails-app-checklist) 2 | -------------------------------------------------------------------------------- /AWS-Client-Side-Encryption.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/aws-client-side-encryption) 2 | -------------------------------------------------------------------------------- /Navigator-Send-Beacon-Rails.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/navigator-send-beacon-rails) 2 | -------------------------------------------------------------------------------- /Securing-Database-Traffic.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/securing-pgbouncer-amazon-rds) 2 | -------------------------------------------------------------------------------- /Securing-Emails-Rails.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/securing-user-emails-in-rails) 2 | -------------------------------------------------------------------------------- /The-Origin-of-SQL-Queries.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/the-origin-of-sql-queries) 2 | -------------------------------------------------------------------------------- /Distributed-Architecture-Talks.md: -------------------------------------------------------------------------------- 1 | [New home](https://ankane.org/distributed-architecture-talks) 2 | -------------------------------------------------------------------------------- /Anonymizing-IPs.md: -------------------------------------------------------------------------------- 1 | # Anonymizing IPs in Ruby 2 | 3 | [New home](https://ankane.org/anonymizing-ips) 4 | -------------------------------------------------------------------------------- /Development-Rails.md: -------------------------------------------------------------------------------- 1 | [New home](https://github.com/ankane/rails-best-practices/blob/master/Development.md) 2 | -------------------------------------------------------------------------------- /archive/dokku-rails.md: -------------------------------------------------------------------------------- 1 | # Rails on Dokku 2 | 3 | ## Console 4 | 5 | To open a Rails console, run: 6 | 7 | ```sh 8 | dokku run rails console 9 | ``` 10 | 11 | ## Migrations 12 | 13 | ```sh 14 | dokku run rails db:migrate 15 | ``` 16 | -------------------------------------------------------------------------------- /archive/irbrc.md: -------------------------------------------------------------------------------- 1 | # irbrc 2 | 3 | My simple `~/.irbrc` 4 | 5 | ```ruby 6 | require "irb/completion" 7 | require "irb/ext/save-history" 8 | IRB.conf[:SAVE_HISTORY] = 10000 9 | require "awesome_print" 10 | AwesomePrint.irb! 11 | ``` 12 | -------------------------------------------------------------------------------- /archive/distributed-architecture-talks.md: -------------------------------------------------------------------------------- 1 | # Distributed Architecture Talks 2 | 3 | Great talks 4 | 5 | - [Microservices](https://www.youtube.com/watch?v=2yko4TbC8cI) - Martin Fowler 6 | - [The State of the Art in Microservices](https://www.youtube.com/watch?v=pwpxq9-uw_0) - Adrian Cockcroft of Netflix 7 | - [Services and Rails: The Shit They Don’t Tell You](https://www.youtube.com/watch?v=GuJ49PNBsn8) - Brian Morton of Yammer 8 | - [Can Time-Travel Keep You From Blowing Up The Enterprise?](https://www.youtube.com/watch?v=23NhP4x3AAE) - David Copeland of Stitch Fix 9 | -------------------------------------------------------------------------------- /archive/modern-encryption-mongoid.md: -------------------------------------------------------------------------------- 1 | # Modern Encryption for Mongoid 2 | 3 | I’m happy to announce that Lockbox now supports Mongoid. This makes it easy to add application-level encryption to your MongoDB documents. 4 | 5 |

Lockbox

6 | 7 | Blind Index also now supports Mongoid for cases where you need to query for exact matches. 8 | 9 | Get the latest versions of [Lockbox](https://github.com/ankane/lockbox) and [Blind Index](https://github.com/ankane/blind_index) today! 10 | -------------------------------------------------------------------------------- /archive/data-science-sql.md: -------------------------------------------------------------------------------- 1 | # Data Science SQL 2 | 3 | [Root mean squared error](https://www.kaggle.com/wiki/RootMeanSquaredError) 4 | 5 | ```sql 6 | SELECT SQRT(AVG(POWER(y - y_pred, 2))) AS rmse FROM ... 7 | ``` 8 | 9 | [Mean absolute error](https://www.kaggle.com/wiki/MeanAbsoluteError) 10 | 11 | ```sql 12 | SELECT AVG(ABS(y - y_pred)) AS mae FROM ... 13 | ``` 14 | 15 | Mean error 16 | 17 | ```sql 18 | SELECT AVG(y_pred - y) AS me FROM ... 19 | ``` 20 | 21 | Median - [get it here](https://github.com/ankane/median.sql) 22 | 23 | ```sql 24 | SELECT MEDIAN(y) FROM ... 25 | -------------------------------------------------------------------------------- /archive/startup-security.md: -------------------------------------------------------------------------------- 1 | # Startup Security 2 | 3 | A few simple steps to keep you secure. 4 | 5 | 1. Require 2-factor authentication for important accounts, like [Gmail](https://www.google.com/landing/2step/) and [GitHub](https://help.github.com/articles/about-two-factor-authentication/). 6 | 7 | 2. Require hard drives to be encrypted. [FileVault makes this easy](https://support.apple.com/en-us/HT204837) on Macs. 8 | 9 | 3. Use [DMARC](https://dmarc.org/overview/) to verify emails sent from your domain. [dmarcian](https://dmarcian.com/) is one provider. 10 | 11 | 4. Use a team password manager like [1Password](https://1password.com/) to share passwords. 12 | -------------------------------------------------------------------------------- /archive/ruby-openssl-1-1.md: -------------------------------------------------------------------------------- 1 | # Ruby with OpenSSL 1.1 2 | 3 | Some Ruby features like `scrypt` and `hkdf` require OpenSSL 1.1. Here’s how to make it work on Mac: 4 | 5 | Install rbenv and OpenSSL 1.1 6 | 7 | ```sh 8 | brew install rbenv ruby-build openssl@1.1 9 | ``` 10 | 11 | Install Ruby 12 | 13 | ```sh 14 | RUBY_CONFIGURE_OPTS="--with-openssl-dir=/usr/local/opt/openssl@1.1" \ 15 | rbenv install 2.6.3 16 | ``` 17 | 18 | Open an interactive shell to confirm it worked 19 | 20 | ```sh 21 | rbenv shell 2.6.3 22 | irb 23 | ``` 24 | 25 | And run 26 | 27 | ```sh 28 | require "openssl" 29 | OpenSSL::OPENSSL_VERSION 30 | OpenSSL::KDF.methods - Object.methods 31 | ``` 32 | -------------------------------------------------------------------------------- /archive/leadership-reads.md: -------------------------------------------------------------------------------- 1 | # Great Leadership Reads 2 | 3 | A few books and articles I’ve read on leadership that changed my everyday relationships 4 | 5 | ## Books 6 | 7 | - [Leadership and Self-Deception](https://www.amazon.com/gp/product/B00GUPYRUS) 8 | - [How to Have Confidence and Power in Dealing with People](https://www.amazon.com/Have-Confidence-Power-Dealing-People-ebook/dp/B01CXHESLE) 9 | - [The 21 Irrefutable Laws of Leadership](https://www.amazon.com/gp/product/B001ECQK9S) 10 | 11 | ## Articles 12 | 13 | - [Power Up Your Team with Nonviolent Communication Principles](https://firstround.com/review/power-up-your-team-with-nonviolent-communication-principles/) 14 | -------------------------------------------------------------------------------- /archive/programming-reads.md: -------------------------------------------------------------------------------- 1 | # Great Programming Reads 2 | 3 | - [Prolific Engineers Take Small Bites - Patterns in Developer Impact](https://blog.gitprime.com/check-in-frequency-and-codebase-impact-the-surprising-correlation/) 4 | - [The Twelve-Factor App](https://12factor.net/) 5 | - [Open Source (Almost) Everything](https://tom.preston-werner.com/2011/11/22/open-source-everything.html) 6 | - [A Baseline for Front-End Developers, 2015](https://rmurphey.com/blog/2015/03/23/a-baseline-for-front-end-developers-2015) 7 | - [Software Engineering at Google](https://arxiv.org/pdf/1702.01715.pdf) 8 | - [The Majestic Monolith](https://m.signalvnoise.com/the-majestic-monolith-29166d022228) 9 | -------------------------------------------------------------------------------- /archive/short-guide-to-metrics.md: -------------------------------------------------------------------------------- 1 | # A Short Guide to Metrics 2 | 3 | Simple rules to follow when creating metrics 4 | 5 | 1. **Over time:** You must see how metrics change over time. Ideally you can view them by day, week, and month. No pie charts! 6 | 2. **In groups:** Metrics require balance. Think of the extremes. If you over-optimize for one metric, what problem will it create? 7 | 3. **Weighted appropriately:** If different times of the week or geographic areas are more important, your metrics should reflect this. 8 | 4. **Robust:** Chances are your data is not pristine. Be careful of outliers. Median or percentile can be better than the mean. 9 | 5. **Keep it simple:** People should be able to understand the metric. 10 | 11 | Nice to have 12 | 13 | 1. **Responsive:** Prefer metrics that change quickly as a result of your actions. 14 | -------------------------------------------------------------------------------- /archive/verify-slack-requests.md: -------------------------------------------------------------------------------- 1 | # Verify Slack Requests in Rails 2 | 3 | Slack [signs its requests](https://api.slack.com/docs/verifying-requests-from-slack) so you can verify they’re authentic. 4 | 5 | Here’s a method you can use in your Rails controllers for it. 6 | 7 | ```ruby 8 | def request_verified? 9 | timestamp = request.headers["X-Slack-Request-Timestamp"] 10 | signature = request.headers["X-Slack-Signature"] 11 | signing_secret = ENV.fetch("SLACK_SIGNING_SECRET") 12 | 13 | if Time.at(timestamp.to_i) < 5.minutes.ago 14 | return false # expired 15 | end 16 | 17 | basestring = "v0:#{timestamp}:#{request.body.read}" 18 | my_signature = "v0=#{OpenSSL::HMAC.hexdigest("SHA256", signing_secret, basestring)}" 19 | 20 | ActiveSupport::SecurityUtils.secure_compare(my_signature, signature) 21 | end 22 | ``` 23 | 24 | :lock: 25 | -------------------------------------------------------------------------------- /archive/navigator-send-beacon-rails.md: -------------------------------------------------------------------------------- 1 | # navigator.sendBeacon and Rails 2 | 3 | [navigator.sendBeacon](https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon) is a neat new API. It allows you to send an asynchronous `POST` request without delaying the page unload. 4 | 5 | To prevent `Can't verify CSRF token authenticity` with Rails, use the method below: 6 | 7 | ```javascript 8 | var data = new FormData(); 9 | data.append("hello", "beacon"); 10 | 11 | // add CSRF 12 | var param = document.querySelector("meta[name=csrf-param]").getAttribute("content"); 13 | var token = document.querySelector("meta[name=csrf-token]").getAttribute("content"); 14 | data.append(param, token); 15 | 16 | navigator.sendBeacon("/beacon", data); 17 | ``` 18 | 19 | For a real-world use case, check out [Ahoy.js](https://github.com/ankane/ahoy.js). 20 | 21 | :anchor: 22 | -------------------------------------------------------------------------------- /archive/security-checks.md: -------------------------------------------------------------------------------- 1 | # Security Checks 2 | 3 | ### Verify SSL certificate chain 4 | 5 | ```sh 6 | openssl s_client -connect www.yahoo.com:443 -CAfile /usr/local/etc/openssl/cert.pem 7 | ``` 8 | 9 | You should see `verify return:1` for each certificate in the chain. 10 | 11 | ### Host header injection 12 | 13 | [Read about it here](http://carlos.bueno.org/2008/06/host-header-injection.html). 14 | 15 | ```sh 16 | curl -i --header "Host: evilsite.com" https://www.yahoo.com 17 | ``` 18 | 19 | Your site is vulnerable if `evilsite.com` appears in the results. 20 | 21 | ### SPF 22 | 23 | Check if your SPF record is valid. [Enter your domain here](https://www.kitterman.com/spf/validate.html). 24 | 25 | ### DNSSEC 26 | 27 | Very few sites have this right now. 28 | 29 | ```sh 30 | dig pir.org +dnssec 31 | ``` 32 | 33 | [See how to interpret the results](https://docs.menandmice.com/display/MM/How+to+test+DNSSEC+validation). 34 | -------------------------------------------------------------------------------- /archive/error-reporting-r.md: -------------------------------------------------------------------------------- 1 | # Error Reporting in R 2 | 3 | R supports global error handling, making it easy to report all errors without individual `tryCatch` statements. 4 | 5 | Create a file to source at the start of all your scripts. 6 | 7 | ```R 8 | if (!interactive()) { 9 | options(error = function() { 10 | message <- geterrmessage() 11 | 12 | ### your error reporting goes here 13 | rollbar.error(message) 14 | ### 15 | 16 | write("Execution halted", stderr()) 17 | q("no", status = 1, runLast = FALSE) 18 | }) 19 | } 20 | ``` 21 | 22 | Unfortunately, there’s no way to get filenames and line numbers (if you manage to do this, let me know!). Thankfully, the last line of the messages includes calls. 23 | 24 | ```txt 25 | Error in func3(b) : unused argument (b) 26 | Calls: func1 -> func2 -> func3 27 | ``` 28 | 29 | Happy production debugging! :dolphin: 30 | 31 | #### If you use Rollbar... 32 | 33 | Check out the [Rollbar](https://github.com/ankane/rollbar) package. 34 | -------------------------------------------------------------------------------- /archive/introducing-archer.md: -------------------------------------------------------------------------------- 1 | # Introducing Archer: Rails Console History for Heroku, Docker, and More 2 | 3 |

Archer

4 | 5 | Many companies today run infrastructure where machines or containers can be replaced at any time, so you can’t depend on them for permanent storage. One place this is especially painful is the Rails console. Console history can save a lot of typing. 6 | 7 | This is where [Archer](https://github.com/ankane/archer) comes in. Add it your project, and it’ll begin to use the database to store history. 8 | 9 | Archer supports multiple users so everyone on the team can have their own history. On Heroku, you can specify a user when starting the console with: 10 | 11 | ```sh 12 | heroku run USER=andrew rails console 13 | ``` 14 | 15 | Set up an [alias](https://shapeshed.com/unix-alias/) to save some typing. 16 | 17 | ```sh 18 | alias hc="heroku run USER=andrew rails console" 19 | ``` 20 | 21 | Add [Archer](https://github.com/ankane/archer) to your team today. 22 | -------------------------------------------------------------------------------- /archive/jupyter-rails.md: -------------------------------------------------------------------------------- 1 | # Jupyter + Rails 2 | 3 | Jupyter notebooks are a great alternative to the Rails console for doing exploratory data analysis and building predictive models. Here’s how to get setup: 4 | 5 | First, install [Jupyter](https://jupyter.org). With Homebrew, use: 6 | 7 | ```sh 8 | brew install jupyterlab 9 | ``` 10 | 11 | Add to your Gemfile 12 | 13 | ```ruby 14 | gem 'iruby', group: :development 15 | ``` 16 | 17 | Run 18 | 19 | ```sh 20 | bundle install 21 | bundle exec iruby register --force 22 | ``` 23 | 24 | Start Jupyter 25 | 26 | ```sh 27 | jupyter notebook 28 | ``` 29 | 30 | Create a notebook and add to the top 31 | 32 | ```ruby 33 | require "./config/environment" 34 | ``` 35 | 36 | > If not at Rails root, use `Dir.chdir("path/to/root") { require "./config/environment" }` 37 | 38 | And science away 39 | 40 | ```ruby 41 | User.last 42 | ``` 43 | 44 | If you use Git, add to `.gitignore` 45 | 46 | ```txt 47 | .ipynb_checkpoints 48 | ``` 49 | 50 | If you use Nyaplot, use the `master` branch to fix [an issue](https://github.com/domitry/nyaplot/issues/52) with empty charts. 51 | 52 | ```ruby 53 | gem 'nyaplot', github: 'domitry/nyaplot' 54 | ``` 55 | -------------------------------------------------------------------------------- /archive/bulk-upsert.md: -------------------------------------------------------------------------------- 1 | # Bulk Upsert in Ruby/Rails 2 | 3 | The [upsert](https://github.com/seamusabshere/upsert) gem is great for individual upserts, but for performant bulk upserts, use the [activerecord-import](https://github.com/zdennis/activerecord-import) gem. 4 | 5 | Add a unique index on the columns to upsert on (if it’s not your primary key) 6 | 7 | ```rb 8 | class AddUpsertIndexOnForecasts < ActiveRecord::Migration[5.2] 9 | def change 10 | add_index :forecasts, [:date], unique: true 11 | end 12 | end 13 | ``` 14 | 15 | Prep your records 16 | 17 | ```ruby 18 | records = [ 19 | {date: "2018-01-01", value: 10}, 20 | {date: "2018-02-01", value: 15}, 21 | {date: "2018-03-01", value: 23} 22 | ] 23 | ``` 24 | 25 | For PostgreSQL 9.5+ and SQLite 3.24+, do: 26 | 27 | ```ruby 28 | Forecast.import(records, 29 | validate: false, 30 | on_duplicate_key_update: { 31 | conflict_target: [:date], 32 | columns: [:value] 33 | } 34 | ) 35 | ``` 36 | 37 | For MySQL, do: 38 | 39 | ```ruby 40 | Forecast.import(records, 41 | validate: false, 42 | on_duplicate_key_update: [:value] 43 | ) 44 | ``` 45 | 46 | [Official docs](https://github.com/zdennis/activerecord-import/wiki/On-Duplicate-Key-Update) 47 | -------------------------------------------------------------------------------- /archive/introducing-pdscan.md: -------------------------------------------------------------------------------- 1 | # Introducing pdscan: Scan Your Data Stores for Unencrypted Personal Data 2 | 3 | It's important to understand where personal data is stored in your applications. Personal data that’s not encrypted at the application level is especially vulnerable in the event of a breach. 4 | 5 | [pdscan](https://github.com/ankane/pdscan) is a command line tool to help you identify this data. 6 | 7 |

pdscan

8 | 9 | It uses data sampling and column naming to find data and produces minimal database load. It scans for: 10 | 11 | - Last names 12 | - Email addresses 13 | - IP addresses 14 | - Street addresses (US) 15 | - Phone numbers (US) 16 | - Credit card numbers 17 | - Social security numbers 18 | - Dates of birth 19 | - Location data 20 | 21 | It also scans for other unencrypted sensitive data, like OAuth tokens, which could be used to access personal data. It currently supports Postgres, MySQL, MariaDB, and SQLite, but it shouldn’t be too difficult to add other data stores like MongoDB and Elasticsearch. It’s written in Go, so it’s fast and has no runtime dependencies. 22 | 23 | Give [pdscan](https://github.com/ankane/pdscan) a try today. 24 | -------------------------------------------------------------------------------- /archive/management-reads.md: -------------------------------------------------------------------------------- 1 | # Great Management Reads 2 | 3 | ## Posts 4 | 5 | - [Radical Candor](https://firstround.com/review/radical-candor-the-surprising-secret-to-being-a-good-boss/) 6 | - [101 Questions to Ask in One on Ones](https://jasonevanish.com/2014/05/29/101-questions-to-ask-in-1-on-1s/) 7 | - [My Best Manager Did This](https://ask.metafilter.com/300002/My-best-manager-did-this) 8 | 9 | ## Books 10 | 11 | - [High Output Management](https://www.amazon.com/High-Output-Management-Andrew-Grove/dp/0679762884) 12 | 13 | ## Engineering-Specific Posts 14 | 15 | - [This Is What Impactful Engineering Leadership Looks Like](https://firstround.com/review/this-is-what-impactful-engineering-leadership-looks-like/) 16 | - [Engineering Management](http://algeri-wong.com/yishan/engineering-management.html) 17 | - [44 Engineering Management Lessons](https://www.defmacro.org/2014/10/03/engman.html) 18 | 19 | ## Advice 20 | 21 | > You have to be your team’s best ally and biggest challenger. You can’t be a great leader by care-taking alone. Push for their best work. [@marcprecipice](https://twitter.com/i/moments/791738696978403328) 22 | 23 | ## Culture 24 | 25 | - [Netflix Culture Deck](https://www.slideshare.net/reed2001/culture-1798664) 26 | -------------------------------------------------------------------------------- /archive/safely-pattern.md: -------------------------------------------------------------------------------- 1 | # The Safely Pattern 2 | 3 | The Safely Pattern is a simple one. It allows you to tag non-critical code by wrapping it in a function. It’s built on top of exception handling and follows these rules: 4 | 5 | 1. Raise exceptions in development and test environments 6 | 2. Catch and report exceptions in other environments 7 | 8 | Here’s a basic implementation in JavaScript: 9 | 10 | ```javascript 11 | function safely(nonCriticalCode) { 12 | try { 13 | nonCriticalCode(); 14 | } catch (e) { 15 | if (env === "development" || env === "test") { 16 | throw(e); 17 | } 18 | report(e); 19 | } 20 | } 21 | ``` 22 | 23 | Its advantages over typical exception handling are: 24 | 25 | 1. It’s easier to write and debug code when errors aren’t caught in development and test environments 26 | 2. It allows you to keep reporting [DRY](https://en.wikipedia.org/wiki/Don't_repeat_yourself) 27 | 28 | It’s recommended to mark exceptions when reporting so it’s clear they were handled with this pattern. One way of doing this is to prefix the exception message with `[safely]`. 29 | 30 | There are currently implementations in [Ruby](https://github.com/ankane/safely) and [JavaScript](https://github.com/ankane/safely.js). 31 | -------------------------------------------------------------------------------- /archive/presto-mac.md: -------------------------------------------------------------------------------- 1 | # Installing Presto for Mac 2 | 3 | [Presto](https://prestodb.io/) is a “Distributed SQL Query Engine for Big Data” that gives you the ability to join across data stores! :tada: 4 | 5 | ## Server 6 | 7 | The easiest way to install Presto is with [Homebrew](https://brew.sh). 8 | 9 | ```sh 10 | brew install presto 11 | ``` 12 | 13 | Next, add a connector. Here’s the list of [available ones](https://prestodb.io/docs/current/connector.html). 14 | 15 | For PostgreSQL, create `/usr/local/opt/presto/libexec/etc/catalog/mydb.properties` with: 16 | 17 | ```ini 18 | connector.name=postgresql 19 | connection-url=jdbc:postgresql://localhost:5432/mydbname 20 | connection-user=myuser 21 | connection-password=mysecret 22 | ``` 23 | 24 | And start the server with: 25 | 26 | ```sh 27 | presto-server run 28 | ``` 29 | 30 | ## Client 31 | 32 | Presto comes with a CLI 33 | 34 | ```sh 35 | presto --catalog mydb --schema public 36 | ``` 37 | 38 | And run: 39 | 40 | ```sql 41 | SHOW TABLES; 42 | ``` 43 | 44 | Try one of your tables with: 45 | 46 | ```sql 47 | SELECT * FROM mytable; 48 | ``` 49 | 50 | There are also clients in [many different languages](https://prestodb.io/resources.html#libraries) you can use. 51 | 52 | :rabbit: :tophat: :sparkles: 53 | -------------------------------------------------------------------------------- /archive/two-metrics.md: -------------------------------------------------------------------------------- 1 | # The Two Metrics You Need 2 | 3 | When interviewing candidates for Instacart’s first site reliability engineer, I volunteered to cover monitoring as one of my topics. I’d start by asking “What metrics should we be monitoring?” 4 | 5 | One candidate gave an answer that astounded me. He said, 6 | 7 | > There are only two things I care about: errors and latency 8 | 9 | More specifically: 10 | 11 | - the sum of 5xx status codes 12 | - latency across all requests - average or 95th percentile 13 | 14 | Both must be measured at the load balancer. Errors include those generated by the application and by the load balancer. 15 | 16 | **Place alerts on these metrics to detect problems with the health of your site.** It is significantly more effective than relying on services which monitor a few endpoints (you should do this as well). 17 | 18 | Here’s how to get them on a few services. 19 | 20 | ## Amazon ELB 21 | 22 | CloudWatch gives them for free 23 | 24 | - Errors = Sum ELB 5XXs + Sum HTTP 5XXs 25 | - Latency = Average Latency 26 | 27 | ## Heroku 28 | 29 | Add Librato. 30 | 31 | ```sh 32 | heroku addons:create librato:development 33 | ``` 34 | 35 | - Errors = Sum of `router.status.5xx` 36 | - Latency = Sum of `router.service.perc95` and `router.connect.perc95` 37 | -------------------------------------------------------------------------------- /archive/tpc-ds.md: -------------------------------------------------------------------------------- 1 | # TPC-DS with Postgres 2 | 3 | [TPC-DS](http://www.tpc.org/tpcds/) is a database benchmark. 4 | 5 | ```sh 6 | git clone https://github.com/gregrahn/tpcds-kit.git 7 | cd tpcds-kit/tools 8 | make OS=MACOS 9 | ``` 10 | 11 | Create the database and load the schema 12 | 13 | ```sh 14 | createdb tpcds 15 | psql tpcds -f tpcds.sql 16 | ``` 17 | 18 | Generate data 19 | 20 | ```sh 21 | ./dsdgen -FORCE -VERBOSE 22 | ``` 23 | 24 | Load the data 25 | 26 | ```sh 27 | for i in `ls *.dat`; do 28 | table=${i/.dat/} 29 | echo "Loading $table..." 30 | sed 's/|$//' $i > /tmp/$i 31 | psql tpcds -q -c "TRUNCATE $table" 32 | psql tpcds -c "\\copy $table FROM '/tmp/$i' CSV DELIMITER '|'" 33 | done 34 | ``` 35 | 36 | Generate queries 37 | 38 | ```sh 39 | ./dsqgen -DIRECTORY ../query_templates -INPUT ../query_templates/templates.lst \ 40 | -VERBOSE Y -QUALIFY Y -DIALECT netezza 41 | ``` 42 | 43 | Run queries 44 | 45 | ```sh 46 | psql tpcds -c "ANALYZE VERBOSE" 47 | psql tpcds < query_0.sql 48 | ``` 49 | 50 | ## Bonus: Add Indexes with Dexter 51 | 52 | Install [Dexter](https://github.com/ankane/dexter) 53 | 54 | ```sh 55 | gem install pgdexter 56 | ``` 57 | 58 | And run 59 | 60 | ```sh 61 | for i in `seq 1 10`; do 62 | dexter tpcds query_0.sql --input-format sql --create 63 | done 64 | ``` 65 | -------------------------------------------------------------------------------- /archive/tpc-h.md: -------------------------------------------------------------------------------- 1 | # TPC-H with Postgres 2 | 3 | [TPC-H](http://www.tpc.org/tpch/) is a database benchmark. 4 | 5 | ```sh 6 | git clone https://github.com/gregrahn/tpch-kit.git 7 | cd tpch-kit/dbgen 8 | make -f Makefile.osx 9 | ``` 10 | 11 | Create the database and load the schema 12 | 13 | ```sh 14 | createdb tpch 15 | psql tpch -f dss.ddl 16 | ``` 17 | 18 | Generate data 19 | 20 | ```sh 21 | ./dbgen -vf -s 1 22 | ``` 23 | 24 | Load the data 25 | 26 | ```sh 27 | for i in `ls *.tbl`; do 28 | table=${i/.tbl/} 29 | echo "Loading $table..." 30 | sed 's/|$//' $i > /tmp/$i 31 | psql tpch -q -c "TRUNCATE $table" 32 | psql tpch -c "\\copy $table FROM '/tmp/$i' CSV DELIMITER '|'" 33 | done 34 | ``` 35 | 36 | Generate queries 37 | 38 | ```sh 39 | mkdir /tmp/queries 40 | for i in `ls queries/*.sql`; do 41 | tail -r $i | sed '2s/;//' | tail -r > /tmp/$i 42 | done 43 | 44 | DSS_QUERY=/tmp/queries ./qgen | sed 's/limit -1//' | sed 's/day (3)/day/' > queries.sql 45 | ``` 46 | 47 | Run queries 48 | 49 | ```sh 50 | psql tpch -c "ANALYZE VERBOSE" 51 | psql tpch < queries.sql 52 | ``` 53 | 54 | ## Bonus: Add Indexes with Dexter 55 | 56 | Install [Dexter](https://github.com/ankane/dexter) 57 | 58 | ```sh 59 | gem install pgdexter 60 | ``` 61 | 62 | And run 63 | 64 | ```sh 65 | for i in `seq 1 5`; do 66 | dexter tpch queries.sql --input-format sql --create 67 | done 68 | ``` 69 | -------------------------------------------------------------------------------- /archive/just-table-it.md: -------------------------------------------------------------------------------- 1 | # Just Table It 2 | 3 | When it comes to data, you can mistakenly optimize by trying to choose the “right” technology for the job. Often, the best choice is right in front of you: your database. Relational database scale pretty well, despite what you’ve been told in recent years. Don’t introduce a new data store into your stack if you don’t need to, and don’t store interesting data in logs unless you can easily query them. Generally: 4 | 5 | **If you need to query data, throw it in a table.** 6 | 7 | At [Instacart](https://www.instacart.com), we’ve stored: 8 | 9 | - customer analytics, like visits and page views (yes, customer analytics!!) 10 | - emails sent 11 | - errors our customers see 12 | - slow requests 13 | - location updates from shoppers 14 | - audits for models 15 | 16 | Even today, we store most of these in PostgreSQL. Your time is better spent adding value to customers than trying to anticipate how to handle this data at scale. You can figure it out when you get there. 17 | 18 | :mount_fuji: 19 | 20 | --- 21 | 22 | We’ve open sourced much of the technology we use to do the above. 23 | 24 | - [Ahoy](https://github.com/ankane/ahoy) for analytics 25 | - [Ahoy Email](https://github.com/ankane/ahoy_email) for emails 26 | - [Notable](https://github.com/ankane/notable) for errors and slow requests 27 | 28 | And 29 | 30 | - [Blazer](https://github.com/ankane/blazer) to analyze the data 31 | -------------------------------------------------------------------------------- /archive/strong-parameters.md: -------------------------------------------------------------------------------- 1 | # attr_accessible to Strong Parameters 2 | 3 | Running Rails 4 with `attr_accessible`? Upgrade in three **safe and easy** steps 4 | 5 | ## 1 6 | 7 | First, log all instances of forbidden attributes. Add to `config/application.rb`: 8 | 9 | ```ruby 10 | config.action_controller.permit_all_parameters = false 11 | ``` 12 | 13 | And create an initializer `config/initializers/forbidden_attributes.rb` with: 14 | 15 | ```ruby 16 | class ActiveRecord::Base 17 | protected 18 | def sanitize_for_mass_assignment_with_forbidden_attributes(*args) 19 | attributes = args[0] 20 | if attributes.respond_to?(:permitted?) && !attributes.permitted? 21 | if Rails.env.development? || Rails.env.test? || ENV["RAISE_FORBIDDEN_ATTRIBUTES"] 22 | raise ActiveModel::ForbiddenAttributesError 23 | end 24 | Rails.logger.warn "Forbidden attributes: #{self.class.name}" 25 | end 26 | sanitize_for_mass_assignment_without_forbidden_attributes(*args) 27 | end 28 | alias_method_chain :sanitize_for_mass_assignment, :forbidden_attributes 29 | end 30 | ``` 31 | 32 | ## 2 33 | 34 | Fix all instances. 35 | 36 | ```ruby 37 | User.create(params[:user]) 38 | ``` 39 | 40 | to 41 | 42 | ```ruby 43 | User.create(params.require(:user).permit(:name)) 44 | ``` 45 | 46 | ## 3 47 | 48 | Remove: 49 | 50 | - all instances of `attr_accessible` 51 | - `config/initializers/forbidden_attributes.rb` 52 | - `protected_attributes` from your Gemfile 53 | -------------------------------------------------------------------------------- /archive/xgboost-lightgbm-come-to-ruby.md: -------------------------------------------------------------------------------- 1 | # XGBoost and LightGBM Come to Ruby 2 | 3 |

4 | Ruby and XGBoost 5 |

6 | 7 | I’m happy to announce that XGBoost - and it’s cousin LightGBM from Microsoft - are now available for Ruby! 8 | 9 | XGBoost and LightGBM are powerful machine learning libraries that use a technique called gradient boosting. Gradient boosting performs well on a [large range of datasets](https://machinelearningmastery.com/start-with-gradient-boosting/) and is common among winning solutions in ML competitions. 10 | 11 | XGBoost and LightGBM are already available for popular ML languages like Python and R. The Ruby gems follow similar interfaces and use the same C APIs under the hood. 12 | 13 | Make predictions with as little code as: 14 | 15 | ```ruby 16 | model = Xgb::Regressor.new 17 | model.fit(x_train, y_train) 18 | model.predict(x_test) 19 | ``` 20 | 21 | While Ruby still lags behind other languages for machine learning, the ecosystem is getting better. [Rumale](https://github.com/yoshoku/rumale) is under active development and supports a large number of algorithms with an interface similar to Scikit-Learn. There’s also [Daru](https://github.com/SciRuby/daru) which is similar to Pandas. The addition of gradient boosting covers another key category. 22 | 23 | Check out the [Xgb](https://github.com/ankane/xgb) and [LightGBM](https://github.com/ankane/lightgbm) gems today! 24 | -------------------------------------------------------------------------------- /archive/vault-pki.md: -------------------------------------------------------------------------------- 1 | # Vault for PKI 2 | 3 | Here’s how to use Vault for public key infrastructure. 4 | 5 | --- 6 | 7 | **Update:** Vault now has a [great article](https://learn.hashicorp.com/vault/secrets-management/sm-pki-engine) on this 8 | 9 | --- 10 | 11 | Install the [latest version of Vault](https://www.vaultproject.io/downloads.html) and jq 12 | 13 | ```sh 14 | sudo apt-get install unzip jq 15 | wget https://releases.hashicorp.com/vault/0.9.0/vault_0.9.0_linux_amd64.zip 16 | unzip vault_0.9.0_linux_amd64.zip 17 | sudo mv vault /usr/local/bin 18 | ``` 19 | 20 | Start Vault (we use development mode for this tutorial) 21 | 22 | ```sh 23 | vault server -dev 24 | ``` 25 | 26 | Create a PKI secret backend 27 | 28 | ```sh 29 | export VAULT_ADDR='http://127.0.0.1:8200' 30 | 31 | vault mount pki 32 | vault mount-tune -max-lease-ttl=87600h pki 33 | 34 | vault write pki/root/generate/internal common_name=myvault.com ttl=87600h 35 | 36 | vault write pki/config/urls issuing_certificates="http://127.0.0.1:8200/v1/pki/ca" \ 37 | crl_distribution_points="http://127.0.0.1:8200/v1/pki/crl" 38 | 39 | vault write pki/roles/yourrole \ 40 | allowed_domains="yourhost" \ 41 | allow_subdomains="false" max_ttl="72h" 42 | ``` 43 | 44 | And issue certificates 45 | 46 | ```sh 47 | data=`vault write -format=json pki/issue/yourrole common_name=yourhost` 48 | 49 | jq -r '.data.certificate' <<< $data > cert.pem 50 | jq -r '.data.private_key' <<< $data > key.pem 51 | jq -r '.data.issuing_ca' <<< $data > ca.pem 52 | ``` 53 | -------------------------------------------------------------------------------- /archive/r-database-urls.md: -------------------------------------------------------------------------------- 1 | # R and Database URLs 2 | 3 | **Note:** This approach is now built into the [dbx package](https://github.com/ankane/dbx) 4 | 5 | --- 6 | 7 | To use a `DATABASE_URL` with R, do: 8 | 9 | ## Postgres 10 | 11 | ```R 12 | library(RPostgreSQL) 13 | library(httr) 14 | 15 | establishConnection <- function(url=Sys.getenv("DATABASE_URL")) 16 | { 17 | cred <- parse_url(url) 18 | if (!identical(cred$scheme, "postgres")) stop("Invalid database url") 19 | if (is.null(cred$username)) cred$username <- "" 20 | if (is.null(cred$password)) cred$password <- "" 21 | if (is.null(cred$port)) cred$port <- 5432 22 | dbConnect(PostgreSQL(), host=cred$hostname, port=cred$port, 23 | user=cred$username, password=cred$password, dbname=cred$path) 24 | } 25 | 26 | con <- establishConnection() 27 | dbGetQuery(con, "SELECT true AS success") 28 | ``` 29 | 30 | ## MySQL 31 | 32 | ```R 33 | library(RMySQL) 34 | library(httr) 35 | 36 | establishConnection <- function(url=Sys.getenv("DATABASE_URL")) 37 | { 38 | cred <- parse_url(url) 39 | if (!identical(cred$scheme, "mysql")) stop("Invalid database url") 40 | if (is.null(cred$username)) cred$username <- "root" 41 | if (is.null(cred$password)) cred$password <- "" 42 | if (is.null(cred$port)) cred$port <- 3306 43 | dbConnect(MySQL(), host=cred$hostname, port=cred$port, 44 | user=cred$username, password=cred$password, dbname=cred$path) 45 | } 46 | 47 | con <- establishConnection() 48 | dbGetQuery(con, "SELECT true AS success") 49 | ``` 50 | 51 | :cake: 52 | -------------------------------------------------------------------------------- /archive/hardening-devise.md: -------------------------------------------------------------------------------- 1 | # Hardening Devise 2 | 3 | A few basic steps to make your [Devise](https://github.com/plataformatec/devise) setup more secure :lock: 4 | 5 | ### Send notifications for important events 6 | 7 | Like a user changing his or her email or password. For email changes, notify the old email address to prevent quiet account takeovers (changing the email then resetting the password). 8 | 9 | In `config/initializers/devise.rb`, set: 10 | 11 | ```ruby 12 | config.send_email_changed_notification = true 13 | config.send_password_change_notification = true 14 | ``` 15 | 16 | ### Rate limit login attempts 17 | 18 | Use Devise’s `Lockable` module to protect individual accounts. This will lock an account after too many attempts. 19 | 20 | Use a library like [Rack::Attack](https://github.com/kickstarter/rack-attack) to slow down [credential stuffing](https://en.wikipedia.org/wiki/Credential_stuffing). This will prevent an IP address from trying to sign into many different accounts using credentials from data breaches. 21 | 22 | Create `config/initializers/rack_attack.rb` with: 23 | 24 | ```ruby 25 | Rack::Attack.throttle("logins/ip", limit: 20, period: 1.hour) do |req| 26 | req.ip if req.post? && req.path.start_with?("/users/sign_in") 27 | end 28 | 29 | ActiveSupport::Notifications.subscribe("rack.attack") do |name, start, finish, request_id, req| 30 | puts "Throttled #{req.env["rack.attack.match_discriminator"]}" 31 | end 32 | ``` 33 | 34 | ### Record and monitor login attempts 35 | 36 | Use [AuthTrail](https://github.com/ankane/authtrail) to record login attempts. 37 | 38 | --- 39 | 40 | Remember, [defense in depth](https://en.wikipedia.org/wiki/Defense_in_depth_%28computing%29)! 41 | 42 | For more, check out [Secure Rails](https://github.com/ankane/secure_rails). 43 | -------------------------------------------------------------------------------- /archive/anonymizing-ips.md: -------------------------------------------------------------------------------- 1 | # Anonymizing IPs in Ruby 2 | 3 | With the [GDPR](https://en.wikipedia.org/wiki/General_Data_Protection_Regulation) just around the corner, here are two useful ways to protect your users’ IP addresses. 4 | 5 | Both support IPv4 and IPv6, and are included in the [ip_anonymizer](https://github.com/ankane/ip_anonymizer) gem. 6 | 7 | ## Masking 8 | 9 | This is the approach [Google Analytics uses for IP anonymization](https://support.google.com/analytics/answer/2763052): 10 | 11 | - For IPv4, the last octet is set to 0 12 | - For IPv6, the last 80 bits are set to zeros 13 | 14 | ```ruby 15 | require "ipaddr" 16 | 17 | def mask_ip(ip) 18 | addr = IPAddr.new(ip) 19 | if addr.ipv4? 20 | # set last octet to 0 21 | addr.mask(24).to_s 22 | else 23 | # set last 80 bits to zeros 24 | addr.mask(48).to_s 25 | end 26 | end 27 | ``` 28 | 29 | Examples 30 | 31 | ```ruby 32 | mask_ip("8.8.4.4") 33 | # => "8.8.4.0" 34 | 35 | mask_ip("2001:4860:4860:0:0:0:0:8844") 36 | # => "2001:4860:4860::" 37 | ``` 38 | 39 | ## Hashing 40 | 41 | This transforms IP addresses with a keyed hash function (PBKDF2-HMAC-SHA256). If an unkeyed function is used (like SHA1), it’s trivial to build a rainbow table. 42 | 43 | ```ruby 44 | require "ipaddr" 45 | require "openssl" 46 | 47 | def hash_ip(ip, key:) 48 | addr = IPAddr.new(ip) 49 | key_len = addr.ipv4? ? 4 : 16 50 | family = addr.ipv4? ? Socket::AF_INET : Socket::AF_INET6 51 | 52 | keyed_hash = OpenSSL::PKCS5.pbkdf2_hmac(addr.to_s, key, 20000, key_len, "sha256") 53 | IPAddr.new(keyed_hash.bytes.inject {|a, b| (a << 8) + b }, family).to_s 54 | end 55 | ``` 56 | 57 | Examples 58 | 59 | ```ruby 60 | hash_ip("8.8.4.4", key: "secret") 61 | # => "114.124.40.57" 62 | 63 | hash_ip("2001:4860:4860:0:0:0:0:8844", key: "secret") 64 | # => "49a2:718:9704:cf11:2068:4c15:587c:1e15" 65 | ``` 66 | -------------------------------------------------------------------------------- /archive/backsolve.md: -------------------------------------------------------------------------------- 1 | # Backsolving in Ruby 2 | 3 | QR decomposition is a [stable way to solve linear regression](https://statsmaths.github.io/stat612/lectures/lec13/lecture13.pdf). 4 | 5 | ```ruby 6 | require "matrix" 7 | 8 | x = Matrix.columns([[1, 1, 1, 1, 1], [1, 2, 3, 4, 5], [4, 2, 5, 6, 1]]) 9 | y = Matrix.column_vector([145, 225, 355, 465, 515]) 10 | ``` 11 | 12 | You can use the [extendmatrix](https://github.com/clbustos/extendmatrix) gem to do decomposition in pure Ruby. Givens rotations [are faster](https://en.wikipedia.org/wiki/QR_decomposition#Using_Givens_rotations), but the implementation appears to have a bug. 13 | 14 | ```ruby 15 | require "extendmatrix" 16 | 17 | r = x.houseR 18 | q = x.houseQ 19 | ``` 20 | 21 | Next, split R into a p-by-p matrix and Q into a n-by-p matrix (see [page 32](https://statsmaths.github.io/stat612/lectures/lec13/lecture13.pdf) for reasoning). 22 | 23 | ```ruby 24 | r1 = r.minor(0...x.column_count, 0...x.column_count) 25 | q1 = q.minor(0...x.row_count, 0...x.column_count) 26 | ``` 27 | 28 | Finally, backsolve. Code ported from [Stack Overflow](https://codereview.stackexchange.com/questions/110039/back-substitution-method-for-solving-linear-system). 29 | 30 | ```ruby 31 | def backsolve(a, b) 32 | x = Matrix.zero(b.row_count, b.column_count) 33 | 34 | # use an array since matrices 35 | # are immutable in Ruby 36 | x = x.to_a 37 | 38 | c = 1.0 / a[-1, -1] 39 | x[-1] = b.row(-1).map { |v| c * v } 40 | 41 | (b.row_count - 2).downto(0) do |i| 42 | c = 1.0 / a[i, i] 43 | s = (a.minor(i..i, (i+1)..-1) * Matrix.rows(x[(i + 1)..-1])).to_a[0] 44 | x[i] = b.row(i).zip(s).map { |v, w| v - w }.map { |v| c * v } 45 | end 46 | 47 | Matrix.rows(x) 48 | end 49 | ``` 50 | 51 | Run it to get the coefficients 52 | 53 | ```ruby 54 | backsolve(r1, q1.transpose * y) 55 | ``` 56 | 57 | Regression solved! 58 | -------------------------------------------------------------------------------- /archive/aws-client-side-encryption.md: -------------------------------------------------------------------------------- 1 | # Client-Side Encryption with AWS and Ruby 2 | 3 | AWS makes it easy to enable server-side encryption on many of its services, but it also provides ways to do client-side encryption well. Here are a few ways in Ruby. 4 | 5 | ## S3 6 | 7 | Gem: [aws-sdk-s3](https://github.com/aws/aws-sdk-ruby) 8 | 9 | ```ruby 10 | client = Aws::S3::Encryption::Client.new( 11 | kms_key_id: "alias/my-key" 12 | ) 13 | 14 | client.put_object( 15 | body: File.read("test.txt"), 16 | bucket: "my-bucket", 17 | key: "test.txt" 18 | ) 19 | 20 | resp = client.get_object( 21 | bucket: "my-bucket", 22 | key: "test.txt" 23 | ) 24 | 25 | puts resp.body.read 26 | ``` 27 | 28 | Use `encryption_key` instead of `kms_key_id` to manage keys yourself. 29 | 30 | ## S3 + CarrierWave 31 | 32 | Gem: [carrierwave-aws](https://github.com/sorentwo/carrierwave-aws) 33 | 34 | ```ruby 35 | CarrierWave.configure do |config| 36 | config.storage = :aws 37 | config.aws_bucket = "my-bucket" 38 | config.aws_acl = "private" 39 | 40 | config.aws_credentials = { 41 | client: Aws::S3::Encryption::Client.new(kms_key_id: "alias/my-key") 42 | } 43 | end 44 | ``` 45 | 46 | Use `encryption_key` instead of `kms_key_id` to manage keys yourself. 47 | 48 | This can also be set on individual uploaders. 49 | 50 | ```ruby 51 | class AvatarUploader < CarrierWave::Uploader::Base 52 | def aws_credentials 53 | { 54 | client: Aws::S3::Encryption::Client.new(kms_key_id: "alias/my-key") 55 | } 56 | end 57 | end 58 | ``` 59 | 60 | ## Active Record Attributes 61 | 62 | Gem: [kms_encrypted](https://github.com/ankane/kms_encrypted) 63 | 64 | ```ruby 65 | class User < ApplicationRecord 66 | has_kms_key key_id: "alias/my-key" 67 | 68 | attr_encrypted :email, key: :kms_key 69 | end 70 | ``` 71 | 72 | ## Active Storage 73 | 74 | Check out [this post](https://ankane.org/activestorage-s3-encryption). 75 | -------------------------------------------------------------------------------- /archive/onnx-runtime-ruby.md: -------------------------------------------------------------------------------- 1 | # Score Almost Any Machine Learning Model in Ruby 2 | 3 | Ruby isn’t a common choice for machine learning, but companies running Ruby can get tremendous value from it. 4 | 5 | I’m happy to announce it’s now possible to build advanced models in TensorFlow, Scikit-learn, PyTorch, and a number of other tools, and score them in Ruby with minimal friction. 6 | 7 | To do this previously, you’d need to either: 8 | 9 | 1. Shell out 10 | 2. Create a microservice 11 | 3. Use a bridge like [PyCall](https://github.com/mrkn/pycall.rb) or [RSRuby](https://github.com/alexgutteridge/rsruby) 12 | 13 | All of these approaches require managing another language in production. Luckily, there’s now a better way. 14 | 15 | ONNX (pronounced “On-ix”) is a serialization format for storing models created by Facebook and Microsoft. It was designed for neural networks but now supports traditional ML models as well. Based on its current adoption, I won’t be surprised if it replaces PMML as the de facto interchange format for ML models. 16 | 17 | To run ONNX models, Microsoft created [ONNX Runtime](https://github.com/microsoft/onnxruntime), a “cross-platform, high performance scoring engine for ML models.” ONNX Runtime has a C API, which Ruby is happy to use. Thanks to FFI, it even works on JRuby! 18 | 19 |

ONNX Runtime Ruby

20 | 21 | ONNX Runtime is designed to be fast, and Microsoft saw significant increases in [performance](https://cloudblogs.microsoft.com/opensource/2019/05/22/onnx-runtime-machine-learning-inferencing-0-4-release/) for a number of models after deploying it. 22 | 23 | This is another step forward for machine learning in Ruby. Earlier this month, XGBoost and LightGBM also [came to Ruby](https://ankane.org/xgboost-lightgbm-come-to-ruby). 24 | 25 | Check out the [ONNX Runtime](https://github.com/ankane/onnxruntime) gem today! 26 | -------------------------------------------------------------------------------- /archive/trying-out-vault.md: -------------------------------------------------------------------------------- 1 | # Trying Out Vault for Postgres Credentials 2 | 3 | Install [Vault](https://www.vaultproject.io/), as well as JQ for JSON parsing 4 | 5 | ```sh 6 | brew install vault jq 7 | ``` 8 | 9 | Start the dev server 10 | 11 | ```sh 12 | vault server -dev 13 | ``` 14 | 15 | Then open another window. For this demo, we’ll create a new Postgres database. 16 | 17 | ```sh 18 | createdb myapp 19 | ``` 20 | 21 | Create a Postgres user for Vault to manage other users 22 | 23 | ```sh 24 | psql -c "CREATE USER vault WITH CREATEROLE ENCRYPTED PASSWORD 'secret';" myapp 25 | ``` 26 | 27 | And create a role to grant to temporary users. This is where you should configure privileges (omitted). 28 | 29 | ```sh 30 | psql -c "CREATE ROLE app;" myapp 31 | ``` 32 | 33 | Configure Vault. We set a default TTL of 10 seconds for users to test. 34 | 35 | ```sh 36 | export VAULT_ADDR='http://127.0.0.1:8200' 37 | 38 | vault mount database 39 | 40 | vault write database/config/postgresql \ 41 | plugin_name=postgresql-database-plugin \ 42 | allowed_roles="app" \ 43 | connection_url="postgresql://vault:secret@localhost:5432/myapp?sslmode=disable" 44 | 45 | vault write database/roles/app \ 46 | db_name=postgresql \ 47 | creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';" \ 48 | default_ttl="10s" \ 49 | max_ttl="24h" 50 | ``` 51 | 52 | Fetch temporary credentials 53 | 54 | ```sh 55 | vault read -format=json database/creds/app 56 | ``` 57 | 58 | Save the result as environment variables (with JQ) 59 | 60 | ```sh 61 | data=`vault read -format=json database/creds/app` 62 | export PGUSER=`echo $data | jq -r '.data.username'` 63 | export PGPASSWORD=`echo $data | jq -r '.data.password'` 64 | ``` 65 | 66 | Test the new user 67 | 68 | ```sh 69 | psql -c "SELECT current_user;" myapp 70 | ``` 71 | 72 | Wait 10 seconds and re-run the command to confirm the user no longer exists 73 | -------------------------------------------------------------------------------- /archive/large-text-indexes.md: -------------------------------------------------------------------------------- 1 | # Large Text Indexes in Postgres 2 | 3 | --- 4 | 5 | **Note:** This article was written for Postgres 9.6 and below. For Postgres 10+, use hash indexes instead. 6 | 7 | --- 8 | 9 | An index on a sufficiently large `text` column can take up more space than the table itself. If you only need to check for equality, you can significantly reduce the size of the index. 10 | 11 | At first glance, a hash index seem perfect for this. However, you shouldn’t use them since [they are not currently WAL-logged](https://www.postgresql.org/docs/9.6/indexes-types.html). Instead, use an expression index: 12 | 13 | ```sql 14 | CREATE INDEX CONCURRENTLY ON table_name (CAST(md5(column_name) AS uuid)); 15 | ``` 16 | 17 | Cast to a `uuid` since it’s 16 bytes - the [perfect size](https://dba.stackexchange.com/questions/115271/what-is-the-optimal-data-type-for-an-md5-field) to store an md5 hash. 18 | 19 | 20 | Add an extra condition to your queries so the index is used. 21 | 22 | ```sql 23 | SELECT * FROM table_name WHERE column_name = 'some_value' 24 | AND md5(column_name)::uuid = md5('some_value')::uuid; -- add this 25 | ``` 26 | 27 | Keep the original equality comparison in the unlikely chance of a hash collision. 28 | 29 | Finally, confirm it worked: 30 | 31 | ```sql 32 | EXPLAIN ANALYZE 33 | SELECT * FROM table_name WHERE column_name = 'some_value' 34 | AND md5(column_name)::uuid = md5('some_value')::uuid; 35 | ``` 36 | 37 | This should show the new index being used. 38 | 39 | ``` 40 | Index Scan using table_name_md5_idx on table_name (cost=0.58..8.60 rows=1 width=172) (actual time=0.012..0.012 rows=0 loops=1) 41 | Index Cond: ((md5(column_name)::uuid = '9619030c-750b-4300-95b1-2d365196de91'::uuid) 42 | Filter: (column_name = 'some_value'::text) 43 | Planning time: 0.062 ms 44 | Execution time: 0.027 ms 45 | ``` 46 | 47 | If it’s not, run `ANALYZE VERBOSE table_name;` and try again. 48 | 49 | This reduced the size of one of our indexes at [Instacart](https://www.instacart.com) by **7x!** :slot_machine: 50 | -------------------------------------------------------------------------------- /archive/learn-data-science.md: -------------------------------------------------------------------------------- 1 | # Learn Data Science 2 | 3 | R and Python are two popular languages for data science. We use both at [Instacart](https://www.instacart.com). 4 | 5 | **This is a short guide for R.** 6 | 7 | It’s quick and everything is completely free. Follow in order for a smoother ride. 8 | 9 | ### Setup 10 | 11 | Download [R](https://cloud.r-project.org/) and [RStudio Desktop](https://www.rstudio.com/products/rstudio/download/). 12 | 13 | ### Getting Started 14 | 15 | Complete [Try R](https://tryr.codeschool.com). Then read about [data structures](http://adv-r.had.co.nz/Data-structures.html) and [subsetting](http://adv-r.had.co.nz/Subsetting.html). The `str()` command will be a favorite as you learn. 16 | 17 | ### Your First Model 18 | 19 | Kaggle is a platform for data science competitions. Complete the [“getting started” competition](https://www.kaggle.com/c/titanic) by following [this great tutorial](https://trevorstephens.com/post/72916401642/titanic-getting-started-with-r) (5 parts). 20 | 21 | ### Keep Learning 22 | 23 | Read and do the labs in [An Introduction to Statistical Learning](https://www-bcf.usc.edu/~gareth/ISL/) (available as a free PDF). Understand the bias-variance tradeoff. 24 | 25 | Check out [R for Data Science](http://r4ds.had.co.nz/) and these courses: 26 | 27 | - [Practical Machine Learning](https://www.coursera.org/learn/practical-machine-learning) 28 | - [Exploratory Data Analysis](https://www.coursera.org/learn/exploratory-data-analysis) 29 | - [Regression Models](https://www.coursera.org/learn/regression-models) 30 | - [Data Cleaning](https://www.coursera.org/learn/data-cleaning) 31 | 32 | Get a quick intro to [time-series analysis](https://a-little-book-of-r-for-time-series.readthedocs.org/en/latest/src/timeseries.html). 33 | 34 | Also check out [Google’s R Style Guide](https://google.github.io/styleguide/Rguide.xml). 35 | 36 | ### Good to Know 37 | 38 | - variables and function names in R can (and often do) contain dots - `lm.fit` is a variable name, not a call to the `lm` object 39 | -------------------------------------------------------------------------------- /archive/devise-argon2.md: -------------------------------------------------------------------------------- 1 | # Argon2 with Devise 2 | 3 | bcrypt has been a great choice for safely storing passwords. However, as time has passed, a better alternative has emerged: [Argon2](https://password-hashing.net/). OWASP [now recommends](https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/Password_Storage_Cheat_Sheet.md) Argon2 for new applications. With a little bit of code, you can use Argon2 with Devise. 4 | 5 | Devise supports [custom encryptors](https://github.com/plataformatec/devise-encryptable). However, it requires a separate column to store a salt, which isn’t needed as Argon2 stores the salt in the password hash (like bcrypt). 6 | 7 | Instead, add [argon2](https://github.com/technion/ruby-argon2) to your Gemfile: 8 | 9 | ```ruby 10 | gem 'argon2', '>= 2' 11 | ``` 12 | 13 | And create `config/initializers/devise_argon2.rb` with: 14 | 15 | ```ruby 16 | module Argon2Encryptor 17 | def digest(klass, password) 18 | if klass.pepper.present? 19 | password = "#{password}#{klass.pepper}" 20 | end 21 | ::Argon2::Password.create(password) 22 | end 23 | 24 | def compare(klass, hashed_password, password) 25 | return false if hashed_password.blank? 26 | 27 | if hashed_password.start_with?("$argon2") 28 | if klass.pepper.present? 29 | password = "#{password}#{klass.pepper}" 30 | end 31 | ::Argon2::Password.verify_password(password, hashed_password) 32 | else 33 | super 34 | end 35 | end 36 | end 37 | 38 | Devise::Encryptor.singleton_class.prepend(Argon2Encryptor) 39 | ``` 40 | 41 | All new passwords will be hashed with Argon2. For existing passwords, rotate to Argon2 when a user signs in. Add to your model: 42 | 43 | ```ruby 44 | class User < ApplicationRecord 45 | def valid_password?(password) 46 | valid = super 47 | if valid && !encrypted_password.start_with?("$argon2") 48 | self.password = password 49 | save(validate: false) 50 | end 51 | valid 52 | end 53 | end 54 | ``` 55 | 56 | You can also [rehash all passwords at once](https://www.michalspacek.com/upgrading-existing-password-hashes), but it’s a bit more complicated. 57 | 58 | Congrats, your password storage is even stronger! 59 | -------------------------------------------------------------------------------- /archive/lockbox-types.md: -------------------------------------------------------------------------------- 1 | # Lockbox: Now with Types 2 | 3 | A new version of [Lockbox](https://ankane.org/modern-encryption-rails) was just released with support for types, making it easier to encrypt non-string fields. 4 | 5 | ```ruby 6 | class User < ApplicationRecord 7 | encrypts :born_on, type: :date 8 | encrypts :salary, type: :integer 9 | end 10 | ``` 11 | 12 | Previously, you’d need to perform typecasting yourself, making it harder to work with encrypted fields. All of these types are supported: 13 | 14 | - date 15 | - datetime 16 | - boolean 17 | - integer 18 | - float 19 | - binary 20 | - json 21 | - hash 22 | 23 | Types are automatically detected for serialized fields for maximum compatibility with existing code and libraries. 24 | 25 | ```ruby 26 | class User < ApplicationRecord 27 | serialize :properties, JSON 28 | encrypts :properties # detects JSON type 29 | end 30 | ``` 31 | 32 | It even works with custom serializers. 33 | 34 | ## Padding 35 | 36 | This release also adds support for padding. Padding can help conceal the exact length of messages. As the [Libsodium docs](https://libsodium.gitbook.io/doc/padding) explain: 37 | 38 | > Most modern cryptographic constructions disclose message lengths. The ciphertext for a given message will always have the same length, or add a constant number of bytes to it. For most applications, this is not an issue. But in some specific situations, [...] hiding the length may be desirable. 39 | 40 | Suppose a person’s health is categorized as either: 41 | 42 | - excellent 43 | - average 44 | - poor 45 | 46 | Even if this value is encrypted, it’s easy to know the status of a person since each category has a different length, which carries over to the ciphertext. Padding addresses this by adding data to the end of each message before encryption. You can enable padding for a field with: 47 | 48 | ```ruby 49 | class Person < ApplicationRecord 50 | encrypts :health_status, padding: true 51 | end 52 | ``` 53 | 54 | This expands all messages to a multiple of 16 bytes. You can configure the multiple as needed based on your data. 55 | 56 | Get started with types and padding by grabbing the [latest version](https://github.com/ankane/lockbox) of Lockbox today! 57 | -------------------------------------------------------------------------------- /archive/the-origin-of-sql-queries.md: -------------------------------------------------------------------------------- 1 | # The Origin of SQL Queries 2 | 3 | Do you know what part of your application is generating that time-consuming database query? There’s a much simpler way than `grep`. 4 | 5 | **Add comments to your queries!!** 6 | 7 | Turn: 8 | 9 | ```sql 10 | SELECT * FROM pandas WHERE mood = 'sad' 11 | ``` 12 | 13 | into 14 | 15 | ```sql 16 | SELECT * FROM pandas WHERE mood = 'happy' 17 | /*application:Nature,job:EatBambooJob*/ 18 | ``` 19 | 20 | Whether you use [PgHero](https://github.com/ankane/pghero), [pg_stat_statements](https://www.postgresql.org/docs/current/static/pgstatstatements.html) on its own, or `log_min_duration_statement` to log slow queries, comments can help! 21 | 22 | ## Ruby on Rails 23 | 24 | [Marginalia](https://github.com/basecamp/marginalia) is great. We prefer to customize slightly in `config/initializers/marginalia.rb`. 25 | 26 | ```ruby 27 | module Marginalia 28 | module Comment 29 | # add namespace to controller 30 | def self.controller 31 | if marginalia_controller.respond_to?(:controller_path) 32 | marginalia_controller.controller_path 33 | end 34 | end 35 | end 36 | end 37 | 38 | # add job 39 | Marginalia::Comment.components << :job 40 | ``` 41 | 42 | ## Python 43 | 44 | With SQLAlchemy and Flask: 45 | 46 | ```python 47 | from flask import current_app, request 48 | from sqlalchemy.engine import Engine 49 | from sqlalchemy import event 50 | 51 | @event.listens_for(Engine, "before_cursor_execute", retval=True) 52 | def annotate_queries(conn, cursor, statement, parameters, context, executemany): 53 | comment = "" 54 | try: 55 | comment = " /*application:{},endpoint:{}*/".format(current_app.name, 56 | request.endpoint) 57 | except RuntimeError: # running in the CLI 58 | try: 59 | comment = " /*application:{}*/".format(current_app.name) 60 | except RuntimeError: # running in a REPL 61 | pass 62 | return statement + comment, parameters 63 | ``` 64 | 65 | ## R 66 | 67 | With [dbx](https://github.com/ankane/dbx), use: 68 | 69 | ```r 70 | options(dbx_comment=TRUE) 71 | ``` 72 | 73 | ## Other Languages and Frameworks 74 | 75 | Please [submit a PR](https://github.com/ankane/shorts/pulls)! 76 | -------------------------------------------------------------------------------- /archive/blind-index-1-0.md: -------------------------------------------------------------------------------- 1 | # Blind Index 1.0 2 | 3 |

Blind Index 1.0

4 | 5 | Blind indexing is an approach to [securely search encrypted data](https://paragonie.com/blog/2017/05/building-searchable-encrypted-databases-with-php-and-sql) with minimal information leakage. 6 | 7 | I’m happy to announce that [Blind Index 1.0](https://github.com/ankane/blind_index) was just released! Here are the key improvements. 8 | 9 | ## Stronger Algorithm 10 | 11 | This release adds support for Argon2id and makes it the default algorithm. 12 | 13 | Argon2 is a memory-hard function. You specify the amount of memory required to compute a hash, and if an attacker tries to compute the hash with less memory, it takes significantly more time to compute. This allows it to better resist attacks on specialized hardware like ASICs. 14 | 15 | Argon2 is significantly better than PBKDF2 (the previous default), so we recommend upgrading for better security. 16 | 17 | ## Less Keys to Manage 18 | 19 | It’s a good practice to use a separate key for each blind index. However, generating, storing, and deploying new keys can be burdensome. Thanks to [this key separation method](https://ciphersweet.paragonie.com/internals/key-hierarchy) by CipherSweet, this is no longer needed. Instead, you can use a single master key and the library will derive separate keys for each blind index automatically. You no longer have to worry about managing additional secrets. 20 | 21 | ## Better Naming 22 | 23 | In earlier versions, blind index columns took the format `encrypted_#{name}_bidx`. This was done to match the encrypted columns of the attr_encrypted gem. However, this column is a hash rather than encrypted data, so the `encrypted_` prefix doesn’t really make sense. It was removed in this release. 24 | 25 | ## Support for Lockbox 26 | 27 | This release also adds support for [Lockbox](https://ankane.org/modern-encryption-rails), a modern encryption library for Rails. 28 | 29 | ## Summary 30 | 31 | Blind Index 1.0 brings a number of improvements, and there’s a [smooth path to upgrading](https://github.com/ankane/blind_index#upgrading) with zero downtime. 32 | 33 | If you’re not encrypting data today because it makes it impossible to query, check out [Blind Index](https://github.com/ankane/blind_index). 34 | -------------------------------------------------------------------------------- /archive/rollup.md: -------------------------------------------------------------------------------- 1 | # Package Your JavaScript Libraries With Rollup 2 | 3 | [Rollup](https://rollupjs.org/guide/en) is a great tool for building libraries. 4 | 5 | > [“Webpack for apps, and Rollup for libraries”](https://medium.com/webpack/webpack-and-rollup-the-same-but-different-a41ad427058c) 6 | 7 | Run: 8 | 9 | ```sh 10 | yarn add rollup rollup-plugin-buble rollup-plugin-commonjs rollup-plugin-node-resolve rollup-plugin-uglify --dev 11 | ``` 12 | 13 | Add to `package.json`: 14 | 15 | ```json 16 | { 17 | "main": "dist/my-project.js", 18 | "module": "dist/my-project.esm.js", 19 | "scripts": { 20 | "build": "rollup -c" 21 | } 22 | } 23 | ``` 24 | 25 | Add `dist/` to your `.gitignore`. 26 | 27 | Create `rollup.config.js` with: 28 | 29 | ```es6 30 | import buble from "rollup-plugin-buble"; 31 | import commonjs from "rollup-plugin-commonjs"; 32 | import pkg from "./package.json"; 33 | import resolve from "rollup-plugin-node-resolve"; 34 | import uglify from "rollup-plugin-uglify"; 35 | 36 | const input = "src/index.js"; 37 | const outputName = "MyProject"; 38 | const external = Object.keys(pkg.peerDependencies || {}); 39 | const esExternal = external.concat(Object.keys(pkg.dependencies || {})); 40 | const banner = 41 | `/* 42 | * ${pkg.name} 43 | * ${pkg.description} 44 | * ${pkg.repository.url} 45 | * v${pkg.version} 46 | * ${pkg.license} License 47 | */ 48 | `; 49 | 50 | export default [ 51 | { 52 | input: input, 53 | output: { 54 | name: outputName, 55 | file: pkg.main, 56 | format: "umd", 57 | banner: banner 58 | }, 59 | external: external, 60 | plugins: [ 61 | resolve(), 62 | commonjs(), 63 | buble() 64 | ] 65 | }, 66 | { 67 | input: input, 68 | output: { 69 | name: outputName, 70 | file: pkg.main.replace(/\.js$/, ".min.js"), 71 | format: "umd" 72 | }, 73 | external: external, 74 | plugins: [ 75 | resolve(), 76 | commonjs(), 77 | buble(), 78 | uglify() 79 | ] 80 | }, 81 | { 82 | input: input, 83 | output: { 84 | file: pkg.module, 85 | format: "es", 86 | banner: banner 87 | }, 88 | external: esExternal, 89 | plugins: [ 90 | buble() 91 | ] 92 | } 93 | ]; 94 | ``` 95 | 96 | And run: 97 | 98 | ```sh 99 | yarn build 100 | ``` 101 | 102 | :fire: 103 | -------------------------------------------------------------------------------- /archive/new-rails-app-checklist.md: -------------------------------------------------------------------------------- 1 | # New Rails App Checklist 2 | 3 | How I personally start new apps 4 | 5 | ## Create Project 6 | 7 | Get the latest version of Rails 8 | 9 | ```sh 10 | gem install rails 11 | ``` 12 | 13 | Create a new app 14 | 15 | ```sh 16 | rails new -d postgresql --skip-turbolinks 17 | ``` 18 | 19 | Don’t fret too much over the name - you can easily update it later 20 | 21 | ## Version Control 22 | 23 | Add Git 24 | 25 | ```sh 26 | git add . 27 | git commit -m "Hello app" 28 | ``` 29 | 30 | ## App Config 31 | 32 | Make a few updates to `config/application.rb` 33 | 34 | Disable unwanted generators 35 | 36 | ```ruby 37 | config.generators do |g| 38 | g.assets false 39 | g.helper false 40 | g.test_framework nil 41 | end 42 | ``` 43 | 44 | Set time zone 45 | 46 | ```ruby 47 | config.time_zone = "Pacific Time (US & Canada)" 48 | ``` 49 | 50 | ## Services 51 | 52 | Create a directory for services 53 | 54 | ```sh 55 | mkdir app/services 56 | ``` 57 | 58 | And create `app/services/application_service.rb` with: 59 | 60 | ```ruby 61 | class ApplicationService 62 | include Rails.application.routes.url_helpers 63 | 64 | def self.perform(*args) 65 | new.perform(*args) 66 | end 67 | 68 | protected 69 | 70 | def default_url_options 71 | Rails.application.config.action_mailer.default_url_options 72 | end 73 | end 74 | ``` 75 | 76 | ## Templates 77 | 78 | Add [Haml](https://github.com/indirect/haml-rails) 79 | 80 | ```ruby 81 | gem 'haml-rails' 82 | ``` 83 | 84 | and run 85 | 86 | ```sh 87 | rake haml:erb2haml 88 | ``` 89 | 90 | ## Console 91 | 92 | Add [Awesome Print](https://github.com/awesome-print/awesome_print) 93 | 94 | ```ruby 95 | gem 'awesome_print', require: false 96 | ``` 97 | 98 | and have it run when the console starts in `config/application.rb` 99 | 100 | ```ruby 101 | console do 102 | require "awesome_print" 103 | AwesomePrint.irb! 104 | end 105 | ``` 106 | 107 | ## Environment 108 | 109 | Add [dotenv](https://github.com/bkeepers/dotenv) 110 | 111 | ```ruby 112 | gem 'dotenv-rails', groups: [:development, :test] 113 | ``` 114 | 115 | Create an env file and exclude it from version control 116 | 117 | ```ruby 118 | touch .env 119 | echo ".env" >> .gitignore 120 | ``` 121 | 122 | ## Lastly 123 | 124 | Run 125 | 126 | ```sh 127 | rails db:create 128 | rails s 129 | ``` 130 | 131 | and create something awesome 132 | -------------------------------------------------------------------------------- /archive/google-oauth-with-devise.md: -------------------------------------------------------------------------------- 1 | # Google OAuth with Devise 2 | 3 | Here’s a quick guide to setting up Google OAuth as your app’s exclusive authentication method 4 | 5 | Add to your Gemfile 6 | 7 | ```ruby 8 | gem 'devise' 9 | gem 'omniauth-google-oauth2' 10 | gem 'dotenv-rails', groups: [:development, :test] 11 | ``` 12 | 13 | And run 14 | 15 | ```sh 16 | rails generate devise:install 17 | ``` 18 | 19 | Create a `User` model 20 | 21 | ```sh 22 | rails g model User 23 | ``` 24 | 25 | In the migration, add 26 | 27 | ```ruby 28 | create_table :users do |t| 29 | t.string :name 30 | t.string :email 31 | t.string :provider 32 | t.string :uid 33 | t.string :remember_token 34 | t.datetime :remember_created_at 35 | t.timestamps null: false 36 | end 37 | 38 | add_index :users, :email, unique: true 39 | add_index :users, [:uid, :provider], unique: true 40 | ``` 41 | 42 | In your `User` model, add 43 | 44 | ```ruby 45 | devise :rememberable, :omniauthable, omniauth_providers: [:google_oauth2] 46 | ``` 47 | 48 | Create a controller 49 | 50 | ``` 51 | rails g controller OmniauthCallbacks 52 | ``` 53 | 54 | with 55 | 56 | ```ruby 57 | class OmniauthCallbacksController < Devise::OmniauthCallbacksController 58 | # replace with your authenticate method 59 | skip_before_action :authenticate_user! 60 | 61 | def google_oauth2 62 | auth = request.env["omniauth.auth"] 63 | user = User.where(provider: auth["provider"], uid: auth["uid"]) 64 | .first_or_initialize(email: auth["info"]["email"]) 65 | user.name ||= auth["info"]["name"] 66 | user.save! 67 | 68 | user.remember_me = true 69 | sign_in(:user, user) 70 | 71 | redirect_to after_sign_in_path_for(user) 72 | end 73 | end 74 | ``` 75 | 76 | In your routes, add 77 | 78 | ```ruby 79 | devise_for :users, controllers: {omniauth_callbacks: "omniauth_callbacks"} 80 | ``` 81 | 82 | In `config/devise.rb`, add 83 | 84 | ```ruby 85 | config.omniauth :google_oauth2, ENV["GOOGLE_CLIENT_ID"], ENV["GOOGLE_CLIENT_SECRET"], access_type: "online" 86 | ``` 87 | 88 | Follow the [Google API Setup](https://github.com/zquestz/omniauth-google-oauth2#google-api-setup) instructions and add your credentials in `.env` 89 | 90 | ``` 91 | GOOGLE_CLIENT_ID=0000000 92 | GOOGLE_CLIENT_SECRET=0000000 93 | ``` 94 | 95 | ## Bonus 96 | 97 | To remove the hash from the end of the URL after sign-in, use: 98 | 99 | ```js 100 | var href = window.location.href; 101 | if (href[href.length - 1] === "#") { 102 | if (typeof window.history.replaceState == "function") { 103 | history.replaceState({}, "", href.slice(0, -1)); 104 | } 105 | } 106 | ``` 107 | -------------------------------------------------------------------------------- /archive/artistic-style-transfer-ruby.md: -------------------------------------------------------------------------------- 1 | # Artistic Style Transfer in Ruby 2 | 3 | The [ONNX Model Zoo](https://github.com/onnx/models) has a number of interesting pretrained deep learning models. Thanks to the ONNX Runtime, we can run them in Ruby. 4 | 5 | Today, we’ll look at artistic style transfer. Here’s [the model](https://github.com/onnx/models/tree/master/vision/style_transfer/fast_neural_style) we’ll use. 6 | 7 | First, download the [pretrained model](https://github.com/onnx/models/raw/master/vision/style_transfer/fast_neural_style/models/opset9/rain_princess.onnx) and this awesome shot of a lynx. 8 | 9 |

Lynx

10 | 11 |

12 | Photo from the U.S. Fish and Wildlife Service 13 |

14 | 15 | Install the ONNX Runtime, MiniMagick, and Numo::NArray gems. MiniMagick allows us to manipulate images, and Numo::NArray makes it easier to work with multi-dimensional arrays. 16 | 17 | ```ruby 18 | gem "onnxruntime" 19 | gem "mini_magick" 20 | gem "numo-narray" 21 | ``` 22 | 23 | Next, load the image. We resize it to be the model dimensions. 24 | 25 | ```ruby 26 | img = MiniMagick::Image.open("lynx.jpg") 27 | img.resize "224x224^", "-gravity", "center", "-extent", "224x224" 28 | pixels = img.get_pixels 29 | ``` 30 | 31 | And load the model 32 | 33 | ```ruby 34 | model = OnnxRuntime::Model.new("rain_princess.onnx") 35 | ``` 36 | 37 | Perform the preprocessing steps from the model docs 38 | 39 | ```ruby 40 | pixels = Numo::NArray.cast(img.get_pixels) 41 | pixels = pixels.transpose(2, 0, 1) 42 | pixels = pixels.expand_dims(0) 43 | ``` 44 | 45 | Run the model 46 | 47 | ```ruby 48 | result = model.predict(input1: pixels) 49 | ``` 50 | 51 | Perform the postprocessing steps 52 | 53 | ```ruby 54 | out_pixels = Numo::NArray.cast(result["output1"].first) 55 | out_pixels = out_pixels.clip(0, 255).cast_to(Numo::UInt8) 56 | out_pixels = out_pixels.transpose(1, 2, 0) 57 | ``` 58 | 59 | And save the image 60 | 61 | ```ruby 62 | img = MiniMagick::Image.import_pixels(out_pixels.to_binary, img.width, img.height, 8, "rgb", "jpg") 63 | img.write("output.jpg") 64 | ``` 65 | 66 |

Lynx with Rain Princess style

67 | 68 | Four other styles are also available. 69 | 70 |

71 | Lynx with Udnie style 72 | Lynx with Candy style 73 | Lynx with Mosaic style 74 | Lynx with Pointilism style 75 |

76 | 77 | Here’s the [complete code](https://gist.github.com/ankane/33ffa59ea0f5add37edee04fa7aacd9e). Now go out and try it with your own images! 78 | -------------------------------------------------------------------------------- /archive/activestorage-s3-encryption.md: -------------------------------------------------------------------------------- 1 | # Active Storage S3 Client-Side Encryption 2 | 3 | Use [client-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html) to encrypt your data before sending it to S3. 4 | 5 | You can provide an encryption key to use directly or a [KMS](https://aws.amazon.com/kms/) key for envelope encryption. With envelope encryption, a data encryption key is retrieved from KMS and used to encrypt the file. An encrypted version of the key is stored in the object metadata. When downloading the file, the encrypted key is sent to KMS to be decrypted and then used to decrypt the file. The AWS SDK handles all this automatically. 6 | 7 | An advantage of this approach is S3 never sees the unencrypted file (unlike with server-side encryption). 8 | 9 | First, we’ll create a service that extends the built-in S3 service. Create `lib/active_storage/services/encrypted_s3_service.rb` with: 10 | 11 | ```ruby 12 | require "active_storage/service/s3_service" 13 | 14 | module ActiveStorage 15 | class Service::EncryptedS3Service < Service::S3Service 16 | attr_reader :encryption_client 17 | 18 | def initialize(bucket:, upload: {}, **options) 19 | super_options = options.except(:kms_key_id, :encryption_key) 20 | super(bucket: bucket, upload: upload, **super_options) 21 | @encryption_client = Aws::S3::Encryption::Client.new(options) 22 | end 23 | 24 | def upload(key, io, checksum: nil, **) 25 | instrument :upload, key: key, checksum: checksum do 26 | begin 27 | encryption_client.put_object( 28 | upload_options.merge( 29 | body: io, 30 | content_md5: checksum, 31 | bucket: bucket.name, 32 | key: key 33 | ) 34 | ) 35 | rescue Aws::S3::Errors::BadDigest 36 | raise ActiveStorage::IntegrityError 37 | end 38 | end 39 | end 40 | 41 | def download(key, &block) 42 | if block_given? 43 | raise NotImplementedError, "#get_object with :range not supported yet" 44 | else 45 | instrument :download, key: key do 46 | encryption_client.get_object( 47 | bucket: bucket.name, 48 | key: key 49 | ).body.string.force_encoding(Encoding::BINARY) 50 | end 51 | end 52 | end 53 | 54 | def download_chunk(key, range) 55 | raise NotImplementedError, "#get_object with :range not supported yet" 56 | end 57 | end 58 | end 59 | ``` 60 | 61 | Note that downloading chunks isn’t supported with client-side encryption. 62 | 63 | Then update `config/storage.yml` to use it: 64 | 65 | ```yml 66 | amazon: 67 | service: EncryptedS3 68 | bucket: my-bucket 69 | kms_key_id: alias/my-key 70 | ``` 71 | 72 | Use `encryption_key` instead of `kms_key_id` to manage keys yourself. 73 | 74 | And that’s it! 75 | 76 | For client-side encryption without Active Storage, check out [this post](https://ankane.org/aws-client-side-encryption). 77 | -------------------------------------------------------------------------------- /archive/pgbouncer-setup.md: -------------------------------------------------------------------------------- 1 | # PgBouncer Setup 2 | 3 | In under 5 minutes 4 | 5 | ## Get Started 6 | 7 | Here’s the flow: 8 | 9 | ```text 10 | Web app -> PgBouncer -> Postgres 11 | ``` 12 | 13 | You can install PgBouncer on the same server as Postgres or a separate server. For Amazon RDS, you won’t have shell access to the database server, so you’ll need to spin up another EC2 instance to run PgBouncer. 14 | 15 | ```text 16 | Web app -> EC2 running PgBouncer -> RDS instance 17 | ``` 18 | 19 | Start by launching a new instance of the latest LTS version of Ubuntu Server. Once the server is ready, ssh in. For the latest version of PgBouncer, we’ll use the [official Postgres APT repository](https://wiki.postgresql.org/wiki/Apt). 20 | 21 | ```sh 22 | sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' 23 | wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - 24 | sudo apt-get update 25 | sudo apt-get install pgbouncer 26 | ``` 27 | 28 | ## Configure PgBouncer 29 | 30 | Edit `/etc/pgbouncer/pgbouncer.ini`. The important settings are: 31 | 32 | ```ini 33 | [databases] 34 | YOUR-DBNAME = host=YOUR-HOST port=5432 dbname=YOUR-DBNAME 35 | 36 | [pgbouncer] 37 | listen_addr = * 38 | listen_port = 6432 39 | auth_type = md5 40 | auth_file = /etc/pgbouncer/userlist.txt 41 | pool_mode = transaction 42 | server_reset_query = 43 | ``` 44 | 45 | [View all settings](https://pgbouncer.github.io/config.html) 46 | 47 | Create `/etc/pgbouncer/userlist.txt` with: 48 | 49 | ``` 50 | "USERNAME1" "PASSWORD1" 51 | "USERNAME2" "PASSWORD2" 52 | ``` 53 | 54 | Use the same credentials as your database server. 55 | 56 | ## Start the Service 57 | 58 | ```sh 59 | sudo service pgbouncer start 60 | ``` 61 | 62 | Then reboot the server and confirm the process comes back up. 63 | 64 | ## Test 65 | 66 | ```sh 67 | psql -h 127.0.0.1 -p 6432 -d YOUR-DBNAME -U USERNAME1 68 | ``` 69 | 70 | ## Increase File Limits 71 | 72 | If you need more than 1,000 connections to PgBouncer, you’ll need to increase file limits. 73 | 74 | Append to `/etc/default/pgbouncer`: 75 | 76 | ```sh 77 | ulimit -n 16384 78 | ``` 79 | 80 | Restart the service with: 81 | 82 | ```sh 83 | sudo service pgbouncer restart 84 | ``` 85 | 86 | To confirm it worked, find the process ID and run: 87 | 88 | ```sh 89 | cat /proc//limits 90 | ``` 91 | 92 | `Max open files` should reflect the value above. 93 | 94 | ## App Changes 95 | 96 | Be sure to disable prepared statements, as they will not work with PgBouncer in transaction mode. 97 | 98 | ## Statement Timeouts 99 | 100 | To use a [statement timeout](https://www.postgresql.org/docs/current/static/runtime-config-client.html#GUC-STATEMENT-TIMEOUT), run: 101 | 102 | ```sql 103 | ALTER ROLE USERNAME1 SET statement_timeout = 5000; 104 | ``` 105 | 106 | ## Congrats 107 | 108 | You’ve successfully set up PgBouncer. 109 | -------------------------------------------------------------------------------- /archive/csp-rails.md: -------------------------------------------------------------------------------- 1 | # Adding CSP to Rails 2 | 3 | Content Security Policy can be an effective way to prevent XSS attacks. If you aren’t familiar, here’s a [great intro](https://www.html5rocks.com/en/tutorials/security/content-security-policy/). 4 | 5 | To get started with Rails, first add the header to all requests in your `ApplicationController`. We want to start by blocking content in development so we notice it, but only report it in production so nothing breaks. 6 | 7 | ```ruby 8 | before_action :set_csp 9 | 10 | # use constants and freeze for performance 11 | CSP_HEADER_NAME = (Rails.env.production? ? "Content-Security-Policy-Report-Only" : "Content-Security-Policy").freeze 12 | CSP_HEADER_VALUE = "default-src *; report-uri /csp_reports?report_only=#{CSP_HEADER_NAME.include?("Report-Only")}".freeze 13 | 14 | def set_csp 15 | response.headers[CSP_HEADER_NAME] = CSP_HEADER_VALUE 16 | end 17 | ``` 18 | 19 | ## Reports 20 | 21 | Create a model to track reports. 22 | 23 | ```sh 24 | rails g model CspReport 25 | ``` 26 | 27 | And in the migration, do: 28 | 29 | ```ruby 30 | class CreateCspReports < ActiveRecord::Migration 31 | def change 32 | create_table :csp_reports do |t| 33 | t.text :document_uri 34 | t.text :referrer 35 | t.text :violated_directive 36 | t.text :effective_directive 37 | t.text :original_policy 38 | t.text :blocked_uri 39 | t.integer :status_code 40 | t.text :user_agent 41 | t.boolean :report_only 42 | t.timestamp :created_at 43 | end 44 | end 45 | end 46 | ``` 47 | 48 | Add a controller to create the reports. 49 | 50 | ```ruby 51 | class CspReportsController < ApplicationController 52 | skip_before_action :verify_authenticity_token 53 | 54 | def create 55 | report = JSON.parse(request.body.read)["csp-report"] 56 | CspReport.create!( 57 | document_uri: report["document-uri"], 58 | referrer: report["referrer"], 59 | violated_directive: report["violated-directive"], 60 | effective_directive: report["effective-directive"], 61 | original_policy: report["original-policy"], 62 | blocked_uri: report["blocked-uri"], 63 | status_code: report["status-code"], 64 | user_agent: request.user_agent, 65 | report_only: params[:report_only] == "true" 66 | ) 67 | head :ok 68 | end 69 | end 70 | ``` 71 | 72 | Don’t forget the route. 73 | 74 | ```ruby 75 | resources :csp_reports, only: [:create] 76 | ``` 77 | 78 | ## Enforcing the Policy 79 | 80 | Once the reports stop, you’ll want to enforce the policy in production. 81 | 82 | ```ruby 83 | CSP_HEADER_NAME = "Content-Security-Policy".freeze 84 | ``` 85 | 86 | ## Testing New Policies 87 | 88 | You can have both an enforced policy and a report only policy, so use this to your advantage when changing policies. Make the new policy report only for a bit before enforcing it. 89 | 90 | ```ruby 91 | before_action :set_csp_report_only 92 | 93 | # use constants and freeze for performance 94 | CSP_REPORT_ONLY_HEADER_NAME = "Content-Security-Policy-Report-Only".freeze 95 | CSP_REPORT_ONLY_HEADER_VALUE = "default-src https:; report-uri /csp_reports?report_only=true".freeze 96 | 97 | def set_csp_report_only 98 | response.headers[CSP_REPORT_ONLY_HEADER_NAME] = CSP_REPORT_ONLY_HEADER_VALUE 99 | end 100 | ``` 101 | -------------------------------------------------------------------------------- /archive/emotion-recognition-ruby.md: -------------------------------------------------------------------------------- 1 | # Emotion Recognition in Ruby 2 | 3 | Welcome to another installment of deep learning in Ruby. Today, we’ll look at [FER+](https://github.com/Microsoft/FERPlus), a deep convolutional neural network for emotion recognition developed at Microsoft. The project is open source, and there’s a pretrained model in the [ONNX Model Zoo](https://github.com/onnx/models/tree/master/vision/body_analysis/emotion_ferplus) that we can get running quickly in Ruby. 4 | 5 | First, download [the model](https://onnxzoo.blob.core.windows.net/models/opset_8/emotion_ferplus/emotion_ferplus.tar.gz) and this photo of a park ranger. 6 | 7 |

Park Ranger

8 | 9 |

10 | Photo from Yellowstone National Park 11 |

12 | 13 | We’ll use MiniMagick to prepare the image and the ONNX Runtime gem to run the model. 14 | 15 | ```ruby 16 | gem "mini_magick" 17 | gem "onnxruntime" 18 | ``` 19 | 20 | For the image, we need to zoom in on her face, resize it to 64x64, and convert it to grayscale. Typically, we’d use a face detection model to find the bounding box and use that information to crop the image, but for simplicity, we’ll do just do it manually. 21 | 22 | ```ruby 23 | img = MiniMagick::Image.open("ranger.jpg") 24 | img.crop "100x100+60+20", "-gravity", "center" # manual crop 25 | img.resize "64x64^", "-gravity", "center", "-extent", "64x64" 26 | img.colorspace "Gray" 27 | img.write "resized.jpg" 28 | ``` 29 | 30 | Here’s a blown up version: 31 | 32 |

Park Ranger

33 | 34 | Finally, create a 64x64 matrix of the grayscale intensities. 35 | 36 | ```ruby 37 | # all pixels are the same for grayscale, so just get one of them 38 | pixels = img.get_pixels.flat_map { |r| r.map(&:first) } 39 | input = OnnxRuntime::Utils.reshape(pixels, [1, 1, 64, 64]) 40 | ``` 41 | 42 | Now that the input is prepared, we can load and run the model. 43 | 44 | ```ruby 45 | model = OnnxRuntime::Model.new("model.onnx") 46 | output = model.predict("Input3" => input) 47 | ``` 48 | 49 | We use [softmax](https://victorzhou.com/blog/softmax/) to convert the model output into probabilities. 50 | 51 | ```ruby 52 | def softmax(x) 53 | exp = x.map { |v| Math.exp(v - x.max) } 54 | exp.map { |v| v / exp.sum } 55 | end 56 | 57 | probabilities = softmax(output["Plus692_Output_0"].first) 58 | ``` 59 | 60 | Then map the labels and sort by highest probability. 61 | 62 | ```ruby 63 | emotion_labels = [ 64 | "neutral", "happiness", "surprise", "sadness", 65 | "anger", "disgust", "fear", "contempt" 66 | ] 67 | pp emotion_labels.zip(probabilities).sort_by { |_, v| -v }.to_h 68 | ``` 69 | 70 | And the results are in: 71 | 72 | ``` 73 | { 74 | "happiness" => 0.9999839207138284, 75 | "surprise" => 1.0569785479062501e-05, 76 | "neutral" => 4.826811128840592e-06, 77 | "anger" => 4.63037778140089e-07, 78 | "sadness" => 9.574742925740587e-08, 79 | "contempt" => 7.941520916580971e-08, 80 | "fear" => 2.8803367665891773e-08, 81 | "disgust" => 1.568577943664937e-08 82 | } 83 | ``` 84 | 85 | There’s a 99.9% probability she looks happy in the photo. Not bad! 86 | 87 | Here’s the [complete code](https://gist.github.com/ankane/3bb4ddbf84edd7f05a24cd3697ccd9a7). Now go out and try it with your own images! 88 | -------------------------------------------------------------------------------- /archive/pghero-2-0.md: -------------------------------------------------------------------------------- 1 | # PgHero 2.0 Has Arrived 2 | 3 | It’s been over 2 years since PgHero 1.0 was released as a performance dashboard for Postgres. Since then, a number of new features have been added. 4 | 5 | - checks for serious issues like transaction ID wraparound and integer overflow 6 | - the ability to capture and view query stats over time 7 | - suggested indexes to give you a better idea of how to optimize queries (check out [Dexter](https://ankane.org/introducing-dexter) for automatic indexing) 8 | 9 | PgHero 2.0 provides even more insight into your database performance with two additional features: query details and space stats. 10 | 11 | ## Query Details 12 | 13 | PgHero makes it easy to see the most time-consuming queries during a given time period, but it’s hard to follow an individual query’s performance over time. When you run into issues, it’s not always easy to uncover what happened. Are the top queries during an incident consistently the most time-consuming, or are they new? Did the number of calls increase or was it the average time? 14 | 15 | The new [Query Details page](https://pghero.dokkuapp.com/datakick/queries/588635171) helps solve this. 16 | 17 |

PgHero Query Details Page

18 | 19 | This page allows you to deep dive into an individual query. View charts of total time, average time, and calls over the past 24 hours to see how they’ve moved. 20 | 21 | For those who [annotate queries](https://ankane.org/the-origin-of-sql-queries), you’ve likely realized the comment in PgHero only tells you one of the places a query is coming from since similar queries are grouped together. Now, you can get a better idea of all the places it’s called. 22 | 23 |

If you don’t annotate queries, you should!!

24 | 25 | This page also lists tables in the query and their indexes so you can quickly see if an index is missing, and an “Explain” button is usually available to help you debug (but may be missing if PgHero hasn’t captured an unnormalized version of the query recently). 26 | 27 | ## Space Stats 28 | 29 | PgHero 2.0 also helps you manage storage space. You can track the growth of tables and indexes over time and view this data on the [Space page](https://pghero.dokkuapp.com/datakick/space). To see the fastest growing relations, click on the “7d Growth” header. 30 | 31 |

PgHero Space Stats Page

32 | 33 | In addition, this page now reports unused indexes to help reclaim space. If you use read replicas, be sure to check that indexes aren’t used on any of them before dropping. 34 | 35 | You can also view the growth for an individual table or index over the past 30 days. 36 | 37 |

PgHero Space Growth Page

38 | 39 | Lastly, there’s syntax highlighting for all SQL for improved readability. 40 | 41 |

PgHero Syntax Highlighting

42 | 43 |

Much better :)

44 | 45 | So what are you waiting for? Get the [latest version](https://github.com/ankane/pghero) of PgHero today. 46 | 47 |
48 | 49 | Note: If you use PgHero outside the dashboard, there are some [breaking changes](https://github.com/ankane/pghero/blob/master/guides/Rails.md#200) from 1.x to be aware of. 50 | -------------------------------------------------------------------------------- /archive/hybrid-cryptography-rails.md: -------------------------------------------------------------------------------- 1 | # Hybrid Cryptography on Rails 2 | 3 |

Keys

4 | 5 | [Hybrid cryptography](https://en.wikipedia.org/wiki/Hybrid_cryptosystem) allows certain servers to encrypt data without the ability to decrypt it. This can greatly limit damage in the event of a breach. 6 | 7 | Suppose we have a service that sends text messages to customers. Customers enter their phone number through the website or mobile app. 8 | 9 | With hybrid cryptography, we can set up web servers to only encrypt phone numbers. Text messages can be sent through background jobs which run on a different set of servers - ones that can decrypt and don’t allow inbound traffic. If internal employees need to view phone numbers, they can use a separate set of web servers that are only accessible through the company VPN. 10 | 11 |   | Encrypt | Decrypt |   12 | --- | --- | --- | --- 13 | Customer web servers | ✓ | 14 | Background workers | ✓ | ✓ | No inbound traffic 15 | Internal web servers | ✓ | ✓ | Requires VPN 16 | 17 | ## Setup 18 | 19 | Install [Libsodium](https://github.com/crypto-rb/rbnacl/wiki/Installing-libsodium) and add [Lockbox](https://github.com/ankane/lockbox) and [RbNaCl](https://github.com/crypto-rb/rbnacl) to your Gemfile: 20 | 21 | ```ruby 22 | gem 'lockbox' 23 | gem 'rbnacl' 24 | ``` 25 | 26 | Generate keys in the Rails console with: 27 | 28 | ```ruby 29 | Lockbox.generate_key_pair 30 | ``` 31 | 32 | Store the keys with your other secrets. This is typically Rails credentials or an environment variable ([dotenv](https://github.com/bkeepers/dotenv) is great for this). Be sure to use different keys in development and production. 33 | 34 | ```sh 35 | PHONE_ENCRYPTION_KEY=... 36 | PHONE_DECRYPTION_KEY=... 37 | ``` 38 | 39 | Only set the decryption key on servers that should be able to decrypt. 40 | 41 | ## Database Fields 42 | 43 | We’ll store phone numbers in an encrypted database field. Create a migration to add a new column for the encrypted data. 44 | 45 | ```ruby 46 | class AddEncryptedPhoneToUsers < ActiveRecord::Migration[5.2] 47 | def change 48 | add_column :users, :phone_ciphertext, :string 49 | end 50 | end 51 | ``` 52 | 53 | In the model, add: 54 | 55 | ```ruby 56 | class User < ApplicationRecord 57 | encrypts :phone, algorithm: "hybrid", encryption_key: ENV["PHONE_ENCRYPTION_KEY"], decryption_key: ENV["PHONE_DECRYPTION_KEY"] 58 | end 59 | ``` 60 | 61 | Set a user’s phone number to ensure it works. 62 | 63 | ## Files 64 | 65 | Suppose we also need to accept sensitive documents. We can take a similar approach with file uploads. 66 | 67 | For Active Storage, use: 68 | 69 | ```ruby 70 | class User < ApplicationRecord 71 | encrypts_attached :document, algorithm: "hybrid", encryption_key: ENV["PHONE_ENCRYPTION_KEY"], decryption_key: ENV["PHONE_DECRYPTION_KEY"] 72 | end 73 | ``` 74 | 75 | For CarrierWave, use: 76 | 77 | ```ruby 78 | class DocumentUploader < CarrierWave::Uploader::Base 79 | encrypt algorithm: "hybrid", encryption_key: ENV["PHONE_ENCRYPTION_KEY"], decryption_key: ENV["PHONE_DECRYPTION_KEY"] 80 | end 81 | ``` 82 | 83 | You can also encrypt an IO stream directly. 84 | 85 | ```ruby 86 | box = Lockbox.new(algorithm: "hybrid", encryption_key: ENV["PHONE_ENCRYPTION_KEY"], decryption_key: ENV["PHONE_DECRYPTION_KEY"]) 87 | box.encrypt(params[:file]) 88 | ``` 89 | 90 | ## Conclusion 91 | 92 | You’ve now seen an approach for keeping your data safe in the event a server is compromised. For more on data protection, check out [Securing Sensitive Data in Rails](https://ankane.org/sensitive-data-rails). 93 | -------------------------------------------------------------------------------- /archive/postgres-sslmode-explained.md: -------------------------------------------------------------------------------- 1 | # Postgres SSLMODE Explained 2 | 3 | When you connect to a database, Postgres uses the `sslmode` parameter to determine the security of the connection. There are many options, so here’s an analogy to web security: 4 | 5 | - `disable` is HTTP 6 | - `verify-full` is HTTPS 7 | 8 | All the other options fall somewhere in between, and by design, make less guarantees of security than HTTPS in your browser does. 9 | 10 |

Screenshot

11 | 12 | This includes the default `prefer`. The [Postgres docs](https://www.postgresql.org/docs/current/libpq-ssl.html) have a great table explaining this: 13 | 14 |

Table

15 | 16 | Other modes like `require` are still useful in protecting against passive attacks (sniffing), but are vulnerable to active attacks that can compromise your credentials. Tarjei Husøy created [postgres-mitm](https://thusoy.com/2016/mitming-postgres) to demonstrate this. 17 | 18 | ## Defense 19 | 20 | The best way to protect a database is to limit inbound traffic. Require a VPN or SSH tunneling through a [bastion host](https://medium.com/@bill_73959/understanding-bastions-hosts-6ccd457e41ac) to connect. This ensures connections are always secure, and even if database credentials are compromised, an attacker won’t be able to access the database. 21 | 22 | If this is not feasible, always use `verify-full`. This includes from code, psql, SQL clients, and other tools like [pgsync](https://github.com/ankane/pgsync) and [pgslice](https://github.com/ankane/pgslice). 23 | 24 | You can specify `sslmode` in the connection URI: 25 | 26 | ```text 27 | postgresql://user:pass@host/dbname?sslmode=verify-full&sslrootcert=ca.pem 28 | ``` 29 | 30 | Or use environment variables. 31 | 32 | ```sh 33 | PGSSLMODE=verify-full PGSSLROOTCERT=ca.pem 34 | ``` 35 | 36 | Libraries for most programming languages have options as well. 37 | 38 | ```ruby 39 | PG.connect(sslmode: "verify-full", sslrootcert: "ca.pem") 40 | ``` 41 | 42 | ## Certificates 43 | 44 | To verify an SSL/TLS certificate, the client checks it against a root certificate. Your browser ships with root certificates to verify HTTPS websites. Postgres doesn’t come with any root certificates, so to use `verify-full`, you must specify one. 45 | 46 | Here are root certificates for a number of providers: 47 | 48 | Provider | Certificate | Docs 49 | --- | --- | --- 50 | Amazon RDS | [Download](https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem) | [View](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html#PostgreSQL.Concepts.General.SSL) 51 | Google Cloud SQL | In Account | [View](https://cloud.google.com/sql/docs/postgres/connect-admin-ip) 52 | Digital Ocean | In Account | [View](https://www.digitalocean.com/docs/databases/how-to/clusters/secure-clusters/) 53 | Citus Data | [Download](https://console.citusdata.com/citus.crt) | [View](https://docs.citusdata.com/en/v8.0/cloud/security.html) 54 | 55 | There’s no way to use `verify-full` with Heroku Postgres, so use caution when connecting from networks you don't fully trust. Instead of `heroku pg:psql`, use: 56 | 57 | ```sh 58 | heroku run psql \$DATABASE_URL 59 | ``` 60 | 61 | This securely connects to a dyno before connecting to the database. 62 | 63 | If you use PgBouncer, [set up secure connections](https://ankane.org/securing-pgbouncer-amazon-rds) for it as well. 64 | 65 | ## Conclusion 66 | 67 | Hopefully this helps you understand connection security a bit better. 68 | 69 |
70 | 71 | Updates 72 | 73 | - August 2019: Added Digital Ocean 74 | -------------------------------------------------------------------------------- /archive/ruby-ml-for-python-coders.md: -------------------------------------------------------------------------------- 1 | # Ruby ML for Python Coders 2 | 3 |

4 | Python and Ruby 5 |

6 | 7 | Curious to try machine learning in Ruby? Here’s a short cheatsheet for Python coders. 8 | 9 | Data structure basics 10 | 11 | - [Numo: NumPy for Ruby](/numo) 12 | - [Daru: Pandas for Ruby](/daru) 13 | 14 | Libraries 15 | 16 | Category | Python | Ruby 17 | --- | --- | --- 18 | Multi-dimensional arrays | [NumPy](https://github.com/numpy/numpy) | [Numo](https://github.com/ruby-numo/numo-narray) 19 | Data frames | [Pandas](https://github.com/pandas-dev/pandas) | [Daru](https://github.com/SciRuby/daru) 20 | Visualization | [Matplotlib](https://github.com/matplotlib/matplotlib) | [Nyaplot](https://github.com/domitry/nyaplot) 21 | Predictive modeling | [Scikit-learn](https://github.com/scikit-learn/scikit-learn) | [Rumale](https://github.com/yoshoku/rumale) 22 | Gradient boosting | [XGBoost](https://github.com/dmlc/xgboost), [LightGBM](https://github.com/Microsoft/LightGBM) | [XGBoost](https://github.com/ankane/xgb), [LightGBM](https://github.com/ankane/lightgbm) 23 | Deep learning | [PyTorch](https://github.com/pytorch/pytorch), [TensorFlow](https://github.com/tensorflow/tensorflow) | [Torch-rb](https://github.com/ankane/torch-rb), [TensorFlow](https://github.com/ankane/tensorflow) (TensorFlow :construction:) 24 | Recommendations | [Surprise](https://github.com/NicolasHug/Surprise), [Implicit](https://github.com/benfred/implicit/) | [Disco](https://github.com/ankane/disco) 25 | Approximate nearest neighbors | [NGT](https://github.com/yahoojapan/NGT), [Annoy](https://github.com/spotify/annoy) | [NGT](https://github.com/ankane/ngt), [Hanny](https://github.com/yoshoku/hanny) 26 | Factorization machines | [xLearn](https://github.com/aksnzhy/xlearn) | [xLearn](https://github.com/ankane/xlearn) 27 | Natural language processing | [spaCy](https://github.com/explosion/spaCy), [NTLK](https://github.com/nltk/nltk) | [Many gems](https://github.com/arbox/nlp-with-ruby) (nothing comprehensive :cry:) 28 | Text classification | [fastText](https://github.com/facebookresearch/fastText) | [fastText](https://github.com/ankane/fasttext) 29 | Forecasting | [Prophet](https://github.com/facebook/prophet) | :cry: 30 | Optimization | [CVXPY](https://github.com/cvxgrp/cvxpy), [PuLP](https://github.com/coin-or/pulp), [SCS](https://github.com/cvxgrp/scs), [OSQP](https://github.com/oxfordcontrol/osqp) | [CBC](https://github.com/gverger/ruby-cbc), [SCS](https://github.com/ankane/scs), [OSQP](https://github.com/ankane/osqp) 31 | Reinforcement learning | [Vowpal Wabbit](https://github.com/VowpalWabbit/vowpal_wabbit) | [Vowpal Wabbit](https://github.com/ankane/vowpalwabbit) 32 | Scoring engine | [ONNX Runtime](https://github.com/Microsoft/onnxruntime) | [ONNX Runtime](https://github.com/ankane/onnxruntime), [Menoh](https://github.com/pfnet-research/menoh-ruby) 33 | 34 | This list is by no means comprehensive. Some Ruby libraries are ones I created, as mentioned [here](/new-ml-gems). 35 | 36 | If you’re planning to add Ruby support to your ML library: 37 | 38 | Category | Python | Ruby 39 | --- | --- | --- 40 | FFI (native) | [ctypes](https://docs.python.org/3/library/ctypes.html) | [Fiddle](https://ruby-doc.org/stdlib-2.7.0/libdoc/fiddle/rdoc/Fiddle.html) 41 | FFI (library) | [cffi](https://cffi.readthedocs.io/en/latest/) | [FFI](https://github.com/ffi/ffi) 42 | C++ extensions | [pybind11](https://github.com/pybind/pybind11) | [Rice](https://github.com/jasonroelofs/rice) 43 | Compile to C | [Cython](https://github.com/cython/cython) | [Rubex](https://github.com/SciRuby/rubex) 44 | 45 | Give Ruby a shot for your next maching learning project! 46 | -------------------------------------------------------------------------------- /archive/scaling-reads.md: -------------------------------------------------------------------------------- 1 | # Scaling Reads 2 | 3 | **Note:** This approach is now packaged into [a gem](https://github.com/ankane/distribute_reads) :gem: 4 | 5 | --- 6 | 7 | One of the easier ways to scale your database is to distribute reads to replicas. 8 | 9 | ## Desire 10 | 11 | Here’s the desired behavior: 12 | 13 | ```ruby 14 | User.find(1) # primary 15 | 16 | distribute_reads do 17 | # use replica for reads 18 | User.maximum(:visits_count) # replica 19 | User.find(2) # replica 20 | 21 | # until a write 22 | # then switch to primary 23 | User.create! # primary 24 | User.last # primary 25 | end 26 | ``` 27 | 28 | ## Contenders 29 | 30 | We looked at a number of libraries, including [Octopus](https://github.com/tchandy/octopus), [Octoshark](https://github.com/dalibor/octoshark), and [Replica Pools](https://github.com/kickstarter/replica_pools). 31 | 32 | The winner was [Makara](https://github.com/taskrabbit/makara) - it handles failover well and has a simple configuration. 33 | 34 | ## Getting Started 35 | 36 | First, install Makara. 37 | 38 | ```ruby 39 | gem 'makara' 40 | ``` 41 | 42 | There are 3 important `ENV` variables in our setup. 43 | 44 | - `DATABASE_URL` - primary database 45 | - `REPLICA_DATABASE_URL` - replica database (can use the primary database in development) 46 | - `MAKARA` - feature flag for a smooth rollout 47 | 48 | Here are sample values: 49 | 50 | ```sh 51 | DATABASE_URL=postgres://nerd:secret@localhost:5432/db_development 52 | REPLICA_DATABASE_URL=postgres://nerd:secret@localhost:5432/db_development 53 | MAKARA=true 54 | ``` 55 | 56 | Next, update `config/database.yml`. 57 | 58 | ```yml 59 | development: &default 60 | <% if ENV["MAKARA"] %> 61 | url: postgresql-makara:/// 62 | makara: 63 | sticky: true 64 | connections: 65 | - role: master 66 | name: primary 67 | url: <%= ENV["DATABASE_URL"] %> 68 | - name: replica 69 | url: <%= ENV["REPLICA_DATABASE_URL"] %> 70 | <% else %> 71 | adapter: postgresql 72 | url: <%= ENV["DATABASE_URL"] %> 73 | <% end %> 74 | 75 | production: 76 | <<: *default 77 | ``` 78 | 79 | We don’t use the middleware, so we remove it by adding to `config/application.rb`: 80 | 81 | ```ruby 82 | config.middleware.delete Makara::Middleware 83 | ``` 84 | 85 | Also, we want to read from primary by default so have to patch Makara. Create an initializer `config/initializers/makara.rb` with: 86 | 87 | ```ruby 88 | Makara::Cache.store = :noop 89 | 90 | module DefaultToPrimary 91 | def _appropriate_pool(*args) 92 | return @master_pool unless Thread.current[:distribute_reads] 93 | super 94 | end 95 | end 96 | 97 | Makara::Proxy.send :prepend, DefaultToPrimary 98 | 99 | module DistributeReads 100 | def distribute_reads 101 | previous_value = Thread.current[:distribute_reads] 102 | begin 103 | Thread.current[:distribute_reads] = true 104 | Makara::Context.set_current(Makara::Context.generate) 105 | yield 106 | ensure 107 | Thread.current[:distribute_reads] = previous_value 108 | end 109 | end 110 | end 111 | 112 | Object.send :include, DistributeReads 113 | ``` 114 | 115 | To distribute reads, use: 116 | 117 | ```ruby 118 | total_users = distribute_reads { User.count } 119 | ``` 120 | 121 | You can also put multiple lines in a block. 122 | 123 | ```ruby 124 | distribute_reads do 125 | User.max(:visits_count) 126 | Order.sum(:revenue_cents) 127 | Visit.average(:duration) 128 | end 129 | ``` 130 | 131 | ## Test Drive 132 | 133 | In the Rails console, run: 134 | 135 | ```ruby 136 | User.first # primary 137 | distribute_reads { User.last } # replica 138 | ``` 139 | 140 | :heart: Happy scaling 141 | -------------------------------------------------------------------------------- /archive/encryption-keys.md: -------------------------------------------------------------------------------- 1 | # Strong Encryption Keys for Rails 2 | 3 |

Encryption Keys

4 | 5 | Encryption is a common way to protect sensitive data. Generating a secure key is an important part of the process. 6 | 7 | [attr_encrypted](https://github.com/attr-encrypted/attr_encrypted), the popular encryption library for Rails, uses AES-256-GCM by default, which takes a 256-bit key. So how can we generate a secure one? 8 | 9 | *If you’re in a hurry, feel free to skip to [the answer](#a-better-way).* 10 | 11 | ## Take 1 12 | 13 | One way to generate a key is: 14 | 15 | ```ruby 16 | SecureRandom.base64(32).first(32) 17 | ``` 18 | 19 | This generates a 32 character string that looks pretty secure. Each character has 64 possible values (letters, numbers, / and +). However, a single byte can represent 256 possible values. We’ve eliminated 75% of possible values per byte, which compounds across all 32 bytes. Here’s the math: 20 | 21 | Method | Possible Keys | Equivalent 22 | --- | --- | --- 23 | Random | 25632 | 2256 24 | Take 1 | 6432 | 2192 25 | 26 | This reduces the number of possible keys by 99.999999999999999994%. Luckily, computers have not (yet) been able to brute force 128-bit keys, which have 2128 possible values. 27 | 28 | ## Why 256? 29 | 30 | So why do we use 256-bit keys to begin with? Security researcher Graham Sutherland [puts it well](https://security.stackexchange.com/questions/14068/why-most-people-use-256-bit-encryption-instead-of-128-bit): 31 | 32 | “Essentially it’s about security margin. The longer the key, the higher the effective security. If there is ever a break in AES that reduces the effective number of operations required to crack it, a bigger key gives you a better chance of staying secure.” 33 | 34 | Also, quantum computers are expected to brute force in [square root time](https://blog.agilebits.com/2013/03/09/guess-why-were-moving-to-256-bit-aes-keys/). This means a 256-bit key could be brute forced in the same time as traditional computers can brute force a 128-bit key. 35 | 36 | ## A Better Way 37 | 38 | The right way to generate a random 32-byte key is: 39 | 40 | ```ruby 41 | SecureRandom.random_bytes(32) 42 | ``` 43 | 44 | However, we can’t store this directly in Rails credentials or as an environment variable. We need to encode it first. Hex is a popular encoding. Rails uses this for its master key in Rails 5.2. 45 | 46 | ```ruby 47 | SecureRandom.random_bytes(32).unpack("H*").first 48 | ``` 49 | 50 | Ruby provides a helper to do this: 51 | 52 | ```ruby 53 | SecureRandom.hex(32) 54 | ``` 55 | 56 | To decode the key, use: 57 | 58 | ```ruby 59 | [hex_key].pack("H*") 60 | ``` 61 | 62 | We now have a much stronger key. If you store the key as an environment variable, your model should look something like: 63 | 64 | ```ruby 65 | class User < ApplicationRecord 66 | attr_encrypted :email, key: [ENV["EMAIL_ENCRYPTION_KEY"]].pack("H*") 67 | end 68 | ``` 69 | 70 | ## Libraries 71 | 72 | Libraries should educate users on how to generate sufficiently random keys. The [rbnacl](https://github.com/crypto-rb/rbnacl) gem has a neat way of enforcing this - it checks if a string is binary before allowing it as a key. 73 | 74 | ```ruby 75 | if key.encoding != Encoding::BINARY 76 | raise ArgumentError, "Insecure key - key must use binary encoding" 77 | end 78 | ``` 79 | 80 | This prevents our initial (flawed) method from working. I’ve incorporated this approach into the [blind_index](https://github.com/ankane/blind_index) gem and [opened an issue](https://github.com/attr-encrypted/attr_encrypted/issues/311) with attr_encrypted to get the author’s thoughts. 81 | 82 | ## Conclusion 83 | 84 | While secure key generation provides better protection against brute force attacks, it won’t help at all if the key is compromised. Limit who has access to encryption keys as well. For more security, consider a [key management service](https://github.com/ankane/kms_encrypted) to manage your keys. 85 | 86 | Happy encrypting! 87 | -------------------------------------------------------------------------------- /archive/tensorflow-ruby.md: -------------------------------------------------------------------------------- 1 | # TensorFlow Object Detection in Ruby 2 | 3 | The [ONNX Runtime](https://github.com/ankane/onnxruntime) gem makes it easy to run Tensorflow models in Ruby. This short tutorial will show you how. It’s based on [this tutorial](https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/ConvertingSSDMobilenetToONNX.ipynb) from tf2onnx. 4 | 5 | We’ll use SSD Mobilenet, which can detect multiple objects in an image. 6 | 7 | First, download the [pretrained model](https://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz) from the official TensorFlow Models project and this awesome shot of polar bears. 8 | 9 |

Bears

10 | 11 |

12 | Photo from the U.S. Fish and Wildlife Service 13 |

14 | 15 | Install [tf2onnx](https://github.com/onnx/tensorflow-onnx) 16 | 17 | ```sh 18 | pip install tf2onnx 19 | ``` 20 | 21 | And convert the model to ONNX 22 | 23 | ```sh 24 | python -m tf2onnx.convert --opset 10 --fold_const \ 25 | --saved-model ssd_mobilenet_v1_coco_2018_01_28/saved_model \ 26 | --output model.onnx 27 | ``` 28 | 29 | Next, install the ONNX Runtime and MiniMagick gems 30 | 31 | ```ruby 32 | gem "onnxruntime" 33 | gem "mini_magick" 34 | ``` 35 | 36 | Load the image 37 | 38 | ```ruby 39 | img = MiniMagick::Image.open("bears.jpg") 40 | pixels = img.get_pixels 41 | ``` 42 | 43 | And the model 44 | 45 | ```ruby 46 | model = OnnxRuntime::Model.new("model.onnx") 47 | ``` 48 | 49 | Check the model inputs 50 | 51 | ```ruby 52 | p model.inputs 53 | ``` 54 | 55 | The shape is `[-1, -1, -1, 3]`. `-1` indicates any size. `pixels` has the shape `[img.width, img.height, 3]`. The model is designed to process multiple images at once, which is where the final dimension comes from. 56 | 57 | Let’s run the model: 58 | 59 | ```ruby 60 | result = model.predict("image_tensor:0" => [pixels]) 61 | ``` 62 | 63 | The model gives us a number of different outputs, like the number of detections, labels, scores, and boxes. Let’s print the results: 64 | 65 | ```ruby 66 | p result["num_detections:0"] 67 | # [3.0] 68 | p result["detection_classes:0"] 69 | # [[23.0, 23.0, 88.0, 1.0, ...]] 70 | ``` 71 | 72 | We can see there were three detections, and if we look at the first three elements in the detection classes array, they are the numbers 23, 23, and 88. These correspond to [COCO](http://cocodataset.org/) labels. We can [look these up](https://github.com/amikelive/coco-labels/blob/master/coco-labels-paper.txt) and see that 23 is bear and 88 is teddy bear. Mostly right! 73 | 74 | With a bit more code, we can apply boxes and labels to the image. 75 | 76 | ```ruby 77 | coco_labels = { 78 | 23 => "bear", 79 | 88 => "teddy bear" 80 | } 81 | 82 | def draw_box(img, label, box) 83 | width, height = img.dimensions 84 | 85 | # calculate box 86 | thickness = 2 87 | top = (box[0] * height).round - thickness 88 | left = (box[1] * width).round - thickness 89 | bottom = (box[2] * height).round + thickness 90 | right = (box[3] * width).round + thickness 91 | 92 | # draw box 93 | img.combine_options do |c| 94 | c.draw "rectangle #{left},#{top} #{right},#{bottom}" 95 | c.fill "none" 96 | c.stroke "red" 97 | c.strokewidth thickness 98 | end 99 | 100 | # draw text 101 | img.combine_options do |c| 102 | c.draw "text #{left},#{top - 5} \"#{label}\"" 103 | c.fill "red" 104 | c.pointsize 18 105 | end 106 | end 107 | 108 | result["num_detections:0"].each_with_index do |n, idx| 109 | n.to_i.times do |i| 110 | label = result["detection_classes:0"][idx][i].to_i 111 | label = coco_labels[label] || label 112 | box = result["detection_boxes:0"][idx][i] 113 | draw_box(img, label, box) 114 | end 115 | end 116 | 117 | # save image 118 | img.write("labeled.jpg") 119 | ``` 120 | 121 | And the result: 122 | 123 |

Bears Labeled

124 | 125 | Here’s the [complete code](https://gist.github.com/ankane/4a9681c8d9b9e814debe9e3ea836529d). Now go out and try it with your own images! 126 | -------------------------------------------------------------------------------- /archive/numo.md: -------------------------------------------------------------------------------- 1 | # Numo: NumPy for Ruby 2 | 3 |

Numo

4 | 5 |

6 | Photo by Jonas Svidras 7 |

8 | 9 | NumPy is an extremely popular library for machine learning in Python. It provides an efficient way to work with large, multi-dimensional arrays. What you may not know is Ruby has a library with similar functionality. It’s called Numo, and in this post, we’ll look at what you can do with it. 10 | 11 | ## Basic Operations 12 | 13 | Numo’s core data structure is the multi-dimensional array, which has methods for mathematical operations. These operations are written in C, so they’re much faster than performing the same operations in Ruby. 14 | 15 | Let’s start by creating a Numo array from a Ruby array. 16 | 17 | ```ruby 18 | x = Numo::DFloat.cast([[1, 2, 3], [4, 5, 6]]) 19 | ``` 20 | 21 | Each array has shape. We created a 2x3 2D array, but arrays can be 1D, 3D, or more. 22 | 23 | ```ruby 24 | x.shape # [2, 3] 25 | ``` 26 | 27 | Read a row or column with: 28 | 29 | ```ruby 30 | x[0, true] # 1st row - [1, 2, 3] 31 | x[true, 2] # 3rd column - [3, 6] 32 | ``` 33 | 34 | We can add a constant value: 35 | 36 | ```ruby 37 | x + 2 # [[3, 4, 5], [6, 7, 8]] 38 | ``` 39 | 40 | Or add arrays: 41 | 42 | ```ruby 43 | x + x # [[2, 4, 6], [8, 10, 12]] 44 | ``` 45 | 46 | Some operations like mean and sum can be run over a specific axis. 47 | 48 | ```ruby 49 | x.sum(0) # sum of each column - [5, 7, 9] 50 | x.mean(1) # mean of each row - [2, 5] 51 | ``` 52 | 53 | We can also change its shape - useful for preparing data for models. 54 | 55 | ```ruby 56 | x.reshape(3, 2) # [[1, 2], [3, 4], [5, 6]] 57 | ``` 58 | 59 | If you’re familiar with NumPy operations, there are [side-by-side examples](https://github.com/ruby-numo/numo-narray/wiki/100-narray-exercises) and a table showing how the [functions map](https://github.com/ruby-numo/numo-narray/wiki/Numo-vs-numpy). 60 | 61 | ## Building Models 62 | 63 | [Rumale](https://github.com/yoshoku/rumale) is a machine learning library similar to Python’s Scikit-learn. It uses Numo for inputs and outputs. Here’s a basic example of linear regression. 64 | 65 | ```ruby 66 | # generate data: y = 1 + 2(x0) + 3(x1) 67 | x = Numo::DFloat.asarray([[0, 1], [1, 0], [1, 2]]) 68 | y = 1 + 2 * x[true, 0] + 3 * x[true, 1] 69 | 70 | # train 71 | model = Rumale::LinearModel::LinearRegression.new( 72 | fit_bias: true, max_iter: 10000) 73 | model.fit(x, y) 74 | 75 | # predict 76 | model.predict(x) 77 | ``` 78 | 79 | Rumale has many, many models and other useful tools for: 80 | 81 | - Regression: linear, ridge, lasso, support vector machines 82 | - Classification: logistic regression, naive Bayes, K-nearest neighbors, support vector machines 83 | - Clustering: K-means, Gaussian mixture model 84 | - Dimensionality reduction: principal component analysis 85 | 86 | Scikit-learn has a great cheat-sheet to help you decide what do use: 87 | 88 |

Scikit-learn Cheat Sheet 89 |

90 | 91 |

92 | Image from Scikit-learn (BSD License) 93 |

94 | 95 | ## Storing Data 96 | 97 | Numo arrays can be marshaled just like other Ruby objects. This allows you to save your work and resume it at a later time. 98 | 99 | ```ruby 100 | # save 101 | File.binwrite("x.dump", Marshal.dump(x)) 102 | 103 | # load 104 | x = Marshal.load(File.binread("x.dump")) 105 | ``` 106 | 107 | [Npy](https://github.com/ankane/npy) allows you to save and load arrays in the same format as NumPy. This is more performant than marshaling. 108 | 109 | ```ruby 110 | # save 111 | Npy.save("x.npy", x) 112 | 113 | # load 114 | x = Npy.load("x.npy") 115 | ``` 116 | 117 | It also makes it easy to load datasets like [MNIST](https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz). 118 | 119 | ```ruby 120 | mnist = Npy.load_npz("mnist.npz") 121 | ``` 122 | 123 | ## Summary 124 | 125 | You now have a basic introduction to Numo and know how to: 126 | 127 | - perform basic operations 128 | - build a model 129 | - store data 130 | 131 | Consider [Numo](https://github.com/ruby-numo/numo-narray) for your next machine learning project. 132 | -------------------------------------------------------------------------------- /archive/securing-pgbouncer-amazon-rds.md: -------------------------------------------------------------------------------- 1 | # Securing Database Traffic with PgBouncer and Amazon RDS 2 | 3 | Securing database traffic inside your network can be a great step for defense in depth. It’s also a necessity for [Zero Trust Networks](https://www.amazon.com/Zero-Trust-Networks-Building-Untrusted/dp/1491962194). 4 | 5 | Both Amazon RDS and PgBouncer have built-in support for TLS, but it’s a little bit of work to get it set up. This tutorial will show you how. 6 | 7 | ## Direct Connections 8 | 9 | The first step is to make sure all direct connections are secure. Luckily, Amazon RDS has a parameter named `rds.force_ssl` for this. Once it’s applied, you’ll see an error if you try to connect without TLS. You can test this out with: 10 | 11 | ```sh 12 | psql "postgresql://user:secret@dbhost:5432/ssltest?sslmode=disable 13 | ``` 14 | 15 | You’ll see an error like `FATAL: no pg_hba.conf entry ... SSL off` if everything is configured correctly. 16 | 17 | There are a number of possible values for `sslmode`, which you can [read about here](https://www.postgresql.org/docs/current/static/libpq-ssl.html). The most secure (and one we want) is `verify-full`, as it provides protection against both eavesdropping and man-in-the-middle attacks. This mode requires you to provide a root certificate to verify against. AWS makes this certificate available on [their website](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html#PostgreSQL.Concepts.General.SSL). 18 | 19 | ```sh 20 | wget https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem 21 | ``` 22 | 23 | To use it with `psql`, run: 24 | 25 | ```sh 26 | psql "postgresql://user:secret@dbhost:5432/ssltest?sslmode=verify-full&sslrootcert=rds-combined-ca-bundle.pem" 27 | ``` 28 | 29 | Once connected, you should see an `SSL connection` line before the first prompt. 30 | 31 | There’s also an extension you can use (useful for non-`psql` connections). 32 | 33 | ```sql 34 | CREATE EXTENSION IF NOT EXISTS sslinfo; 35 | SELECT ssl_is_used(); 36 | ``` 37 | 38 | Now direct connections are good, so let’s secure connections from PgBouncer to the database. 39 | 40 | ## PgBouncer to the Database 41 | 42 | Follow [this guide](pgbouncer-setup) to set up PgBouncer. Once that’s completed, there are two settings to add to `/etc/pgbouncer/pgbouncer.ini`: 43 | 44 | ```ini 45 | server_tls_sslmode = verify-full 46 | server_tls_ca_file = /path/to/rds-combined-ca-bundle.pem 47 | ``` 48 | 49 | Restart the service 50 | 51 | ```sh 52 | sudo service pgbouncer restart 53 | ``` 54 | 55 | And test it 56 | 57 | ```sh 58 | psql "postgresql://user:secret@bouncerhost:6432/ssltest" 59 | ``` 60 | 61 | The connection should succeed and the server should report SSL is used. 62 | 63 | ```sql 64 | SELECT ssl_is_used(); 65 | ``` 66 | 67 | We’ve now successfully encrypted traffic between the bouncer and the database! 68 | 69 | However, you’ll notice the `psql` prompt does not have an `SSL connection` line as it did before. You can also use `sslmode=disable` to successfully connect, and programs like `tcpdump` or [tshark](https://www.wireshark.org/docs/man-pages/tshark.html) will show unencrypted traffic between the client and the bouncer. You can test this out with: 70 | 71 | ```sh 72 | sudo tcpdump -i lo -X -s 0 'port 6432' 73 | ``` 74 | 75 | Run commands in `psql` and you’ll see plaintext statements printed. 76 | 77 | ## Clients to PgBouncer 78 | 79 | This last flow is the trickiest. PgBouncer 1.7+ supports TLS, but we need to create keys and certificates for it. For this, we’ll create a private [PKI](https://en.wikipedia.org/wiki/Public_key_infrastructure). [Minica](https://github.com/jsha/minica) and [Vault](https://www.vaultproject.io/) are two ways to do this. 80 | 81 | We’ll use Minica (here are [instructions for Vault](vault-pki)). Install the latest version: 82 | 83 | ```sh 84 | sudo apt-get install minica 85 | ``` 86 | 87 | And run: 88 | 89 | ```sh 90 | minica --domains bouncerhost 91 | ``` 92 | 93 | We now have the files we need to connect. Add the key and certificate to `/etc/pgbouncer/pgbouncer.ini`: 94 | 95 | ```ini 96 | client_tls_sslmode = require # not verify-full 97 | client_tls_key_file = /path/to/bouncerhost/key.pem 98 | client_tls_cert_file = /path/to/bouncerhost/cert.pem 99 | ``` 100 | 101 | And restart the service. To connect, we once again use `verify-full` but this time with the root certificate we generated above: 102 | 103 | ```sh 104 | psql "postgresql://user:secret@bouncerhost:6432/ssltest?sslmode=verify-full&sslrootcert=minica.pem" 105 | ``` 106 | 107 | Confirm the `SSL connection` line is printed and `sslmode=disable` no longer works. 108 | 109 | We’ve now successfully encrypted traffic end-to-end! 110 | -------------------------------------------------------------------------------- /archive/host-your-own-postgres.md: -------------------------------------------------------------------------------- 1 | # Host Your Own Postgres 2 | 3 | :elephant: Get running with the last version of Postgres in minutes 4 | 5 | ## Set Up Server 6 | 7 | Spin up a new server with Ubuntu 16.04. 8 | 9 | Firewall 10 | 11 | ```sh 12 | sudo ufw allow ssh 13 | sudo ufw enable 14 | ``` 15 | 16 | [Automatic updates](https://help.ubuntu.com/16.04/serverguide/automatic-updates.html) 17 | 18 | ```sh 19 | sudo apt-get -y install unattended-upgrades 20 | echo 'APT::Periodic::Unattended-Upgrade "1";' >> /etc/apt/apt.conf.d/10periodic 21 | ``` 22 | 23 | Time zone 24 | 25 | ```sh 26 | sudo dpkg-reconfigure tzdata 27 | ``` 28 | 29 | and select `None of the above`, then `UTC`. 30 | 31 | ## Install Postgres 32 | 33 | Install PostgreSQL 10 34 | 35 | ```sh 36 | echo "deb https://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main" > /etc/apt/sources.list.d/pgdg.list 37 | wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - 38 | sudo apt-get update 39 | sudo apt-get install -qq -y postgresql-10 postgresql-contrib 40 | ``` 41 | 42 | ## Configure 43 | 44 | Edit `/etc/postgresql/10/main/postgresql.conf`. 45 | 46 | ```sh 47 | # general 48 | max_connections = 100 49 | 50 | # logging 51 | log_min_duration_statement = 100 # log queries over 100ms 52 | log_temp_files = 0 # log all temp files 53 | 54 | # stats 55 | shared_preload_libraries = 'pg_stat_statements' 56 | pg_stat_statements.max = 1000 57 | ``` 58 | 59 | ## Remote Connections 60 | 61 | Enable remote connections if needed 62 | 63 | ```sh 64 | echo "host all all 0.0.0.0/0 md5" >> /etc/postgresql/9.6/main/pg_hba.conf 65 | echo "listen_addresses = '*'" >> /etc/postgresql/9.6/main/postgresql.conf 66 | sudo service postgresql restart 67 | ``` 68 | 69 | And update the firewall 70 | 71 | ```sh 72 | sudo ufw allow 5432/tcp # for all ips 73 | sudo ufw allow from 127.0.0.1 to any port 5432 proto tcp # specific ip 74 | sudo ufw enable 75 | ``` 76 | 77 | ## Provisioning 78 | 79 | Create a new user and database for each of your apps 80 | 81 | ```sh 82 | sudo su - postgres 83 | psql 84 | ``` 85 | 86 | And run: 87 | 88 | ```sql 89 | CREATE USER myapp WITH PASSWORD 'mypassword'; 90 | ALTER USER myapp WITH CONNECTION LIMIT 20; 91 | CREATE DATABASE myapp_production OWNER myapp; 92 | ``` 93 | 94 | Generate a random password with: 95 | 96 | ```sh 97 | cat /dev/urandom | LC_CTYPE=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1 98 | ``` 99 | 100 | ## Backups 101 | 102 | ### Daily 103 | 104 | Store backups on S3 105 | 106 | - [Amazon S3 Backup Scripts](https://github.com/collegeplus/s3-shell-backups/blob/master/s3-postgresql-backup.sh) 107 | - [Automatic Backups to Amazon S3 are Easy ](https://rossta.net/blog/automatic-backups-to-amazon-s3-are-easy.html) 108 | 109 | *TODO: better instructions* 110 | 111 | ### Continuous 112 | 113 | Rollback to a specific point in time with [WAL-E](https://github.com/wal-e/wal-e). 114 | 115 | Opbeat has a [great tutorial](https://opbeat.com/blog/posts/postgresql-backup-to-s3-part-one/). 116 | 117 | ## Logging 118 | 119 | [Papertrail](https://papertrailapp.com) is great and has a free plan. 120 | 121 | Install remote syslog 122 | 123 | ```sh 124 | cd /tmp 125 | wget https://github.com/papertrail/remote_syslog2/releases/download/v0.13/remote_syslog_linux_amd64.tar.gz 126 | tar xzf ./remote_syslog*.tar.gz 127 | cd remote_syslog 128 | sudo cp ./remote_syslog /usr/local/bin 129 | ``` 130 | 131 | Create `/etc/log_files.yml` with: 132 | 133 | ```sh 134 | files: 135 | - /var/log/postgresql/*.log 136 | destination: 137 | host: logs.papertrailapp.com 138 | port: 12345 139 | protocol: tls 140 | ``` 141 | 142 | ### Archive 143 | 144 | Archive logs to S3 145 | 146 | ```sh 147 | sudo apt-get install logrotate s3cmd 148 | s3cmd --configure 149 | ``` 150 | 151 | Add to `/etc/logrotate.d/postgresql-common`: 152 | 153 | ```conf 154 | sharedscripts 155 | postrotate 156 | s3cmd sync /var/log/postgresql/*.gz s3://mybucket/logs/ 157 | endscript 158 | ``` 159 | 160 | Test with: 161 | 162 | ```sh 163 | logrotate -fv /etc/logrotate.d/postgresql-common 164 | ``` 165 | 166 | ## TODO 167 | 168 | - scripts 169 | 170 | ```sh 171 | pghost bootstrap 172 | pghost allow all 173 | pghost allow 127.0.0.1 174 | pghost backup:all 175 | pghost backup myapp 176 | pghost restore myapp 177 | pghost provision myapp 178 | pghost logs:syslog logs.papertrailapp.com 12345 179 | pghost logs:archive mybucket/logs 180 | ``` 181 | 182 | - monitoring (Graphite, CloudWatch, etc) 183 | 184 | ## Resources 185 | 186 | - [Copy your server logs to Amazon S3 using Logrotate and s3cmd](https://www.shanestillwell.com/2013/04/04/copy-your-server-logs-to-amazon-s3-using-logrotate-and-s3cmd/) 187 | -------------------------------------------------------------------------------- /archive/rails-on-heroku.md: -------------------------------------------------------------------------------- 1 | # Rails on Heroku 2 | 3 | [The official guide](https://devcenter.heroku.com/articles/getting-started-with-rails4) is a great place to start, but there’s more you can do to make life easier. 4 | 5 | :tangerine: Based on lessons learned in the early days of [Instacart](https://www.instacart.com/) 6 | 7 | ## Deploys 8 | 9 | For zero downtime deploys, enable [preboot](https://devcenter.heroku.com/articles/preboot). This will cause deploys to take a few minutes longer to go live, but it’s better than impacting your users. 10 | 11 | ```sh 12 | heroku features:enable -a appname preboot 13 | ``` 14 | 15 | Add a preload check make sure your app boots. Create `lib/tasks/preload.rake` with: 16 | 17 | ```ruby 18 | task preload: :environment do 19 | Rails.application.eager_load! 20 | ::Rails::Engine.subclasses.map(&:instance).each { |engine| engine.eager_load! } 21 | ActiveRecord::Base.descendants 22 | end 23 | ``` 24 | 25 | And add a [release phase](https://devcenter.heroku.com/articles/release-phase) task to your `Procfile` to run the preload script and (optionally) migrations. 26 | 27 | ```sh 28 | release: bundle exec rails preload db:migrate 29 | ``` 30 | 31 | Create a deployment script in `bin/deploy`. Here’s an example: 32 | 33 | ```sh 34 | #!/usr/bin/env bash 35 | 36 | function notify() { 37 | # add your chat service 38 | echo $1 39 | } 40 | 41 | notify "Deploying" 42 | 43 | git checkout master -q && git pull origin master -q && \ 44 | git push origin master -q && git push heroku master 45 | 46 | if [ $? -eq 0 ]; then 47 | notify "Deploy complete" 48 | else 49 | notify "Deploy failed" 50 | fi 51 | ``` 52 | 53 | Be sure to `chmod +x bin/deploy`. Replace the `echo` command with a call to your chat service ([Hipchat instructions](https://github.com/hipchat/hipchat-cli)). 54 | 55 | Deploy with: 56 | 57 | ```sh 58 | bin/deploy 59 | ``` 60 | 61 | ## Migrations 62 | 63 | Follow best practices for [zero downtime migrations](https://github.com/ankane/strong_migrations). 64 | 65 | If you start to see errors about prepared statements after running migrations, disable them. 66 | 67 | ```yml 68 | production: 69 | prepared_statements: false 70 | ``` 71 | 72 | Don’t worry! Your app will still be fast (and you’ll probably do this anyways at scale since PgBouncer requires it). 73 | 74 | ## Rollbacks 75 | 76 | Create a rollback script in `bin/rollback`. 77 | 78 | ```sh 79 | #!/usr/bin/env bash 80 | 81 | function notify() { 82 | # add your chat service 83 | echo $1 84 | } 85 | 86 | notify "Rolling back" 87 | 88 | heroku rollback 89 | 90 | if [ $? -eq 0 ]; then 91 | notify "Rollback complete" 92 | else 93 | notify "Rollback failed" 94 | fi 95 | ``` 96 | 97 | Don’t forget to `chmod +x bin/rollback`. Rollback with: 98 | 99 | ```sh 100 | bin/rollback 101 | ``` 102 | 103 | ## Logs 104 | 105 | Add [Papertrail](https://papertrailapp.com/) to make your logs easily searchable. 106 | 107 | ```sh 108 | heroku addons:create papertrail 109 | ``` 110 | 111 | Set it up to [archive logs to S3](https://help.papertrailapp.com/kb/how-it-works/permanent-log-archives/). 112 | 113 | ## Performance 114 | 115 | Add a performance monitoring service like New Relic. 116 | 117 | ```sh 118 | heroku addons:create newrelic 119 | ``` 120 | 121 | And follow the [installation instructions](https://devcenter.heroku.com/articles/newrelic). 122 | 123 | Use a [CDN](https://en.wikipedia.org/wiki/Content_delivery_network) like [Amazon CloudFront](https://devcenter.heroku.com/articles/using-amazon-cloudfront-cdn) to serve assets. 124 | 125 | ## Autoscaling 126 | 127 | Check out [HireFire](https://www.hirefire.io/). 128 | 129 | ## Productivity 130 | 131 | Use [Archer](https://github.com/ankane/archer) to enable console history. 132 | 133 | Use [aliases](https://www.digitalocean.com/community/tutorials/an-introduction-to-useful-bash-aliases-and-functions) for less typing. 134 | 135 | ```sh 136 | alias hc="heroku run rails console" 137 | ``` 138 | 139 | ## Staging 140 | 141 | Create a separate app for staging. 142 | 143 | ```sh 144 | heroku create staging-appname -r staging 145 | heroku config:set RAILS_ENV=staging RACK_ENV=staging -r staging 146 | ``` 147 | 148 | Deploy with: 149 | 150 | ```sh 151 | git push staging branch:master 152 | ``` 153 | 154 | You may also want to password protect your staging environment. 155 | 156 | ```ruby 157 | class ApplicationController < ActionController::Base 158 | http_basic_authenticate_with name: "happy", password: "carrots" if Rails.env.staging? 159 | end 160 | ``` 161 | 162 | ## Lastly... 163 | 164 | Have suggestions? [Please share](https://github.com/ankane/shorts/issues/new). For more tips, check out [Production Rails](https://github.com/ankane/production_rails). 165 | 166 | :hatched_chick: Happy coding! 167 | -------------------------------------------------------------------------------- /archive/daru.md: -------------------------------------------------------------------------------- 1 | # Daru: Pandas for Ruby 2 | 3 |

Panda

4 | 5 |

6 | Photo by Bruce Hong 7 |

8 | 9 | NumPy and Pandas are two extremely popular libraries for machine learning in Python. Last post, we looked at [Numo](https://ankane.org/numo), a Ruby library similar to NumPy. As luck would have it, there’s a library similar to Pandas as well. It’s called Daru, and it’s the focus of this post. 10 | 11 | ## Overview 12 | 13 | Daru is a data analysis library. Its core data structure is the data frame, which is similar to an in-memory database table. Data frames have rows and columns, and each column has a specific data type. Let’s create a data frame with the most populous countries: 14 | 15 | ```ruby 16 | df = Daru::DataFrame.new( 17 | country: ["China", "India", "USA"], 18 | population: [1433, 1366, 329] # in millions 19 | ) 20 | ``` 21 | 22 |

23 | Population data from the United Nations, 2019 24 |

25 | 26 | Here’s what it looks like: 27 | 28 | ```text 29 | country population 30 | 0 China 1433 31 | 1 India 1366 32 | 2 USA 329 33 | ``` 34 | 35 | You can get specific columns with: 36 | 37 | ```ruby 38 | df[:country] 39 | df[:country, :population] 40 | ``` 41 | 42 | Or specific rows with: 43 | 44 | ```ruby 45 | df.first(2) # first 2 rows 46 | df.last(2) # last 2 rows 47 | df.row[1] # 2nd row 48 | df.row[1..2] # 2nd and 3rd row 49 | ``` 50 | 51 | ## Filtering, Sorting, and Grouping 52 | 53 | Select countries with over 1 billion people. 54 | 55 | ```ruby 56 | df.where(df[:population] > 1000) 57 | ``` 58 | 59 | For equality, use `eq` or `in`. 60 | 61 | ```ruby 62 | df.where(df[:country].eq("China")) 63 | df.where(df[:country].in(["USA", "India"])) 64 | ``` 65 | 66 | Negate a condition with `!`. 67 | 68 | ```ruby 69 | df.where(!df[:country].eq("India")) 70 | ``` 71 | 72 | Combine operators with `&` (and) and `|` (or). 73 | 74 | ```ruby 75 | df.where(df[:country].eq("USA") | (df[:population] < 1400)) 76 | ``` 77 | 78 | Sort the data frame by a column with: 79 | 80 | ```ruby 81 | df.sort([:population]) 82 | df.sort([:country], ascending: [false]) 83 | ``` 84 | 85 | You can also group data and perform aggregations. 86 | 87 | ```ruby 88 | cities = Daru::DataFrame.new( 89 | country: ["China", "China", "India"], 90 | city: ["Shanghai", "Beijing", "Mumbai"] 91 | ) 92 | cities.group_by([:country]).count 93 | ``` 94 | 95 | ## Combining Data Frames 96 | 97 | There are a number of ways to combine data frames. You can add rows: 98 | 99 | ```ruby 100 | countries = Daru::DataFrame.new( 101 | country: ["Indonesia", "Pakistan"], 102 | population: [271, 217] # in millions 103 | ) 104 | df.concat(countries) 105 | ``` 106 | 107 | Or add columns: 108 | 109 | ```ruby 110 | locations = Daru::DataFrame.new( 111 | continent: ["Asia", "Asia", "North America"], 112 | planet: ["Earth", "Earth", "Earth"] 113 | ) 114 | df.merge(locations) 115 | ``` 116 | 117 | You can also perform joins like in SQL. 118 | 119 | ```ruby 120 | cities = Daru::DataFrame.new( 121 | country: ["China", "China", "India"], 122 | city: ["Shanghai", "Beijing", "Mumbai"] 123 | ) 124 | df.join(cities, how: :inner, on: [:country]) 125 | ``` 126 | 127 | ## Reading and Writing Data 128 | 129 | Daru makes it easy to load data from a CSV file. 130 | 131 | ```ruby 132 | Daru::DataFrame.from_csv("countries.csv") 133 | ``` 134 | 135 | After manipulating the data, you can save it back to a CSV file. 136 | 137 | ```ruby 138 | df.write_csv("countries_v2.csv") 139 | ``` 140 | 141 | You can also load data directly from Active Record. 142 | 143 | ```ruby 144 | relation = Country.where("population > 100") 145 | Daru::DataFrame.from_activerecord(relation) 146 | ``` 147 | 148 | ## Plotting 149 | 150 | For plotting, use a Jupyter notebook with [IRuby](https://github.com/sciruby/iruby). Create a plot with: 151 | 152 | ```ruby 153 | df.plot type: :bar, x: :country, y: :population do |plot, diagram| 154 | plot.x_label "Country" 155 | plot.y_label "Population (millions)" 156 | diagram.color(Nyaplot::Colors.Pastel1) 157 | end 158 | ``` 159 | 160 |

Daru Plot

161 | 162 | You can also create line charts, scatter plots, box plots, and histograms. 163 | 164 | ## Summary 165 | 166 | You’ve now seen how to use Daru to: 167 | 168 | - create data frames 169 | - filter, sort, and group data 170 | - combine data frames 171 | - create plots 172 | 173 | Try out [Daru](https://github.com/SciRuby/daru) for your next analysis. 174 | -------------------------------------------------------------------------------- /archive/introducing-dexter.md: -------------------------------------------------------------------------------- 1 | # Introducing Dexter, the Automatic Indexer for Postgres 2 | 3 |

Dexter

4 | 5 | Your database knows which queries are running. It also has a pretty good idea of which indexes are best for a given query. And since indexes don’t change the results of a query, they’re really just a performance optimization. So why do we always need a human to choose them? 6 | 7 | Introducing [Dexter](https://github.com/ankane/dexter). Dexter indexes your database for you. You can still do it yourself, but Dexter will do a pretty good job. 8 | 9 | Dexter works in two phases: 10 | 11 | 1. Collect queries 12 | 2. Generate indexes 13 | 14 | We’ll walk through each of them. 15 | 16 | ### Phase 1: Collect 17 | 18 | You can stream Postgres log files directly to Dexter. Dexter finds lines like: 19 | 20 | ```txt 21 | LOG: duration: 14.077 ms statement: SELECT * FROM ratings WHERE user_id = 3; 22 | ``` 23 | 24 | And parses out the query and duration. It uses fingerprinting to group queries. Queries with the same parse tree but different values are grouped together. For instance, both of the following queries have the same fingerprint. 25 | 26 | ```sql 27 | SELECT * FROM ratings WHERE user_id = 2; 28 | SELECT * FROM ratings WHERE user_id = 3; 29 | ``` 30 | 31 | The data is aggregated to get the total execution time by fingerprint. You can get similar information from the [pg_stat_statements view](https://www.postgresql.org/docs/current/static/pgstatstatements.html), except queries in the view are normalized. This means, you get: 32 | 33 | ```sql 34 | SELECT * FROM ratings WHERE user_id = ?; 35 | ``` 36 | 37 | instead of 38 | 39 | ```sql 40 | SELECT * FROM ratings WHERE user_id = 3; 41 | ``` 42 | 43 | However, we need the actual values to determine costs in the next step. To prevent over-indexing, you can set a threshold for the total execution time before a query is considered for indexing. 44 | 45 | ### Phase 2. Generate 46 | 47 | To generate indexes, Dexter creates hypothetical indexes to try to speed up the slow queries we’ve just collected. Hypothetical indexes show how a query’s execution plan would change if an actual index existed. They take virtually no time to create, don’t require any disk space, and are only visible to the current session. You can read more about [hypothetical indexes here](https://rjuju.github.io/postgresql/2015/07/02/how-about-hypothetical-indexes.html). 48 | 49 | The main steps Dexter takes are: 50 | 51 | 1. Filter out queries on system tables and other databases 52 | 2. Analyze tables for up-to-date planner statistics if they haven’t been analyzed recently 53 | 3. Get the initial cost of queries 54 | 4. Create hypothetical indexes on columns that aren’t already indexes 55 | 5. Get costs again and see if any hypothetical indexes were used 56 | 57 | While fairly straightforward, this approach is extremely powerful, as it uses the Postgres query planner to figure out the best index(es) for a query. Hypothetical indexes that were used AND significantly reduced cost are selected to be indexes. 58 | 59 | To be safe, indexes are only logged by default. This allows you to use Dexter for index suggestions if you want to manually verify them first. When you let Dexter create indexes, they’re created concurrently to limit the impact on database performance. 60 | 61 | ```txt 62 | 2017-06-25T17:52:22+00:00 Index found: ratings (user_id) 63 | 2017-06-25T17:52:22+00:00 Creating index: CREATE INDEX CONCURRENTLY ON ratings (user_id) 64 | 2017-06-25T17:52:37+00:00 Index created: 15243 ms 65 | ``` 66 | 67 | ### Trade-offs and Limitations 68 | 69 | The big advantage of indexes is faster data retrieval. On the flip side, indexes add overhead to write operations, like INSERT, UPDATE, and DELETE, as indexes must be updated as well. Indexes also take up disk space. 70 | 71 | Because of this, you may not want to index write-heavy tables. Dexter does not currently try to identify these tables automatically, but you can pass them in by hand. 72 | 73 | As for other limitations, Dexter does not try to create multicolumn indexes (edit: this is no longer the case). Dexter also assumes the search_path for queries is the same as the user running Dexter. You’ll still need to create unique constraints on your own. Dexter also requires the [HypoPG](https://github.com/HypoPG/hypopg) extension, which isn’t available on some hosted providers like Heroku and Amazon RDS. 74 | 75 | * * * 76 | 77 | It’s time to make forgotten indexes a problem of the past. 78 | 79 | [Add Dexter to your team](https://github.com/ankane/dexter) today. 80 | 81 | ### Thanks 82 | 83 | This software wouldn’t be possible without [HypoPG](https://github.com/HypoPG/hypopg), which allows you to create hypothetical indexes, and [pg_query](https://github.com/lfittl/pg_query), which allows you to parse and fingerprint queries. A big thanks to Dalibo and [Lukas Fittl](https://medium.com/@LukasFittl) respectively. 84 | -------------------------------------------------------------------------------- /archive/postgres-users.md: -------------------------------------------------------------------------------- 1 | # Bootstrapping Postgres Users 2 | 3 | Setting up database users for an app can be challenging if you don’t do it often. Good permissions add a layer of security and can minimize the chances of developer mistakes. 4 | 5 | The three types of users we’ll cover are: 6 | 7 | Type | Description | Read | Write | Modify 8 | --- | --- | --- | --- | --- 9 | migrations | Schema changes | ✓ | ✓ | ✓ 10 | app | Reading and writing data | ✓ | ✓ | 11 | analytics | Data analysis and reporting | ✓ | | 12 | 13 | Before we jump into it, there’s something you should know about new databases. 14 | 15 | ## New Databases 16 | 17 | After creating a new database, all users can access it and create tables in the `public` schema. This isn’t what we want. To fix this, run: 18 | 19 | ```sql 20 | REVOKE ALL ON DATABASE mydb FROM PUBLIC; 21 | 22 | REVOKE ALL ON SCHEMA public FROM PUBLIC; 23 | ``` 24 | 25 | Be sure to replace `mydb` with your database name. 26 | 27 | ## Roles 28 | 29 | PostgreSQL uses the concept of roles to manage privileges. Roles can be used to define groups and users. A user is simply a role with a password and permission to log in. 30 | 31 | The approach we’ll take is to create a group and add users to it. This makes it easy to rotate credentials in the future: just add a second user to the group, set your app’s configuration to the new user, and remove the original one. 32 | 33 | ## Migrations 34 | 35 | First, we need a group to manage the schema. We could use a superuser, but this isn’t a great idea, as superusers can access all databases, change permissions, and create new roles. Instead, let’s create a new group. 36 | 37 | ```sql 38 | CREATE ROLE migrations; 39 | 40 | GRANT CONNECT ON DATABASE mydb TO migrations; 41 | 42 | GRANT ALL ON SCHEMA public TO migrations; 43 | 44 | ALTER ROLE migrations SET lock_timeout TO '5s'; 45 | ``` 46 | 47 | We set a lock timeout so migrations don’t disrupt normal database activity while attempting to acquire a lock. 48 | 49 | Now, we can create a user who’s a member of the group. 50 | 51 | ```sql 52 | CREATE ROLE migrator WITH LOGIN ENCRYPTED PASSWORD 'secret' IN ROLE migrations; 53 | 54 | ALTER ROLE migrator SET role TO 'migrations'; 55 | ``` 56 | 57 | The last statement ensures tables created by the user are owned by the group. 58 | 59 | You can generate a nice password from the command line with: 60 | 61 | ```sh 62 | cat /dev/urandom | LC_CTYPE=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1 63 | ``` 64 | 65 | ## App 66 | 67 | Next, let’s create a group for our app. It’ll need to read and write data but shouldn’t need to modify the schema or truncate tables. We also want to set a statement timeout to prevent long running queries from degrading database performance. 68 | 69 | ```sql 70 | CREATE ROLE app; 71 | 72 | GRANT CONNECT ON DATABASE mydb TO app; 73 | 74 | GRANT USAGE ON SCHEMA public TO app; 75 | 76 | GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app; 77 | 78 | GRANT SELECT, USAGE ON ALL SEQUENCES IN SCHEMA public TO app; 79 | 80 | ALTER DEFAULT PRIVILEGES FOR ROLE migrations IN SCHEMA public 81 | GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO app; 82 | 83 | ALTER DEFAULT PRIVILEGES FOR ROLE migrations IN SCHEMA public 84 | GRANT SELECT, USAGE ON SEQUENCES TO app; 85 | 86 | ALTER ROLE app SET statement_timeout TO '30s'; 87 | ``` 88 | 89 | > **Note:** The default privileges statements reference the group used for migrations. If you use Amazon RDS, you must run these statements as the migrator user we created above (since you don’t have access to a true superuser). 90 | 91 | Then, create a user with: 92 | 93 | ```sql 94 | CREATE ROLE myapp WITH LOGIN ENCRYPTED PASSWORD 'secret' IN ROLE app; 95 | ``` 96 | 97 | ## Analytics 98 | 99 | Finally, let’s create a group to be used for data analysis, reporting, and business intelligence tools (like [Blazer](https://github.com/ankane/blazer), our open-source one). These users are often referred to as a *read-only users*. We don’t want them to be able to mistakenly update data. 100 | 101 | ```sql 102 | CREATE ROLE analytics; 103 | 104 | GRANT CONNECT ON DATABASE mydb TO analytics; 105 | 106 | GRANT USAGE ON SCHEMA public TO analytics; 107 | 108 | GRANT SELECT ON ALL TABLES IN SCHEMA public TO analytics; 109 | 110 | ALTER DEFAULT PRIVILEGES FOR ROLE migrations IN SCHEMA public 111 | GRANT SELECT ON TABLES TO analytics; 112 | 113 | ALTER ROLE analytics SET statement_timeout TO '3min'; 114 | ``` 115 | 116 | Once again, creating a user is relatively straightforward. 117 | 118 | ```sql 119 | CREATE ROLE bi WITH LOGIN ENCRYPTED PASSWORD 'secret' IN ROLE analytics; 120 | ``` 121 | 122 | ## Summary 123 | 124 | You now know how to create different types of Postgres users. Spending a bit of time upfront to configure your users can make them easier to manage in the long run. This should give you a nice foundation. 125 | -------------------------------------------------------------------------------- /archive/dokku-digital-ocean.md: -------------------------------------------------------------------------------- 1 | # Dokku on DigitalOcean 2 | 3 | :droplet: Your very own PaaS 4 | 5 | ## Create Droplet 6 | 7 | Create new droplet with Ubuntu 16.04. Be sure to use an SSH key. 8 | 9 | ## Install Dokku 10 | 11 | ```sh 12 | wget https://raw.githubusercontent.com/dokku/dokku/v0.12.5/bootstrap.sh 13 | sudo DOKKU_TAG=v0.12.5 bash bootstrap.sh 14 | ``` 15 | 16 | And visit your server’s IP address in your browser to complete installation. 17 | 18 | If you have a domain, use virtualhost naming. Otherwise, Dokku will use different ports for each deploy of your app. You can add easily add a domain later. 19 | 20 | ## Add a Firewall 21 | 22 | Create a [firewall](https://cloud.digitalocean.com/networking/firewalls) 23 | 24 | Inbound Rules 25 | 26 | - SSH from your [external IP](https://www.google.com/search?q=external+ip) 27 | - HTTP and HTTPS from all IPv4 and all IPv6 28 | 29 | Outbound Rules 30 | 31 | - ICMP, all TCP, and all UDP from all IPv4 and all IPv6 32 | 33 | ## Set Up Server 34 | 35 | Turn on [automatic updates](https://help.ubuntu.com/16.04/serverguide/automatic-updates.html) 36 | 37 | ```sh 38 | sudo apt-get -y install unattended-upgrades 39 | echo 'APT::Periodic::Unattended-Upgrade "1";' >> /etc/apt/apt.conf.d/10periodic 40 | ``` 41 | 42 | Enable swap 43 | 44 | ```sh 45 | sudo fallocate -l 4G /swapfile 46 | sudo chmod 600 /swapfile 47 | sudo mkswap /swapfile 48 | sudo swapon /swapfile 49 | sudo sh -c 'echo "/swapfile none swap sw 0 0" >> /etc/fstab' 50 | ``` 51 | 52 | Configure time zone 53 | 54 | ```sh 55 | sudo dpkg-reconfigure tzdata 56 | ``` 57 | 58 | and select `None of the above`, then `UTC`. 59 | 60 | ## Deploy 61 | 62 | Get the official Dokku client locally 63 | 64 | ```sh 65 | git clone git@github.com:progrium/dokku.git ~/.dokku 66 | 67 | # add the following to either your 68 | # .bashrc, .bash_profile, or .profile file 69 | alias dokku='$HOME/.dokku/contrib/dokku_client.sh' 70 | ``` 71 | 72 | Create app 73 | 74 | ```sh 75 | dokku apps:create myapp 76 | ``` 77 | 78 | Add a `CHECKS` file 79 | 80 | ```txt 81 | WAIT=2 82 | ATTEMPTS=15 83 | / 84 | ``` 85 | 86 | Deploy 87 | 88 | ```sh 89 | git remote add dokku dokku@dokkuhost:myapp 90 | git push dokku master 91 | ``` 92 | 93 | ## Workers 94 | 95 | Dokku only runs web processes by default. If you have workers or other process types, use: 96 | 97 | ```sh 98 | dokku ps:scale worker=1 99 | ``` 100 | 101 | ## One-Off Jobs 102 | 103 | ```sh 104 | dokku run rails db:migrate 105 | dokku run rails console 106 | ``` 107 | 108 | ## Scheduled Jobs 109 | 110 | Two options 111 | 112 | 1. Add a [custom clock process](https://devcenter.heroku.com/articles/scheduled-jobs-custom-clock-processes) to your Procfile 113 | 114 | 2. Or create `/etc/cron.d/myapp` with: 115 | 116 | ``` 117 | PATH=/usr/local/bin:/usr/bin:/bin 118 | SHELL=/bin/bash 119 | * * * * * dokku dokku --rm run myapp rake task1 120 | 0 0 * * * dokku dokku --rm run myapp rake task2 121 | ``` 122 | 123 | ## Custom Domains 124 | 125 | ```sh 126 | dokku domains:add www.datakick.org 127 | ``` 128 | 129 | ## SSL 130 | 131 | Get free SSL certificates thanks to [Let’s Encrypt](https://letsencrypt.org/). On the server, run: 132 | 133 | ```sh 134 | dokku plugin:install https://github.com/dokku/dokku-letsencrypt.git 135 | dokku letsencrypt:cron-job --add 136 | ``` 137 | 138 | And locally, run: 139 | 140 | ```sh 141 | dokku config:set --no-restart DOKKU_LETSENCRYPT_EMAIL=your@email.tld 142 | dokku letsencrypt 143 | ``` 144 | 145 | ## Logging 146 | 147 | Use syslog to ship your logs to a service. [Papertrail](https://papertrailapp.com) is great and has a free plan. 148 | 149 | For apps, use: 150 | 151 | ```sh 152 | dokku plugin:install https://github.com/michaelshobbs/dokku-logspout.git 153 | dokku plugin:install https://github.com/michaelshobbs/dokku-hostname.git 154 | dokku logspout:server syslog+tls://logs.papertrailapp.com:12345 155 | dokku logspout:start 156 | ``` 157 | 158 | For nginx and other logs, install [remote_syslog2](https://github.com/papertrail/remote_syslog2) 159 | 160 | ```sh 161 | cd /tmp 162 | wget https://github.com/papertrail/remote_syslog2/releases/download/v0.18/remote_syslog_linux_amd64.tar.gz 163 | tar xzf ./remote_syslog*.tar.gz 164 | cd remote_syslog 165 | sudo cp ./remote_syslog /usr/local/bin 166 | ``` 167 | 168 | Create `/etc/log_files.yml` with: 169 | 170 | ```sh 171 | files: 172 | - /var/log/nginx/*.log 173 | - /var/log/unattended-upgrades/*.log 174 | destination: 175 | host: logs.papertrailapp.com 176 | port: 12345 177 | protocol: tls 178 | ``` 179 | 180 | And run: 181 | 182 | ```sh 183 | remote_syslog 184 | ``` 185 | 186 | ## Database 187 | 188 | Check out [Host Your Own Postgres](host-your-own-postgres). 189 | 190 | ## Memcached 191 | 192 | ```sh 193 | dokku plugin:install https://github.com/dokku/dokku-memcached.git 194 | dokku memcached:create lolipop 195 | dokku memcached:link lolipop myapp 196 | ``` 197 | 198 | ## Redis 199 | 200 | ```sh 201 | dokku plugin:install https://github.com/dokku/dokku-redis.git 202 | dokku redis:create lolipop 203 | dokku redis:link lolipop myapp 204 | ``` 205 | 206 | ## TODO 207 | 208 | - [Monitoring](https://www.brianchristner.io/how-to-setup-docker-monitoring/) 209 | 210 | ## Bonus 211 | 212 | Find great Docker projects at [Awesome Docker](https://github.com/veggiemonk/awesome-docker). 213 | 214 | ## Resources 215 | 216 | - [Additional Recommended Steps for New Ubuntu 14.04 Servers](https://www.digitalocean.com/community/tutorials/additional-recommended-steps-for-new-ubuntu-14-04-servers) 217 | -------------------------------------------------------------------------------- /archive/securing-user-emails-lockbox.md: -------------------------------------------------------------------------------- 1 | # Securing User Emails in Rails with Lockbox 2 | 3 |

Model Code

4 | 5 | --- 6 | 7 | *This is an update to [Securing User Emails in Rails](https://ankane.org/securing-user-emails-in-rails) with a number of improvements:* 8 | 9 | - *Works with Devise’s email changed notifications* 10 | - *Works with Devise’s reconfirmable option* 11 | - *Stores encrypted data in a single field* 12 | - *You only need to manage a single key* 13 | 14 | --- 15 | 16 | Email addresses are a common form of personal data, and they’re often stored unencrypted. If an attacker gains access to the database or backups, emails will be compromised. 17 | 18 | This post will walk you through a practical approach to protecting emails. It works with [Devise](https://github.com/plataformatec/devise), the most popular authentication framework for Rails, and is general enough to work with others. 19 | 20 | ## Strategy 21 | 22 | We’ll use two concepts to make this happen: encryption and blind indexing. Encryption gives us a way to securely store the data, and blind indexing provides a way to look it up. 23 | 24 | Blind indexing works by computing a hash of the data. You’re probably familiar with hash functions like MD5 and SHA1. Rather than one of these, we use a hash function that takes a secret key and uses [key stretching](https://en.wikipedia.org/wiki/Key_stretching) to slow down brute force attempts. You can read more about [blind indexing here](https://www.sitepoint.com/how-to-search-on-securely-encrypted-database-fields/). 25 | 26 | We’ll use the [Lockbox](https://github.com/ankane/lockbox) gem for encryption and the [Blind Index](https://github.com/ankane/blind_index) gem for blind indexing. 27 | 28 | ## Instructions 29 | 30 | Let’s assume you have a `User` model with an email field. 31 | 32 | Add to your Gemfile: 33 | 34 | ```ruby 35 | gem 'lockbox' 36 | gem 'blind_index' 37 | ``` 38 | 39 | And run: 40 | 41 | ```sh 42 | bundle install 43 | ``` 44 | 45 | Generate a key 46 | 47 | ```ruby 48 | Lockbox.generate_key 49 | ``` 50 | 51 | Store the key with your other secrets. This is typically Rails credentials or an environment variable ([dotenv](https://github.com/bkeepers/dotenv) is great for this). Be sure to use different keys in development and production. 52 | 53 | Set the following environment variables with your key (you can use this one in development) 54 | 55 | ```sh 56 | LOCKBOX_MASTER_KEY=0000000000000000000000000000000000000000000000000000000000000000 57 | ``` 58 | 59 | or create `config/initializers/lockbox.rb` with something like 60 | 61 | ```ruby 62 | Lockbox.master_key = Rails.application.credentials.lockbox_master_key 63 | ``` 64 | 65 | Next, let’s replace the email field with an encrypted version. Create a migration: 66 | 67 | ```sh 68 | rails generate migration add_email_ciphertext_to_users 69 | ``` 70 | 71 | And add: 72 | 73 | ```ruby 74 | class AddEmailCiphertextToUsers < ActiveRecord::Migration[5.2] 75 | def change 76 | # encrypted data 77 | add_column :users, :email_ciphertext, :string 78 | 79 | # blind index 80 | add_column :users, :email_bidx, :string 81 | add_index :users, :email_bidx, unique: true 82 | 83 | # drop original here unless we have existing users 84 | remove_column :users, :email 85 | end 86 | end 87 | ``` 88 | 89 | Then migrate: 90 | 91 | ```sh 92 | rails db:migrate 93 | ``` 94 | 95 | Add to your user model: 96 | 97 | ```ruby 98 | class User < ApplicationRecord 99 | encrypts :email 100 | blind_index :email 101 | end 102 | ``` 103 | 104 | Create a new user and confirm it works. 105 | 106 | ## Existing Users 107 | 108 | If you have existing users, we need to backfill the data before dropping the email column. 109 | 110 | ```ruby 111 | class User < ApplicationRecord 112 | encrypts :email, migrating: true 113 | blind_index :email, migrating: true 114 | end 115 | ``` 116 | 117 | Backfill the data in the Rails console: 118 | 119 | ```ruby 120 | Lockbox.migrate(User) 121 | ``` 122 | 123 | Then update the model to the desired state: 124 | 125 | ```ruby 126 | class User < ApplicationRecord 127 | encrypts :email 128 | blind_index :email 129 | 130 | # remove this line after dropping email column 131 | self.ignored_columns = ["email"] 132 | end 133 | ``` 134 | 135 | Finally, drop the email column. 136 | 137 | ## Reconfirmable 138 | 139 | If you use the confirmable module with `reconfirmable`, you should also encrypt the `unconfirmed_email` field. 140 | 141 | ```ruby 142 | class AddUnconfirmedEmailToUsers < ActiveRecord::Migration[5.2] 143 | def change 144 | add_column :users, :unconfirmed_email_ciphertext, :text 145 | end 146 | end 147 | ``` 148 | 149 | And add `unconfirmed_email` to the list of encrypted fields and a new method: 150 | 151 | ```ruby 152 | class User < ApplicationRecord 153 | encrypts :email, :unconfirmed_email 154 | end 155 | ``` 156 | 157 | ## Logging 158 | 159 | We also need to make sure email addresses aren’t logged. Add to `config/initializers/filter_parameter_logging.rb`: 160 | 161 | ```ruby 162 | Rails.application.config.filter_parameters += [:email] 163 | ``` 164 | 165 | Use [Logstop](https://github.com/ankane/logstop) to filter anything that looks like an email address as an extra line of defense. Add to your Gemfile: 166 | 167 | ```ruby 168 | gem 'logstop' 169 | ``` 170 | 171 | And create `config/initializers/logstop.rb` with: 172 | 173 | ```ruby 174 | Logstop.guard(Rails.logger) 175 | ``` 176 | 177 | ## Summary 178 | 179 | We now have a way to encrypt emails and query for exact matches. You can apply this same approach to other fields as well. For more security, consider a [key management service](https://github.com/ankane/kms_encrypted) to manage your keys. 180 | -------------------------------------------------------------------------------- /archive/modern-encryption-rails.md: -------------------------------------------------------------------------------- 1 | # Modern Encryption for Rails 2 | 3 |

Lockbox

4 | 5 | Encrypting sensitive data at the application-level is crucial for data security. Since writing [Securing Sensitive Data in Rails](https://ankane.org/sensitive-data-rails), I haven’t been able to shake the feeling that encryption in Rails could be easier and cleaner. 6 | 7 | To address this, I created a library called [Lockbox](https://github.com/ankane/lockbox). Here are some of the principles behind it. 8 | 9 | ## Easy to Use, Hard to Misuse 10 | 11 | Many cryptography mistakes happen during implementation. Lockbox provides good defaults and is designed to be hard to misuse. You don’t need to deal with initialization vectors and it only supports secure algorithms. 12 | 13 | ## Popular Integrations 14 | 15 | Sensitive data can appear in many places, like database fields, file uploads, and strings. You shouldn’t need different libraries for each of these. 16 | 17 | Lockbox can encrypt your data in all of these forms. It has built-in integrations with Active Record, Active Storage, and CarrierWave. 18 | 19 | ## Zero Downtime Migrations 20 | 21 | At some point, you may want to encrypt existing data. This should be easy to do, and most importantly, not require any downtime. Lockbox provides a single method you can use for this once your model is configured: 22 | 23 | ```ruby 24 | Lockbox.migrate(User) 25 | ``` 26 | 27 | No need to write one-off backfill scripts. 28 | 29 | ## Maximum Compatibility 30 | 31 | Encrypting attributes shouldn’t break existing code or libraries. To make this possible, methods like `attribute_changed?` and `attribute_was` should behave similarly regardless of whether or not an attribute is encrypted. Lockbox includes these methods in its test suite for maximum compatibility. 32 | 33 | This allows features like Devise’s ability to send email change notifications to work when the email attribute is encrypted, which is an important measure to prevent account hijacking. 34 | 35 | ```ruby 36 | Devise.setup do |config| 37 | config.send_email_changed_notification = true 38 | end 39 | ``` 40 | 41 | You can even query encrypted attributes thanks to the [blind_index](https://github.com/ankane/blind_index) gem. 42 | 43 | ## Modern Algorithms 44 | 45 | Lockbox uses AES-GCM for [authenticated encryption](https://tonyarcieri.com/all-the-crypto-code-youve-ever-written-is-probably-broken). It also supports XSalsa20 (thanks to Libsodium), which is recommended by [some cryptographers](https://latacora.micro.blog/2018/04/03/cryptographic-right-answers.html). 46 | 47 | ## Less Keys To Manage 48 | 49 | It’s a good practice to use a different encryption key for each field to make it more difficult for attackers and to reduce the likelihood of a [nonce collision](https://www.cryptologie.net/article/402/is-symmetric-security-solved/). However, this can be burdensome for developers. 50 | 51 | Instead, we can use a single master key and derive separate keys for each field from it. This approach is taken from [CipherSweet](https://ciphersweet.paragonie.com/internals/key-hierarchy), an encryption library for PHP and Node.js. Now developers can safely add encrypted fields without having to worry about generating and storing additional secrets. 52 | 53 | You can still specify keys for certain fields if you prefer, but it’s no longer required. Lockbox also works with [KMS Encrypted](https://github.com/ankane/kms_encrypted) if you want to use a key management service to manage your keys. 54 | 55 | ## Built-In Key Rotation 56 | 57 | It’s good security hygiene to rotate your encryption keys from time-to-time. Lockbox makes this easy by allowing you to specify previous versions of keys and algorithm: 58 | 59 | ```ruby 60 | class User < ApplicationRecord 61 | encrypts :email, previous_versions: [{key: previous_key}] 62 | end 63 | ``` 64 | 65 | New data is encrypted with the new key and algorithm, while older data can still be decrypted. 66 | 67 | ## Cleaner Schema 68 | 69 | [attr_encrypted](https://github.com/attr-encrypted/attr_encrypted), the de facto encryption library for database fields, uses two fields for each encrypted attribute: one for the ciphertext and another for the initialization vector. 70 | 71 | ```ruby 72 | encrypted_email 73 | encrypted_email_iv 74 | ``` 75 | 76 | However, it’s possible to store both in a single field for a cleaner schema. 77 | 78 | ```ruby 79 | email_ciphertext 80 | ``` 81 | 82 | ## Hybrid Cryptography 83 | 84 | Hybrid cryptography allows certain servers to encrypt data without the ability to decrypt it. This can do a better job [protecting data](https://ankane.org/decryption-keys) than symmetric cryptography when you can use it. Lockbox makes it just as easy to use hybrid cryptography. 85 | 86 | ```ruby 87 | class User < ApplicationRecord 88 | encrypts :email, algorithm: "hybrid", encryption_key: encryption_key, decryption_key: decryption_key 89 | end 90 | ``` 91 | 92 | ## Updates 93 | 94 | Since this post was originally published: 95 | 96 | - Lockbox also supports [types](https://ankane.org/lockbox-types) 97 | - Here’s how to [encrypt user email addresses](https://ankane.org/securing-user-emails-lockbox) 98 | - Lockbox supports [Mongoid](https://ankane.org/modern-encryption-mongoid) 99 | 100 | ## Summary 101 | 102 | You’ve now seen what Lockbox brings to encryption for Rails. To summarize, it: 103 | 104 | - Is hard to misuse 105 | - Works with database fields, files, and strings 106 | - Makes it easy to migrate existing data without downtime 107 | - Maximizes compatibility with existing code and libraries 108 | - Uses modern algorithms 109 | - Requires you to only manage a single encryption key 110 | - Makes key rotation easy 111 | - Stores encrypted data in a single field 112 | - Supports hybrid cryptography 113 | 114 | Try out [Lockbox](https://github.com/ankane/lockbox) today. 115 | 116 | *Already use a library for encryption? No worries, it’s [easy to migrate](https://github.com/ankane/lockbox#migrating-from-another-library).* 117 | -------------------------------------------------------------------------------- /archive/decryption-keys.md: -------------------------------------------------------------------------------- 1 | # Why and How to Keep Your Decryption Keys Off Web Servers 2 | 3 |

Keys

4 | 5 | Suppose a worst-case scenario happens: an attacker finds a remote code execution vulnerability and creates a [reverse shell](https://hackernoon.com/reverse-shell-cf154dfee6bd) on one of your web servers. They then find the database credentials, connect to your database, and steal the data. 6 | 7 | For unencrypted data and data encrypted at the storage level, it’s game over. The attacker has it all. If data is encrypted at the application level with symmetric encryption but the encryption key is accessible from the server, it’s exactly the same. The attacker has all they need to decrypt the data offline. 8 | 9 | This is the case whether you store the encryption key in configuration management, an environment variable, or dynamically load it from an outside source. If your app can access the key, it’s vulnerable to compromise. 10 | 11 | The best way to defend against this attack is make sure the compromised server isn’t able to decrypt data. Web servers are typically most exposed to attacks. If your web servers accept sensitive data but don’t need to show it in its entirety back to users, they should be able to encrypt the data and write it to the database, but not decrypt it. The data can be decrypted and processed by background workers that don’t allow inbound traffic. 12 | 13 | You likely can’t do this for all of your data, but you should do it for all of the data you can. Sometimes it’s possible to just show partial information back to users. This is universal for saved credit cards. 14 | 15 |

Credit cards

16 | 17 | In these cases, you can store the partial data in a separate field which web servers can decrypt, while not allowing them to decrypt the full data. 18 | 19 | ## Practical Example 20 | 21 | Suppose we have a service that sends text messages to customers. Customers enter their phone number through the website or mobile app. 22 | 23 | We can set up web servers so they can only encrypt phone numbers. Text messages can be sent through background jobs which run on a different set of servers - ones that can decrypt and don’t allow inbound traffic. If internal employees need to view full phone numbers, they can use a separate set of web servers that are only accessible through the company VPN. 24 | 25 |   | Encrypt | Decrypt |   26 | --- | --- | --- | --- 27 | Customer web servers | ✓ | 28 | Background workers | ✓ | ✓ | No inbound traffic 29 | Internal web servers | ✓ | ✓ | Requires VPN 30 | 31 | If customers need to see their saved phone numbers, you can show them the last 4 digits, which are stored in a separate field. 32 | 33 | ## Approaches 34 | 35 | Two approaches you can take to accomplish this are: 36 | 37 | 1. Hybrid cryptography 38 | 2. Cryptography as a service 39 | 40 | ## Hybrid Cryptography 41 | 42 | Public key cryptography, or asymmetric cryptography, uses different keys to perform encryption and decryption. Servers that need to encrypt have the encryption key and servers that need to decrypt have the decryption key. 43 | 44 | However, public key cryptography is much less efficient than symmetric cryptography, so most implementations combine the two. They use public key cryptography to exchange a symmetric key, and symmetric cryptography to encrypt the data. This is called hybrid cryptography, and it’s how TLS and GPG work. 45 | 46 | X25519 is a modern key exchange algorithm that’s [widely deployed](https://ianix.com/pub/curve25519-deployment.html) and [currently recommended](https://paragonie.com/blog/2019/03/definitive-2019-guide-cryptographic-key-sizes-and-algorithm-recommendations#after-fold). 47 | 48 | [Libsodium](https://libsodium.gitbook.io/doc/), which uses X25519, is a great option for hybrid cryptography in applications. It has [libraries](https://libsodium.gitbook.io/doc/bindings_for_other_languages) for most languages. 49 | 50 | ## Cryptography as a Service 51 | 52 | Another approach is to use a service to perform encryption and decryption. This service can allow some sets of servers to encrypt and others to decrypt. You could write your own (micro)service, but there are a number of existing solutions, often called key management services (KMS). 53 | 54 | - [Vault](https://www.vaultproject.io/) 55 | - [AWS KMS](https://aws.amazon.com/kms/) 56 | - [Google Cloud KMS](https://cloud.google.com/kms/) 57 | 58 | These services don’t store the encrypted data - they just encrypt and decrypt on-demand. You can either encrypt data directly with the KMS or use envelope encryption. 59 | 60 | ### Direct Encryption 61 | 62 | With direct encryption, you don’t need to set up encryption in your app. Whenever you need to encrypt or decrypt data, simply send the data to the KMS. 63 | 64 | However, this has a few downsides. It exposes the unencrypted data to the KMS, which is disastrous if the KMS alone is breached. It’s also less efficient for large files and hosted services have a fairly low limit on the size of data you can encrypt. 65 | 66 | ### Envelope Encryption 67 | 68 | Another approach is envelope encryption, which addresses the issues above but requires encryption in your app. 69 | 70 | To encrypt, generate a random encryption key, known as a data encryption key (DEK), and use it to encrypt the data. Then encrypt the DEK with the KMS and store the encrypted version. 71 | 72 | To decrypt, decrypt the DEK with the KMS and then use it to decrypt the data. This way, the KMS only ever sees the DEK. 73 | 74 | ### Auditing 75 | 76 | Another benefit of cryptography as a service is auditing. You can see exactly when data or DEKs are decrypted, and there’s no way to get around the auditing without compromising the KMS. This makes it easy to tell which information was accessed during a breach. 77 | 78 | ## Conclusion 79 | 80 | We don’t encrypt data for a sunny day. You’ve now seen two approaches to limit damage in the event of a web server breach. 81 | 82 | If you use Ruby on Rails, I’ve written a companion piece on [hybrid cryptography](/hybrid-cryptography-rails) with code for how to do this. 83 | -------------------------------------------------------------------------------- /archive/gem-patterns.md: -------------------------------------------------------------------------------- 1 | # Gem Patterns 2 | 3 | I’ve created [a few](https://ankane.org/opensource?language=Ruby) Ruby gems over the years, and there are a number of patterns I’ve found myself repeating that I wanted to share. I didn’t invent them, but have long forgotten where I first saw them. They are: 4 | 5 | - [Rails Migrations](#rails-migrations) 6 | - [Rails Dependencies](#rails-dependencies) 7 | - [Testing Against Multiple Dependency Versions](#testing-against-multiple-dependency-versions) 8 | - [Testing Against Rails](#testing-against-rails) 9 | - [Coding Your Gemspec](#coding-your-gemspec) 10 | 11 | Let’s dig into each of them. In the examples, the gem is called `hello`. 12 | 13 | ## Rails Migrations 14 | 15 | Create a template in `lib/generators/hello/templates/migration.rb.tt`: 16 | 17 | ```ruby 18 | class <%= migration_class_name %> < ActiveRecord::Migration<%= migration_version %> 19 | def change 20 | # your migration 21 | end 22 | end 23 | ``` 24 | 25 | The `.tt` extension denotes Thor template. [Thor](https://github.com/erikhuda/thor) is what Rails uses under the hood. 26 | 27 | Add `lib/generators/hello/install_generator.rb` 28 | 29 | ```ruby 30 | require "rails/generators/active_record" 31 | 32 | module Hello 33 | module Generators 34 | class InstallGenerator < Rails::Generators::Base 35 | include ActiveRecord::Generators::Migration 36 | source_root File.join(__dir__, "templates") 37 | 38 | def copy_migration 39 | migration_template "migration.rb", "db/migrate/install_hello.rb", migration_version: migration_version 40 | end 41 | 42 | def migration_version 43 | "[#{ActiveRecord::VERSION::MAJOR}.#{ActiveRecord::VERSION::MINOR}]" 44 | end 45 | end 46 | end 47 | end 48 | ``` 49 | 50 | This lets you run: 51 | 52 | ```sh 53 | rails generate hello:install 54 | ``` 55 | 56 | Change the generator path and class name to match your gem. They must match exactly what Rails expects to work. 57 | 58 | [Example](https://github.com/ankane/archer/blob/master/lib/generators/archer/install_generator.rb) 59 | 60 | ## Rails Dependencies 61 | 62 | If your gem depends on Rails, add `railties` and any other Rails libraries it needs. 63 | 64 | ```ruby 65 | spec.add_dependency "railties", ">= 5" 66 | spec.add_dependency "activerecord", ">= 5" 67 | ``` 68 | 69 | I typically require a [supported version](https://rubyonrails.org/security/) of Rails. 70 | 71 | In code, don’t require Rails gems directly, as this can cause them to load early and introduce issues. 72 | 73 | ```ruby 74 | require "active_record" # bad!! 75 | 76 | ActiveRecord::Base.include(Hello::Model) 77 | ``` 78 | 79 | Instead, do: 80 | 81 | ```ruby 82 | require "active_support" 83 | 84 | ActiveSupport.on_load(:active_record) do 85 | include Hello::Model 86 | end 87 | ``` 88 | 89 | [Example](https://github.com/ankane/hightop/blob/master/lib/hightop.rb) 90 | 91 | ## Testing Against Multiple Dependency Versions 92 | 93 | If your gem has dependencies, you may want to test against multiple versions of a dependency. For instance, you may want to test against multiple versions of Active Record. 94 | 95 | To do this, create a `test/gemfiles` directory (or `spec/gemfiles` if you use RSpec). 96 | 97 | Create `test/gemfiles/activerecord50.gemfile` with: 98 | 99 | ```ruby 100 | source "https://rubygems.org" 101 | 102 | gemspec path: "../../" 103 | 104 | gem "activerecord", "~> 5.0.0" 105 | ``` 106 | 107 | Install with: 108 | 109 | ```sh 110 | BUNDLE_GEMFILE=test/gemfiles/activerecord50.gemfile bundle install 111 | ``` 112 | 113 | And run with: 114 | 115 | ```sh 116 | BUNDLE_GEMFILE=test/gemfiles/activerecord50.gemfile bundle exec rake 117 | ``` 118 | 119 | [Example](https://github.com/ankane/groupdate/tree/master/test/gemfiles) 120 | 121 | On Travis CI, you can add to `.travis.yml`: 122 | 123 | ```yml 124 | gemfile: 125 | - Gemfile 126 | - test/gemfiles/activerecord50.gemfile 127 | ``` 128 | 129 | You can also use a library like [Appraisal](https://github.com/thoughtbot/appraisal) to help generate and run these files. 130 | 131 | ## Testing Against Rails 132 | 133 | To test against Rails, use a library like [Combustion](https://github.com/pat/combustion). It’s designed to be used with RSpec, but I haven’t had any issues with Minitest. Combustion generates some files that aren’t needed, so I just delete them. 134 | 135 | ```ruby 136 | Combustion.initialize! :all 137 | ``` 138 | 139 | [Example](https://github.com/ankane/field_test/tree/master/test) 140 | 141 | ## Coding Your Gemspec 142 | 143 | There are a variety of ways to code your gemspec. Here’s the one I like to use: 144 | 145 | ```ruby 146 | require_relative "lib/hello/version" 147 | 148 | Gem::Specification.new do |spec| 149 | spec.name = "hello" 150 | spec.version = Hello::VERSION 151 | spec.summary = "Hello world" 152 | spec.homepage = "https://github.com/you/hello" 153 | spec.license = "MIT" 154 | 155 | spec.author = "Your Name" 156 | spec.email = "you@example.com" 157 | 158 | spec.files = Dir["*.{md,txt}", "{lib}/**/*"] 159 | spec.require_path = "lib" 160 | 161 | spec.required_ruby_version = ">= 2.4" 162 | 163 | spec.add_dependency "activesupport", ">= 5" 164 | 165 | spec.add_development_dependency "bundler" 166 | spec.add_development_dependency "rake" 167 | end 168 | ``` 169 | 170 | Change `files` if your gem has `app`, `config`, or `vendor` directories. I typically use the [last supported version](https://www.ruby-lang.org/en/downloads/branches/) for the minimum Ruby version. 171 | 172 | If your gem has an executable file, add: 173 | 174 | ```ruby 175 | spec.bindir = "exe" 176 | spec.executables = ["hello"] 177 | ``` 178 | 179 | [Don’t check in](https://yehudakatz.com/2010/12/16/clarifying-the-roles-of-the-gemspec-and-gemfile/) `Gemfile.lock`. 180 | 181 | Some gems have moved development dependencies entirely out of the gemspec and into the Gemfile, which is another option. 182 | 183 | ## Summary 184 | 185 | You’ve now seen five patterns that can be useful for Ruby gems. Now go build something awesome! 186 | -------------------------------------------------------------------------------- /archive/securing-user-emails-in-rails.md: -------------------------------------------------------------------------------- 1 | # Securing User Emails in Rails 2 | 3 | --- 4 | 5 | *There is an [updated version](https://ankane.org/securing-user-emails-lockbox) of this post.* 6 | 7 | --- 8 | 9 | The GDPR goes into effect next Friday. Whether or not you serve European residents, it’s a great reminder that we have the responsibility to build systems in a way that protects user privacy. 10 | 11 | Email addresses are a common form of personal data, and they’re often stored unencrypted. If an attacker gains access to the database or backups, emails will be compromised. 12 | 13 | This post will walk you through a practical approach to protecting emails. It works with [Devise](https://github.com/plataformatec/devise), the most popular authentication framework for Rails, and is general enough to work with others. 14 | 15 | ## Strategy 16 | 17 | We’ll use two concepts to make this happen: encryption and blind indexing. Encryption gives us a way to securely store the data, and blind indexing provides a way to look it up. 18 | 19 | Blind indexing works by computing a hash of the data. You’re probably familiar with hash functions like MD5 and SHA1. Rather than one of these, we use a hash function that takes a secret key and uses [key stretching](https://en.wikipedia.org/wiki/Key_stretching) to slow down brute force attempts. You can read more about [blind indexing here](https://www.sitepoint.com/how-to-search-on-securely-encrypted-database-fields/). 20 | 21 | We’ll use the [attr_encrypted gem](https://github.com/attr-encrypted/attr_encrypted) for encryption and the [blind_index gem](https://github.com/ankane/blind_index) for blind indexing. 22 | 23 | ## Instructions 24 | 25 | Let’s assume you have a `User` model with an email field. 26 | 27 | Add to your Gemfile: 28 | 29 | ```ruby 30 | gem 'attr_encrypted' 31 | gem 'blind_index' 32 | ``` 33 | 34 | And run: 35 | 36 | ```sh 37 | bundle install 38 | ``` 39 | 40 | Next, let’s replace the email field with an encrypted version. Create a migration: 41 | 42 | ```sh 43 | rails g migration add_encrypted_email_to_users 44 | ``` 45 | 46 | And add: 47 | 48 | ```ruby 49 | class AddEncryptedEmailToUsers < ActiveRecord::Migration[5.2] 50 | def change 51 | # encrypted data 52 | add_column :users, :encrypted_email, :string 53 | add_column :users, :encrypted_email_iv, :string 54 | add_index :users, :encrypted_email_iv, unique: true 55 | 56 | # blind index 57 | add_column :users, :encrypted_email_bidx, :string 58 | add_index :users, :encrypted_email_bidx, unique: true 59 | 60 | # drop original here unless we have existing users 61 | remove_column :users, :email 62 | end 63 | end 64 | ``` 65 | 66 | We use one column to store the encrypted data, one to store [the IV](http://www.cryptofails.com/post/70059609995/crypto-noobs-1-initialization-vectors), and another to store the blind index. 67 | 68 | We add a unique index on the IV since reusing an IV with the same key in AES-GCM (the default algorithm for attr_encrypted) will [leak the key](https://csrc.nist.gov/csrc/media/projects/block-cipher-techniques/documents/bcm/joux_comments.pdf). 69 | 70 | Then migrate: 71 | 72 | ```sh 73 | rails db:migrate 74 | ``` 75 | 76 | Next, generate keys. We use environment variables to store the keys as hex-encoded strings ([dotenv](https://github.com/bkeepers/dotenv) is great for this). [Here’s an explanation](https://ankane.org/encryption-keys) of why `pack` is used. *Do not commit them to source control.* Generate one key for encryption and one key for hashing. You can generate keys in the Rails console with: 77 | 78 | ```ruby 79 | SecureRandom.hex(32) 80 | ``` 81 | 82 | For development, you can use these: 83 | 84 | ```sh 85 | EMAIL_ENCRYPTION_KEY=0000000000000000000000000000000000000000000000000000000000000000 86 | EMAIL_BLIND_INDEX_KEY=ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff 87 | ``` 88 | 89 | Add to your user model: 90 | 91 | ```ruby 92 | class User < ApplicationRecord 93 | attr_encrypted :email, key: [ENV["EMAIL_ENCRYPTION_KEY"]].pack("H*") 94 | blind_index :email, key: [ENV["EMAIL_BLIND_INDEX_KEY"]].pack("H*") 95 | end 96 | ``` 97 | 98 | > `pack` is used to decode the hex value 99 | 100 | Create a new user and confirm it works. 101 | 102 | ## Existing Users 103 | 104 | If you have existing users, we need to backfill the data before dropping the email column. We temporarily use a virtual attribute - `protected_email` - so we can backfill without downtime. 105 | 106 | ```ruby 107 | class User < ApplicationRecord 108 | attr_encrypted :protected_email, key: [ENV["EMAIL_ENCRYPTION_KEY"]].pack("H*"), attribute: "encrypted_email" 109 | blind_index :protected_email, key: [ENV["EMAIL_BLIND_INDEX_KEY"]].pack("H*"), attribute: "email", bidx_attribute: "encrypted_email_bidx" 110 | 111 | before_validation :protect_email, if: -> { email_changed? } 112 | 113 | def protect_email 114 | self.protected_email = email 115 | compute_protected_email_bidx 116 | end 117 | end 118 | ``` 119 | 120 | Backfill the data in the Rails console: 121 | 122 | ```ruby 123 | User.where(encrypted_email: nil).find_each do |user| 124 | user.protect_email 125 | user.save! 126 | end 127 | ``` 128 | 129 | Then update the model to the desired state: 130 | 131 | ```ruby 132 | class User < ApplicationRecord 133 | attr_encrypted :email, key: [ENV["EMAIL_ENCRYPTION_KEY"]].pack("H*") 134 | blind_index :email, key: [ENV["EMAIL_BLIND_INDEX_KEY"]].pack("H*") 135 | 136 | # remove this line after dropping email column 137 | self.ignored_columns = ["email"] 138 | end 139 | ``` 140 | 141 | Finally, drop the email column. 142 | 143 | ## Logging 144 | 145 | We also need to make sure email addresses aren’t logged. Add to `config/initializers/filter_parameter_logging.rb`: 146 | 147 | ```ruby 148 | Rails.application.config.filter_parameters += [:email] 149 | ``` 150 | 151 | Use [Logstop](https://github.com/ankane/logstop) to filter anything that looks like an email address as an extra line of defense. Add to your Gemfile: 152 | 153 | ```ruby 154 | gem 'logstop' 155 | ``` 156 | 157 | And create `config/initializers/logstop.rb` with: 158 | 159 | ```ruby 160 | Logstop.guard(Rails.logger) 161 | ``` 162 | 163 | ## Summary 164 | 165 | We now have a way to encrypt data and query for exact matches. You can apply this same approach to other fields as well. For more security, consider a [key management service](https://github.com/ankane/kms_encrypted) to manage your keys. 166 | -------------------------------------------------------------------------------- /archive/new-ml-gems.md: -------------------------------------------------------------------------------- 1 | # 16 New ML Gems for Ruby 2 | 3 |

4 | New ML Gems 5 |

6 | 7 | In August, I set out to improve the machine learning ecosystem for Ruby. I wasn’t sure where it would go. Over the next 5 months, I ended up releasing 16 libraries and learned a lot along the way. I wanted to share some of that knowledge and introduce some of the libraries you can now use in Ruby. 8 | 9 | ## The Theme 10 | 11 | There are many great machine libraries for Python, so a natural place to start was to see what it’d take to bring them to Ruby. It turned out to be a lot less work than expected based on a common theme. 12 | 13 | ML libraries want to be fast. This means less time waiting and more time iterating. However, interpreted languages like Python and Ruby aren’t relatively fast. How do libraries overcome this? 14 | 15 | The key is they do most of the work in a compiled language - typically C++ - and have wrappers for other languages like Python. 16 | 17 | This was really great news. The same approach and code could be used for Ruby. 18 | 19 | ## The Patterns 20 | 21 | Ruby has a number of ways to call C and C++ code. 22 | 23 | Native extensions are one method. They’re written in C or C++ and use [Ruby’s C API](https://silverhammermba.github.io/emberb/c/). You may have noticed gems with native extensions taking longer to install, as they need to compile. 24 | 25 | ```c 26 | void Init_stats() 27 | { 28 | VALUE mStats = rb_define_module("Stats"); 29 | rb_define_module_function(mStats, "mean", mean, 2); 30 | } 31 | ``` 32 | 33 | A more general way for one language to call another is a foreign function interface, or FFI. It requires a C API (due to C++ name mangling), which many machine learning libraries had. An advantage of FFI is you can define the interface in the host language - in our case, Ruby. 34 | 35 | Ruby supports FFI with Fiddle. It was added in Ruby 1.9, but appears to be [“the Ruby standard library’s best kept secret.”](https://www.honeybadger.io/blog/use-any-c-library-from-ruby-via-fiddle-the-ruby-standard-librarys-best-kept-secret/) 36 | 37 | ```ruby 38 | module Stats 39 | extend Fiddle::Importer 40 | dlload "libstats.so" 41 | extern "double mean(int a, int b)" 42 | end 43 | ``` 44 | 45 | There’s also the [FFI](https://github.com/ffi/ffi) gem, which provides higher-level functionality and overcomes some limitations of Fiddle (like the ability to pass structs by value). 46 | 47 | ```ruby 48 | module Stats 49 | extend FFI::Library 50 | ffi_lib "stats" 51 | attach_function :mean, [:int, :int], :double 52 | end 53 | ``` 54 | 55 | For libraries without a C API, [Rice](https://github.com/jasonroelofs/rice) provides a really nice way to bind C++ code (similar to Python’s pybind11). 56 | 57 | ```cpp 58 | void Init_stats() 59 | { 60 | Module mStats = define_module("Stats"); 61 | mStats.define_singleton_method("mean", &mean); 62 | } 63 | ``` 64 | 65 | Another approach is SWIG (Simplified Wrapper and Interface Generator). You create an interface file and then run SWIG to generate the bindings. Gusto has a [good tutorial](https://engineering.gusto.com/simple-ruby-c-extensions-with-swig/) on this. 66 | 67 | ```swig 68 | %module stats 69 | 70 | double mean(int, int); 71 | ``` 72 | 73 | There’s also [Rubex](https://github.com/SciRuby/rubex), which lets you write Ruby-like code that compiles to C (similar to Python’s Cython). It also provides the ability to interface with C libraries. 74 | 75 | ```ruby 76 | lib "" 77 | double mean(int, int) 78 | end 79 | ``` 80 | 81 | None of the approaches above are specific to machine learning, so you can use them with any C or C++ library. 82 | 83 | ## The Libraries 84 | 85 | Libraries were chosen based on popularity and performance. Many have a similar interface to their Python counterpart to make it easy to follow existing tutorials. Libraries are broken down into categories below with brief descriptions. 86 | 87 | ### Gradient Boosting 88 | 89 | [XGBoost](https://github.com/ankane/xgb) and [LightGBM](https://github.com/ankane/lightgbm) are gradient boosting libraries. Gradient boosting is a powerful technique for building predictive models that fits many small decision trees that together make robust predictions, even with outliers and missing values. Gradient boosting performs well on tabular data. 90 | 91 | ### Deep Learning 92 | 93 | [Torch-rb](https://github.com/ankane/torch-rb) and [TensorFlow](https://github.com/ankane/tensorflow) are deep learning libraries. Torch-rb is built on LibTorch, the library that powers PyTorch. Deep learning has been very successful in areas like image recognition and natural language processing. 94 | 95 | ### Recommendations 96 | 97 | [Disco](https://github.com/ankane/disco) is a recommendation library. It looks at ratings or actions from users to predict other items they might like, known as collaborative filtering. Matrix factorization is a common way to accomplish this. 98 | 99 | [LIBMF](https://github.com/ankane/libmf) is a high-performance matrix factorization library. 100 | 101 | Collaborative filtering can also find similar users and items. If you have a large number of users or items, an approximate nearest neighbor algorithm can speed up the search. Spotify [does this](https://github.com/spotify/annoy#background) for music recommendations. 102 | 103 | [NGT](https://github.com/ankane/ngt) is an approximate nearest neighbor library that performs extremely well on benchmarks (in Python/C++). 104 | 105 |

106 | ANN Benchmarks 107 |

108 | 109 |

110 | Image from ANN Benchmarks, MIT license 111 |

112 | 113 | Another promising technique for recommendations is factorization machines. The traditional approach to collaborative filtering builds a model exclusively from past ratings or actions. However, you may have additional *side information* about users or items. Factorization machines can incorporate this data. They can also perform classification and regression. 114 | 115 | [xLearn](https://github.com/ankane/xlearn) is a high-performance library for factorization machines. 116 | 117 | ### Optimization 118 | 119 | Optimization finds the best solution to a problem out of many possible solutions. Scheduling and vehicle routing are two common tasks. Optimization problems have an objective function to minimize (or maximize) and a set of constraints. 120 | 121 | Linear programming is an approach you can use when the objective function and constraints are linear. Here’s a really good [introductory series](https://www.youtube.com/watch?v=0TD9EQcheZM) if you want to learn more. 122 | 123 | [SCS](https://github.com/ankane/scs) is a library that can solve [many types](https://www.cvxpy.org/tutorial/advanced/index.html#choosing-a-solver) of optimization problems. 124 | 125 | [OSQP](https://github.com/ankane/osqp) is another that’s specifically designed for quadratic problems. 126 | 127 | ### Text Classification 128 | 129 | [fastText](https://github.com/ankane/fasttext) is a text classification and word representation library. It can label documents with one or more categories, which is useful for content tagging, spam filtering, and language detection. It can also compute word vectors, which can be compared to find similar words and analogies. 130 | 131 | ### Interoperability 132 | 133 | It’s nice when languages play nicely together. 134 | 135 | [ONNX Runtime](https://github.com/ankane/onnxruntime) is a scoring engine for ML models. You can build a model in one language, save it in the ONNX format, and run it in another. Here’s [an example](/tensorflow-ruby). 136 | 137 | [Npy](https://github.com/ankane/npy) is a library for saving and loading NumPy `npy` and `npz` files. It uses [Numo](/numo) for multi-dimensional arrays. 138 | 139 | ### Others 140 | 141 | [Vowpal Wabbit](https://github.com/ankane/vowpalwabbit) specializes in online learning. It’s great for reinforcement learning as well as supervised learning where you want to train a model incrementally instead of all at once. This is nice when you have a lot of data. 142 | 143 | [ThunderSVM](https://github.com/ankane/thundersvm) is an SVM library that runs in parallel on either CPUs or GPUs. 144 | 145 | [GSLR](https://github.com/ankane/gslr) is a linear regression library powered by GSL that supports both ordinary least squares and ridge regression. It can be used alone or to improve the performance of [Eps](https://github.com/ankane/eps). 146 | 147 | ## Shout-out 148 | 149 | I wanted to also give a shout-out to another library that entered the scene in 2019. 150 | 151 | [Rumale](https://github.com/yoshoku/rumale) is a machine learning library that supports many, many algorithms, similar to Python’s Scikit-learn. Thanks [@yoshoku](https://github.com/yoshoku) for the amazing work! 152 | 153 | ## Final Word 154 | 155 | There are now many state-of-the-art machine learning libraries available for Ruby. If you’re a Ruby engineering who’s interested in machine learning, now’s a good time to try it. Also, if you come across a C or C++ library you want to use in Ruby, you’ve seen a few ways to do it. Let’s make Ruby a great language for machine learning. 156 | -------------------------------------------------------------------------------- /archive/rails-meet-data-science.md: -------------------------------------------------------------------------------- 1 | # Rails, Meet Data Science 2 | 3 |

Rails, Meet Data Science

4 | 5 | Organizations today have more data than ever. Predictive modeling is a powerful way to use this data to solve problems and create better experiences for customers. For instance, do a better job keeping items in stock by predicting demand or lower costs by predicting fraud. If you use Ruby on Rails, it can be tough to know how to incorporate this into your app. 6 | 7 | We’ll go over four patterns you can use for prediction with Rails. We used all four successfully during my time at [Instacart](https://www.instacart.com). They can work when you have no data scientists (when I started) as well as when you have a strong data science team. 8 | 9 | ## Patterns 10 | 11 | With predictive modeling, you first train a model and then use it to predict. The patterns can be grouped by the language used for each task: 12 | 13 | Pattern | Train | Predict 14 | --- | --- | --- 15 | 1 | 3rd Party | 3rd Party 16 | 2 | Ruby | Ruby 17 | 3 | Another Language | Ruby 18 | 4 | Another Language | Another Language 19 | 20 | Two popular languages for data science are Python and R. 21 | 22 | You can decide which pattern to use for each model you build. We’ll walk through the approaches and discuss the pros and cons of each. 23 | 24 | ## Pattern 1: Use a 3rd Party 25 | 26 | Before building a model in-house, it’s good to see what already exists. There are a number of external services you can use for specific problems. Here are a few: 27 | 28 | - Fraud - [Sift Science](https://siftscience.com/) 29 | - Recommendations - [Tamber](https://tamber.com/) 30 | - Anomaly Detection & Forecasting - [Trend](https://trendapi.org/) 31 | - NLP - [Amazon Comprehend](https://aws.amazon.com/comprehend/) and [Google Cloud Natural Language](https://cloud.google.com/natural-language/) 32 | - Vision - [AWS Rekognition](https://aws.amazon.com/rekognition/) and [Google Cloud Vision](https://cloud.google.com/vision/) 33 | 34 | Pros 35 | 36 | - Get domain knowledge from the company 37 | - Fast to implement and easy to maintain 38 | 39 | Cons 40 | 41 | - Not easy to iterate if it doesn’t fit your needs 42 | - Vendor lock-in 43 | 44 | ## Pattern 2: Train and Predict in Ruby 45 | 46 | Ruby has a number of libraries for building simple models. Simple models can perform very well since a large part of model building is [feature engineering](https://en.wikipedia.org/wiki/Feature_engineering). This is a great option if there are no data scientists in your company or on your team. A developer can own the model end-to-end, which is great for speed and iteration. 47 | 48 | Here are a few libraries for building models in Ruby: 49 | 50 | - [Eps](https://github.com/ankane/eps) - good for beginners 51 | - [Rumale](https://github.com/yoshoku/rumale) - good for advanced users 52 | - [Xgb](https://github.com/ankane/xgb) - XGBoost 53 | - [LightGBM](https://github.com/ankane/lightgbm) - LightGBM 54 | - And [many more](https://github.com/arbox/machine-learning-with-ruby) 55 | 56 | Once a model is trained, you’ll need to store it. You can use methods provided by the library, or marshal if none exist. You can store the models as files or in the database. 57 | 58 | Be sure to commit the code used to train models so you can update them with newer data in the future. The Rails console is a decent place to create them, or use a [Jupyter notebook](https://jupyter.org/) running [IRuby](https://github.com/SciRuby/iruby) for better visualizations (see [setup instructions for Rails](https://ankane.org/jupyter-rails)). 59 | 60 | Pros 61 | 62 | - Simple models can perform well 63 | - No need to introduce a new language 64 | 65 | Cons 66 | 67 | - Limited tools for building models 68 | - Limited model selection 69 | - Many people who have experience building models don’t know Ruby 70 | 71 | ## Pattern 3: Train in Another Language, Predict in Ruby 72 | 73 | Ruby is getting better for data science thanks to [SciRuby](https://github.com/SciRuby/sciruby). However, languages like R and Python currently have much better tools. Also, many people who have experience building models don’t know Ruby. 74 | 75 | Luckily, you can build models in another language and predict in Ruby. This way, you can use more advanced tools for visualization, validation, and tuning without adding complexity to your production stack. If you don’t have data scientists, you can use this pattern to contract with one. 76 | 77 | Here are models that can currently predict in Ruby: 78 | 79 | - [Eps](https://github.com/ankane/eps) - Linear Regression, Naive Bayes 80 | - [Scoruby](https://github.com/asafschers/scoruby) - Random Forest, GBM, Decision Tree, Naive Bayes 81 | - [Xgb](https://github.com/ankane/xgb) - XGBoost 82 | - [LightGBM](https://github.com/ankane/lightgbm) - LightGBM 83 | 84 | For this to work, models need to be stored in a shared format that both languages understand. PMML and PFA are two interchange formats. PFA is newer but has less adoption than PMML. Andrey Melentyev has a [great post](https://www.andrey-melentyev.com/model-interoperability.html) on the topic. 85 | 86 | Once again, it’s important that models are reproducible. This allows you to update them with newer data in the future. Be sure to follow software engineering best practices like: 87 | 88 | - Use source control (create a new repo or add to your existing repo) 89 | - Use a package manager for a reproducible environment 90 | - Keep credentials out of source control (use `.env` or `.Renviron`) 91 | 92 | Here are some tools you can use: 93 | 94 | Function | Python | R 95 | --- | --- | --- | --- 96 | Package management | [Pipenv](https://pipenv.readthedocs.io/en/latest/) | [Jetpack](https://github.com/ankane/jetpack) 97 | Database access | [SQLAlchemy](https://www.sqlalchemy.org/) | [dbx](https://github.com/ankane/dbx) 98 | PMML export | [sklearn2pmml](https://github.com/jpmml/sklearn2pmml) | [pmml](https://cran.r-project.org/package=pmml) 99 | 100 | One place to be careful is implementing the features in Ruby. It must be consistent with how they were implemented in training. To ensure this is correct, verify it programmatically. Create a CSV file with ids and predictions from the original model and confirm the Ruby predictions match. Here’s some [example code](https://github.com/ankane/eps#verifying). 101 | 102 | Pros 103 | 104 | - Better tools for model building 105 | - No need to operate a new language in production 106 | 107 | Cons 108 | 109 | - Need to introduce a new language in development 110 | - Limited model selection 111 | - Need to create features in two languages 112 | 113 | ## Pattern 4: Train and Predict in Another Language 114 | 115 | The last option we’ll cover is doing both training and prediction outside Ruby. This is great if you have a team of data scientists who specialize in another language. This pattern allows data scientists to own models end-to-end. 116 | 117 | It also gives you access to models that are not available in Ruby. For instance, there are forecasting libraries like [Prophet](https://facebook.github.io/prophet/) and deep learning libraries like [TensorFlow](https://www.tensorflow.org/). 118 | 119 | The implementation depends on how predictions are generated. Two common ways are batch and real-time. 120 | 121 | --- 122 | 123 | ### Batch Predictions 124 | 125 | Batch predictions are generated asynchronously and are typically run on a regular interval. This can be every minute or once a week. An example is a daily job that updates demand forecasts for the following weeks. Predictions can be stored and later used by the Rails app as needed. 126 | 127 | Don’t be afraid to read and write directly to the database. While microservice design patterns caution against using the database as an API, we didn’t have much issue with it. When updating records, it’s also a good idea to write audits to see how predictions change over time. 128 | 129 | Jobs can be scheduled with cron, or ideally a distributed scheduler like [Mani](https://github.com/sherinkurian/mani) for high availability. If you need to let the Rails app know a job has completed, you can do this through your messaging system. HTTP works great if you don’t have one. 130 | 131 | --- 132 | 133 | ### Real-Time Predictions 134 | 135 | Real-time predictions are generated synchronously and are triggered by calls from the Rails app. An example is recommending items to a user at checkout based off what’s in their cart. 136 | 137 | HTTP is a common choice for retrieving predictions, but you can use a messaging system or even pipes. Great tools for HTTP are [Django](https://www.djangoproject.com/) and [Flask](http://flask.pocoo.org/) for Python and [Plumber](https://www.rplumber.io/) for R. 138 | 139 | --- 140 | 141 | As with the other patterns, follow best engineering practices. In addition to ones previously mentioned: 142 | 143 | - Use a framework, or at the very least a consistent project structure 144 | - Keep code [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) 145 | 146 | Don’t be afraid to use Rails to manage the database schema. It’s easy enough for data scientists to learn to create and run migrations. Otherwise, you need to support another system for schema changes. 147 | 148 | To store models, you most likely won’t use an interchange format, since libraries can’t load them. Instead, use serialization specific to the language, like pickle in Python and serialize in R. 149 | 150 | If deciding between Python and R, Python has more general purpose libraries, so it’s easier to run in production. 151 | 152 | Pros 153 | 154 | - Larger selection of models available 155 | - Data scientists can own models end-to-end 156 | 157 | Cons 158 | 159 | - Need to run multiple languages in production 160 | 161 | ## Conclusion 162 | 163 | You’ve now seen four great patterns for bringing predictive models to Rails. Each has different trade-offs, so we recommend taking the simplest approach that works for you. No matter which you choose, make sure your models are reproducible. 164 | 165 | Happy modeling! 166 | 167 |
168 | 169 | Updates 170 | 171 | - May 2019: Added Rumale 172 | - August 2019: Added Xgb and LightGBM 173 | -------------------------------------------------------------------------------- /archive/scaling-the-monolith.md: -------------------------------------------------------------------------------- 1 | # Scaling the Monolith 2 | 3 | Many companies start out with a single web application. As the team and codebase grow, things feel less organized and common tasks like booting the app and running the test suite take longer and longer. It can be tempting to turn to microservices to alleviate some of this pain. However, distributed systems add a significant amount of complexity and mental overhead. 4 | 5 | Before you decide to split apart your app, there are a number of tactics you can use to scale it [majestically](https://m.signalvnoise.com/the-majestic-monolith-29166d022228#.bst5vwy6r). Spend a significant amount of time trying to solve your existing problems before making big changes. 6 | 7 | The topics we’ll cover are: 8 | 9 | - [Code](#code) 10 | - [Errors](#errors) 11 | - [Boot Times & Memory](#boot-times-memory) 12 | - [Testing](#testing) 13 | - [Databases](#databases) 14 | - [Stability](#stability) 15 | 16 | The examples are geared towards Rails apps, but the principles apply to any codebase. 17 | 18 | ## Code 19 | 20 | Rails models and controllers tend to get larger and larger. Rails introduced [concerns](https://signalvnoise.com/posts/3372-put-chubby-models-on-a-diet-with-concerns) as one way to address this. Concerns allow you to pull out related logic into a separate file. 21 | 22 | Service objects are another nice pattern for this. Here’s [an example](https://hackernoon.com/service-objects-in-ruby-on-rails-and-you-79ca8a1c946e) of a service object. There’s not a standard way to create service objects, but it’s a good idea to decide on a convention for your app. You can use gems like [Interactor](https://github.com/collectiveidea/interactor) to establish one. 23 | 24 | Use namespaces to organize code. 25 | 26 | ```ruby 27 | class Admin::UsersController < Admin::BaseController 28 | end 29 | ``` 30 | 31 | Some teams also prefer to use Rails engines, although I’m not a fan of this approach. Here’s a [good comparison](https://stackoverflow.com/a/29641532/1177228) of the pros and cons of each. 32 | 33 | ## Errors 34 | 35 | As the team grows, it’s important that errors get routed to the right place. You can use the [ownership](https://github.com/ankane/ownership) gem to help with this. Add it to controllers, jobs, and rake tasks. 36 | 37 | ```ruby 38 | class WelcomeJob < ApplicationJob 39 | owner :growth 40 | end 41 | ``` 42 | 43 | `git blame` can help with assigning initial owners. 44 | 45 | ## Boot Times & Memory 46 | 47 | As your app accumulates more gems and files, its boot time and memory usage grow. There have been a number of projects over the years to speed up boot time. [Spring](https://github.com/rails/spring) was introduced in Rails 4.1 and keeps your app running in the background so it doesn’t have to boot every time you run a new command. 48 | 49 | Last year, Shopify released [Bootsnap](https://github.com/Shopify/bootsnap), which caches expensive loading computations. It’s now part of Rails 5.2 and can be used with earlier versions of Rails as well. With Bootsnap, “the core Shopify platform - a rather large monolithic application - boots about 75% faster, dropping from around 25s to 6.5s.” 50 | 51 | Another tactic is lazy loading files. Instead of incurring a speed and memory penalty at startup to load files, you can incur it the first time a request or job requires it. If it’s never needed, it’s never loaded. You can specify which gems to load in your Gemfile. 52 | 53 | ```rb 54 | gem 'groupdate', require: false 55 | ``` 56 | 57 | You can also use different Bundler groups to selectively load gems for different environments. 58 | 59 | ```rb 60 | group :web do 61 | gem 'rack-attack' 62 | end 63 | 64 | group :admin_web do 65 | gem 'activeadmin' 66 | end 67 | 68 | group :worker do 69 | gem 'premailer-rails' 70 | end 71 | ``` 72 | 73 | Read how to [set it up here](https://engineering.harrys.com/2014/07/29/hacking-bundler-groups.html). 74 | 75 | Use [Bumbler](https://github.com/nevir/Bumbler) to see how long each gem takes to load and [Derailed Benchmarks](https://github.com/schneems/derailed_benchmarks) to see memory usage. Focus on the top ones and leave the rest. 76 | 77 | If a gem is slow, there’s a chance it may be doing a lot of work upfront. You can try to debug the gem and fix it. Here’s an [example](https://github.com/ankane/area/commit/2c8cc47d151828ebdcce0e7060b7ac77a4c2f9ce) of speeding up initial load time by only reading a CSV file when it’s needed. 78 | 79 | ## Testing 80 | 81 | As the number of tests grow, the test suite can become slow. [TestProf](https://test-prof.evilmartians.io) provides a number of tools to profile and optimize your tests. You can also use a library like [Database Cleaner](https://github.com/DatabaseCleaner/database_cleaner) to quickly clean the database after tests. 82 | 83 | In development, you can use Guard for [Minitest](https://github.com/guard/guard-minitest) or [RSpec](https://github.com/guard/guard-rspec) to automatically run tests when relevant files are modified. Also make sure it’s easy to manually run common subsets of tests. You can use tags in RSpec for this. 84 | 85 | ```sh 86 | rspec --tags growth 87 | ``` 88 | 89 | The key to speeding up the entire test suite is parallelization. Stripe has a [great post](https://stripe.com/blog/distributed-ruby-testing) about how they were able to get three hours of tests to run in three minutes. With continuous integration, split tests across multiple machines. Both [Travis](https://docs.travis-ci.com/user/speeding-up-the-build/#parallelizing-your-builds-across-virtual-machines) and [Circle](https://circleci.com/docs/2.0/parallelism-faster-jobs/) support this. You can use [ParallelTests](https://github.com/grosser/parallel_tests) in development to use all the cores on your machine. Rails 6 will run tests in parallel by default. 90 | 91 | Another way to speed up tests is to change your schema dump format to SQL. 92 | 93 | ```ruby 94 | config.active_record.schema_format = :sql 95 | ``` 96 | 97 | This allows you to load the database schema for tests without booting the Rails app. With Postgres, you can use: 98 | 99 | ```sh 100 | psql < db/structure.sql 101 | ``` 102 | 103 | To prevent slow tests from being added, automatically fail tests that take too long. With RSpec, you can do: 104 | 105 | ```ruby 106 | RSpec.configure do |config| 107 | config.around(:each) do |example| 108 | duration = Benchmark.realtime(&example) 109 | raise "Test took over 2 seconds to run" if duration > 2 110 | end 111 | end 112 | ``` 113 | 114 | Start with a higher value and ratchet it down as you fix tests that are slow. You can see the slowest tests with: 115 | 116 | ```sh 117 | rspec --profile 118 | ``` 119 | 120 | As the number of tests grows, there’s a higher chance of a random network issue causing an individual test to fail. Automatically retry failing tests to cut down on noise. With RSpec, you can use [RSpec::Retry](https://github.com/NoRedInk/rspec-retry) for this. 121 | 122 | ```ruby 123 | require "rspec/retry" 124 | 125 | RSpec.configure do |config| 126 | config.around(:each) do |example| 127 | example.run_with_retry retry: 2 # must be 2 to retry once (shrug) 128 | end 129 | end 130 | ``` 131 | 132 | For test failures, make sure they get routed to the committer. You can use webhooks from your CI platform to do this. 133 | 134 | ## Databases 135 | 136 | Modern relational databases can scale extremely well if you follow best practices. 137 | 138 | One of the most important things you can do is set a [statement timeout](https://github.com/ankane/the-ultimate-guide-to-ruby-timeouts#statement-timeouts-1) to prevent bad queries from taking too many resources. 139 | 140 | ```yml 141 | production: 142 | variables: 143 | statement_timeout: 250 # ms 144 | ``` 145 | 146 | It’s also good to track which queries consume the most CPU time. With Postgres, you can use [PgHero](https://github.com/ankane/pghero) for this. 147 | 148 |

PgHero

149 | 150 | Use [Marginalia](https://github.com/basecamp/marginalia) to make it easy to identity the origin of queries. This adds a comment to the end of queries like `/*application:Datakick,controller:items,action:edit*/` so you can see where they’re coming from. 151 | 152 | Add defensive measures as well. For instance, pause low priority job queues automatically when the database CPU gets too high. 153 | 154 | ```ruby 155 | Sidekiq::Queue.new("low").pause! 156 | ``` 157 | 158 | As the team grows, so does the chance of someone accidentally running a migration that takes down the site. [Strong Migrations](https://github.com/ankane/strong_migrations) can help prevent downtime due to database migrations. It raises an error if you try to run an unsafe operation and gives instructions for a better way to do it. 159 | 160 |

Strong Migrations

161 | 162 | Some tables can accumulate a lot of columns. You can split them into multiple tables based off concern that have a 1-to-1 relationship. 163 | 164 | Scale reads by fixing N+1 queries and caching frequent queries. [Bullet](https://github.com/flyerhzm/bullet) can help you identify N+1 queries. If you still have high load after spending a good amount of time on these, use [Distribute Reads](https://github.com/ankane/distribute_reads) for replicas. 165 | 166 | Scale writes and space with additional databases. Use [Multiverse](https://github.com/ankane/multiverse) to manage them. This can also be good if you have business domains with different workloads. It adds complexity and removes the ability to join certain tables, but can increase stability. 167 | 168 | Partitioning is another strategy for space for tables that only need recent data. You can use [pgslice](https://github.com/ankane/pgslice) for Postgres. 169 | 170 | While Rails has built-in connection pooling, connections can become an issue when you have a lot of servers. With Postgres, use a connection pooler like [PgBouncer](https://ankane.org/pgbouncer-setup) when you start to hit 500 connections. 171 | 172 | Be hesitant to introduce new data stores. Most of the time you can [just table data](https://ankane.org/just-table-it). It’s often not worth having another technology to manage if your current stack can do the job. 173 | 174 | ## Stability 175 | 176 | Your monolith is one codebase, but you can increase stability by isolating different parts of the app in production. Have separate load balancers and web servers for your customer site and admin site so customers aren’t impacted if the admin site goes down. Use separate workers for different groups of queues so a backed up queue or bad job won’t affect the whole system. 177 | 178 | You can separate by business domain, which will be aligned with teams if you have vertical teams. This also allows you to scale different parts of your app independently as if they were different services. 179 | 180 | ## Conclusion 181 | 182 | As you’ve seen, there are a number of things you can do to scale your monolith. Focus on developer happiness and productivity as well as system stability. Keep track of metrics over time that impact developers, like boot time, test suite time, and deploy time. It’s also good to invest in projects that make it comfortable to ship code fast, like quick rollbacks. Overall, spend a decent amount of time trying to solve your exact pain points before breaking your app apart to solve them. 183 | --------------------------------------------------------------------------------