├── .gitignore ├── README.md ├── Vagrantfile ├── dev-runner.sh ├── playbook ├── ansible.cfg ├── inventories │ ├── production │ └── vagrant ├── main.yml ├── roles │ ├── airflow │ │ ├── defaults │ │ │ └── main.yml │ │ ├── files │ │ │ ├── aws-config │ │ │ ├── etl-load-events.key │ │ │ ├── etl-organization-stats.key │ │ │ └── ssh-config │ │ ├── tasks │ │ │ ├── aws-config.yml │ │ │ ├── initd.yml │ │ │ ├── install.yml │ │ │ ├── main.yml │ │ │ └── sys-time.yml │ │ └── templates │ │ │ ├── airflow.cfg │ │ │ ├── aws-credentials │ │ │ ├── initd.template │ │ │ ├── insert_connection.sh │ │ │ └── my.cnf │ ├── mysql │ │ ├── .travis.yml │ │ ├── README.md │ │ ├── defaults │ │ │ └── main.yml │ │ ├── handlers │ │ │ └── main.yml │ │ ├── meta │ │ │ └── main.yml │ │ ├── tasks │ │ │ ├── configure.yml │ │ │ ├── databases.yml │ │ │ ├── main.yml │ │ │ ├── replication.yml │ │ │ ├── secure-installation.yml │ │ │ ├── setup-Debian.yml │ │ │ ├── setup-RedHat.yml │ │ │ └── users.yml │ │ ├── templates │ │ │ ├── my.cnf.j2 │ │ │ └── user-my.cnf.j2 │ │ ├── tests │ │ │ ├── Dockerfile.centos-6 │ │ │ ├── Dockerfile.centos-7 │ │ │ ├── Dockerfile.ubuntu-12.04 │ │ │ ├── Dockerfile.ubuntu-14.04 │ │ │ ├── centos-7-test.yml │ │ │ ├── initctl_faker │ │ │ ├── inventory │ │ │ └── test.yml │ │ └── vars │ │ │ ├── Debian.yml │ │ │ └── RedHat.yml │ ├── nginx-proxy │ │ ├── tasks │ │ │ └── main.yml │ │ └── templates │ │ │ └── nginx-default-conf │ └── nginx │ │ ├── tasks │ │ └── main.yml │ │ └── templates │ │ └── nginx.conf └── vars │ ├── airflow-dev.yml │ └── airflow-prod.yml └── workflows ├── __init__.py ├── dags ├── __init__.py └── daily_etls.py ├── operators ├── __init__.py ├── lambda_operator.py └── s3_sensor.py └── settings ├── __init__.py └── job_properties.py /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .idea/ 3 | *.pyc 4 | .vagrant -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Airflow 2 | 3 | ## Introduction 4 | This project is part of an example data pipeline, presented as part of a talk at ACM's [Applicative 2016 Conference] 5 | (http://applicative.acm.org/speakers/ypodeswa.html). Slides are available [here] 6 | (https://docs.google.com/presentation/d/1hX_fPTu92YBIny6LwvUfyF597YT6Bu0F7TgLr6focGk/edit?usp=sharing), which describe 7 | the data pipeline. This pipeline is made of 3 projects, all meant to be stitched together: 8 | 9 | * An [event loading job](https://github.com/yashap/etl-load-events), which reads JSON events from S3, and loads them 10 | into different database tables based on the class of event (organization payments, generic organization events, generic 11 | user events, and unknown events) 12 | * A job that calculates [organization statistics](https://github.com/yashap/etl-organization-stats), including key stats 13 | like how much each organization is paying, how active the users in the org are, etc. These stats could be used by an 14 | Account Manager to monitor the health of an organization. It depends on the output of the event loading job 15 | * This project, an implementation of [Airbnb's Airflow system](http://nerds.airbnb.com/airflow/), which acts as a 16 | communication and orchestration layer. It runs the jobs, making sure the the **event loading** job runs before the 17 | **organization statistics** job, and also handles things like job retries, job concurrency levels, and 18 | monitoring/alerting on failures 19 | 20 | Note that this is meant to be somewhat of a skeleton pipeline - fork it and use the code as a starting point, tweaking 21 | it for your own needs, with your own business logic. 22 | 23 | ## Airflow 24 | Airflow expresses relationships between jobs as a Directed Acyclic Graph. It lets you set dependencies for jobs, so they 25 | only run when their dependencies complete successfully. It also lets you define retry logic for jobs, monitor job 26 | completion/errors, view job runs in a web UI, and more. Full docs [here](https://pythonhosted.org/airflow/). 27 | 28 | ## Configuration/Setup 29 | Change the default production inventory for the playbook (`playbook/inventories/production`) to whichever host you want 30 | to deploy Airflow to. Update `playbook/vars/airflow-dev.yml` and `playbook/vars/airflow-prod.yml` with your choice of 31 | credentials/settings (mysql users, fernet keys, aws credentials that can be used to run Lambda jobs, etc.). 32 | 33 | ## Running the App in Dev 34 | `vagrant up`, then visit **192.168.33.11** in your browser to see the Airflow web interface. 35 | 36 | Airflow consists of 3 Python services: a scheduler, a set of workers, and a web app. The scheduler determines what 37 | tasks airflow should perform when (i.e. what to monitor), the workers actually perform the tasks, and the web server 38 | gives you a web interface where you can view the statuses of all your jobs. 39 | 40 | The logs for these services are located at: 41 | 42 | $AIRFLOW_HOME/airflow-worker.log 43 | $AIRFLOW_HOME/airflow-scheduler.log 44 | $AIRFLOW_HOME/airflow-webserver.log 45 | 46 | And you can start/stop/restart any of them with: 47 | 48 | $ sudo service airflow-worker {start|stop|restart} 49 | $ sudo service airflow-scheduler {start|stop|restart} 50 | $ sudo service airflow-webserver {start|stop|restart} 51 | 52 | You can also start/stop services with the `dev-runner.sh` script, run `./runner.sh -h` for usage. 53 | 54 | The DAG definitions can be found in the `workflows` dir. 55 | 56 | ## Running the App in Prod 57 | Run the playbook against the prod inventory: 58 | 59 | $ ansible-playbook main.yml -i inventories/production 60 | 61 | Deploy the airflow dir via your favourite means to `$AIRFLOW_HOME` on the prod server. For a quick MVP, if you don't 62 | want to use a more formal build/deploy tool, you can just tar and scp the dir up to the server. Restart Airflow 63 | services, and the jobs should run. 64 | -------------------------------------------------------------------------------- /Vagrantfile: -------------------------------------------------------------------------------- 1 | # -*- mode: ruby -*- 2 | # vi: set ft=ruby : 3 | 4 | Vagrant.configure(2) do |config| 5 | BASEDIR = "#{`git rev-parse --show-toplevel`.strip}/playbook" 6 | EMAIL = `git config user.email`.strip 7 | 8 | config.vm.box = "ubuntu/trusty64" 9 | config.vm.network :private_network, ip: "192.168.33.11" 10 | config.vm.hostname = "airflow" 11 | 12 | config.ssh.username = "vagrant" 13 | config.ssh.password = "vagrant" 14 | 15 | config.vm.provider :virtualbox do |vb| 16 | vb.customize ["modifyvm", :id, "--memory", 6192] 17 | vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] 18 | end 19 | 20 | config.vm.synced_folder "workflows", "/home/vagrant/airflow/workflows", create: true 21 | 22 | config.vm.provision :ansible do |ansible| 23 | ansible.playbook = "playbook/main.yml" 24 | ansible.inventory_path = "playbook/inventories/vagrant" 25 | ansible.extra_vars = { base_dir: BASEDIR, airflow_email_to: EMAIL } 26 | end 27 | end 28 | -------------------------------------------------------------------------------- /dev-runner.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # Settings 4 | VAGRANT_DIR="." 5 | 6 | # Definitions 7 | stop_airflow() { 8 | (cd "$VAGRANT_DIR" && vagrant ssh -c " 9 | sudo service airflow-webserver stop; 10 | sudo service airflow-scheduler stop; 11 | sudo service airflow-worker stop; 12 | ") 13 | } 14 | 15 | start_airflow() { 16 | (cd "$VAGRANT_DIR" && vagrant ssh -c " 17 | sudo service airflow-worker start; 18 | sudo service airflow-scheduler start; 19 | sudo service airflow-webserver start; 20 | ") 21 | } 22 | 23 | reset_db() { 24 | (cd "$VAGRANT_DIR" && yes | vagrant ssh -c "airflow resetdb") 25 | } 26 | 27 | vagrant_up() { 28 | (cd "$VAGRANT_DIR" && vagrant up) 29 | } 30 | 31 | vagrant_provision() { 32 | (cd "$VAGRANT_DIR" && vagrant provision) 33 | } 34 | 35 | vagrant_ssh() { 36 | (cd "$VAGRANT_DIR" && vagrant ssh) 37 | } 38 | 39 | main() { 40 | vagrant_up 41 | 42 | if [ "$CLEAN" ]; then 43 | stop_airflow 44 | reset_db 45 | start_airflow 46 | elif [ "$START" ]; then 47 | start_airflow 48 | elif [ "$STOP" ]; then 49 | stop_airflow 50 | elif [ "$RESTART" ]; then 51 | stop_airflow 52 | start_airflow 53 | fi 54 | 55 | if [ "$SSH" ]; then 56 | vagrant_ssh 57 | fi 58 | } 59 | 60 | # Parse command line args 61 | while getopts ":csprh" opt; do 62 | case "$opt" in 63 | c) CLEAN=true;; 64 | s) START=true;; 65 | p) STOP=true;; 66 | r) RESTART=true;; 67 | h) 68 | echo "Usage: 69 | -c stop airflow, clear airflow's db, start airflow 70 | -s start all airflow service 71 | -p stop all airflow service 72 | -r restart all airflow services" 73 | exit 0 74 | ;; 75 | \?) 76 | echo "Invalid option: -$OPTARG" >&2 77 | exit 1 78 | ;; 79 | esac 80 | done 81 | 82 | # Main run 83 | main 84 | -------------------------------------------------------------------------------- /playbook/ansible.cfg: -------------------------------------------------------------------------------- 1 | # config file for ansible -- http://ansible.com/ 2 | # ============================================== 3 | 4 | [ssh_connection] 5 | control_path = %(directory)s/%%h-%%r -------------------------------------------------------------------------------- /playbook/inventories/production: -------------------------------------------------------------------------------- 1 | [default] 2 | some.prod.host.com 3 | 4 | [all:vars] 5 | env=prod 6 | os_user=ubuntu 7 | airflow_email_to=some_email@some_domain.com -------------------------------------------------------------------------------- /playbook/inventories/vagrant: -------------------------------------------------------------------------------- 1 | [default] 2 | 192.168.33.11 3 | 4 | [all:vars] 5 | env=dev 6 | os_user=vagrant -------------------------------------------------------------------------------- /playbook/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: airflow 3 | user: "{{ os_user }}" 4 | hosts: all 5 | sudo: true 6 | sudo_user: root 7 | 8 | vars: 9 | project: airflow 10 | os_user_home: "/home/{{ os_user }}" 11 | 12 | vars_files: 13 | - "vars/{{ project }}-{{ env }}.yml" 14 | 15 | roles: 16 | - { role: mysql } 17 | - { role: airflow } 18 | - { role: nginx } 19 | - { role: nginx-proxy, proxy_from: "{{ inventory_hostname }}", proxy_to: "{{ airflow_web_server_url }}" } 20 | 21 | environment: 22 | AIRFLOW_HOME: "{{ service_home }}" 23 | -------------------------------------------------------------------------------- /playbook/roles/airflow/defaults/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | service_home: "{{ os_user_home }}/airflow" 3 | airflow_version: "1.7.1.2" 4 | aws_config_home: "{{ os_user_home }}/.aws" 5 | 6 | db_name: "airflow" 7 | db_port: 3306 8 | db_host: "{{ airflow_db_host }}" 9 | 10 | services: 11 | - { service_name: "airflow-worker", start_command: "airflow worker", dependencies: "mysql" } 12 | - { service_name: "airflow-scheduler", start_command: "airflow scheduler", dependencies: "mysql airflow-worker" } 13 | - { service_name: "airflow-webserver", start_command: "airflow webserver", dependencies: "mysql" } 14 | -------------------------------------------------------------------------------- /playbook/roles/airflow/files/aws-config: -------------------------------------------------------------------------------- 1 | [default] 2 | output = json 3 | region = us-east-1 -------------------------------------------------------------------------------- /playbook/roles/airflow/files/etl-load-events.key: -------------------------------------------------------------------------------- 1 | -----BEGIN RSA PRIVATE KEY----- 2 | MIIEpQIBAAKCAQEAqFSRUSuT/rLdgRqiXXU4O6dMt29otL814nD0eCeawFfECVUO 3 | 5TpjwE4kSLPN4JkZ7lFy8pCuaeVimTkiofya6OUgBAgrOCagCTUE+zC9V00GW3Nr 4 | Bnjyzk8e6vJ4DR7sXdceoP2M9/eiQxfmIkOTiZqwaMc1rvbp6x/sgvHvBkzv8yDb 5 | T9f9F/aUTdDlnQAZdU0alageeRhKYMGRehIHy7Kw4QMqb3DYkoB/jU8dxFO8DBho 6 | qfxf69y1ka2TBNYSET09F0W8wu3IlN5S1X3PM+evwFelVCdeJfBv/EMyoq8z+SOj 7 | pLr8bVXqrTFHnzu+9eZIBwgd/JvW3L3DHkoO1wIDAQABAoIBABgFOVdj6JKH46Pw 8 | sQq1F3krvn7OnxprzrypcblTrXmVDJxoTt/XHTTr8bGONuq97j1b0hNbIghqe09y 9 | H8cNzO0/BPqOT9yLZnrp3fQaWGqEy0txhOw/hiZ6k+bjAs6cgw5BDFXc2Kjp5XW9 10 | i1GIYx1XvaS2CKsXAPpUa+OjsoArbym9gvpAFRF2rFnJr3hbgyYnzuxJKGtbjSqc 11 | eDpABc/KWXNigJwG1xa61wdUasGScMFSLtDBiyt28YZYn/fMCR1UHf/ENZtz34Py 12 | Tsisl1K1H0t653kx4LbUi1UG3x2EqT0QLpwunz3gCaaqZ5FILvo9KZ643PNdujWC 13 | dVdB2EECgYEA1x3saqFPXSe4EOvcRIw63hwSm0CsWji1RLniroDmUjfFh6MuFr9I 14 | 30sGHpRIwKfLLSDUKCexDIHJg27VEqAo3dmDsHNF5u+XPNx3yHQVGOvjHhSpSFsn 15 | 9WVXUi4Eto0xh54YlwiYxHe6SeFyGv9z6Kz1O7tAKhqiwjLBzq9e+TsCgYEAyFJW 16 | 6VojeEJDZA3qvXp2FIqjsV5l2San/WFn5aLhsFPB/GkVkd3ws+z7bPU1K6y0uw1Z 17 | RXcdqxetOogt9DpNQtpKpif9f41whkAf82oThldxyl9phO+LSrsyyeOlE/F5G8ZP 18 | RhW/zE/aDatxszPTN6tfwGs/a8YWfejs/EUdBxUCgYEAipXHkmZ7x6roBVa+IBcr 19 | cZ4qSoTexH0WIsGSjROTzlIJ2rlA3vy6yMf/mEG1oRA4b8lfhMMVZ9ZRaXBEquwt 20 | h8cy9ME+vmmKSHZMYQKP+O828VEkQe4gDxMLr3DgLm9GAnXSp7KtIJ11kVVBeq0q 21 | EjMjBik7TCS+yFeEzk/i4rMCgYEApqTQxkkFDsrY8wgcyklwp3/50th7k5zzzPZW 22 | DxNj+mKDEq58kh72WUeAlVCbTdzbcGwXYpFH7gfBRNr6l5xUn5Om2/iSiqSoAhag 23 | Pcd1vKFL+RVMW5lG4AFAq/CjaCbOIAvl8KCxMI8RD6Qa7v/i6wG2owTU+pwMI+w8 24 | EVSRZE0CgYEAiUl0KhJ9GG0hm63j7UXJCLubk7hOyVYn03zpd+vAxup4iTljG7aQ 25 | 6aL9GQcI/elTHXoQrv+bQdzrfau0/oSs6JQxeUHe1+ygnt29FvxArrxsJTi7ezfv 26 | +zCq6h3xKgtXAtGFvuhKOoG1GdUZpxKljImwoIwKnJ3VTHK/Prl4UsE= 27 | -----END RSA PRIVATE KEY----- 28 | -------------------------------------------------------------------------------- /playbook/roles/airflow/files/etl-organization-stats.key: -------------------------------------------------------------------------------- 1 | -----BEGIN RSA PRIVATE KEY----- 2 | MIIEpQIBAAKCAQEA2mCsoQ7j8MrqrElJgLRN6V93VsKWWIAeoa81FpnYomALJJlE 3 | wHZHP3r+2a8bvR7DmIE5KpGajW7W4eyLy8Tx2p/cWe3MbUEDnb1V85u2VUnk243O 4 | 3xB8GieyaLdsafEkdl7Yhvl/aVc6r9mf1/fjSN3f7zq6bvS03bWt9a3LTKWKmZAN 5 | uFSzWG9KcteFeRpgzxsQKm/jaBtIC9py2UcLt+vTptGlwH5iDc1ZcMlgvROpF9VH 6 | G6g0EGUucXs1qCLxiHaqWTmMETXxE2bDJH7+THP0J2sHUM8i5hQF2rHJBM1JPszo 7 | QI6ZZVsaKHWja7A47AEt4VnPnK2S7K++l0sFCQIDAQABAoIBACe2WO5ZFN6fKBn4 8 | oeND5r8/2yXt8QVbFzbz88WOaLTunlgjfzs4xzAmH95aV8MGqy86oLi7Dc4WkAE5 9 | 0RpXUFwfoiTAd+KOZifzXIQWlwvfijzbBvnNt6PSAEHGyXJipezYxquVB7SSZlvA 10 | Sa8upyiDIMwSdADlg1amSWJaHpRrWDzODTDJh2KU4DViyKSvLedpgHwG5VENra+W 11 | AAbI940QsM0VEssIZwc6r3BJ7w6p6xUlqd1tVfjW7LeMN9lrfkBPIlgCsryJ6VS5 12 | unrqLVbMBfHFV37IefhMwQcpFsu1X97gcIlcf1M2N1FBzdEbeFk64UPgSOiUZIGf 13 | kuRZKuECgYEA9pMMV9rmPM/qWGB9ypicSeGiI+LBE8OOJIlN4tBeECo/pLtu2fTd 14 | naEQqE1cjncL1rR/gGCCjUer49RuuiidepFo+ZwlaswFid/6m45nRzX7REI3x5+h 15 | Ryy4Y/QN+F8KNb4giNxi+GMjGAxnErP6frp/7evsFxBxPHtD8WKuFW0CgYEA4rmy 16 | BYnFwScfdOFEw8f3T4lYRoGxRiDQWwfGhjkXEl1WTwQBPG9S4QtPNwZEwOegQAro 17 | KbsfDGyZCbC/VrthCp+kgn0hpXmsLKHkt9U65OfJkLXqk2s9kMjk3KEgs4vadKFc 18 | AbMF8CXMuaWiaU2zCFKvzt8DVaKBDPsQXwOjGI0CgYEAjn74mSUl/WrisWRCDf0Y 19 | BRJiU37NuhA/axn2auekFI917EttioQaNuhH6hubK7Hco534OUaM6/zJd4bi3q5u 20 | I9E461ezv/5cDQvlllQ7l0m5Bf+GoNS9rZZIkWsPT8QM8HYJ80353DXeqB0yy/o6 21 | /1XkbKj07XdRGXTbFPrERBkCgYEAsBeUGdMQwd02BFx2QS1NevvskQ5n9lTEHv+i 22 | BFvQ/JV71HEC2MKJ93oGM3Ft8vmzsCoIeWj5S3gJQMqDQcTVMSAe8K5pdJFU1XGE 23 | J/e3/1O7bOat44O2VH6DqoyGzoy/xjgRMsytvwBMyp/HzcvoUn2OSLlTaK6HVuKk 24 | q3cytH0CgYEApDB3FFD8QLl+GzbdvpryxcJVZ7pRm3VRkfn8Q6MrInRFJ8QevFqE 25 | lURiM9BbDUQWqR5XlMAmt2o//ySnZGtOHRWhrSda0NbOyTv1oKA61RZutaNrFkUn 26 | lJ9neL/QGO7W87iJryAWjMAPY8sBd3AgjoswxrSAun0dMBbsjN55Vkc= 27 | -----END RSA PRIVATE KEY----- 28 | -------------------------------------------------------------------------------- /playbook/roles/airflow/files/ssh-config: -------------------------------------------------------------------------------- 1 | Host etl-load-events 2 | HostName 192.168.33.14 3 | User vagrant 4 | UserKnownHostsFile /dev/null 5 | StrictHostKeyChecking no 6 | PasswordAuthentication no 7 | IdentityFile /home/vagrant/.ssh/etl-load-events 8 | IdentitiesOnly yes 9 | LogLevel FATAL 10 | 11 | Host etl-organization-stats 12 | HostName 192.168.33.13 13 | User vagrant 14 | UserKnownHostsFile /dev/null 15 | StrictHostKeyChecking no 16 | PasswordAuthentication no 17 | IdentityFile /home/vagrant/.ssh/etl-organization-stats 18 | IdentitiesOnly yes 19 | LogLevel FATAL 20 | -------------------------------------------------------------------------------- /playbook/roles/airflow/tasks/aws-config.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: "ensure ~/.aws dir exists" 3 | file: 4 | path="{{ aws_config_home }}" 5 | state=directory 6 | mode=0755 7 | owner="{{ os_user }}" 8 | group="{{ os_user }}" 9 | 10 | - name: "ensure aws credentials are copied" 11 | template: 12 | src="aws-credentials" 13 | dest="{{ aws_config_home }}/credentials" 14 | mode=0600 15 | owner="{{ os_user }}" 16 | group="{{ os_user }}" 17 | 18 | - name: "ensure aws config is copied" 19 | copy: 20 | src="aws-config" 21 | dest="{{ aws_config_home }}/config" 22 | mode=0600 23 | owner="{{ os_user }}" 24 | group="{{ os_user }}" 25 | -------------------------------------------------------------------------------- /playbook/roles/airflow/tasks/initd.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: "ensure airflow init scripts are copied" 3 | template: src=initd.template dest="/etc/init.d/{{item.service_name}}" mode=0755 4 | with_items: services 5 | 6 | - name: "register airflow services to be started on server startup" 7 | command: sudo update-rc.d "{{ item.service_name }}" defaults 8 | with_items: services 9 | 10 | - name: "ensure all airflow services are started" 11 | service: name="{{ item.service_name }}" state=started 12 | with_items: services 13 | when: env == 'dev' 14 | -------------------------------------------------------------------------------- /playbook/roles/airflow/tasks/install.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: "ensure AIRFLOW_HOME env var is set for os user" 3 | lineinfile: 4 | dest="{{ os_user_home }}/.profile" 5 | line="export AIRFLOW_HOME={{ service_home }}" 6 | 7 | - name: "ensure required apt packages are installed and upgraded" 8 | apt: name={{ item }} update_cache=yes state=latest 9 | with_items: 10 | - python2.7 11 | - python2.7-dev 12 | - python-setuptools 13 | - python-pip 14 | - libkrb5-dev 15 | - libsasl2-dev 16 | - libmysqlclient-dev 17 | - libpq-dev 18 | - libffi-dev 19 | 20 | - name: "ensure airflow is installed with python dependencies" 21 | pip: name={{ item }} 22 | with_items: 23 | - "boto3==1.3.1" 24 | - "awscli==1.10.32" 25 | - "amqp==1.4.9" 26 | - "anyjson==0.3.3" 27 | - "celery==3.1.23" 28 | - "airflow=={{ airflow_version }}" 29 | - "airflow[mysql]=={{ airflow_version }}" 30 | - "airflow[crypto]=={{ airflow_version }}" 31 | 32 | - name: "ensure airflow home dir exists" 33 | file: 34 | path="{{ service_home }}" 35 | state=directory 36 | mode=0755 37 | owner="{{ os_user }}" 38 | group="{{ os_user }}" 39 | 40 | - name: "ensure airflow config is copied" 41 | template: 42 | src=airflow.cfg 43 | dest="{{ service_home }}/airflow.cfg" 44 | mode=0644 45 | owner="{{ os_user }}" 46 | group="{{ os_user }}" 47 | 48 | - name: "initialize airflow database" 49 | command: airflow initdb 50 | -------------------------------------------------------------------------------- /playbook/roles/airflow/tasks/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - include: "sys-time.yml" 3 | - include: "aws-config.yml" 4 | - include: "install.yml" 5 | - include: "initd.yml" 6 | -------------------------------------------------------------------------------- /playbook/roles/airflow/tasks/sys-time.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: "remove existing localtime" 3 | file: 4 | path="/etc/localtime" 5 | state=absent 6 | 7 | - name: "set localtime" 8 | file: 9 | src="/usr/share/zoneinfo/{{ airflow_timezone }}" 10 | dest="/etc/localtime" 11 | owner="{{ os_user }}" 12 | group="{{ os_user }}" 13 | state=link 14 | -------------------------------------------------------------------------------- /playbook/roles/airflow/templates/airflow.cfg: -------------------------------------------------------------------------------- 1 | [core] 2 | # The home folder for airflow, default is ~/airflow 3 | airflow_home = {{ service_home }} 4 | 5 | # The folder where you airflow pipelines live, most likely a 6 | # subfolder in a code repository 7 | dags_folder = {{ service_home }}/workflows 8 | 9 | # The folder where airflow should store its log files 10 | base_log_folder = {{ service_home }}/logs 11 | 12 | # The executor class that airflow should use. Choices include 13 | # SequentialExecutor, LocalExecutor, CeleryExecutor 14 | executor = CeleryExecutor 15 | 16 | # The SQLAlchemy connection string to the metadata database. 17 | # SQLAlchemy supports many different database engine, more information 18 | # their website 19 | sql_alchemy_conn = mysql://{{ mysql_airflow_user }}:{{ mysql_airflow_password }}@{{ db_host }}:{{ db_port }}/{{ db_name }} 20 | 21 | # The amount of parallelism as a setting to the executor. This defines 22 | # the max number of task instances that should run simultaneously 23 | # on this airflow installation 24 | parallelism = 32 25 | 26 | # Whether to load the examples that ship with Airflow. It's good to 27 | # get started, but you probably want to set this to False in a production 28 | # environment 29 | load_examples = False 30 | 31 | # Where your Airflow plugins are stored 32 | plugins_folder = {{ service_home }}/plugins 33 | 34 | # Secret key to save connection passwords in the db 35 | fernet_key = {{ airflow_fernet_key }} 36 | 37 | [webserver] 38 | # The base url of your website as airflow cannot guess what domain or 39 | # cname you are using. This is use in automated emails that 40 | # airflow sends to point links to the right web server 41 | base_url = {{ airflow_web_server_url }} 42 | 43 | # The ip specified when starting the web server 44 | web_server_host = 0.0.0.0 45 | 46 | # The port on which to run the web server 47 | web_server_port = {{ airflow_web_server_port }} 48 | 49 | # Secret key used to run your flask app 50 | secret_key = temporary_key 51 | 52 | 53 | [smtp] 54 | # If you want airflow to send emails on retries, failure, and you want to 55 | # the airflow.utils.send_email function, you have to configure an smtp 56 | # server here 57 | smtp_host = localhost 58 | smtp_port = 25 59 | smtp_user = 60 | smtp_password = 61 | smtp_starttls = False 62 | smtp_mail_from = {{ airflow_mail_from }} 63 | 64 | [celery] 65 | # This section only applies if you are using the CeleryExecutor in 66 | # [core] section above 67 | 68 | # The app name that will be used by celery 69 | celery_app_name = airflow.executors.celery_executor 70 | 71 | # The concurrency that will be used when starting workers with the 72 | # "airflow worker" command. This defines the number of task instances that 73 | # a worker will take, so size up your workers based on the resources on 74 | # your worker box and the nature of your tasks 75 | celeryd_concurrency = 16 76 | 77 | # When you start an airflow worker, airflow starts a tiny web server 78 | # subprocess to serve the workers local log files to the airflow main 79 | # web server, who then builds pages and sends them to users. This defines 80 | # the port on which the logs are served. It needs to be unused, and open 81 | # visible from the main web server to connect into the workers. 82 | worker_log_server_port = 8793 83 | 84 | # The Celery broker URL. Celery supports RabbitMQ, Redis and experimentally 85 | # a sqlalchemy database. Refer to the Celery documentation for more 86 | # information. 87 | broker_url = sqla+mysql://{{ mysql_airflow_user }}:{{ mysql_airflow_password }}@{{ db_host }}:{{ db_port }}/{{ db_name }} 88 | 89 | # Another key Celery setting 90 | celery_result_backend = db+mysql://{{ mysql_airflow_user }}:{{ mysql_airflow_password }}@{{ db_host }}:{{ db_port }}/{{ db_name }} 91 | 92 | # Celery Flower is a sweet UI for Celery. Airflow has a shortcut to start 93 | # it `airflow flower`. This defines the port that Celery Flower runs on 94 | flower_port = 8383 95 | 96 | # Default queue that tasks get assigned to and that worker listen on. 97 | default_queue = default 98 | 99 | [scheduler] 100 | # Task instances listen for external kill signal (when you clear tasks 101 | # from the CLI or the UI), this defines the frequency at which they should 102 | # listen (in seconds). 103 | job_heartbeat_sec = 5 104 | 105 | # The scheduler constantly tries to trigger new tasks (look at the 106 | # scheduler section in the docs for more information). This defines 107 | # how often the scheduler should run (in seconds). 108 | scheduler_heartbeat_sec = 5 109 | 110 | # Statsd (https://github.com/etsy/statsd) integration settings 111 | # statsd_on = False 112 | # statsd_host = localhost 113 | # statsd_port = 8125 114 | # statsd_prefix = airflow 115 | 116 | [app_specific] 117 | # CSV list (no quotes) of email addresses to notify on task failure 118 | email_to = {{ airflow_email_to }} 119 | -------------------------------------------------------------------------------- /playbook/roles/airflow/templates/aws-credentials: -------------------------------------------------------------------------------- 1 | [default] 2 | aws_access_key_id = {{ aws_access_key_id }} 3 | aws_secret_access_key = {{ aws_secret_access_key }} -------------------------------------------------------------------------------- /playbook/roles/airflow/templates/initd.template: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ### BEGIN INIT INFO 3 | # Provides: {{ item.service_name }} 4 | # Required-Start: $local_fs $network {{ item.dependencies }} 5 | # Required-Stop: $local_fs $network 6 | # Default-Start: 2 3 4 5 7 | # Default-Stop: 0 1 6 8 | ### END INIT INFO 9 | 10 | APP_NAME="{{ item.service_name }}" 11 | NAME="{{ item.service_name }}" 12 | SCRIPT="/usr/local/bin/{{ item.start_command }}" 13 | RUN_AS="{{ os_user }}" 14 | 15 | PIDFILE="{{ service_home }}/{{ item.service_name }}.pid" 16 | LOGFILE="{{ service_home }}/{{ item.service_name }}.log" 17 | 18 | AIRFLOW_HOME="{{ service_home }}" 19 | 20 | running() { 21 | if [ -s "$PIDFILE" ] && [ ! -z $(pgrep --pidfile "$PIDFILE") ]; then 22 | return 0 23 | fi 24 | return 1 25 | } 26 | 27 | 28 | echo_error() { 29 | echo "$1" 30 | } 31 | 32 | 33 | status() { 34 | if running; then 35 | echo "Service $APP_NAME running" 36 | else 37 | echo "Service $APP_NAME not running" 38 | fi 39 | } 40 | 41 | 42 | start() { 43 | if running; then 44 | echo_error "Service $APP_NAME already running" 45 | return 0 46 | fi 47 | 48 | echo "Starting service $APP_NAME..." 49 | local CMD="$SCRIPT > \"$LOGFILE\" 2>&1 & echo \$! > $PIDFILE" 50 | sudo su -c "$CMD" "$RUN_AS" 51 | 52 | if running; then 53 | echo "Service $APP_NAME started" 54 | else 55 | echo "Failed to start service $APP_NAME" 56 | return 1 57 | fi 58 | } 59 | 60 | 61 | stop() { 62 | if ! running; then 63 | echo_error 'Service not running' 64 | return 0 65 | fi 66 | 67 | echo "Stopping service $APP_NAME..." 68 | sudo kill -15 $(cat "$PIDFILE") && rm -f "$PIDFILE" 69 | echo "Service $APP_NAME stopped" 70 | } 71 | 72 | check() { 73 | if running; then 74 | echo 'up' 75 | return 0 76 | else 77 | echo 'down' 78 | return 1 79 | fi 80 | } 81 | 82 | case "$1" in 83 | status) 84 | status ;; 85 | start) 86 | start ;; 87 | stop) 88 | stop ;; 89 | restart) 90 | stop 91 | start ;; 92 | check) 93 | check ;; 94 | *) 95 | echo "Usage: $0 {status|start|stop|restart|check}" 96 | esac -------------------------------------------------------------------------------- /playbook/roles/airflow/templates/insert_connection.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | password_hash=$(python -c "from cryptography.fernet import Fernet; \ 4 | print Fernet('{{ airflow_fernet_key }}').encrypt('{{ item.password }}')") 5 | 6 | mysql --user={{ mysql_airflow_user }} \ 7 | --password={{ mysql_airflow_password }} \ 8 | --database={{ db_name }} \ 9 | --execute="insert into connection select \ 10 | null, \ 11 | '{{ item.conn_id }}', \ 12 | '{{ item.conn_type }}', \ 13 | '{{ item.host }}', \ 14 | '{{ item.schema }}', \ 15 | '{{ item.login }}', \ 16 | '$password_hash', \ 17 | '{{ item.port }}', \ 18 | null, \ 19 | 1, \ 20 | 0 \ 21 | from dual where not exists (select conn_id from connection where conn_id = '{{ item.conn_id }}');" 22 | -------------------------------------------------------------------------------- /playbook/roles/airflow/templates/my.cnf: -------------------------------------------------------------------------------- 1 | [client] 2 | user={{ mysql_airflow_user }} 3 | password={{ mysql_airflow_password }} 4 | database={{ db_name }} 5 | host={{ db_host }} 6 | port={{ db_port }} 7 | -------------------------------------------------------------------------------- /playbook/roles/mysql/.travis.yml: -------------------------------------------------------------------------------- 1 | --- 2 | sudo: required 3 | 4 | env: 5 | - distribution: centos 6 | version: 6 7 | init: /sbin/init 8 | run_opts: "" 9 | playbook: test.yml 10 | - distribution: centos 11 | version: 7 12 | init: /usr/lib/systemd/systemd 13 | run_opts: "--privileged --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro" 14 | playbook: centos-7-test.yml 15 | - distribution: ubuntu 16 | version: 14.04 17 | init: /sbin/init 18 | run_opts: "" 19 | playbook: test.yml 20 | # - distribution: ubuntu 21 | # version: 12.04 22 | # init: /sbin/init 23 | # run_opts: "" 24 | # playbook: test.yml 25 | 26 | services: 27 | - docker 28 | 29 | before_install: 30 | # Pull container 31 | - 'sudo docker pull ${distribution}:${version}' 32 | # Customize container 33 | - 'sudo docker build --rm=true --file=tests/Dockerfile.${distribution}-${version} --tag=${distribution}-${version}:ansible tests' 34 | 35 | script: 36 | - container_id=$(mktemp) 37 | # Run container in detached state 38 | - 'sudo docker run --detach --volume="${PWD}":/etc/ansible/roles/role_under_test:ro ${run_opts} ${distribution}-${version}:ansible "${init}" > "${container_id}"' 39 | 40 | # Ansible syntax check. 41 | - 'sudo docker exec --tty "$(cat ${container_id})" env TERM=xterm ansible-playbook /etc/ansible/roles/role_under_test/tests/${playbook} --syntax-check' 42 | 43 | # Test role. 44 | - 'sudo docker exec --tty "$(cat ${container_id})" env TERM=xterm ansible-playbook /etc/ansible/roles/role_under_test/tests/${playbook}' 45 | 46 | # Test role idempotence. 47 | - > 48 | sudo docker exec "$(cat ${container_id})" ansible-playbook /etc/ansible/roles/role_under_test/tests/${playbook} 49 | | grep -q 'changed=0.*failed=0' 50 | && (echo 'Idempotence test: pass' && exit 0) 51 | || (echo 'Idempotence test: fail' && exit 1) 52 | 53 | # Some MySQL debugging (show all the logs). 54 | - sudo docker exec --tty "$(cat ${container_id})" env TERM=xterm ls -lah /var/log 55 | - sudo docker exec --tty "$(cat ${container_id})" env TERM=xterm cat /var/log/mysql/error.log || true 56 | - sudo docker exec --tty "$(cat ${container_id})" env TERM=xterm cat /var/log/mysql.err || true 57 | 58 | # Check to make sure we can connect to MySQL via Unix socket. 59 | - > 60 | sudo docker exec "$(cat ${container_id})" mysql -u root -proot -e 'show databases;' 61 | | grep -q 'information_schema' 62 | && (echo 'MySQL running normally' && exit 0) 63 | || (echo 'MySQL not running' && exit 1) 64 | 65 | # Check to make sure we can connect to MySQL via TCP. 66 | - > 67 | sudo docker exec "$(cat ${container_id})" mysql -u root -proot -h 127.0.0.1 -e 'show databases;' 68 | | grep -q 'information_schema' 69 | && (echo 'MySQL running normally' && exit 0) 70 | || (echo 'MySQL not running' && exit 1) 71 | 72 | # Clean up 73 | - sudo docker stop "$(cat ${container_id})" 74 | 75 | notifications: 76 | webhooks: https://galaxy.ansible.com/api/v1/notifications/ 77 | -------------------------------------------------------------------------------- /playbook/roles/mysql/README.md: -------------------------------------------------------------------------------- 1 | # Ansible Role: MySQL 2 | 3 | [![Build Status](https://travis-ci.org/geerlingguy/ansible-role-mysql.svg?branch=master)](https://travis-ci.org/geerlingguy/ansible-role-mysql) 4 | 5 | Installs and configures MySQL or MariaDB server on RHEL/CentOS or Debian/Ubuntu servers. 6 | 7 | ## Requirements 8 | 9 | None. 10 | 11 | ## Role Variables 12 | 13 | Available variables are listed below, along with default values (see `defaults/main.yml`): 14 | 15 | mysql_user_home: /root 16 | 17 | The home directory inside which Python MySQL settings will be stored, which Ansible will use when connecting to MySQL. This should be the home directory of the user which runs this Ansible role. 18 | 19 | mysql_root_password: root 20 | 21 | The MySQL root user account password. 22 | 23 | mysql_root_password_update: no 24 | 25 | Whether to force update the MySQL root user's password. By default, this role will only change the root user's password when MySQL is first configured. You can force an update by setting this to `yes`. 26 | 27 | mysql_enabled_on_startup: yes 28 | 29 | Whether MySQL should be enabled on startup. 30 | 31 | overwrite_global_mycnf: yes 32 | 33 | Whether the global my.cnf should be overwritten each time this role is run. Setting this to `no` tells Ansible to only create the `my.cnf` file if it doesn't exist. This should be left at its default value (`yes`) if you'd like to use this role's variables to configure MySQL. 34 | 35 | mysql_config_include_files: [] 36 | 37 | A list of files that should override the default global my.cnf. Each item in the array requires a "src" parameter which is a path to a file. An optional "force" parameter can force the file to be updated each time ansible runs. 38 | 39 | mysql_databases: [] 40 | 41 | The MySQL databases to create. A database has the values `name`, `encoding` (defaults to `utf8`), `collation` (defaults to `utf8_general_ci`) and `replicate` (defaults to `1`, only used if replication is configured). The formats of these are the same as in the `mysql_db` module. 42 | 43 | mysql_users: [] 44 | 45 | The MySQL users and their privileges. A user has the values `name`, `host` (defaults to `localhost`), `password`, `priv` (defaults to `*.*:USAGE`), and `append_privs` (defaults to `no`). The formats of these are the same as in the `mysql_user` module. 46 | 47 | mysql_packages: 48 | - mysql 49 | - mysql-server 50 | 51 | (OS-specific, RedHat/CentOS defaults listed here) Packages to be installed. In some situations, you may need to add additional packages, like `mysql-devel`. 52 | 53 | mysql_enablerepo: "" 54 | 55 | (RedHat/CentOS only) If you have enabled any additional repositories (might I suggest geerlingguy.repo-epel or geerlingguy.repo-remi), those repositories can be listed under this variable (e.g. `remi,epel`). This can be handy, as an example, if you want to install later versions of MySQL. 56 | 57 | mysql_port: "3306" 58 | mysql_bind_address: '0.0.0.0' 59 | mysql_datadir: /var/lib/mysql 60 | 61 | Default MySQL connection configuration. 62 | 63 | mysql_log: "" 64 | mysql_log_error: /var/log/mysqld.log 65 | mysql_syslog_tag: mysqld 66 | 67 | MySQL logging configuration. Setting `mysql_log` (the general query log) or `mysql_log_error` to `syslog` will make MySQL log to syslog using the `mysql_syslog_tag`. 68 | 69 | mysql_slow_query_log_enabled: no 70 | mysql_slow_query_log_file: /var/log/mysql-slow.log 71 | mysql_slow_query_time: 2 72 | 73 | Slow query log settings. Note that the log file will be created by this role, but if you're running on a server with SELinux or AppArmor, you may need to add this path to the allowed paths for MySQL, or disable the mysql profile. For example, on Debian/Ubuntu, you can run `sudo ln -s /etc/apparmor.d/usr.sbin.mysqld /etc/apparmor.d/disable/usr.sbin.mysqld && sudo service apparmor restart`. 74 | 75 | mysql_key_buffer_size: "256M" 76 | mysql_max_allowed_packet: "64M" 77 | mysql_table_open_cache: "256" 78 | [...] 79 | 80 | The rest of the settings in `defaults/main.yml` control MySQL's memory usage. The default values are tuned for a server where MySQL can consume ~512 MB RAM, so you should consider adjusting them to suit your particular server better. 81 | 82 | mysql_server_id: "1" 83 | mysql_max_binlog_size: "100M" 84 | mysql_expire_logs_days: "10" 85 | mysql_replication_role: '' 86 | mysql_replication_master: '' 87 | mysql_replication_user: [] 88 | 89 | Replication settings. Set `mysql_server_id` and `mysql_replication_role` by server (e.g. the master would be ID `1`, with the `mysql_replication_role` of `master`, and the slave would be ID `2`, with the `mysql_replication_role` of `slave`). The `mysql_replication_user` uses the same keys as `mysql_users`, and is created on master servers, and used to replicate on all the slaves. 90 | 91 | ### MariaDB usage 92 | 93 | This role works with either MySQL or a compatible version of MariaDB. On RHEL/CentOS 7+, the mariadb database engine was substituted as the default MySQL replacement package, so you should override the `mysql_packages` variable with the below configuration to make sure MariaDB is installed correctly. 94 | 95 | #### RHEL/CentOS 7 MariaDB configuration 96 | 97 | Set the following variables (at a minimum): 98 | 99 | mysql_packages: 100 | - mariadb 101 | - mariadb-server 102 | - mariadb-libs 103 | - MySQL-python 104 | - perl-DBD-MySQL 105 | mysql_daemon: mariadb 106 | mysql_log_error: /var/log/mariadb/mariadb.log 107 | mysql_syslog_tag: mariadb 108 | mysql_pid_file: /var/run/mariadb/mariadb.pid 109 | 110 | #### Ubuntu 14.04 MariaDB configuration 111 | 112 | Set the following variables (at a minimum): 113 | 114 | mysql_packages: 115 | - mariadb-client 116 | - mariadb-server 117 | - python-mysqldb 118 | 119 | ## Dependencies 120 | 121 | None. 122 | 123 | ## Example Playbook 124 | 125 | - hosts: db-servers 126 | vars_files: 127 | - vars/main.yml 128 | roles: 129 | - { role: geerlingguy.mysql } 130 | 131 | *Inside `vars/main.yml`*: 132 | 133 | mysql_root_password: super-secure-password 134 | mysql_databases: 135 | - name: example_db 136 | encoding: latin1 137 | collation: latin1_general_ci 138 | mysql_users: 139 | - name: example_user 140 | host: "%" 141 | password: similarly-secure-password 142 | priv: "example_db.*:ALL" 143 | 144 | ## License 145 | 146 | MIT / BSD 147 | 148 | ## Author Information 149 | 150 | This role was created in 2014 by [Jeff Geerling](http://jeffgeerling.com/), author of [Ansible for DevOps](http://ansiblefordevops.com/). 151 | -------------------------------------------------------------------------------- /playbook/roles/mysql/defaults/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | mysql_user_home: /root 3 | mysql_root_username: root 4 | mysql_root_password: root 5 | 6 | # Set this to `yes` to forcibly update the root password. 7 | mysql_root_password_update: no 8 | 9 | mysql_enabled_on_startup: yes 10 | 11 | # update my.cnf. each time role is run? yes | no 12 | overwrite_global_mycnf: yes 13 | 14 | # Pass in a comma-separated list of repos to use (e.g. "remi,epel"). Used only 15 | # for RedHat systems (and derivatives). 16 | mysql_enablerepo: "" 17 | 18 | # Define a custom list of packages to install; if none provided, the default 19 | # package list from vars/[OS-family].yml will be used. 20 | # mysql_packages: 21 | # - mysql 22 | # - mysql-server 23 | # - MySQL-python 24 | 25 | # MySQL connection settings. 26 | mysql_port: "3306" 27 | mysql_bind_address: '0.0.0.0' 28 | mysql_datadir: /var/lib/mysql 29 | mysql_pid_file: /var/run/mysqld/mysqld.pid 30 | 31 | # Slow query log settings. 32 | mysql_slow_query_log_enabled: no 33 | mysql_slow_query_log_file: /var/log/mysql-slow.log 34 | mysql_slow_query_time: 2 35 | 36 | # Memory settings (default values optimized ~512MB RAM). 37 | mysql_key_buffer_size: "256M" 38 | mysql_max_allowed_packet: "64M" 39 | mysql_table_open_cache: "256" 40 | mysql_sort_buffer_size: "1M" 41 | mysql_read_buffer_size: "1M" 42 | mysql_read_rnd_buffer_size: "4M" 43 | mysql_myisam_sort_buffer_size: "64M" 44 | mysql_thread_cache_size: "8" 45 | mysql_query_cache_size: "16M" 46 | mysql_max_connections: 151 47 | 48 | # Other settings. 49 | mysql_wait_timeout: 28800 50 | 51 | # Try number of CPU's * 2 for thread_concurrency. 52 | mysql_thread_concurrency: 2 53 | 54 | # InnoDB settings. 55 | # Set .._buffer_pool_size up to 80% of RAM but beware of setting too high. 56 | mysql_innodb_file_per_table: "1" 57 | mysql_innodb_buffer_pool_size: "256M" 58 | mysql_innodb_additional_mem_pool_size: "20M" 59 | # Set .._log_file_size to 25% of buffer pool size. 60 | mysql_innodb_log_file_size: "64M" 61 | mysql_innodb_log_buffer_size: "8M" 62 | mysql_innodb_flush_log_at_trx_commit: "1" 63 | mysql_innodb_lock_wait_timeout: 50 64 | 65 | # mysqldump settings. 66 | mysql_mysqldump_max_allowed_packet: "64M" 67 | 68 | # Logging settings. 69 | mysql_log: "" 70 | mysql_log_error: /var/log/mysql.err 71 | mysql_syslog_tag: mysql 72 | 73 | mysql_config_include_files: [] 74 | # - src: path/relative/to/playbook/file.cnf 75 | # - { src: path/relative/to/playbook/anotherfile.cnf, force: yes } 76 | 77 | # Databases. 78 | mysql_databases: [] 79 | # - name: example 80 | # collation: utf8_general_ci 81 | # encoding: utf8 82 | # replicate: 1 83 | 84 | # Users. 85 | mysql_users: [] 86 | # - name: example 87 | # host: 127.0.0.1 88 | # password: secret 89 | # priv: *.*:USAGE 90 | 91 | # Replication settings (replication is only enabled if master/user have values). 92 | mysql_server_id: "1" 93 | mysql_max_binlog_size: "100M" 94 | mysql_expire_logs_days: "10" 95 | mysql_replication_role: '' 96 | mysql_replication_master: '' 97 | # Same keys as `mysql_users` above. 98 | mysql_replication_user: [] 99 | -------------------------------------------------------------------------------- /playbook/roles/mysql/handlers/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: restart mysql 3 | service: "name={{ mysql_daemon }} state=restarted sleep=5" 4 | -------------------------------------------------------------------------------- /playbook/roles/mysql/meta/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | dependencies: [] 3 | 4 | galaxy_info: 5 | author: geerlingguy 6 | description: MySQL server for RHEL/CentOS and Debian/Ubuntu. 7 | company: "Midwestern Mac, LLC" 8 | license: "license (BSD, MIT)" 9 | min_ansible_version: 1.9 10 | platforms: 11 | - name: EL 12 | versions: 13 | - 6 14 | - 7 15 | - name: Ubuntu 16 | versions: 17 | - all 18 | - name: Debian 19 | versions: 20 | - all 21 | galaxy_tags: 22 | - database 23 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tasks/configure.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: Copy my.cnf global MySQL configuration. 3 | template: 4 | src: my.cnf.j2 5 | dest: "{{ mysql_config_file }}" 6 | owner: root 7 | group: root 8 | mode: 0644 9 | force: "{{ overwrite_global_mycnf }}" 10 | notify: restart mysql 11 | 12 | - name: Verify mysql include directory exists. 13 | file: 14 | path: "{{ mysql_config_include_dir }}" 15 | state: directory 16 | owner: root 17 | group: root 18 | mode: 0755 19 | when: mysql_config_include_files | length 20 | 21 | - name: Copy my.cnf override files into include directory. 22 | template: 23 | src: "{{ item.src }}" 24 | dest: "{{ mysql_config_include_dir }}/{{ item.src | basename }}" 25 | owner: root 26 | group: root 27 | mode: 0644 28 | force: "{{ item.force | default(False) }}" 29 | with_items: "{{ mysql_config_include_files }}" 30 | notify: restart mysql 31 | 32 | - name: Create slow query log file (if configured). 33 | file: "path={{ mysql_slow_query_log_file }} state=touch" 34 | when: mysql_slow_query_log_enabled 35 | 36 | - name: Set ownership on slow query log file (if configured). 37 | file: 38 | path: "{{ mysql_slow_query_log_file }}" 39 | state: file 40 | owner: mysql 41 | group: mysql 42 | mode: 0644 43 | when: mysql_slow_query_log_enabled 44 | 45 | - name: Ensure MySQL is started and enabled on boot. 46 | service: "name={{ mysql_daemon }} state=started enabled={{ mysql_enabled_on_startup }}" 47 | register: mysql_service_configuration 48 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tasks/databases.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: Ensure MySQL databases are present. 3 | mysql_db: 4 | name: "{{ item.name }}" 5 | collation: "{{ item.collation | default('utf8_general_ci') }}" 6 | encoding: "{{ item.encoding | default('utf8') }}" 7 | state: present 8 | with_items: "{{ mysql_databases }}" 9 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tasks/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # Include variables and define needed variables. 3 | - name: Include OS-specific variables. 4 | include_vars: "{{ ansible_os_family }}.yml" 5 | 6 | - name: Define mysql_packages. 7 | set_fact: 8 | mysql_packages: "{{ __mysql_packages | list }}" 9 | when: mysql_packages is not defined 10 | 11 | - name: Define mysql_daemon. 12 | set_fact: 13 | mysql_daemon: "{{ __mysql_daemon }}" 14 | when: mysql_daemon is not defined 15 | 16 | - name: Define mysql_slow_query_log_file. 17 | set_fact: 18 | mysql_slow_query_log_file: "{{ __mysql_slow_query_log_file }}" 19 | when: mysql_slow_query_log_file is not defined 20 | 21 | # Setup/install tasks. 22 | - include: setup-RedHat.yml 23 | when: ansible_os_family == 'RedHat' 24 | 25 | - include: setup-Debian.yml 26 | when: ansible_os_family == 'Debian' 27 | 28 | - name: Check if MySQL packages were installed. 29 | set_fact: 30 | mysql_install_packages: "{{ (rh_mysql_install_packages is defined and rh_mysql_install_packages.changed) or (deb_mysql_install_packages is defined and deb_mysql_install_packages.changed) }}" 31 | 32 | # Configure MySQL. 33 | - include: configure.yml 34 | - include: secure-installation.yml 35 | - include: databases.yml 36 | - include: users.yml 37 | - include: replication.yml 38 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tasks/replication.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: Ensure replication user exists on master. 3 | mysql_user: 4 | name: "{{ mysql_replication_user.name }}" 5 | host: "{{ mysql_replication_user.host | default('%') }}" 6 | password: "{{ mysql_replication_user.password }}" 7 | priv: "{{ mysql_replication_user.priv | default('*.*:REPLICATION SLAVE') }}" 8 | state: present 9 | when: > 10 | (mysql_replication_role == 'master') 11 | and mysql_replication_user 12 | and (mysql_replication_master != '') 13 | 14 | - name: Check slave replication status. 15 | mysql_replication: mode=getslave 16 | ignore_errors: true 17 | register: slave 18 | when: > 19 | mysql_replication_role == 'slave' 20 | and (mysql_replication_master != '') 21 | 22 | - name: Check master replication status. 23 | mysql_replication: mode=getmaster 24 | delegate_to: "{{ mysql_replication_master }}" 25 | register: master 26 | when: > 27 | slave|failed 28 | and (mysql_replication_role == 'slave') 29 | and (mysql_replication_master != '') 30 | 31 | - name: Configure replication on the slave. 32 | mysql_replication: 33 | mode: changemaster 34 | master_host: "{{ mysql_replication_master }}" 35 | master_user: "{{ mysql_replication_user.name }}" 36 | master_password: "{{ mysql_replication_user.password }}" 37 | master_log_file: "{{ master.File }}" 38 | master_log_pos: "{{ master.Position }}" 39 | ignore_errors: True 40 | when: > 41 | slave|failed 42 | and (mysql_replication_role == 'slave') 43 | and (mysql_replication_master != '') 44 | and mysql_replication_user 45 | 46 | - name: Start replication. 47 | mysql_replication: mode=startslave 48 | when: > 49 | slave|failed 50 | and (mysql_replication_role == 'slave') 51 | and (mysql_replication_master != '') 52 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tasks/secure-installation.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: Disallow root login remotely 3 | command: 'mysql -NBe "{{ item }}"' 4 | with_items: 5 | - DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1') 6 | changed_when: False 7 | 8 | - name: Get list of hosts for the root user. 9 | command: mysql -NBe 'SELECT Host FROM mysql.user WHERE User = "root" ORDER BY (Host="localhost") ASC' 10 | register: mysql_root_hosts 11 | changed_when: false 12 | 13 | # Note: We do not use mysql_user for this operation, as it doesn't always update 14 | # the root password correctly. See: https://goo.gl/MSOejW 15 | - name: Update MySQL root password for localhost root account. 16 | shell: > 17 | mysql -u root -NBe 18 | 'SET PASSWORD FOR "{{ mysql_root_username }}"@"{{ item }}" = PASSWORD("{{ mysql_root_password }}");' 19 | with_items: "{{ mysql_root_hosts.stdout_lines }}" 20 | when: mysql_install_packages | bool or mysql_root_password_update 21 | 22 | # Has to be after the root password assignment, for idempotency. 23 | - name: Copy .my.cnf file with root password credentials. 24 | template: 25 | src: "user-my.cnf.j2" 26 | dest: "{{ mysql_user_home }}/.my.cnf" 27 | owner: root 28 | group: root 29 | mode: 0600 30 | 31 | - name: Get list of hosts for the anonymous user. 32 | command: mysql -NBe 'SELECT Host FROM mysql.user WHERE User = ""' 33 | register: mysql_anonymous_hosts 34 | changed_when: false 35 | 36 | - name: Remove anonymous MySQL users. 37 | mysql_user: 38 | name: "" 39 | host: "{{ item }}" 40 | state: absent 41 | with_items: "{{ mysql_anonymous_hosts.stdout_lines }}" 42 | 43 | - name: Remove MySQL test database. 44 | mysql_db: "name='test' state=absent" 45 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tasks/setup-Debian.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: Check if MySQL is already installed. 3 | stat: path=/etc/init.d/mysql 4 | register: mysql_installed 5 | 6 | - name: Update apt cache if MySQL is not yet installed. 7 | apt: update_cache=yes 8 | when: mysql_installed.stat.exists == false 9 | 10 | - name: Ensure MySQL Python libraries are installed. 11 | apt: "name=python-mysqldb state=installed" 12 | 13 | - name: Ensure MySQL packages are installed. 14 | apt: "name={{ item }} state=installed" 15 | with_items: "{{ mysql_packages }}" 16 | register: deb_mysql_install_packages 17 | 18 | # Because Ubuntu starts MySQL as part of the install process, we need to stop 19 | # mysql and remove the logfiles in case the user set a custom log file size. 20 | - name: Ensure MySQL is stopped after initial install. 21 | service: "name={{ mysql_daemon }} state=stopped" 22 | when: mysql_installed.stat.exists == false 23 | 24 | - name: Delete innodb log files created by apt package after initial install. 25 | shell: "rm -f {{ mysql_datadir }}/ib_logfile[01]" 26 | when: mysql_installed.stat.exists == false 27 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tasks/setup-RedHat.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: Ensure MySQL packages are installed. 3 | yum: "name={{ item }} state=installed enablerepo={{ mysql_enablerepo }}" 4 | with_items: "{{ mysql_packages }}" 5 | register: rh_mysql_install_packages 6 | 7 | - name: Ensure MySQL Python libraries are installed. 8 | yum: "name=MySQL-python state=installed enablerepo={{ mysql_enablerepo }}" 9 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tasks/users.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: Ensure MySQL users are present. 3 | mysql_user: 4 | name: "{{ item.name }}" 5 | host: "{{ item.host | default('localhost') }}" 6 | password: "{{ item.password }}" 7 | priv: "{{ item.priv | default('*.*:USAGE') }}" 8 | state: present 9 | append_privs: "{{ item.append_privs | default('no') }}" 10 | with_items: "{{ mysql_users }}" 11 | -------------------------------------------------------------------------------- /playbook/roles/mysql/templates/my.cnf.j2: -------------------------------------------------------------------------------- 1 | [client] 2 | #password = your_password 3 | port = {{ mysql_port }} 4 | socket = {{ mysql_socket }} 5 | 6 | [mysqld] 7 | port = {{ mysql_port }} 8 | bind-address = {{ mysql_bind_address }} 9 | datadir = {{ mysql_datadir }} 10 | socket = {{ mysql_socket }} 11 | pid-file = {{ mysql_pid_file }} 12 | 13 | # Logging configuration. 14 | {% if mysql_log_error == 'syslog' or mysql_log == 'syslog' %} 15 | syslog 16 | syslog-tag = {{ mysql_syslog_tag }} 17 | {% else %} 18 | {% if mysql_log %} 19 | log = {{ mysql_log }} 20 | {% endif %} 21 | log-error = {{ mysql_log_error }} 22 | {% endif %} 23 | 24 | {% if mysql_slow_query_log_enabled %} 25 | # Slow query log configuration. 26 | slow_query_log = 1 27 | slow_query_log_file = {{ mysql_slow_query_log_file }} 28 | long_query_time = {{ mysql_slow_query_time }} 29 | {% endif %} 30 | 31 | {% if mysql_replication_master %} 32 | # Replication 33 | server-id = {{ mysql_server_id }} 34 | 35 | {% if mysql_replication_role == 'master' %} 36 | log_bin = mysql-bin 37 | log-bin-index = mysql-bin.index 38 | expire_logs_days = {{ mysql_expire_logs_days }} 39 | max_binlog_size = {{ mysql_max_binlog_size }} 40 | 41 | {% for db in mysql_databases %} 42 | {% if db.replicate|default(1) %} 43 | binlog_do_db = {{ db.name }} 44 | {% else %} 45 | binlog_ignore_db = {{ db.name }} 46 | {% endif %} 47 | {% endfor %} 48 | {% endif %} 49 | 50 | {% if mysql_replication_role == 'slave' %} 51 | read_only 52 | relay-log = relay-bin 53 | relay-log-index = relay-bin.index 54 | {% endif %} 55 | {% endif %} 56 | 57 | # Disabling symbolic-links is recommended to prevent assorted security risks 58 | symbolic-links = 0 59 | 60 | # User is ignored when systemd is used (fedora >= 15). 61 | user = mysql 62 | 63 | # http://dev.mysql.com/doc/refman/5.5/en/performance-schema.html 64 | ;performance_schema 65 | 66 | # Memory settings. 67 | key_buffer_size = {{ mysql_key_buffer_size }} 68 | max_allowed_packet = {{ mysql_max_allowed_packet }} 69 | table_open_cache = {{ mysql_table_open_cache }} 70 | sort_buffer_size = {{ mysql_sort_buffer_size }} 71 | read_buffer_size = {{ mysql_read_buffer_size }} 72 | read_rnd_buffer_size = {{ mysql_read_rnd_buffer_size }} 73 | myisam_sort_buffer_size = {{ mysql_myisam_sort_buffer_size }} 74 | thread_cache_size = {{ mysql_thread_cache_size }} 75 | query_cache_size = {{ mysql_query_cache_size }} 76 | max_connections = {{ mysql_max_connections }} 77 | 78 | # Other settings. 79 | wait_timeout = {{ mysql_wait_timeout }} 80 | 81 | # Try number of CPU's * 2 for thread_concurrency. 82 | thread_concurrency = {{ mysql_thread_concurrency }} 83 | 84 | # InnoDB settings. 85 | innodb_file_per_table = {{ mysql_innodb_file_per_table }} 86 | innodb_buffer_pool_size = {{ mysql_innodb_buffer_pool_size }} 87 | innodb_additional_mem_pool_size = {{ mysql_innodb_additional_mem_pool_size }} 88 | innodb_log_file_size = {{ mysql_innodb_log_file_size }} 89 | innodb_log_buffer_size = {{ mysql_innodb_log_buffer_size }} 90 | innodb_flush_log_at_trx_commit = {{ mysql_innodb_flush_log_at_trx_commit }} 91 | innodb_lock_wait_timeout = {{ mysql_innodb_lock_wait_timeout }} 92 | 93 | [mysqldump] 94 | quick 95 | max_allowed_packet = {{ mysql_mysqldump_max_allowed_packet }} 96 | 97 | [mysqld_safe] 98 | pid-file = {{ mysql_pid_file }} 99 | 100 | {% if mysql_config_include_files | length %} 101 | # * IMPORTANT: Additional settings that can override those from this file! 102 | # The files must end with '.cnf', otherwise they'll be ignored. 103 | # 104 | !includedir {{ mysql_config_include_dir }} 105 | {% endif %} 106 | 107 | -------------------------------------------------------------------------------- /playbook/roles/mysql/templates/user-my.cnf.j2: -------------------------------------------------------------------------------- 1 | [client] 2 | user={{ mysql_root_username }} 3 | password={{ mysql_root_password }} 4 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tests/Dockerfile.centos-6: -------------------------------------------------------------------------------- 1 | FROM centos:6 2 | 3 | # Install Ansible 4 | RUN yum -y update; yum clean all; 5 | RUN yum -y install epel-release 6 | RUN yum -y install git ansible sudo 7 | RUN yum clean all 8 | 9 | # Disable requiretty 10 | RUN sed -i -e 's/^\(Defaults\s*requiretty\)/#--- \1/' /etc/sudoers 11 | 12 | # Install Ansible inventory file 13 | RUN echo -e '[local]\nlocalhost ansible_connection=local' > /etc/ansible/hosts 14 | 15 | CMD ["/usr/sbin/init"] 16 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tests/Dockerfile.centos-7: -------------------------------------------------------------------------------- 1 | FROM centos:7 2 | 3 | # Install systemd -- See https://hub.docker.com/_/centos/ 4 | RUN yum -y swap -- remove fakesystemd -- install systemd systemd-libs 5 | RUN yum -y update; yum clean all; \ 6 | (cd /lib/systemd/system/sysinit.target.wants/; for i in *; do [ $i == systemd-tmpfiles-setup.service ] || rm -f $i; done); \ 7 | rm -f /lib/systemd/system/multi-user.target.wants/*; \ 8 | rm -f /etc/systemd/system/*.wants/*; \ 9 | rm -f /lib/systemd/system/local-fs.target.wants/*; \ 10 | rm -f /lib/systemd/system/sockets.target.wants/*udev*; \ 11 | rm -f /lib/systemd/system/sockets.target.wants/*initctl*; \ 12 | rm -f /lib/systemd/system/basic.target.wants/*; \ 13 | rm -f /lib/systemd/system/anaconda.target.wants/*; 14 | 15 | # Install Ansible 16 | RUN yum -y install epel-release 17 | RUN yum -y install git ansible sudo 18 | RUN yum clean all 19 | 20 | # Disable requiretty 21 | RUN sed -i -e 's/^\(Defaults\s*requiretty\)/#--- \1/' /etc/sudoers 22 | 23 | # Install Ansible inventory file 24 | RUN echo -e '[local]\nlocalhost ansible_connection=local' > /etc/ansible/hosts 25 | 26 | VOLUME ["/sys/fs/cgroup"] 27 | CMD ["/usr/sbin/init"] 28 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tests/Dockerfile.ubuntu-12.04: -------------------------------------------------------------------------------- 1 | FROM ubuntu:12.04 2 | RUN apt-get update 3 | 4 | # Install Ansible 5 | RUN apt-get install -y software-properties-common python-software-properties git 6 | RUN apt-add-repository -y ppa:ansible/ansible 7 | RUN apt-get update 8 | RUN apt-get install -y ansible 9 | 10 | COPY initctl_faker . 11 | RUN chmod +x initctl_faker && rm -fr /sbin/initctl && ln -s /initctl_faker /sbin/initctl 12 | 13 | # Install Ansible inventory file 14 | RUN echo "[local]\nlocalhost ansible_connection=local" > /etc/ansible/hosts 15 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tests/Dockerfile.ubuntu-14.04: -------------------------------------------------------------------------------- 1 | FROM ubuntu:14.04 2 | RUN apt-get update 3 | 4 | # Install Ansible 5 | RUN apt-get install -y software-properties-common git 6 | RUN apt-add-repository -y ppa:ansible/ansible 7 | RUN apt-get update 8 | RUN apt-get install -y ansible 9 | 10 | COPY initctl_faker . 11 | RUN chmod +x initctl_faker && rm -fr /sbin/initctl && ln -s /initctl_faker /sbin/initctl 12 | 13 | # Install Ansible inventory file 14 | RUN echo "[local]\nlocalhost ansible_connection=local" > /etc/ansible/hosts 15 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tests/centos-7-test.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - hosts: all 3 | vars: 4 | mysql_packages: 5 | - mariadb 6 | - mariadb-server 7 | - mariadb-libs 8 | - MySQL-python 9 | - perl-DBD-MySQL 10 | mysql_daemon: mariadb 11 | mysql_log_error: /var/log/mariadb/mariadb.log 12 | mysql_syslog_tag: mariadb 13 | mysql_pid_file: /var/run/mariadb/mariadb.pid 14 | roles: 15 | - role_under_test 16 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tests/initctl_faker: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | ALIAS_CMD="$(echo ""$0"" | sed -e 's?/sbin/??')" 3 | 4 | case "$ALIAS_CMD" in 5 | start|stop|restart|reload|status) 6 | exec service $1 $ALIAS_CMD 7 | ;; 8 | esac 9 | 10 | case "$1" in 11 | list ) 12 | exec service --status-all 13 | ;; 14 | reload-configuration ) 15 | exec service $2 restart 16 | ;; 17 | start|stop|restart|reload|status) 18 | exec service $2 $1 19 | ;; 20 | \?) 21 | exit 0 22 | ;; 23 | esac 24 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tests/inventory: -------------------------------------------------------------------------------- 1 | localhost 2 | -------------------------------------------------------------------------------- /playbook/roles/mysql/tests/test.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - hosts: all 3 | roles: 4 | - role_under_test 5 | -------------------------------------------------------------------------------- /playbook/roles/mysql/vars/Debian.yml: -------------------------------------------------------------------------------- 1 | --- 2 | __mysql_daemon: mysql 3 | __mysql_packages: 4 | - mysql-common 5 | - mysql-server 6 | __mysql_slow_query_log_file: /var/log/mysql/mysql-slow.log 7 | mysql_config_file: /etc/mysql/my.cnf 8 | mysql_config_include_dir: /etc/mysql/conf.d 9 | mysql_socket: /var/run/mysqld/mysqld.sock 10 | -------------------------------------------------------------------------------- /playbook/roles/mysql/vars/RedHat.yml: -------------------------------------------------------------------------------- 1 | --- 2 | __mysql_daemon: mysqld 3 | __mysql_packages: 4 | - mysql 5 | - mysql-server 6 | __mysql_slow_query_log_file: /var/log/mysql-slow.log 7 | mysql_config_file: /etc/my.cnf 8 | mysql_config_include_dir: /etc/my.cnf.d 9 | mysql_socket: /var/lib/mysql/mysql.sock 10 | -------------------------------------------------------------------------------- /playbook/roles/nginx-proxy/tasks/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: "install nginx proxy config" 3 | template: 4 | src=nginx-default-conf 5 | dest=/etc/nginx/sites-available/default 6 | mode=0644 7 | owner=root 8 | group=root 9 | force=yes 10 | 11 | - name: "restart nginx" 12 | service: name=nginx state=restarted 13 | -------------------------------------------------------------------------------- /playbook/roles/nginx-proxy/templates/nginx-default-conf: -------------------------------------------------------------------------------- 1 | server { 2 | listen 80; 3 | server_name {{ proxy_from }}; 4 | 5 | location / { 6 | proxy_buffering off; 7 | proxy_set_header X-Real-IP $remote_addr; 8 | proxy_set_header X-Forwarded-Proto $scheme; 9 | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 10 | proxy_set_header Host $http_host; 11 | proxy_http_version 1.1; 12 | proxy_pass {{ proxy_to }}; 13 | } 14 | } 15 | -------------------------------------------------------------------------------- /playbook/roles/nginx/tasks/main.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - name: "apt-get update" 3 | action: apt update-cache=yes 4 | 5 | - name: "ensure python-software package required for apt_repository" 6 | action: apt pkg=python-software-properties state=latest 7 | 8 | - name: "add nginx apt repository" 9 | action: apt_repository repo='ppa:nginx/stable' 10 | 11 | - name: "apt-get update" 12 | action: apt update-cache=yes 13 | 14 | - name: "install nginx" 15 | action: apt pkg=nginx state=latest 16 | 17 | - name: "install nginx configuration" 18 | template: src=nginx.conf dest=/etc/nginx/nginx.conf 19 | 20 | - name: "start nginx" 21 | service: name=nginx state=started 22 | 23 | - name: "create the backends-available dir" 24 | file: path=/etc/nginx/backends-available state=directory 25 | 26 | - name: "create the backends-enabled dir" 27 | file: path=/etc/nginx/backends-enabled state=directory 28 | 29 | - name: "register nginx to be started on server startup" 30 | command: sudo update-rc.d nginx defaults -------------------------------------------------------------------------------- /playbook/roles/nginx/templates/nginx.conf: -------------------------------------------------------------------------------- 1 | user www-data; 2 | worker_processes {{ nginx_worker_processes|default("4") }}; 3 | pid /run/nginx.pid; 4 | 5 | events { 6 | worker_connections {{ nginx_worker_connections|default("768") }}; 7 | # multi_accept on; 8 | } 9 | 10 | http { 11 | 12 | ## 13 | # Basic Settings 14 | ## 15 | 16 | sendfile on; 17 | tcp_nopush on; 18 | tcp_nodelay on; 19 | keepalive_timeout 65; 20 | types_hash_max_size 2048; 21 | # server_tokens off; 22 | 23 | # server_names_hash_bucket_size 64; 24 | # server_name_in_redirect off; 25 | 26 | include /etc/nginx/mime.types; 27 | default_type application/octet-stream; 28 | 29 | ## 30 | # Logging Settings 31 | ## 32 | 33 | access_log {{ nginx_access_log|default("/var/log/nginx/access.log") }}; 34 | error_log {{ nginx_error_log|default("/var/log/nginx/error.log") }}; 35 | 36 | ## 37 | # Gzip Settings 38 | ## 39 | 40 | gzip on; 41 | gzip_disable "msie6"; 42 | 43 | # gzip_vary on; 44 | # gzip_proxied any; 45 | # gzip_comp_level 6; 46 | # gzip_buffers 16 8k; 47 | # gzip_http_version 1.1; 48 | # gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript; 49 | 50 | ## 51 | # nginx-naxsi config 52 | ## 53 | # Uncomment it if you installed nginx-naxsi 54 | ## 55 | 56 | #include /etc/nginx/naxsi_core.rules; 57 | 58 | ## 59 | # nginx-passenger config 60 | ## 61 | # Uncomment it if you installed nginx-passenger 62 | ## 63 | 64 | #passenger_root /usr; 65 | #passenger_ruby /usr/bin/ruby; 66 | 67 | ## 68 | # Virtual Host Configs 69 | ## 70 | 71 | include /etc/nginx/conf.d/*.conf; 72 | include /etc/nginx/sites-enabled/*; 73 | include /etc/nginx/backends-enabled/*; 74 | } 75 | 76 | 77 | #mail { 78 | # # See sample authentication script at: 79 | # # http://wiki.nginx.org/ImapAuthenticateWithApachePhpScript 80 | # 81 | # # auth_http localhost/auth.php; 82 | # # pop3_capabilities "TOP" "USER"; 83 | # # imap_capabilities "IMAP4rev1" "UIDPLUS"; 84 | # 85 | # server { 86 | # listen localhost:110; 87 | # protocol pop3; 88 | # proxy on; 89 | # } 90 | # 91 | # server { 92 | # listen localhost:143; 93 | # protocol imap; 94 | # proxy on; 95 | # } 96 | #} -------------------------------------------------------------------------------- /playbook/vars/airflow-dev.yml: -------------------------------------------------------------------------------- 1 | --- 2 | mysql_airflow_user: "airflow" 3 | mysql_airflow_password: "im-also-so-secret" 4 | 5 | mysql_root_password: im-so-secret 6 | mysql_databases: 7 | - name: airflow 8 | encoding: utf8 9 | collation: utf8_general_ci 10 | mysql_users: 11 | - name: "{{ mysql_airflow_user }}" 12 | host: "%" 13 | password: "{{ mysql_airflow_password }}" 14 | priv: "airflow.*:ALL" 15 | 16 | airflow_timezone: "US/Pacific" 17 | airflow_web_server_port: "8201" 18 | airflow_web_server_url: "http://127.0.0.1:{{ airflow_web_server_port }}" 19 | airflow_fernet_key: "uyxvMw36azr14ICoZ8j-nL1Zcr30tGaNfUWep1D6phY=" 20 | airflow_db_host: "127.0.0.1" 21 | airflow_mail_from: "airflow@somewhere.com" 22 | 23 | aws_access_key_id: "some_id" 24 | aws_secret_access_key: "some_key" -------------------------------------------------------------------------------- /playbook/vars/airflow-prod.yml: -------------------------------------------------------------------------------- 1 | --- 2 | mysql_airflow_user: "airflow" 3 | mysql_airflow_password: "im-also-so-secret" 4 | 5 | mysql_root_password: im-so-secret 6 | mysql_databases: 7 | - name: airflow 8 | encoding: utf8 9 | collation: utf8_general_ci 10 | mysql_users: 11 | - name: "{{ mysql_airflow_user }}" 12 | host: "%" 13 | password: "{{ mysql_airflow_password }}" 14 | priv: "airflow.*:ALL" 15 | 16 | airflow_timezone: "US/Pacific" 17 | airflow_web_server_port: "8201" 18 | airflow_web_server_url: "http://127.0.0.1:{{ airflow_web_server_port }}" 19 | airflow_fernet_key: "uyxvMw36azr14ICoZ8j-nL1Zcr30tGaNfUWep1D6phY=" 20 | airflow_db_host: "127.0.0.1" 21 | airflow_mail_from: "airflow@somewhere.com" 22 | 23 | aws_access_key_id: "some_id" 24 | aws_secret_access_key: "some_key" -------------------------------------------------------------------------------- /workflows/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashap/airflow/b141d6c008bf3677a45ee18b793368a2bfdb23dd/workflows/__init__.py -------------------------------------------------------------------------------- /workflows/dags/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashap/airflow/b141d6c008bf3677a45ee18b793368a2bfdb23dd/workflows/dags/__init__.py -------------------------------------------------------------------------------- /workflows/dags/daily_etls.py: -------------------------------------------------------------------------------- 1 | from settings.job_properties import * 2 | from operators.lambda_operator import LambdaOperator 3 | from operators.s3_sensor import S3DatepartSensor 4 | from airflow import DAG 5 | 6 | from datetime import timedelta 7 | 8 | 9 | # DAG 10 | daily_etls = DAG( 11 | dag_id='daily_etls', 12 | schedule_interval=timedelta(days=1), 13 | concurrency=4, 14 | start_date=DAG_START, 15 | default_args=DEFAULT_ARGS 16 | ) 17 | 18 | 19 | # Nodes 20 | s3_sensor = S3DatepartSensor( 21 | task_id='s3_sensor', 22 | context_to_datepart=context_to_datepart, 23 | s3_bucket=S3_BUCKET, 24 | s3_prefix=S3_PREFIX, 25 | dag=daily_etls 26 | ) 27 | 28 | etl_load_events = LambdaOperator( 29 | task_id='etl_load_events', 30 | context_to_payload=context_to_payload, 31 | lambda_function_name='etl-load-events', 32 | dag=daily_etls 33 | ) 34 | 35 | etl_organization_stats = LambdaOperator( 36 | task_id='etl_organization_stats', 37 | context_to_payload=context_to_payload, 38 | lambda_function_name='etl-organization-stats', 39 | dag=daily_etls 40 | ) 41 | 42 | 43 | # Edges 44 | etl_load_events.set_upstream(s3_sensor) 45 | etl_organization_stats.set_upstream(etl_load_events) 46 | -------------------------------------------------------------------------------- /workflows/operators/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashap/airflow/b141d6c008bf3677a45ee18b793368a2bfdb23dd/workflows/operators/__init__.py -------------------------------------------------------------------------------- /workflows/operators/lambda_operator.py: -------------------------------------------------------------------------------- 1 | from airflow.operators import BaseOperator 2 | from airflow.utils import apply_defaults 3 | from airflow.exceptions import AirflowException 4 | import boto3 5 | 6 | import logging 7 | import json 8 | import base64 9 | 10 | 11 | class LambdaOperator(BaseOperator): 12 | 13 | @apply_defaults 14 | def __init__(self, context_to_payload, lambda_function_name, *args, **kwargs): 15 | super(LambdaOperator, self).__init__(*args, **kwargs) 16 | self.context_to_payload = context_to_payload 17 | self.lambda_function_name = lambda_function_name 18 | self.lambda_client = boto3.client('lambda') 19 | 20 | def execute(self, context): 21 | request_payload = self.context_to_payload(context) 22 | 23 | logging.info('Making the following request against AWS Lambda %s' % request_payload) 24 | 25 | response = self.lambda_client.invoke( 26 | FunctionName=self.lambda_function_name, 27 | InvocationType='RequestResponse', 28 | Payload=bytearray(request_payload), 29 | LogType='Tail' 30 | ) 31 | 32 | response_log_tail = base64.b64decode(response.get('LogResult')) 33 | response_payload = json.loads(response.get('Payload').read()) 34 | response_code = response_payload.get('code') 35 | 36 | log_msg_logs = 'Tail of logs from AWS Lambda:\n{logs}'.format(logs=response_log_tail) 37 | log_msg_payload = 'Response payload from AWS Lambda:\n{resp}'.format(resp=response_payload) 38 | 39 | if response_code == 200: 40 | logging.info(log_msg_logs) 41 | logging.info(log_msg_payload) 42 | return response_code 43 | else: 44 | logging.error(log_msg_logs) 45 | logging.error(log_msg_payload) 46 | raise AirflowException('Lambda invoke failed') 47 | -------------------------------------------------------------------------------- /workflows/operators/s3_sensor.py: -------------------------------------------------------------------------------- 1 | from airflow.operators.sensors import BaseSensorOperator 2 | from airflow.utils import apply_defaults 3 | import boto3 4 | 5 | 6 | class S3DatepartSensor(BaseSensorOperator): 7 | 8 | @apply_defaults 9 | def __init__(self, context_to_datepart, s3_bucket, s3_prefix, *args, **kwargs): 10 | super(S3DatepartSensor, self).__init__(*args, **kwargs) 11 | self.context_to_datepart = context_to_datepart 12 | self.s3_bucket = s3_bucket 13 | self.s3_prefix = s3_prefix 14 | self.s3_client = boto3.client('s3') 15 | 16 | def poke(self, context): 17 | datepart = self.context_to_datepart(context) 18 | prefix_to_check = '{prefix}/{datepart}'.format( 19 | prefix=self.s3_prefix, 20 | datepart=datepart 21 | ) 22 | 23 | s3_objects = self.s3_client.list_objects(Bucket=self.s3_bucket, Prefix=prefix_to_check) 24 | 25 | return True if s3_objects.get('Contents') else False 26 | -------------------------------------------------------------------------------- /workflows/settings/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yashap/airflow/b141d6c008bf3677a45ee18b793368a2bfdb23dd/workflows/settings/__init__.py -------------------------------------------------------------------------------- /workflows/settings/job_properties.py: -------------------------------------------------------------------------------- 1 | from airflow.configuration import conf 2 | 3 | from datetime import timedelta, datetime 4 | import json 5 | 6 | def days_ago(days): 7 | now = datetime.now() - timedelta(days=days) 8 | return datetime(now.year, now.month, now.day) 9 | 10 | def split_and_trim(value): 11 | return [s.strip() for s in value.split(',') if s.strip()] 12 | 13 | DAG_START = datetime(2016, 2, 22) 14 | S3_BUCKET = 'sodp' 15 | S3_PREFIX = 'talk' 16 | 17 | DEFAULT_ARGS = { 18 | 'owner': 'airflow', 19 | 'depends_on_past': True, 20 | 'email': split_and_trim(conf.get('app_specific', 'email_to')), 21 | 'email_on_failure': True, 22 | 'email_on_retry': False, 23 | 'retries': 3, 24 | 'retry_delay': timedelta(seconds=10), 25 | 'poke_interval': 30, 26 | } 27 | 28 | def context_to_datepart(context): 29 | """ 30 | Creates a partial S3 path based on an Airflow job context 31 | 32 | :param context: a dictionary with details about the job run, where 'tomorrow_ds' is a string holding the day after 33 | the job run, in YYYY-MM-DD format 34 | :return: a partial S3 path 35 | """ 36 | return context['tomorrow_ds'].replace('-', '/') 37 | 38 | 39 | def context_to_payload(context): 40 | """ 41 | Creates a payload used to run an AWS Lambda function, based on an Airflow job context 42 | 43 | :param context: a dictionary with details about the job run, where 'ds' is a string holding the day of the job run, 44 | in YYYY-MM-DD format 45 | :return: a payload used to run an AWS Lambda function 46 | """ 47 | return json.dumps({'runDate': context['ds']}) 48 | --------------------------------------------------------------------------------