├── generic_conf ├── setup_logging.conf ├── basic_vts_location.conf ├── basic_vts_setup.conf ├── lua_path_setup.conf ├── define_cache.conf ├── setup_cache.conf └── backend_definition.conf ├── load_test.sh ├── img ├── .DS_Store ├── 2.2.0_wrk.webp ├── add_source.webp ├── cache_hit.webp ├── cache_lock.webp ├── set_source.webp ├── 2.2.0_metrics.webp ├── 2.2.1_wrk_1s.webp ├── 2.2.1_wrk_60s.webp ├── 3.0.0_metrics.webp ├── 3.1.0_metrics.webp ├── 3.1.1_metrics.webp ├── 4.0.0_metrics.webp ├── 4.0.1_metrics.webp ├── edge_backend.webp ├── metrics_status.webp ├── 2.2.1_metrics_1s.webp ├── 2.2.1_metrics_60s.webp ├── initial_architecture.webp ├── metrics_architecture.webp ├── nginx_directive_restriction.webp └── simplified_workers_nginx_architecture.webp ├── data └── grafana │ ├── grafana.db │ └── alerting │ └── 1 │ └── __default__.tmpl ├── .gitignore ├── src ├── edge.lua ├── backend.lua ├── load_tests.lua ├── loadbalancer.lua └── simulations.lua ├── Dockerfile ├── config └── prometheus.yml ├── nginx_backend.conf ├── nginx_edge.conf ├── nginx_loadbalancer.conf ├── LICENSE ├── docker-compose.yaml └── README.md /generic_conf/setup_logging.conf: -------------------------------------------------------------------------------- 1 | #access_log /dev/stdout; 2 | access_log off; 3 | -------------------------------------------------------------------------------- /load_test.sh: -------------------------------------------------------------------------------- 1 | wrk -c10 -t2 -d600s -s ./src/load_tests.lua --latency http://localhost:18080 2 | -------------------------------------------------------------------------------- /img/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/.DS_Store -------------------------------------------------------------------------------- /img/2.2.0_wrk.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/2.2.0_wrk.webp -------------------------------------------------------------------------------- /img/add_source.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/add_source.webp -------------------------------------------------------------------------------- /img/cache_hit.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/cache_hit.webp -------------------------------------------------------------------------------- /img/cache_lock.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/cache_lock.webp -------------------------------------------------------------------------------- /img/set_source.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/set_source.webp -------------------------------------------------------------------------------- /img/2.2.0_metrics.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/2.2.0_metrics.webp -------------------------------------------------------------------------------- /img/2.2.1_wrk_1s.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/2.2.1_wrk_1s.webp -------------------------------------------------------------------------------- /img/2.2.1_wrk_60s.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/2.2.1_wrk_60s.webp -------------------------------------------------------------------------------- /img/3.0.0_metrics.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/3.0.0_metrics.webp -------------------------------------------------------------------------------- /img/3.1.0_metrics.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/3.1.0_metrics.webp -------------------------------------------------------------------------------- /img/3.1.1_metrics.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/3.1.1_metrics.webp -------------------------------------------------------------------------------- /img/4.0.0_metrics.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/4.0.0_metrics.webp -------------------------------------------------------------------------------- /img/4.0.1_metrics.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/4.0.1_metrics.webp -------------------------------------------------------------------------------- /img/edge_backend.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/edge_backend.webp -------------------------------------------------------------------------------- /data/grafana/grafana.db: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/data/grafana/grafana.db -------------------------------------------------------------------------------- /img/metrics_status.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/metrics_status.webp -------------------------------------------------------------------------------- /img/2.2.1_metrics_1s.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/2.2.1_metrics_1s.webp -------------------------------------------------------------------------------- /img/2.2.1_metrics_60s.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/2.2.1_metrics_60s.webp -------------------------------------------------------------------------------- /img/initial_architecture.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/initial_architecture.webp -------------------------------------------------------------------------------- /img/metrics_architecture.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/metrics_architecture.webp -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | !edge/.gitkeep 2 | !pop/.gitkeep 3 | !data/grafana/grafana.db 4 | edge/* 5 | pop/* 6 | data/prometheus/* 7 | .DS_Store/* 8 | -------------------------------------------------------------------------------- /img/nginx_directive_restriction.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/nginx_directive_restriction.webp -------------------------------------------------------------------------------- /generic_conf/basic_vts_location.conf: -------------------------------------------------------------------------------- 1 | location /status { 2 | vhost_traffic_status_display; 3 | vhost_traffic_status_display_format html; 4 | } 5 | -------------------------------------------------------------------------------- /img/simplified_workers_nginx_architecture.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leandromoreira/cdn-up-and-running/HEAD/img/simplified_workers_nginx_architecture.webp -------------------------------------------------------------------------------- /src/edge.lua: -------------------------------------------------------------------------------- 1 | local simulations = require "simulations" 2 | local edge = {} 3 | 4 | edge.simulate_load = function() 5 | simulations.for_work_longtail(simulations.profiles.edge) 6 | end 7 | 8 | return edge 9 | -------------------------------------------------------------------------------- /generic_conf/basic_vts_setup.conf: -------------------------------------------------------------------------------- 1 | vhost_traffic_status_zone shared:vhost_traffic_status:12m; 2 | vhost_traffic_status_filter_by_set_key $status status::*; 3 | vhost_traffic_status_histogram_buckets 0.005 0.01 0.05 0.1 0.5 1 5 10; # buckets are in seconds 4 | -------------------------------------------------------------------------------- /generic_conf/lua_path_setup.conf: -------------------------------------------------------------------------------- 1 | lua_package_path "/usr/local/openresty/lualib/?.lua;/usr/local/openresty/luajit/share/lua/5.1/?.lua;/lua/src/?.lua"; 2 | lua_package_cpath "/usr/local/openresty/lualib/?.so;/usr/local/openresty/luajit/lib/lua/5.1/?.so;"; 3 | -------------------------------------------------------------------------------- /generic_conf/define_cache.conf: -------------------------------------------------------------------------------- 1 | proxy_cache zone_1; 2 | proxy_cache_key $cache_key; 3 | proxy_cache_lock on; 4 | proxy_http_version 1.1; 5 | proxy_set_header Connection ""; 6 | proxy_buffering on; 7 | proxy_buffers 16 16k; 8 | add_header X-Cache-Status $upstream_cache_status; 9 | -------------------------------------------------------------------------------- /generic_conf/setup_cache.conf: -------------------------------------------------------------------------------- 1 | proxy_cache_path /cache/ levels=2:2 keys_zone=zone_1:10m max_size=10m inactive=10m use_temp_path=off; 2 | proxy_cache_lock_timeout 2s; 3 | proxy_cache_use_stale error timeout updating; 4 | proxy_read_timeout 2s; 5 | proxy_send_timeout 2s; 6 | proxy_ignore_client_abort on; 7 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM igorbarinov/openresty-nginx-module-vts 2 | 3 | RUN apk --no-cache --virtual .build-deps add build-base git \ 4 | && git clone https://github.com/openresty/lua-resty-balancer.git \ 5 | && cd lua-resty-balancer/ \ 6 | && make \ 7 | && cp -r lib/resty/* /usr/local/openresty/lualib/resty/ \ 8 | && cp librestychash.so /usr/local/openresty/lualib/ \ 9 | && apk del .build-deps 10 | -------------------------------------------------------------------------------- /src/backend.lua: -------------------------------------------------------------------------------- 1 | local simulations = require "simulations" 2 | local backend = {} 3 | 4 | backend.generate_content = function() 5 | simulations.for_work_longtail(simulations.profiles.backend) 6 | 7 | ngx.header['Content-Type'] = 'application/json' 8 | ngx.header['Cache-Control'] = 'public, max-age=' .. (ngx.var.arg_max_age or 10) 9 | 10 | ngx.say('{"service": "api", "value": 42, "request": "' .. ngx.var.uri .. '"}') 11 | end 12 | 13 | return backend 14 | -------------------------------------------------------------------------------- /config/prometheus.yml: -------------------------------------------------------------------------------- 1 | global: 2 | scrape_interval: 10s # By default, scrape targets every 15 seconds. 3 | evaluation_interval: 10s # By default, scrape targets every 15 seconds. 4 | scrape_timeout: 2s # the global default (10s). 5 | 6 | external_labels: 7 | monitor: 'CDN' 8 | 9 | scrape_configs: 10 | - job_name: 'prometheus' 11 | metrics_path: '/status/format/prometheus' 12 | static_configs: 13 | - targets: ['loadbalancer:8080', 'edge:8080', 'edge1:8080', 'edge2:8080', 'backend:8080', 'backend1:8080'] 14 | -------------------------------------------------------------------------------- /nginx_backend.conf: -------------------------------------------------------------------------------- 1 | # vi:syntax=nginx 2 | events { 3 | worker_connections 1024; 4 | } 5 | 6 | error_log stderr; 7 | 8 | http { 9 | include generic_conf/setup_logging.conf; 10 | 11 | include generic_conf/lua_path_setup.conf; 12 | include generic_conf/basic_vts_setup.conf; 13 | 14 | server { 15 | listen 8080; 16 | 17 | location / { 18 | content_by_lua_block { 19 | local backend = require "backend" 20 | backend.generate_content() 21 | } 22 | } 23 | 24 | include generic_conf/basic_vts_location.conf; 25 | } 26 | } 27 | 28 | 29 | -------------------------------------------------------------------------------- /src/load_tests.lua: -------------------------------------------------------------------------------- 1 | math.randomseed(os.time()) 2 | local random = math.random 3 | 4 | local popular_percentage = 96 5 | local popular_items_quantity = 5 6 | local max_total_items = 200 7 | 8 | -- trying to model the long tail 9 | request = function() 10 | local is_popular = random(1, 100) <= popular_percentage 11 | local item = "" 12 | 13 | if is_popular then 14 | item = "item-" .. random(1, popular_items_quantity) 15 | else 16 | item = "item-" .. random(popular_items_quantity + 1, popular_items_quantity + max_total_items) 17 | end 18 | 19 | return wrk.format(nil, "/path/" .. item .. ".ext") 20 | end 21 | 22 | -------------------------------------------------------------------------------- /generic_conf/backend_definition.conf: -------------------------------------------------------------------------------- 1 | env BACKEND_HOST; # allow list for os.getenv 2 | env BACKEND_PORT; # allow list for os.getenv 3 | 4 | upstream backend { 5 | server 0.0.0.1; # just an invalid address as a place holder 6 | 7 | balancer_by_lua_block { 8 | local balancer = require "ngx.balancer" 9 | local host = os.getenv("BACKEND_HOST") 10 | local port = number(os.getenv("BACKEND_PORT")) 11 | 12 | local ok, err = balancer.set_current_peer(host, port) 13 | if not ok then 14 | ngx.log(ngx.ERR, "failed to set the current peer: ", err) 15 | return ngx.exit(500) 16 | end 17 | } 18 | 19 | keepalive 10; # connection pool 20 | } 21 | 22 | -------------------------------------------------------------------------------- /nginx_edge.conf: -------------------------------------------------------------------------------- 1 | # vi:syntax=nginx 2 | events { 3 | worker_connections 1024; 4 | } 5 | 6 | error_log stderr; 7 | 8 | http { 9 | resolver 127.0.0.11 ipv6=off; 10 | include generic_conf/setup_logging.conf; 11 | 12 | include generic_conf/lua_path_setup.conf; 13 | include generic_conf/basic_vts_setup.conf; 14 | include generic_conf/setup_cache.conf; 15 | 16 | upstream backend { 17 | server backend:8080; 18 | server backend1:8080; 19 | keepalive 10; # connection pool 20 | } 21 | 22 | server { 23 | listen 8080; 24 | 25 | location / { 26 | set_by_lua_block $cache_key { 27 | return ngx.var.uri 28 | } 29 | 30 | access_by_lua_block { 31 | local edge = require "edge" 32 | edge.simulate_load() 33 | } 34 | 35 | proxy_pass http://backend; 36 | include generic_conf/define_cache.conf; 37 | add_header X-Edge Server; 38 | } 39 | 40 | include generic_conf/basic_vts_location.conf; 41 | } 42 | 43 | } 44 | 45 | -------------------------------------------------------------------------------- /nginx_loadbalancer.conf: -------------------------------------------------------------------------------- 1 | # vi:syntax=nginx 2 | events { 3 | worker_connections 1024; 4 | } 5 | 6 | error_log stderr; 7 | 8 | http { 9 | resolver 127.0.0.11 ipv6=off; 10 | include generic_conf/setup_logging.conf; 11 | 12 | include generic_conf/lua_path_setup.conf; 13 | include generic_conf/basic_vts_setup.conf; 14 | include generic_conf/setup_cache.conf; 15 | 16 | init_by_lua_block { 17 | loadbalancer = require "loadbalancer" 18 | loadbalancer.setup_server_list() 19 | } 20 | 21 | upstream backend { 22 | server 0.0.0.1; 23 | balancer_by_lua_block { 24 | loadbalancer.set_proper_server() 25 | } 26 | keepalive 60; 27 | } 28 | 29 | server { 30 | listen 8080; 31 | 32 | location / { 33 | access_by_lua_block { 34 | loadbalancer.resolve_name_for_upstream() 35 | } 36 | 37 | proxy_pass http://backend; 38 | add_header X-Edge LoadBalancer; 39 | } 40 | 41 | include generic_conf/basic_vts_location.conf; 42 | } 43 | 44 | } 45 | 46 | -------------------------------------------------------------------------------- /src/loadbalancer.lua: -------------------------------------------------------------------------------- 1 | local resty_chash = require "resty.chash" 2 | 3 | local loadbalancer = {} 4 | 5 | loadbalancer.setup_server_list = function() 6 | local server_list = { 7 | ["edge"] = 1, 8 | ["edge1"] = 1, 9 | ["edge2"] = 1, 10 | } 11 | local chash_up = resty_chash:new(server_list) 12 | 13 | package.loaded.my_chash_up = chash_up 14 | package.loaded.my_servers = server_list 15 | end 16 | 17 | loadbalancer.set_proper_server = function() 18 | local b = require "ngx.balancer" 19 | local chash_up = package.loaded.my_chash_up 20 | local servers = package.loaded.my_ip_servers 21 | local id = chash_up:find(ngx.var.uri) -- hashing based on uri 22 | 23 | assert(b.set_current_peer(servers[id] .. ":8080")) 24 | end 25 | 26 | loadbalancer.resolve_name_for_upstream = function() 27 | local resolver = require "resty.dns.resolver" 28 | local r, err = resolver:new{ 29 | nameservers = {"127.0.0.11", {"127.0.0.11", 53} }, 30 | retrans = 5, 31 | timeout = 1000, 32 | no_random = true, 33 | } 34 | -- quick hack, we could use ips already 35 | -- or resolve names on background 36 | if package.loaded.my_ip_servers ~= nil then 37 | return 38 | end 39 | 40 | local servers = package.loaded.my_servers 41 | local ip_servers = {} 42 | 43 | for host, weight in pairs(servers) do 44 | local answers, err, tries = r:query(host, nil, {}) 45 | ip_servers[host] = answers[1].address 46 | end 47 | 48 | package.loaded.my_ip_servers = ip_servers 49 | end 50 | 51 | return loadbalancer 52 | -------------------------------------------------------------------------------- /src/simulations.lua: -------------------------------------------------------------------------------- 1 | local simulations = {} 2 | local random = math.random 3 | local sleep = ngx.sleep 4 | local second = 0.001 -- a millisecond in second 5 | 6 | -- setup entropy 7 | math.randomseed(ngx.time() + ngx.worker.pid()) 8 | 9 | -- a percentile distribution based on a percentiles map 10 | -- { 11 | -- { 12 | -- p=50, min=1, max=400, 13 | -- } 14 | -- } 15 | -- for instance, for 50% we'll wait min 1ms and max 400ms 16 | simulations.for_work_longtail = function(percentiles) 17 | -- sort by percentile 18 | table.sort(percentiles, function(a,b) return a.p < b.p end) 19 | 20 | local current_percentage = random(1, 100) 21 | local min_wait_ms = 1 22 | local max_wait_ms = 1000 23 | 24 | for _, percentile in pairs(percentiles) do 25 | if current_percentage <= percentile.p then 26 | min_wait_ms = percentile.min 27 | max_wait_ms = percentile.max 28 | break 29 | end 30 | end 31 | 32 | local sleep_seconds = random(min_wait_ms, max_wait_ms) * second -- sleep expects seconds 33 | ngx.header["X-Latency"] = "simulated=" .. sleep_seconds .. "s, min=" .. min_wait_ms .. ", max=" .. max_wait_ms .. ", profile=" .. (ngx.var.arg_profile or "empty") 34 | 35 | sleep(sleep_seconds) 36 | end 37 | 38 | -- the percentile latency configuation in ms 39 | simulations.profiles = { 40 | edge={ 41 | {p=50, min=1, max=20,}, {p=90, min=21, max=50,}, {p=95, min=51, max=150,}, {p=99, min=151, max=500,}, 42 | }, 43 | backend={ 44 | {p=50, min=100, max=400,}, {p=90, min=401, max=500,}, {p=95, min=501, max=1500,}, {p=99, min=1501, max=3000,}, 45 | }, 46 | } 47 | 48 | return simulations 49 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2021, Leandro Moreira 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | 3. Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /data/grafana/alerting/1/__default__.tmpl: -------------------------------------------------------------------------------- 1 | 2 | {{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }} 3 | 4 | {{ define "__text_alert_list" }}{{ range . }} 5 | Value: {{ or .ValueString "" }} 6 | Labels: 7 | {{ range .Labels.SortedPairs }} - {{ .Name }} = {{ .Value }} 8 | {{ end }}Annotations: 9 | {{ range .Annotations.SortedPairs }} - {{ .Name }} = {{ .Value }} 10 | {{ end }}{{ if gt (len .GeneratorURL) 0 }}Source: {{ .GeneratorURL }} 11 | {{ end }}{{ if gt (len .SilenceURL) 0 }}Silence: {{ .SilenceURL }} 12 | {{ end }}{{ if gt (len .DashboardURL) 0 }}Dashboard: {{ .DashboardURL }} 13 | {{ end }}{{ if gt (len .PanelURL) 0 }}Panel: {{ .PanelURL }} 14 | {{ end }}{{ end }}{{ end }} 15 | 16 | {{ define "default.title" }}{{ template "__subject" . }}{{ end }} 17 | 18 | {{ define "default.message" }}{{ if gt (len .Alerts.Firing) 0 }}**Firing** 19 | {{ template "__text_alert_list" .Alerts.Firing }}{{ if gt (len .Alerts.Resolved) 0 }} 20 | 21 | {{ end }}{{ end }}{{ if gt (len .Alerts.Resolved) 0 }}**Resolved** 22 | {{ template "__text_alert_list" .Alerts.Resolved }}{{ end }}{{ end }} 23 | 24 | 25 | {{ define "__teams_text_alert_list" }}{{ range . }} 26 | Value: {{ or .ValueString "" }} 27 | Labels: 28 | {{ range .Labels.SortedPairs }} - {{ .Name }} = {{ .Value }} 29 | {{ end }} 30 | Annotations: 31 | {{ range .Annotations.SortedPairs }} - {{ .Name }} = {{ .Value }} 32 | {{ end }} 33 | {{ if gt (len .GeneratorURL) 0 }}Source: {{ .GeneratorURL }} 34 | 35 | {{ end }}{{ if gt (len .SilenceURL) 0 }}Silence: {{ .SilenceURL }} 36 | 37 | {{ end }}{{ if gt (len .DashboardURL) 0 }}Dashboard: {{ .DashboardURL }} 38 | 39 | {{ end }}{{ if gt (len .PanelURL) 0 }}Panel: {{ .PanelURL }} 40 | 41 | {{ end }} 42 | {{ end }}{{ end }} 43 | 44 | 45 | {{ define "teams.default.message" }}{{ if gt (len .Alerts.Firing) 0 }}**Firing** 46 | {{ template "__teams_text_alert_list" .Alerts.Firing }}{{ if gt (len .Alerts.Resolved) 0 }} 47 | 48 | {{ end }}{{ end }}{{ if gt (len .Alerts.Resolved) 0 }}**Resolved** 49 | {{ template "__teams_text_alert_list" .Alerts.Resolved }}{{ end }}{{ end }} 50 | -------------------------------------------------------------------------------- /docker-compose.yaml: -------------------------------------------------------------------------------- 1 | version: "3.9" 2 | services: 3 | nginx_base: 4 | build: 5 | context: . 6 | volumes: 7 | - "./generic_conf/:/usr/local/openresty/nginx/conf/generic_conf/" 8 | - "./src/:/lua/src/" 9 | loadbalancer: 10 | extends: 11 | service: nginx_base 12 | volumes: 13 | - "./nginx_loadbalancer.conf:/usr/local/openresty/nginx/conf/nginx.conf" 14 | ports: 15 | - "18080:8080" 16 | depends_on: 17 | - edge 18 | - edge1 19 | - edge2 20 | 21 | backend: 22 | extends: 23 | service: nginx_base 24 | volumes: 25 | - "./nginx_backend.conf:/usr/local/openresty/nginx/conf/nginx.conf" 26 | ports: 27 | - "8080:8080" 28 | 29 | backend1: 30 | extends: 31 | service: nginx_base 32 | volumes: 33 | - "./nginx_backend.conf:/usr/local/openresty/nginx/conf/nginx.conf" 34 | ports: 35 | - "8180:8080" 36 | 37 | edge: 38 | extends: 39 | service: nginx_base 40 | volumes: 41 | - "./nginx_edge.conf:/usr/local/openresty/nginx/conf/nginx.conf" 42 | depends_on: 43 | - backend 44 | - backend1 45 | ports: 46 | - "8081:8080" 47 | 48 | edge1: 49 | extends: 50 | service: nginx_base 51 | volumes: 52 | - "./nginx_edge.conf:/usr/local/openresty/nginx/conf/nginx.conf" 53 | depends_on: 54 | - backend 55 | - backend1 56 | ports: 57 | - "8082:8080" 58 | 59 | edge2: 60 | extends: 61 | service: nginx_base 62 | volumes: 63 | - "./nginx_edge.conf:/usr/local/openresty/nginx/conf/nginx.conf" 64 | depends_on: 65 | - backend 66 | - backend1 67 | ports: 68 | - "8083:8080" 69 | 70 | prometheus: 71 | image: prom/prometheus:v2.17.1 72 | container_name: prometheus 73 | volumes: 74 | - ./config:/etc/prometheus 75 | - ./data/prometheus:/prometheus 76 | command: 77 | - '--config.file=/etc/prometheus/prometheus.yml' 78 | - '--storage.tsdb.path=/prometheus' 79 | - '--web.console.libraries=/etc/prometheus/console_libraries' 80 | - '--web.console.templates=/etc/prometheus/consoles' 81 | - '--storage.tsdb.retention.time=24h' 82 | - '--web.enable-lifecycle' 83 | restart: unless-stopped 84 | ports: 85 | - "9090:9090" 86 | labels: 87 | org.label-schema.group: "monitoring" 88 | depends_on: 89 | - edge 90 | - edge1 91 | - edge2 92 | - backend 93 | - backend1 94 | - loadbalancer 95 | 96 | grafana: 97 | image: grafana/grafana:latest 98 | container_name: monitoring_grafana 99 | restart: unless-stopped 100 | links: 101 | - prometheus 102 | volumes: 103 | - ./data/grafana:/var/lib/grafana 104 | environment: 105 | - GF_SECURITY_ADMIN_USER=admin 106 | - GF_SECURITY_ADMIN_PASSWORD=admin 107 | - GF_USERS_ALLOW_SIGN_UP=false 108 | ports: 109 | - "9091:3000" 110 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # CDN Up and Running 2 | 3 | The objective of this repo is to build a body of knowledge on how CDNs work by coding one from "scratch". The CDN we're going to design uses: nginx, lua, docker, docker-compose, Prometheus, grafana, and wrk. 4 | 5 | We'll start creating a single backend service and expand from there to a multi-node, latency simulated, observable, and testable CDN. In each section, there are discussions regarding the challenges and trade-offs of building/managing/operating a CDN. 6 | 7 | ![grafana screenshot](/img/4.0.1_metrics.webp "grafana screenshot") 8 | 9 | ## What is a CDN? 10 | 11 | A Content Delivery Network is a set of computers, spatially distributed in order to provide high availability and **better performance** for systems that have their **work cached** on this network. 12 | 13 | ## Why do you need a CDN? 14 | 15 | A CDN helps to improve: 16 | * loading times (smoother streaming, instant page to buy, quick friends feed, etc) 17 | * accommodate traffic spikes (black friday, popular streaming release, breaking news, etc) 18 | * decrease costs (traffic offloading) 19 | * scalability for millions 20 | 21 | ## How does a CDN work? 22 | 23 | CDNs are able to make services faster by placing the content (media files, pages, games, javascript, a json response, etc) closer to the users. 24 | 25 | When a user wants to consume a service, the CDN routing system will deliver the "best" node where the content is likely **already cached and closer to the client**. Don't worry about the loose use of the word best in here. I hope that throughout the reading, the understanding of what is the best node will be elucidated. 26 | 27 | ## The CDN stack 28 | 29 | The CDN we'll build relies on: 30 | * [`Linux/GNU/Kernel`](https://www.linux.org/) - a kernel / operating system with outstanding networking capabilities as well as IO excellence. 31 | * [`Nginx`](http://nginx.org/) - an excellent web server that can be used as a reverse proxy providing caching capability. 32 | * [`Lua(jit)`](https://luajit.org/) - a simple powerful language to add features into nginx. 33 | * [`Prometheus`](https://prometheus.io/) - A system with a dimensional data model, flexible query language, efficient time series database. 34 | * [`Grafana`](https://github.com/grafana/grafana) - An open source analytics & monitoring tool that plugs with many sources, including prometheus. 35 | * [`Containers`](https://www.docker.com/) - technology to package, deploy, and isolate applications, we'll use docker and docker compose. 36 | 37 | # Origin - the backend service 38 | 39 | Origin is the system where the content is created - or at least it's the source to the CDN. The sample service we're going to build will be a straightforward JSON API. The backend service could be returning an image, video, javascript, HTML page, game, or anything you want to deliver to your clients. 40 | 41 | We'll use Nginx and Lua to design the backend service. It's a great excuse to introduce Nginx and Lua since we're going to use them a lot here. 42 | 43 | > **Heads up: the backend service could be written in any language you like.** 44 | 45 | ## Nginx - quick introduction 46 | 47 | Nginx is a web server that will follow its [configuration](http://nginx.org/en/docs/beginners_guide.html#conf_structure). The config file uses [directives](http://nginx.org/en/docs/dirindex.html) as the dominant factor. A directive is a simple construction to set properties in nginx. There are two types of directives: **simple and block (context)**. 48 | 49 | A **simple directive** is formed by its name followed by parameters ending with a semicolon. 50 | 51 | ```nginx 52 | # Syntax: ; 53 | # Example 54 | add_header X-Header AnyValue; 55 | ``` 56 | 57 | The **block directive** follows the same pattern, but instead of a semicolon, it ends surrounded by curly braces. A block directive can also have directives within it. This block is also known as context. 58 | 59 | ```nginx 60 | # Syntax: 61 | location / { 62 | add_header X-Header AnyValue; 63 | } 64 | ``` 65 | 66 | Nginx uses workers (processes) to handle the requests. The [nginx architecture](https://www.aosabook.org/en/nginx.html) plays a crucial role in its performance. 67 | 68 | ![simplified workers nginx architecture](/img/simplified_workers_nginx_architecture.webp "simplified workers nginx architecture") 69 | 70 | > **Heads up: Although a single accept queue serving multiple workers is common, there are other models to [load balance the incoming requests](https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/).** 71 | 72 | ## Backend service conf 73 | 74 | Let's walk through the backend JSON API nginx configuration. I think it'll be much easier if we see it in action. 75 | 76 | ```nginx 77 | events { 78 | worker_connections 1024; 79 | } 80 | error_log stderr; 81 | 82 | http { 83 | access_log /dev/stdout; 84 | 85 | server { 86 | listen 8080; 87 | 88 | location / { 89 | content_by_lua_block { 90 | ngx.header['Content-Type'] = 'application/json' 91 | ngx.say('{"service": "api", "value": 42}') 92 | } 93 | } 94 | } 95 | } 96 | ``` 97 | 98 | Were you able to understand what this config is doing? In any case, let's break it down by making comments on each directive. 99 | 100 | The [`events`](http://nginx.org/en/docs/ngx_core_module.html#events) provides context for [connection processing configurations](http://nginx.org/en/docs/events.html), and the [`worker_connections`](http://nginx.org/en/docs/ngx_core_module.html#worker_connections) defines the maximum number of simultaneous connections that can be opened by a worker process. 101 | ```nginx 102 | events { 103 | worker_connections 1024; 104 | } 105 | ``` 106 | 107 | The [`error_log`](http://nginx.org/en/docs/ngx_core_module.html#error_log) configures logging for error. Here we just send all the errors to the stdout (error) 108 | 109 | ```nginx 110 | error_log stderr; 111 | ``` 112 | 113 | The [`http`](http://nginx.org/en/docs/http/ngx_http_core_module.html#http) provides a root context to set up all the http/s servers. 114 | 115 | ```nginx 116 | http {} 117 | ``` 118 | 119 | The [`access_log`](http://nginx.org/en/docs/http/ngx_http_log_module.html#access_log) configures the path (and optionally format, etc) for the access logging. 120 | 121 | ```nginx 122 | access_log /dev/stdout; 123 | ``` 124 | 125 | The [`server`](http://nginx.org/en/docs/http/ngx_http_core_module.html#server) sets the root configuration for a server, aka where we're going to setup specific behavior to the server. You can have multiple `server` blocks per `http` context. 126 | 127 | ```nginx 128 | server {} 129 | ``` 130 | 131 | Within the `server` we can set the [`listen`](http://nginx.org/en/docs/http/ngx_http_core_module.html#listen) directive controlling the address and/or the port on which the [server will accept requests](http://nginx.org/en/docs/http/request_processing.html). 132 | 133 | ```nginx 134 | listen 8080; 135 | ```` 136 | 137 | In the server configuration, we can specify a route by using the [`location`](http://nginx.org/en/docs/http/ngx_http_core_module.html#location) directive. This will be used to provide specific configuration for that matching request path. 138 | 139 | ```nginx 140 | location / {} 141 | ``` 142 | 143 | Within this location (by the way, `/` will handle all the requests) we'll use Lua to create the response. There's a directive called [`content_by_lua_block`](https://github.com/openresty/lua-nginx-module#content_by_lua_block) which provides a context where the Lua code will run. 144 | 145 | ```nginx 146 | content_by_lua_block {} 147 | ``` 148 | 149 | Finally, we'll use Lua and the basic [Nginx Lua API](https://github.com/openresty/lua-nginx-module#nginx-api-for-lua) to set the desired behavior. 150 | 151 | ```lua 152 | -- ngx.header sets the current response header that is to be sent. 153 | ngx.header['Content-Type'] = 'application/json' 154 | -- ngx.say will write the response body 155 | ngx.say('{"service": "api", "value": 42}') 156 | ``` 157 | 158 | Notice that most of the directives contain their scope. For instance, the `location` is only applicable within the `location` (recursively) and `server` context. 159 | 160 | ![directive restriction](/img/nginx_directive_restriction.webp "directive restriction") 161 | 162 | > **Heads up: we won't comment on each directive we add from now on, we'll only describe the most relevant for the section.** 163 | 164 | ## CDN 1.0.0 Demo time 165 | 166 | Let's see what we did. 167 | 168 | ```bash 169 | git checkout 1.0.0 # going back to specific configuration 170 | docker-compose run --rm --service-ports backend # run the containers exposing the service 171 | http http://localhost:8080/path/to/my/content.ext # consuming the service, I used httpie but you can use curl or anything you like 172 | 173 | # you should see the json response :) 174 | ``` 175 | 176 | ## Adding caching capabilities 177 | 178 | For the backend service to be cacheable we need to set up the caching policy. We'll use the HTTP header [Cache-Control](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control) to setup what caching behavior we want. 179 | 180 | ```Lua 181 | -- we want the content to be cached by 10 seconds OR the provided max_age (ex: /path/to/service?max_age=40 for 40 seconds) 182 | ngx.header['Cache-Control'] = 'public, max-age=' .. (ngx.var.arg_max_age or 10) 183 | ``` 184 | 185 | And, if you want, make sure to check the returned response header `Cache-Control`. 186 | 187 | ```bash 188 | git checkout 1.0.1 # going back to specific configuration 189 | docker-compose run --rm --service-ports backend 190 | http "http://localhost:8080/path/to/my/content.ext?max_age=30" 191 | ``` 192 | 193 | ## Adding metrics 194 | 195 | Checking the logging is fine for debugging. But once we're reaching more traffic, it'll be nearly impossible to understand how the service is operating. To tackle this case, we're going to use [VTS](https://github.com/vozlt/nginx-module-vts), an nginx module which adds metrics measurements. 196 | 197 | ```nginx 198 | vhost_traffic_status_zone shared:vhost_traffic_status:12m; 199 | vhost_traffic_status_filter_by_set_key $status status::*; 200 | vhost_traffic_status_histogram_buckets 0.005 0.01 0.05 0.1 0.5 1 5 10; # buckets are in seconds 201 | ``` 202 | 203 | The [`vhost_traffic_status_zone`](https://github.com/vozlt/nginx-module-vts#vhost_traffic_status_zone) sets a memory space required for the metrics. The [`vhost_traffic_status_filter_by_set_key`](https://github.com/vozlt/nginx-module-vts#vhost_traffic_status_filter_by_set_key) groups metrics by a given variable (for instance, we decided to group metrics by `status`) and finally, the [`vhost_traffic_status_histogram_buckets`](https://github.com/vozlt/nginx-module-vts#vhost_traffic_status_histogram_buckets) provides a way to bucketize the metrics in seconds. We decided to create buckets varying from `0.005` to `10` seconds, because they will help us to create percentiles (`p99`, `p50`, etc). 204 | 205 | ```nginx 206 | location /status { 207 | vhost_traffic_status_display; 208 | vhost_traffic_status_display_format html; 209 | } 210 | ``` 211 | 212 | We also must expose the metrics in a location. We will use the `/status` to do it. 213 | 214 | ```bash 215 | git checkout 1.1.0 216 | docker-compose run --rm --service-ports backend 217 | # if you go to http://localhost:8080/status/format/html you'll see information about the server 8080 218 | # notice that VTS also provides other formats such as status/format/prometheus, which will be pretty helpful for us in near future 219 | ``` 220 | 221 | ![nginx vts status page](/img/metrics_status.webp "nginx vts status page") 222 | 223 | With metrics, we can run (load) tests and see if the configuration changes we made are resulting in a better performance or not. 224 | 225 | > **Heads up**: You can [group the metrics under a custom namespace](https://github.com/leandromoreira/cdn-up-and-running/commit/105f54a27d1b58b88659789ae024d70c89d4a478). This is useful when you have a single location that behaves differently depending on the context. 226 | 227 | ## Refactoring the nginx conf 228 | 229 | As the configuration becomes bigger, it also gets harder to comprehend. Nginx offers a neat directive called [`include`](http://nginx.org/en/docs/ngx_core_module.html#include) which allows us to create partial config files and include them into the root configuration file. 230 | 231 | ```diff 232 | - location /status { 233 | - vhost_traffic_status_display; 234 | - vhost_traffic_status_display_format html; 235 | - } 236 | + include basic_vts_location.conf; 237 | 238 | ``` 239 | 240 | We can extract location, group configurations per similarities, or anything that makes sense to a file. We can do [a similar thing for the Lua code](https://github.com/openresty/lua-nginx-module#lua_package_path) as well. 241 | 242 | ```diff 243 | content_by_lua_block { 244 | - ngx.header['Content-Type'] = 'application/json' 245 | - ngx.header['Cache-Control'] = 'public, max-age=' .. (ngx.var.arg_max_age or 10) 246 | - 247 | - ngx.say('{"service": "api", "value": 42, "request": "' .. ngx.var.uri .. '"}') 248 | + local backend = require "backend" 249 | + backend.generate_content() 250 | } 251 | ``` 252 | 253 | All these modifications were made to improve readability, but it also promotes reuse. 254 | 255 | 256 | # The CDN - siting in front of the backend 257 | 258 | ## Proxying 259 | 260 | What we did so far has nothing to do with the CDN. Now it's time to start building the CDN. For that, we'll create another node with nginx, just adding a few new directives to connect the `edge` (CDN) node with the `backend` node. 261 | 262 | ![backend edge architecture](/img/edge_backend.webp "backend edge architecture") 263 | 264 | There's really nothing fancy here, it's just an [`upstream`](http://nginx.org/en/docs/http/ngx_http_upstream_module.html#upstream) block with a server pointing to our `backend` endpoint. In the location, we do not provide the content, but instead we point to the upstream, using the [`proxy_pass`](http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass), we just created. 265 | 266 | ```nginx 267 | upstream backend { 268 | server backend:8080; 269 | keepalive 10; # connection pool for reuse 270 | } 271 | 272 | server { 273 | listen 8080; 274 | 275 | location / { 276 | proxy_pass http://backend; 277 | add_header X-Cache-Status $upstream_cache_status; 278 | } 279 | } 280 | ```` 281 | 282 | We also added a new header (X-Cache-Status) to indicate whether the [cache was used or not](http://nginx.org/en/docs/http/ngx_http_upstream_module.html#variables). 283 | * **HIT**: when the content is in the CDN, the `X-Cache-Status` should return a hit. 284 | * **MISS**: when the content isn't in the CDN, the `X-Cache-Status` should return a miss. 285 | 286 | ```bash 287 | git checkout 2.0.0 288 | docker-compose up 289 | # we still can fetch the content from the backend 290 | http "http://localhost:8080/path/to/my/content.ext" 291 | # but we really want to access the content through the edge (CDN) 292 | http "http://localhost:8081/path/to/my/content.ext" 293 | ``` 294 | 295 | ## Caching 296 | 297 | When we try to fetch content, the `X-Cache-Status` header is absent. It seems that the edge node is always invariably requesting the backend. This is not the way a CDN should work, right? 298 | 299 | ```log 300 | backend_1 | 172.22.0.4 - - [05/Jan/2022:17:24:48 +0000] "GET /path/to/my/content.ext HTTP/1.0" 200 70 "-" "HTTPie/2.6.0" 301 | edge_1 | 172.22.0.1 - - [05/Jan/2022:17:24:48 +0000] "GET /path/to/my/content.ext HTTP/1.1" 200 70 "-" "HTTPie/2.6.0" 302 | ```` 303 | 304 | The edge is just proxying the clients to the backend. What are we missing? Is there any reason to use a "simple" proxy at all? Well, it does, maybe you want to provide throttling, authentication, authorization, tls termination, or a gateway for multiple services, but that's not what we want. 305 | 306 | We need to create a cache area on nginx through the directive [`proxy_cache_path`](http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path). It's setting up the path where the cached content will reside, the shared memory `key_zone`, and policies such as `inactive`, `max_size`, among others, to control how we want the cache to behave. 307 | 308 | ```nginx 309 | proxy_cache_path /cache/ levels=2:2 keys_zone=zone_1:10m max_size=10m inactive=10m use_temp_path=off; 310 | ``` 311 | 312 | Once we've configured a proper cache, we must also set up the [`proxy_cache`](http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache) pointing to the right zone (via `proxy_cache_path keys_zone=:size`), and the [`proxy_pass`](http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass) linking to the upstream we've created. 313 | 314 | ```nginx 315 | location / { 316 | # ... 317 | proxy_pass http://backend; 318 | proxy_cache zone_1; 319 | } 320 | ``` 321 | 322 | There is another important aspect of caching which is managed by the directive [`proxy_cache_key`](http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_key). 323 | When a client requests content from nginx, it will (highly simplified): 324 | 325 | * Receive the request (let's say: `GET /path/to/something.txt`) 326 | * Apply a hash md5 function over the cache key value (let's assume that the cache key is the `uri`) 327 | * md5("/path/to/something.txt") => `b3c4c5e7dc10b13dc2e3f852e52afcf3` 328 | * you can check that on your terminarl `echo -n "/path/to/something.txt" | md5` 329 | * It checks whether the content (hash `b3c4..`) is cached or not 330 | * If it's cached, it just returns the object otherwise it fetches the content from the backend 331 | * It also saves locally (in memory and on disk) to avoid future requests 332 | 333 | Let's create a variable called `cache_key` using the lua directive [`set_by_lua_block`](https://github.com/openresty/lua-nginx-module#set_by_lua_block). It will, for each incoming request, fill the `cache_key` with the `uri` **value**. Beyond that, we also need to update the [`proxy_cache_key`](http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_key). 334 | 335 | ```nginx 336 | location / { 337 | set_by_lua_block $cache_key { 338 | return ngx.var.uri 339 | } 340 | # ... 341 | proxy_cache_key $cache_key; 342 | } 343 | ``` 344 | 345 | > **Heads up**: Using `uri` as cache key will make the following two requests http://example.com/path/to/content.ext and http://example.edu/path/to/content.ext (if they're using the same cache proxy) as if they were a single object. If you do not provide a cache key, nginx will use a reasonable **default value** `$scheme$proxy_host$request_uri`. 346 | 347 | Now we can see the caching properly working. 348 | 349 | ```bash 350 | git checkout 2.1.0 351 | docker-compose up 352 | http "http://localhost:8081/path/to/my/content.ext" 353 | # the second request must get the content from the CDN without leaving to the backend 354 | http "http://localhost:8081/path/to/my/content.ext" 355 | ``` 356 | 357 | ![cache hit header](/img/cache_hit.webp "cache hit header") 358 | 359 | ## Monitoring Tools 360 | 361 | Checking the cache effectiveness by looking at the command line isn't efficient. It's better if we use a tool for that. **Prometheus** will be used to scrape metrics on all servers, and **Grafana** will show graphics based on the metrics collected by the prometheus. 362 | 363 | ![instrumentalization architecture](/img/metrics_architecture.webp "instrumentalization architecture") 364 | 365 | Prometheus configuration will look like this. 366 | 367 | ```yaml 368 | global: 369 | scrape_interval: 10s # each 10s prometheus will scrape targets 370 | evaluation_interval: 10s 371 | scrape_timeout: 2s 372 | 373 | external_labels: 374 | monitor: 'CDN' 375 | 376 | scrape_configs: 377 | - job_name: 'prometheus' 378 | metrics_path: '/status/format/prometheus' 379 | static_configs: 380 | - targets: ['edge:8080', 'backend:8080'] # the server list to be scrapped by the scrap_path 381 | ``` 382 | 383 | Now, we need to add a prometheus source for Grafana. 384 | 385 | ![grafana source](/img/add_source.webp "grafana source") 386 | 387 | And set the proper prometheus server. 388 | 389 | ![grafana source set](/img/set_source.webp "grafana source set") 390 | 391 | ## Simulated Work (latency) 392 | 393 | The backend server is artificially creating responses. We'll add simulated latency using lua. The idea is to make it closer to real-world situations. We're going to model the latency using [percentiles](https://www.mathsisfun.com/data/percentiles.html). 394 | 395 | ```lua 396 | percentile_config={ 397 | {p=50, min=1, max=20,}, {p=90, min=21, max=50,}, {p=95, min=51, max=150,}, {p=99, min=151, max=500,}, 398 | } 399 | ``` 400 | 401 | We randomly pick a number from 1 to 100, and then we apply another random using the respective `percentile profile` ranging from the min to the max. Finally, we [`sleep`](https://github.com/openresty/lua-nginx-module#ngxsleep) that duration. 402 | 403 | ```lua 404 | local current_percentage = random(1, 100) -- decide with percentile this request will be 405 | -- let's assume we picked 94 406 | -- therefore we'll use the percentile_config with p90 407 | local sleep_duration = random(p90.min, p90.max) 408 | sleep(sleep_seconds) 409 | ``` 410 | 411 | This model lets us freely try to emulate closer to [real-world observed latencies](https://research.google/pubs/pub40801/). 412 | 413 | ## Load Testing 414 | 415 | We'll run some load testing to learn more about the solution we're building. Wrk is an HTTP benchmarking tool that you can dynamically configure using lua. We pick a random number from 1 to 100 and request that item. 416 | 417 | ```lua 418 | request = function() 419 | local item = "item_" .. random(1, 100) 420 | 421 | return wrk.format(nil, "/" .. item .. ".ext") 422 | end 423 | ``` 424 | 425 | The command line will run the tests for 10 minutes (600s), using two threads, and 10 connections. 426 | 427 | ```bash 428 | wrk -c10 -t2 -d600s -s ./src/load_tests.lua --latency http://localhost:8081 429 | ``` 430 | 431 | Of course, you can run this on your machine: 432 | 433 | ```bash 434 | git checkout 2.2.0 435 | docker-compose up 436 | 437 | # run the tests 438 | ./load_test.sh 439 | 440 | # go check on grafana, how the system is behaving 441 | http://localhost:9091 442 | ``` 443 | 444 | The `wrk` output was as shown bellow. There were **37k** requests with **674** failing requests in total. 445 | 446 | ```bash 447 | Running 10m test @ http://localhost:8081 448 | 2 threads and 10 connections 449 | Thread Stats Avg Stdev Max +/- Stdev 450 | Latency 218.31ms 236.55ms 1.99s 84.32% 451 | Req/Sec 35.14 29.02 202.00 79.15% 452 | Latency Distribution 453 | 50% 162.73ms 454 | 75% 350.33ms 455 | 90% 519.56ms 456 | 99% 1.02s 457 | 37689 requests in 10.00m, 15.50MB read 458 | Non-2xx or 3xx responses: 674 459 | Requests/sec: 62.80 460 | Transfer/sec: 26.44KB 461 | ``` 462 | 463 | Grafana showed that in a given instant, **68** requests were responded by the `edge`. From these requests, **16** went through the `backend`. The [cache efficiency](https://www.cloudflare.com/learning/cdn/what-is-a-cache-hit-ratio/) was **76%**, 1% of the request's latency was longer than **3.6s**, 5% observed more than **786ms**, and the median was around **73ms**. 464 | 465 | 466 | > Be aware! Latencies measurements highly depend on the bucket size, and [using histogram to measure performance](https://medium.com/mercari-engineering/have-you-been-using-histogram-metrics-correctly-730c9547a7a9) might not be adequate. 467 | 468 | ![grafana result for 2.2.0](/img/2.2.0_metrics.webp "grafana result for 2.2.0") 469 | 470 | ## Learning by testing - let's change the cache ttl (max age) 471 | 472 | This project should engage you to experiment, change parameters values, run load testing, and check the results. I think this loop can be a great to learn. Let's try to see what happens when we change the cache behavior. 473 | 474 | ### 1s 475 | 476 | Using 1s for cache validity. 477 | 478 | ```lua 479 | request = function() 480 | local item = "item_" .. random(1, 100) 481 | 482 | return wrk.format(nil, "/" .. item .. ".ext?max_age=1") 483 | end 484 | ``` 485 | 486 | Run the tests, and the result is: only 16k requests with 773 errors. 487 | 488 | ``` 489 | Running 10m test @ http://localhost:8081 490 | 2 threads and 10 connections 491 | Thread Stats Avg Stdev Max +/- Stdev 492 | Latency 378.72ms 254.21ms 1.46s 68.40% 493 | Req/Sec 15.11 9.98 90.00 74.18% 494 | Latency Distribution 495 | 50% 396.15ms 496 | 75% 507.22ms 497 | 90% 664.18ms 498 | 99% 1.05s 499 | 16643 requests in 10.00m, 6.83MB read 500 | Non-2xx or 3xx responses: 773 501 | Requests/sec: 27.74 502 | Transfer/sec: 11.66KB 503 | ``` 504 | 505 | We also noticed that the cache hit went down significantly `(23%)`, and many more requests leaked to the backend. 506 | 507 | ![grafana result for 2.2.1 1 second](/img/2.2.1_metrics_1s.webp "grafana result for 2.2.1 1 second") 508 | 509 | ### 60s 510 | 511 | What if instead we increase the caching expire to a complete minute?! 512 | 513 | ```lua 514 | request = function() 515 | local item = "item_" .. random(1, 100) 516 | 517 | return wrk.format(nil, "/" .. item .. ".ext?max_age=60") 518 | end 519 | ``` 520 | 521 | Run the tests, and the result now is: 45k requests with 551 errors. 522 | 523 | ```bash 524 | Running 10m test @ http://localhost:8081 525 | 2 threads and 10 connections 526 | Thread Stats Avg Stdev Max +/- Stdev 527 | Latency 196.27ms 223.43ms 1.79s 84.74% 528 | Req/Sec 42.31 34.80 242.00 78.01% 529 | Latency Distribution 530 | 50% 79.67ms 531 | 75% 321.06ms 532 | 90% 494.41ms 533 | 99% 1.01s 534 | 45695 requests in 10.00m, 18.79MB read 535 | Non-2xx or 3xx responses: 551 536 | Requests/sec: 76.15 537 | Transfer/sec: 32.06KB 538 | ``` 539 | 540 | We see a much better **cache efficiency (80% vs 23%)** and **throughput (45k vs 16k requests)**. 541 | 542 | ![grafana result for 2.2.1 60 seconds](/img/2.2.1_metrics_60s.webp "grafana result for 2.2.1 60 seconds") 543 | 544 | > **Heads up**: caching for longer helps improve performance but at the cost of stale content. 545 | 546 | ## Fine tunning - cache lock, stale, timeout, network 547 | 548 | Using default configurations for Nginx, linux, and others will be sufficient for many small workloads. But when you're goal is more ambitious, you will inevitably need to fine-tune the CDN for your need. 549 | 550 | The process of fine-tuning a web server is gigantic. It goes from managing how [`nginx/Linux process sockets`](https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/), to [`linux network queuing`](https://github.com/leandromoreira/linux-network-performance-parameters), how [`io`](https://serverfault.com/questions/796665/what-are-the-performance-implications-for-millions-of-files-in-a-modern-file-sys) affects performance, among other aspects. There is a lot of symbiosis between the [application and OS](https://nginx.org/en/docs/http/ngx_http_core_module.html#sendfile) with direct implications to the performance, for instance [saving user land switch context with ktls](https://docs.kernel.org/networking/tls-offload.html). 551 | 552 | You'll be reading a lot of man pages, mostly tweaking timeouts and buffers. The test loop can help you build confidence in your ideas, let's see. 553 | 554 | * You have a hypothesis or have observed something weird and want to test a parameter value 555 | * stick to a single set of related parameters each time 556 | * Set the new value 557 | * Run the tests 558 | * Check results against the same server with the old parameter 559 | 560 | > **Heads up**: doing tests locally is fine for learning, but most of the time you'll only trust your production results. Be prepared to do a partial deployment, compare old system/config to newer test parameters. 561 | 562 | Did you notice that the errors were all related to timeout? It seems that the `backend` is taking longer to respond than what the `edge` is willing to wait. 563 | 564 | ```log 565 | edge_1 | 2021/12/29 11:52:45 [error] 8#8: *3 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 172.25.0.1, server: , request: "GET /item_34.ext HTTP/1.1", upstream: "http://172.25.0.3:8080/item_34.ext", host: "localhost:8081" 566 | ``` 567 | 568 | To solve this problem we can try to increase the proxy timeouts. We're also using a neat directive [`proxy_cache_use_stale`](http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_use_stale) that serves `stale content` when nginx is dealing with `errors, timeout, or even updating the cache`. 569 | 570 | ```nginx 571 | proxy_cache_lock_timeout 2s; 572 | proxy_read_timeout 2s; 573 | proxy_send_timeout 2s; 574 | proxy_cache_use_stale error timeout updating; 575 | ``` 576 | 577 | While we were reading about proxy caching, something catch our attention. There's a directive called [`proxy_cache_lock`](http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock) that collapses multiple user requests for the same content into a single request going `upstream` to fetch the content at a time. This is very often known as [coalescing](https://cloud.google.com/cdn/docs/caching#request-coalescing). 578 | 579 | ```nginx 580 | proxy_cache_lock on 581 | ``` 582 | 583 | ![caching lock](/img/cache_lock.webp "caching lock") 584 | 585 | Running the tests we observed that we decrease the timeout errors but we also got less throughput. Why? Maybe it's because of lock contention. The big benefit of this feature it's to avoid the [thundering herd](https://alexpareto.com/2020/06/15/thundering-herds.html) in the backend. Traffic went down from **6k to 3k** and requests from **16 to 8**. 586 | 587 | ![grafana result for test 3.0.0](/img/3.0.0_metrics.webp "grafana result for test 3.0.0") 588 | 589 | ## From normal to long tail distribution 590 | 591 | We've been running load testing assuming a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) but that's far from reality. What we might see in production is [most of the requests will be towards a few items](https://en.wikipedia.org/wiki/Long_tail). To closer simulate that, we'll tweak our code to randomly pick a number from 1 to 100 and then decide if it's a popular item or not. 592 | 593 | ```lua 594 | local popular_percentage = 96 -- 96% of users are requesting top 5 content 595 | local popular_items_quantity = 5 -- top content quantity 596 | local max_total_items = 200 -- total items clientes are requesting 597 | 598 | request = function() 599 | local is_popular = random(1, 100) <= popular_percentage 600 | local item = "" 601 | 602 | if is_popular then -- if it's popular let's pick one of the top content 603 | item = "item-" .. random(1, popular_items_quantity) 604 | else -- otherwise let's pick any resting items 605 | item = "item-" .. random(popular_items_quantity + 1, popular_items_quantity + max_total_items) 606 | end 607 | 608 | return wrk.format(nil, "/path/" .. item .. ".ext") 609 | end 610 | ``` 611 | 612 | > **Heads-up**: we could model the long tail using [a formula](https://firstmonday.org/ojs/index.php/fm/article/view/1832/1716), but for the purpose of this repo, this extrapolation might be good enough. 613 | 614 | Now, let's test again with `proxy_cache_lock` `off` and `on`. 615 | 616 | ### Long tail `proxy_cache_lock` off 617 | ![grafana result for test 3.1.0](/img/3.1.0_metrics.webp "grafana result for test 3.1.0") 618 | ### Long tail `proxy_cache_lock` on 619 | ![grafana result for test 3.1.1](/img/3.1.1_metrics.webp "grafana result for test 3.1.1") 620 | 621 | It's pretty close, even though the `lock off` is still better marginally. This feature might go to production to show if it's worthy or not. 622 | 623 | > **Heads up**: the `proxy_cache_lock_timeout` is dangerous but necessary, if the configured time has passed, all the requests will go to the backend. 624 | 625 | ## Routing challenges 626 | 627 | We've been testing a single edge but in reality, there will be hundreds of nodes. Having more edge nodes is necessary for scalability, resilience and also to provide closer to user responses. Introducing multiple nodes also introduces another challenge, clients need somehow to figure out which node to fetch the content. 628 | 629 | There are many ways to overcome this complication, and we'll try to explore some of them. 630 | 631 | ### Load balancing 632 | 633 | A load balancer will spread the client's requests among all the edges. 634 | 635 | #### Round-robin 636 | 637 | Round-robin is a balancing policy that takes an ordered list of edges and goes serving requests picking a server each time and wrapping around when the server list ends. 638 | 639 | ```nginx 640 | # on nginx, if we do not specify anything the default policy is weighted round-robin 641 | # http://nginx.org/en/docs/http/ngx_http_upstream_module.html#upstream 642 | upstream backend { 643 | server edge:8080; 644 | server edge1:8080; 645 | server edge2:8080; 646 | } 647 | 648 | server { 649 | listen 8080; 650 | 651 | location / { 652 | proxy_pass http://backend; 653 | add_header X-Edge LoadBalancer; 654 | } 655 | } 656 | ``` 657 | 658 | What's good about `round-robin`? The requests are shared almost equally to all servers. There might be slower servers or responses which may enqueue lots of requests. There is the [`least_conn`](http://nginx.org/en/docs/http/ngx_http_upstream_module.html#least_conn) that also considers many connections. 659 | 660 | What's not good about it? It's not caching-aware, meaning multiple clients will face higher latencies because they're asking uncached servers. 661 | 662 | > [See more about when to use and avoid `rr` policy.](https://github.com/leandromoreira/cdn-up-and-running/issues/10) 663 | 664 | ```bash 665 | # demo time 666 | git checkout 4.0.0 667 | docker-compose up 668 | ./load_test.sh 669 | ``` 670 | 671 | ![round-robin grafana](/img/4.0.0_metrics.webp "round-robin grafana") 672 | 673 | > **Heads up**: the load balancer itself here plays a single point of failure role. [Facebook has a great talk explaining](https://www.youtube.com/watch?v=bxhYNfFeVF4) how they created a load balancer that is resilient, maintainable, and scalable. 674 | 675 | #### Consistent Hashing 676 | 677 | Knowing that caching awareness is important for a CDN, it's hard to use round-robin as it is. There is a balancing method known as [`consistent hashing`](https://en.wikipedia.org/wiki/Consistent_hashing) which tries to solve this problem by choosing a signal (the `uri` for instance) and mapping it to a hash table, consistently sending all the requests to the same server. 678 | 679 | There is a directive for that on nginx as well, it's called [`hash`](http://nginx.org/en/docs/http/ngx_http_upstream_module.html#hash). 680 | 681 | ```nginx 682 | upstream backend { 683 | hash $request_uri consistent; 684 | server edge:8080; 685 | server edge1:8080; 686 | server edge2:8080; 687 | } 688 | 689 | server { 690 | listen 8080; 691 | 692 | location / { 693 | proxy_pass http://backend; 694 | add_header X-Edge LoadBalancer; 695 | } 696 | } 697 | ``` 698 | 699 | What's good about `consistent hashing`? It enforces a policy that will increase the chances of a cache hit. 700 | 701 | What's not good about it? Imagine a single content (video, game) is peaking and now we have a problem of a small number of servers to respond to most of the clients. 702 | 703 | > **Heads up** [Consistent Hashing With Bounded Load](https://medium.com/vimeo-engineering-blog/improving-load-balancing-with-a-new-consistent-hashing-algorithm-9f1bd75709ed) born to solve this problem. 704 | 705 | ```bash 706 | # demo time 707 | git checkout 4.0.1 708 | docker-compose up 709 | ./load_test.sh 710 | ``` 711 | 712 | ![consistent hashing grafana](/img/4.0.1_metrics.webp "consistent hashing grafana") 713 | 714 | > **Heads up** Initially I used a lua library because I thought the consistent hashing was only available for comercial nginx. 715 | 716 | #### Load balancer bottleneck 717 | 718 | There are at least two problems (beyond it being a [SPoF](https://en.wikipedia.org/wiki/Single_point_of_failure)) with a load balancer: 719 | 720 | * Network egress - the input/output bandwidth capacity of the load balancer must be at least sum of all its servers. 721 | * one could use [DSR](https://www.loadbalancer.org/blog/yahoos-l3-direct-server-return-an-alternative-to-lvs-tun-explored/) or [307](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/307). 722 | * Distributed edges - there might be nodes geographically sparsed that impose a hard time for a load balancer. 723 | 724 | ### Network reachability 725 | 726 | Many of the problems we saw on the load balancer section are about network reachability. Here we're going to discuss some of the ways we can tackle that, and each one with their ups and downs. 727 | 728 | #### API 729 | 730 | We could introduce an `API (cdn routing)`, all clients will only know where to find a content (`a specific edge node`) after asking for this API. Clients might need to deal with failover. 731 | 732 | > **Heads up** solving on the software side, one could mix the best of all worlds: start balacing using `consistent hashing` and then when a given content becames popular uses [a better natural distribution](https://brooker.co.za/blog/2012/01/17/two-random.html) 733 | 734 | #### DNS 735 | 736 | We could use DNS for that. It looks pretty similar to the API but we're going to rely on dns caching ttl for that. Failover on this case is even harder. 737 | 738 | #### Anycast 739 | 740 | We could also use a single [Domain/IP, announcing the IP](https://en.wikipedia.org/wiki/Anycast) in all places we have nodes, leave the [network routing protocols](https://www.youtube.com/watch?v=O6tCoD5c_U0) to find the closest node for a given user. 741 | 742 | ## Miscellaneous 743 | 744 | We didn't talk about lots of important aspects of a CDN such as: 745 | 746 | * [Peering](https://www.peeringdb.com/) - CDNs will host their nodes/content on ISPs, public peering places and private places. 747 | * Security - CDNs suffer a lot of attacks, DDoS, [caching poisoning](https://youst.in/posts/cache-poisoning-at-scale/), and others. 748 | * [Caching strategies](https://netflixtechblog.com/netflix-and-fill-c43a32b490c0) - in some cases, instead of pulling the content from the backend, the backend pushes the content to the edge. 749 | * [Tenants](https://en.wikipedia.org/wiki/Multitenancy)/Isolation - CDNs will host multiple clients on the same nodes, isolation is a must. 750 | * metrics, caching area, configurations (caching policies, backend), and etc. 751 | * Tokens - CDNs offer some form of [token protection](https://en.wikipedia.org/wiki/JSON_Web_Token) for content from unauthorized clients. 752 | * [Health check (fault detection)](https://youtu.be/1TIzPL4878Q?t=782) - stating whether a node is functional or not. 753 | * HTTP Headers - very often (i.e. [CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS)) a client wants to add some headers (sometimes dynamically) 754 | * [Geoblocking](https://github.com/leev/ngx_http_geoip2_module#example-usage) - to save money or enforce contractual restrictions, your CDN will employ some policy regarding the locality of users. 755 | * Purging - the ability to [purge content from the cache](https://docs.nginx.com/nginx/admin-guide/content-cache/content-caching/#purging-content-from-the-cache). 756 | * [Throttling](https://github.com/leandromoreira/nginx-lua-redis-rate-measuring#use-case-distributed-throttling) - limit the number of concurrent requests. 757 | * [Edge computing](https://leandromoreira.com/2020/04/19/building-an-edge-computing-platform/) - ability to run code as a filter for the content hosted. 758 | * and so on... 759 | 760 | ## Conclusion 761 | 762 | I hope you learned a little bit about how a CDN works. It's a complex endeavor, highly dependent on how close your nodes are to the clients and how well you can distribute the load, taking caching into consideration, to accommodate spikes and low traffics likewise. 763 | --------------------------------------------------------------------------------