├── .gitignore ├── LICENSE ├── k8s ├── deployment.yaml ├── echo │ ├── Dockerfile │ ├── echo.py │ ├── requirements.txt │ └── templates │ │ └── index.html ├── helm │ ├── Chart.yaml │ └── templates │ │ ├── NOTES.txt │ │ ├── deployment.yaml │ │ ├── ingress.yaml │ │ ├── namespace.yaml │ │ ├── rbac.yaml │ │ └── service.yaml ├── k8s-101.md └── k8s-102.md ├── mysql ├── mysql-101-0.md ├── mysql-101-1.md └── mysql-102.md └── terraform └── tf-101.md /.gitignore: -------------------------------------------------------------------------------- 1 | k8s/cert/* -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Mozilla Public License Version 2.0 2 | ================================== 3 | 4 | 1. Definitions 5 | -------------- 6 | 7 | 1.1. "Contributor" 8 | means each individual or legal entity that creates, contributes to 9 | the creation of, or owns Covered Software. 10 | 11 | 1.2. "Contributor Version" 12 | means the combination of the Contributions of others (if any) used 13 | by a Contributor and that particular Contributor's Contribution. 14 | 15 | 1.3. "Contribution" 16 | means Covered Software of a particular Contributor. 17 | 18 | 1.4. "Covered Software" 19 | means Source Code Form to which the initial Contributor has attached 20 | the notice in Exhibit A, the Executable Form of such Source Code 21 | Form, and Modifications of such Source Code Form, in each case 22 | including portions thereof. 23 | 24 | 1.5. "Incompatible With Secondary Licenses" 25 | means 26 | 27 | (a) that the initial Contributor has attached the notice described 28 | in Exhibit B to the Covered Software; or 29 | 30 | (b) that the Covered Software was made available under the terms of 31 | version 1.1 or earlier of the License, but not also under the 32 | terms of a Secondary License. 33 | 34 | 1.6. "Executable Form" 35 | means any form of the work other than Source Code Form. 36 | 37 | 1.7. "Larger Work" 38 | means a work that combines Covered Software with other material, in 39 | a separate file or files, that is not Covered Software. 40 | 41 | 1.8. "License" 42 | means this document. 43 | 44 | 1.9. "Licensable" 45 | means having the right to grant, to the maximum extent possible, 46 | whether at the time of the initial grant or subsequently, any and 47 | all of the rights conveyed by this License. 48 | 49 | 1.10. "Modifications" 50 | means any of the following: 51 | 52 | (a) any file in Source Code Form that results from an addition to, 53 | deletion from, or modification of the contents of Covered 54 | Software; or 55 | 56 | (b) any new file in Source Code Form that contains any Covered 57 | Software. 58 | 59 | 1.11. "Patent Claims" of a Contributor 60 | means any patent claim(s), including without limitation, method, 61 | process, and apparatus claims, in any patent Licensable by such 62 | Contributor that would be infringed, but for the grant of the 63 | License, by the making, using, selling, offering for sale, having 64 | made, import, or transfer of either its Contributions or its 65 | Contributor Version. 66 | 67 | 1.12. "Secondary License" 68 | means either the GNU General Public License, Version 2.0, the GNU 69 | Lesser General Public License, Version 2.1, the GNU Affero General 70 | Public License, Version 3.0, or any later versions of those 71 | licenses. 72 | 73 | 1.13. "Source Code Form" 74 | means the form of the work preferred for making modifications. 75 | 76 | 1.14. "You" (or "Your") 77 | means an individual or a legal entity exercising rights under this 78 | License. For legal entities, "You" includes any entity that 79 | controls, is controlled by, or is under common control with You. For 80 | purposes of this definition, "control" means (a) the power, direct 81 | or indirect, to cause the direction or management of such entity, 82 | whether by contract or otherwise, or (b) ownership of more than 83 | fifty percent (50%) of the outstanding shares or beneficial 84 | ownership of such entity. 85 | 86 | 2. License Grants and Conditions 87 | -------------------------------- 88 | 89 | 2.1. Grants 90 | 91 | Each Contributor hereby grants You a world-wide, royalty-free, 92 | non-exclusive license: 93 | 94 | (a) under intellectual property rights (other than patent or trademark) 95 | Licensable by such Contributor to use, reproduce, make available, 96 | modify, display, perform, distribute, and otherwise exploit its 97 | Contributions, either on an unmodified basis, with Modifications, or 98 | as part of a Larger Work; and 99 | 100 | (b) under Patent Claims of such Contributor to make, use, sell, offer 101 | for sale, have made, import, and otherwise transfer either its 102 | Contributions or its Contributor Version. 103 | 104 | 2.2. Effective Date 105 | 106 | The licenses granted in Section 2.1 with respect to any Contribution 107 | become effective for each Contribution on the date the Contributor first 108 | distributes such Contribution. 109 | 110 | 2.3. Limitations on Grant Scope 111 | 112 | The licenses granted in this Section 2 are the only rights granted under 113 | this License. No additional rights or licenses will be implied from the 114 | distribution or licensing of Covered Software under this License. 115 | Notwithstanding Section 2.1(b) above, no patent license is granted by a 116 | Contributor: 117 | 118 | (a) for any code that a Contributor has removed from Covered Software; 119 | or 120 | 121 | (b) for infringements caused by: (i) Your and any other third party's 122 | modifications of Covered Software, or (ii) the combination of its 123 | Contributions with other software (except as part of its Contributor 124 | Version); or 125 | 126 | (c) under Patent Claims infringed by Covered Software in the absence of 127 | its Contributions. 128 | 129 | This License does not grant any rights in the trademarks, service marks, 130 | or logos of any Contributor (except as may be necessary to comply with 131 | the notice requirements in Section 3.4). 132 | 133 | 2.4. Subsequent Licenses 134 | 135 | No Contributor makes additional grants as a result of Your choice to 136 | distribute the Covered Software under a subsequent version of this 137 | License (see Section 10.2) or under the terms of a Secondary License (if 138 | permitted under the terms of Section 3.3). 139 | 140 | 2.5. Representation 141 | 142 | Each Contributor represents that the Contributor believes its 143 | Contributions are its original creation(s) or it has sufficient rights 144 | to grant the rights to its Contributions conveyed by this License. 145 | 146 | 2.6. Fair Use 147 | 148 | This License is not intended to limit any rights You have under 149 | applicable copyright doctrines of fair use, fair dealing, or other 150 | equivalents. 151 | 152 | 2.7. Conditions 153 | 154 | Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted 155 | in Section 2.1. 156 | 157 | 3. Responsibilities 158 | ------------------- 159 | 160 | 3.1. Distribution of Source Form 161 | 162 | All distribution of Covered Software in Source Code Form, including any 163 | Modifications that You create or to which You contribute, must be under 164 | the terms of this License. You must inform recipients that the Source 165 | Code Form of the Covered Software is governed by the terms of this 166 | License, and how they can obtain a copy of this License. You may not 167 | attempt to alter or restrict the recipients' rights in the Source Code 168 | Form. 169 | 170 | 3.2. Distribution of Executable Form 171 | 172 | If You distribute Covered Software in Executable Form then: 173 | 174 | (a) such Covered Software must also be made available in Source Code 175 | Form, as described in Section 3.1, and You must inform recipients of 176 | the Executable Form how they can obtain a copy of such Source Code 177 | Form by reasonable means in a timely manner, at a charge no more 178 | than the cost of distribution to the recipient; and 179 | 180 | (b) You may distribute such Executable Form under the terms of this 181 | License, or sublicense it under different terms, provided that the 182 | license for the Executable Form does not attempt to limit or alter 183 | the recipients' rights in the Source Code Form under this License. 184 | 185 | 3.3. Distribution of a Larger Work 186 | 187 | You may create and distribute a Larger Work under terms of Your choice, 188 | provided that You also comply with the requirements of this License for 189 | the Covered Software. If the Larger Work is a combination of Covered 190 | Software with a work governed by one or more Secondary Licenses, and the 191 | Covered Software is not Incompatible With Secondary Licenses, this 192 | License permits You to additionally distribute such Covered Software 193 | under the terms of such Secondary License(s), so that the recipient of 194 | the Larger Work may, at their option, further distribute the Covered 195 | Software under the terms of either this License or such Secondary 196 | License(s). 197 | 198 | 3.4. Notices 199 | 200 | You may not remove or alter the substance of any license notices 201 | (including copyright notices, patent notices, disclaimers of warranty, 202 | or limitations of liability) contained within the Source Code Form of 203 | the Covered Software, except that You may alter any license notices to 204 | the extent required to remedy known factual inaccuracies. 205 | 206 | 3.5. Application of Additional Terms 207 | 208 | You may choose to offer, and to charge a fee for, warranty, support, 209 | indemnity or liability obligations to one or more recipients of Covered 210 | Software. However, You may do so only on Your own behalf, and not on 211 | behalf of any Contributor. You must make it absolutely clear that any 212 | such warranty, support, indemnity, or liability obligation is offered by 213 | You alone, and You hereby agree to indemnify every Contributor for any 214 | liability incurred by such Contributor as a result of warranty, support, 215 | indemnity or liability terms You offer. You may include additional 216 | disclaimers of warranty and limitations of liability specific to any 217 | jurisdiction. 218 | 219 | 4. Inability to Comply Due to Statute or Regulation 220 | --------------------------------------------------- 221 | 222 | If it is impossible for You to comply with any of the terms of this 223 | License with respect to some or all of the Covered Software due to 224 | statute, judicial order, or regulation then You must: (a) comply with 225 | the terms of this License to the maximum extent possible; and (b) 226 | describe the limitations and the code they affect. Such description must 227 | be placed in a text file included with all distributions of the Covered 228 | Software under this License. Except to the extent prohibited by statute 229 | or regulation, such description must be sufficiently detailed for a 230 | recipient of ordinary skill to be able to understand it. 231 | 232 | 5. Termination 233 | -------------- 234 | 235 | 5.1. The rights granted under this License will terminate automatically 236 | if You fail to comply with any of its terms. However, if You become 237 | compliant, then the rights granted under this License from a particular 238 | Contributor are reinstated (a) provisionally, unless and until such 239 | Contributor explicitly and finally terminates Your grants, and (b) on an 240 | ongoing basis, if such Contributor fails to notify You of the 241 | non-compliance by some reasonable means prior to 60 days after You have 242 | come back into compliance. Moreover, Your grants from a particular 243 | Contributor are reinstated on an ongoing basis if such Contributor 244 | notifies You of the non-compliance by some reasonable means, this is the 245 | first time You have received notice of non-compliance with this License 246 | from such Contributor, and You become compliant prior to 30 days after 247 | Your receipt of the notice. 248 | 249 | 5.2. If You initiate litigation against any entity by asserting a patent 250 | infringement claim (excluding declaratory judgment actions, 251 | counter-claims, and cross-claims) alleging that a Contributor Version 252 | directly or indirectly infringes any patent, then the rights granted to 253 | You by any and all Contributors for the Covered Software under Section 254 | 2.1 of this License shall terminate. 255 | 256 | 5.3. In the event of termination under Sections 5.1 or 5.2 above, all 257 | end user license agreements (excluding distributors and resellers) which 258 | have been validly granted by You or Your distributors under this License 259 | prior to termination shall survive termination. 260 | 261 | ************************************************************************ 262 | * * 263 | * 6. Disclaimer of Warranty * 264 | * ------------------------- * 265 | * * 266 | * Covered Software is provided under this License on an "as is" * 267 | * basis, without warranty of any kind, either expressed, implied, or * 268 | * statutory, including, without limitation, warranties that the * 269 | * Covered Software is free of defects, merchantable, fit for a * 270 | * particular purpose or non-infringing. The entire risk as to the * 271 | * quality and performance of the Covered Software is with You. * 272 | * Should any Covered Software prove defective in any respect, You * 273 | * (not any Contributor) assume the cost of any necessary servicing, * 274 | * repair, or correction. This disclaimer of warranty constitutes an * 275 | * essential part of this License. No use of any Covered Software is * 276 | * authorized under this License except under this disclaimer. * 277 | * * 278 | ************************************************************************ 279 | 280 | ************************************************************************ 281 | * * 282 | * 7. Limitation of Liability * 283 | * -------------------------- * 284 | * * 285 | * Under no circumstances and under no legal theory, whether tort * 286 | * (including negligence), contract, or otherwise, shall any * 287 | * Contributor, or anyone who distributes Covered Software as * 288 | * permitted above, be liable to You for any direct, indirect, * 289 | * special, incidental, or consequential damages of any character * 290 | * including, without limitation, damages for lost profits, loss of * 291 | * goodwill, work stoppage, computer failure or malfunction, or any * 292 | * and all other commercial damages or losses, even if such party * 293 | * shall have been informed of the possibility of such damages. This * 294 | * limitation of liability shall not apply to liability for death or * 295 | * personal injury resulting from such party's negligence to the * 296 | * extent applicable law prohibits such limitation. Some * 297 | * jurisdictions do not allow the exclusion or limitation of * 298 | * incidental or consequential damages, so this exclusion and * 299 | * limitation may not apply to You. * 300 | * * 301 | ************************************************************************ 302 | 303 | 8. Litigation 304 | ------------- 305 | 306 | Any litigation relating to this License may be brought only in the 307 | courts of a jurisdiction where the defendant maintains its principal 308 | place of business and such litigation shall be governed by laws of that 309 | jurisdiction, without reference to its conflict-of-law provisions. 310 | Nothing in this Section shall prevent a party's ability to bring 311 | cross-claims or counter-claims. 312 | 313 | 9. Miscellaneous 314 | ---------------- 315 | 316 | This License represents the complete agreement concerning the subject 317 | matter hereof. If any provision of this License is held to be 318 | unenforceable, such provision shall be reformed only to the extent 319 | necessary to make it enforceable. Any law or regulation which provides 320 | that the language of a contract shall be construed against the drafter 321 | shall not be used to construe this License against a Contributor. 322 | 323 | 10. Versions of the License 324 | --------------------------- 325 | 326 | 10.1. New Versions 327 | 328 | Mozilla Foundation is the license steward. Except as provided in Section 329 | 10.3, no one other than the license steward has the right to modify or 330 | publish new versions of this License. Each version will be given a 331 | distinguishing version number. 332 | 333 | 10.2. Effect of New Versions 334 | 335 | You may distribute the Covered Software under the terms of the version 336 | of the License under which You originally received the Covered Software, 337 | or under the terms of any subsequent version published by the license 338 | steward. 339 | 340 | 10.3. Modified Versions 341 | 342 | If you create software not governed by this License, and you want to 343 | create a new license for such software, you may create and use a 344 | modified version of this License if you rename the license and remove 345 | any references to the name of the license steward (except to note that 346 | such modified license differs from this License). 347 | 348 | 10.4. Distributing Source Code Form that is Incompatible With Secondary 349 | Licenses 350 | 351 | If You choose to distribute Source Code Form that is Incompatible With 352 | Secondary Licenses under the terms of this version of the License, the 353 | notice described in Exhibit B of this License must be attached. 354 | 355 | Exhibit A - Source Code Form License Notice 356 | ------------------------------------------- 357 | 358 | This Source Code Form is subject to the terms of the Mozilla Public 359 | License, v. 2.0. If a copy of the MPL was not distributed with this 360 | file, You can obtain one at http://mozilla.org/MPL/2.0/. 361 | 362 | If it is not possible or desirable to put the notice in a particular 363 | file, then You may include the notice in a location (such as a LICENSE 364 | file in a relevant directory) where a recipient would be likely to look 365 | for such a notice. 366 | 367 | You may add additional accurate notices of copyright ownership. 368 | 369 | Exhibit B - "Incompatible With Secondary Licenses" Notice 370 | --------------------------------------------------------- 371 | 372 | This Source Code Form is "Incompatible With Secondary Licenses", as 373 | defined by the Mozilla Public License, v. 2.0. 374 | -------------------------------------------------------------------------------- /k8s/deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: echo 5 | --- 6 | apiVersion: apps/v1 7 | kind: Deployment 8 | metadata: 9 | name: echo 10 | namespace: echo 11 | spec: 12 | selector: 13 | matchLabels: 14 | app: echo 15 | replicas: 1 16 | template: 17 | metadata: 18 | labels: 19 | app: echo 20 | spec: 21 | containers: 22 | - name: echo 23 | image: localhost:5000/echo:latest 24 | imagePullPolicy: Always 25 | stdin: true 26 | tty: true 27 | -------------------------------------------------------------------------------- /k8s/echo/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.10-alpine 2 | 3 | WORKDIR /app 4 | 5 | COPY . /app 6 | 7 | RUN pip install -r requirements.txt 8 | 9 | EXPOSE 8080 10 | 11 | CMD ["python", "/app/echo.py"] 12 | -------------------------------------------------------------------------------- /k8s/echo/echo.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from flask import Flask, request, render_template 4 | 5 | app = Flask(__name__) 6 | 7 | @app.route("/") 8 | def form(): 9 | return render_template("index.html") 10 | 11 | @app.route("/", methods=["POST"]) 12 | def form_post(): 13 | return request.form["echo_input"] 14 | 15 | if __name__ == "__main__": 16 | app.run(host="0.0.0.0", port=8080, debug=True) 17 | -------------------------------------------------------------------------------- /k8s/echo/requirements.txt: -------------------------------------------------------------------------------- 1 | flask 2 | -------------------------------------------------------------------------------- /k8s/echo/templates/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | Echo (echo...) 4 | 5 | 6 |

Echo (echo...)

7 |
8 | 9 | 10 |
11 | -------------------------------------------------------------------------------- /k8s/helm/Chart.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v2 2 | name: echo 3 | description: A Helm chart for a simple echo app 4 | version: 0.1.0 5 | appVersion: 0.1.0 6 | -------------------------------------------------------------------------------- /k8s/helm/templates/NOTES.txt: -------------------------------------------------------------------------------- 1 | To access, please run the following command: 2 | 3 | sudo echo "$(minikube ip) echo.internal" >> /etc/hosts 4 | 5 | Then go to http://echo.internal inyour browser. 6 | 7 | To clean up, run the following command: 8 | 9 | sudo sed -i'' '$d' /etc/hosts 10 | -------------------------------------------------------------------------------- /k8s/helm/templates/deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: echo 5 | namespace: echo 6 | spec: 7 | selector: 8 | matchLabels: 9 | app: echo 10 | replicas: 1 11 | template: 12 | metadata: 13 | labels: 14 | app: echo 15 | spec: 16 | containers: 17 | - name: echo 18 | image: localhost:5000/echo:latest 19 | imagePullPolicy: Always 20 | ports: 21 | - containerPort: 8080 22 | stdin: true 23 | tty: true 24 | resources: 25 | limits: 26 | cpu: 100m 27 | memory: 128Mi 28 | requests: 29 | cpu: 50m 30 | memory: 50Mi 31 | -------------------------------------------------------------------------------- /k8s/helm/templates/ingress.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: networking.k8s.io/v1 2 | kind: Ingress 3 | metadata: 4 | name: echo 5 | namespace: echo 6 | annotations: 7 | nginx.ingress.kubernetes.io/rewrite-target: / 8 | spec: 9 | rules: 10 | - host: echo.internal 11 | http: 12 | paths: 13 | - path: / 14 | pathType: Prefix 15 | backend: 16 | service: 17 | name: echo 18 | port: 19 | number: 8080 20 | -------------------------------------------------------------------------------- /k8s/helm/templates/namespace.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: echo 5 | -------------------------------------------------------------------------------- /k8s/helm/templates/rbac.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: rbac.authorization.k8s.io/v1 2 | kind: Role 3 | metadata: 4 | name: echo-rw 5 | namespace: echo 6 | rules: 7 | - apiGroups: [""] 8 | resources: ["pods"] 9 | verbs: ["get", "list", "watch"] 10 | - apiGroups: [""] 11 | resources: ["pods/exec"] 12 | verbs: ["create"] 13 | --- 14 | apiVersion: rbac.authorization.k8s.io/v1 15 | kind: RoleBinding 16 | metadata: 17 | name: echo-rw 18 | namespace: echo 19 | subjects: 20 | - kind: User 21 | name: echo-user 22 | apiGroup: rbac.authorization.k8s.io 23 | roleRef: 24 | kind: Role 25 | name: echo-rw 26 | apiGroup: rbac.authorization.k8s.io 27 | -------------------------------------------------------------------------------- /k8s/helm/templates/service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | name: echo 5 | namespace: echo 6 | spec: 7 | ports: 8 | - protocol: TCP 9 | port: 8080 10 | targetPort: 8080 11 | selector: 12 | app: echo 13 | type: NodePort 14 | -------------------------------------------------------------------------------- /k8s/k8s-101.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | ## What is Kubernetes? 4 | 5 | Kubernetes is a container orchestration system. It manages the scheduling and execution of containers. Similar platforms exist like Docker Swarm, Apache Mesos, and Hashicorp Nomad. Kubernetes by far has the dominant market share, and is also the most complex out of those listed. 6 | 7 | ## Kubernetes components 8 | 9 | ### High level 10 | * Cluster: a logical grouping of one or more nodes. 11 | * Node: a server running Kubernetes - can be bare metal, a VM, or a container. 12 | * Pod: a logical grouping of one or more containers. 13 | * Container: the same as Docker. 14 | 15 | #### Node roles 16 | * Control plane: schedules pods, detects and responds to cluster events, maintains cluster state in a database. 17 | * Worker: runs user-defined workloads via DaemonSets, StatefulSets, or Deployments. 18 | * Note that while not recommended in production, for development purposes, a single-node cluster can serve as both of these roles. 19 | 20 | ### Low[er] level 21 | 22 | For a more thorough examination of Kubernetes components, [the official documentation](https://kubernetes.io/docs/concepts/overview/components/) is recommended. A brief overview of some components follows: 23 | 24 | * kube-apiserver: handles requests to the API, typically via kubectl. 25 | * etcd: a key/value store utilizing the Raft algorithm for consensus; frequently used as the store for Kubernetes cluster data. 26 | * K3s (and thus K3d) uses an embedded SQLite database as its backing store by default; in general any database may be used, but in production etcd is the standard. 27 | * kube-scheduler: assigns workloads to a node, constrained by resource limits, affinity/anti-affinity rules, etc. 28 | * kubelet: an agent running on every node, ensuring that a Pod's containers are running. 29 | 30 | ## Kubernetes distributions 31 | 32 | Each cloud provider has their own - Amazon has EKS, Azure has AKS, Google has GKE, DigitalOcean has DOKS, etc. Vanilla Kubernetes can be installed [either manually](https://github.com/kelseyhightower/kubernetes-the-hard-way), or with a tool like [kubeadm](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/). Various distributions also exist, much like Linux distributions. [Minikube](https://minikube.sigs.k8s.io/docs/) is a popular way to bootstrap a single-node cluster for development in an existing operating system. K3d is based on [k3s](https://k3s.io/), which is a lightweight single-binary distribution of Kubernetes. Rancher Labs (owned by SuSE) also make a full single-purpose OS called [k3os](https://k3os.io/) which is designed to run k3s, and only k3s. A similar (albeit running vanilla Kubernetes) but even more extreme example is [Talos](https://www.talos.dev/), which is completely immutable, has no shell access, and allows access only via its API. Amazon has a similar offering called [Bottlerocket](https://aws.amazon.com/bottlerocket/). 33 | 34 | In general, any Kubernetes distribution will be perfectly adequate for learning, and it comes down to personal preference. For production, there are arguments to be made for managed services like EKS, but that's beyond the scope of this document. 35 | 36 | # Getting started 37 | 38 | ## Install 39 | 40 | ### Prerequisites: 41 | 42 | - Docker 43 | - There are many ways to do this, pick your favorite 44 | - If you're using Minikube, you can just `eval` its daemon 45 | - helm 46 | - `brew install helm` 47 | - kubectl 48 | - `brew install kubectl` 49 | - Optional: 50 | - `brew install hidetatz/tap/kubecolor` 51 | - Optional but please do it: 52 | - `brew install gnu-sed` 53 | - Add the following to your shell rc file: 54 | - `alias kubectl=kubecolor` (if kubecolor was installed) 55 | - `alias k=kubectl` 56 | - Install your shell's plugin for kubectl 57 | 58 | ### Install and verification 59 | 60 | #### M1 Macs (ARM) 61 | 62 | Install Docker Desktop, and launch minikube as below, but with `--driver docker` instead. Additionally, skip all steps regarding using a registry, and whenever the image path is referenced, use these instead: 63 | 64 | # For the first part, which has no networking functionality 65 | stephangarland/echo:local 66 | # For the second part, which exposes a container port 67 | stephangarland/echo:web 68 | 69 | #### Intel Macs (x86-64) 70 | 71 | Install the `hyperkit` driver with `brew install hyperkit`, and then minikube with `brew install minikube`. 72 | 73 | First, run `brew install minikube`. Then, run `minikube start` with a few options: `minikube start --memory 8GB --cpus 4 --driver hyperkit`. Assuming you have the memory and CPU to spare, this ensures we won't run into any backing hardware issues. `hyperkit` as the driver means we don't have to download anything additional to spin up the VM that runs the cluster. 74 | 75 | ❯ minikube start --memory 8GB --cpus 4 --driver hyperkit 76 | 😄 minikube v1.25.2 on Darwin 11.6.5 77 | ▪ KUBECONFIG=/Users/sgarland/.kube/.switch_tmp/config.1541775917.tmp 78 | ▪ MINIKUBE_ACTIVE_DOCKERD=minikube 79 | ✨ Using the hyperkit driver based on user configuration 80 | 👍 Starting control plane node minikube in cluster minikube 81 | 🔥 Creating hyperkit VM (CPUs=4, Memory=8192MB, Disk=20000MB) ... 82 | 🐳 Preparing Kubernetes v1.23.3 on Docker 20.10.12 ... 83 | ▪ kubelet.housekeeping-interval=5m 84 | ▪ Generating certificates and keys ... 85 | ▪ Booting up control plane ... 86 | ▪ Configuring RBAC rules ... 87 | 🔎 Verifying Kubernetes components... 88 | ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5 89 | 🌟 Enabled addons: storage-provisioner, default-storageclass 90 | 🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default 91 | 92 | Next, we'll enable the registry addon: 93 | 94 | ❯ minikube addons enable registry 95 | ▪ Using image registry:2.7.1 96 | ▪ Using image gcr.io/google_containers/kube-registry-proxy:0.4 97 | 🔎 Verifying registry addon... 98 | 🌟 The 'registry' addon is enabled 99 | 100 | Also, since we'll need it later, let's enable the ingress addon now: 101 | 102 | ❯ minikube addons enable ingress 103 | ▪ Using image k8s.gcr.io/ingress-nginx/controller:v1.1.1 104 | ▪ Using image k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.1.1 105 | ▪ Using image k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.1.1 106 | 🔎 Verifying ingress addon... 107 | 🌟 The 'ingress' addon is enabled 108 | 109 | Next, if you don't already have the docker daemon (hint: does `docker version` return anything?), we'll hook into Minikube's: 110 | 111 | eval $(minikube -p minikube docker-env) 112 | 113 | Finally, we need to modify networking a little bit using `socat` to get the registry to listen to our local docker daemon: 114 | 115 | ❯ docker run --rm -it -d --network=host alpine ash -c "apk add socat && socat TCP-LISTEN:5000,reuseaddr,fork TCP:$(minikube ip):5000" 116 | Unable to find image 'alpine:latest' locally 117 | latest: Pulling from library/alpine 118 | df9b9388f04a: Already exists 119 | Digest: sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454 120 | Status: Downloaded newer image for alpine:latest 121 | fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/main/x86_64/APKINDEX.tar.gz 122 | fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/community/x86_64/APKINDEX.tar.gz 123 | (1/4) Installing ncurses-terminfo-base (6.3_p20211120-r0) 124 | (2/4) Installing ncurses-libs (6.3_p20211120-r0) 125 | (3/4) Installing readline (8.1.1-r0) 126 | (4/4) Installing socat (1.7.4.2-r0) 127 | Executing busybox-1.34.1-r5.trigger 128 | OK: 7 MiB in 18 packages 129 | 130 | Let's verify the cluster: 131 | 132 | ❯ kubectl get nodes 133 | NAME STATUS ROLES AGE VERSION 134 | minikube Ready control-plane,master 6m26s v1.23.3 135 | 136 | For more detail, use `describe`. There's a lot here, but I'll highlight some pertinent information. 137 | 138 | ❯ kubectl describe nodes 139 | Name: minikube 140 | Roles: control-plane,master 141 | Labels: beta.kubernetes.io/arch=amd64 142 | ... 143 | Capacity: 144 | cpu: 4 145 | ephemeral-storage: 17784752Ki 146 | hugepages-2Mi: 0 147 | memory: 8161900Ki 148 | pods: 110 149 | ... 150 | Events: 151 | Type Reason Age From Message 152 | ---- ------ ---- ---- ------- 153 | Normal Starting 6m28s kube-proxy 154 | Normal NodeHasSufficientMemory 6m52s (x5 over 6m52s) kubelet Node minikube status is now: NodeHasSufficientMemory 155 | Normal NodeHasNoDiskPressure 6m52s (x5 over 6m52s) kubelet Node minikube status is now: NodeHasNoDiskPressure 156 | Normal NodeHasSufficientPID 6m52s (x4 over 6m52s) kubelet Node minikube status is now: NodeHasSufficientPID 157 | Normal Starting 6m42s kubelet Starting kubelet. 158 | Normal NodeHasNoDiskPressure 6m42s kubelet Node minikube status is now: NodeHasNoDiskPressure 159 | Normal NodeHasSufficientPID 6m42s kubelet Node minikube status is now: NodeHasSufficientPID 160 | Normal NodeNotReady 6m42s kubelet Node minikube status is now: NodeNotReady 161 | Normal NodeAllocatableEnforced 6m42s kubelet Updated Node Allocatable limit across pods 162 | Normal NodeHasSufficientMemory 6m42s kubelet Node minikube status is now: NodeHasSufficientMemory 163 | Normal NodeReady 6m31s kubelet Node minikube status is now: NodeReady 164 | 165 | At the top, we can see the name, role, labels, and annotations. The name is self-explanatory. The role here is showing two - control-plane, and master. These are the same thing, and are in parallel since Kubernetes v1.20. `master` is being deprecated in favor of `control-plane` and will be fully removed in a future release. The purpose and limitations of this, along with taints, will be discussed later. Labels are key/value pairs that can be arbitrarily applied, but usually carry semantic meaning for either the user or an application. For example, `kubernetes.io/arch=amd64` tells us that this node has `amd64` architecture. Clusters can be of mixed architecture, so it's good to be able to easily tell apart `x86` and `arm` nodes for scheduling purposes. 166 | 167 | Let's label the node, for fun: 168 | 169 | ❯ kubectl label node --all "my.name.is=$(whoami)" 170 | node/minikube labeled 171 | Using the `--all` flag applies it to all nodes; without it, you'd need to add the node's name (`k3d-sgarland-cluster-server-0` for me). 172 | 173 | We can then see the new label with the `--show-labels `flag: 174 | 175 | ❯ kubectl get nodes --show-labels 176 | NAME STATUS ROLES AGE VERSION LABELS 177 | minikube Ready control-plane,master 9m32s v1.23.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=minikube,kubernetes.io/os=linux,minikube.k8s.io/commit=362d5fdc0a3dbee389b3d3f1034e8023e72bd3a7,minikube.k8s.io/name=minikube,minikube.k8s.io/primary=true,minikube.k8s.io/updated_at=2022_05_06T14_14_05_0700,minikube.k8s.io/version=v1.25.2,my.name.is=sgarland,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers= 178 | 179 | Unfortunately it's a bit messy in the default comma-separated form, but should be able to spot your change in there. To delete the label, the syntax is somewhat confusing; you use the key and a `-` sign to indicate that it should be removed: 180 | 181 | ❯ kubectl label node --all "my.name.is-" 182 | node/minikube labeled 183 | 184 | Next is Capacity. We can see things like the amount of ephemeral storage and memory available (16 GB and 8 GB, respectively) to the cluster, as well as allocatable pods. 110 pods is not actually resource-based, but networking - with a `/24` block being assigned to each node, there are 256 addresses available. With slightly over double the amount of addresses than the maximum number of pods, this reduces IP address reuse as pods come and go from the node. 185 | 186 | Finally, Events. In this section, the kubelet reports the status of the node, here showing that it has sufficient memory, disk, PID, and is ready. 187 | 188 | # Exploration 189 | 190 | ## Imperative vs Declarative 191 | 192 | Ideally, everything is maintained in code, and changes are made with some form of state management, be it ArgoCD, Flux, or others. Less optimally, you can issue commands with `kubectl apply`, which reads your input file and compares it to existing, then makes changes. Even less optimally, you can directly issue `kubectl` commands. 193 | 194 | ## Kubectl verbs 195 | 196 | So far we've used a few - `get`, `describe`, and `label`. Kubernetes [loosely follows HTTP verbs](https://kubernetes.io/docs/reference/access-authn-authz/authorization/#determine-the-request-verb), with some extras thrown in. One important note is that since you are directly communicating with the API, there are no warnings for destructive actions. If you tell it to delete a Persistent Volume, it will do so (with some exceptions for finalizers). 197 | 198 | ## Create a deployment 199 | 200 | Let's deploy a simple application. If you have a small Dockerized app you'd like to run you're welcome to use it here, but otherwise, we'll focus on this simple echo app that echoes the user's input. 201 | 202 | ### Building the application 203 | Use your own, or copy/paste this into a shell to write them to `echo.py` and `Dockerfile`, respectively. 204 | 205 | cat << EOF > echo.py 206 | #!/usr/bin/env python 207 | 208 | def main(): 209 | while True: 210 | user_input = input("Hi, say something, or type 'quit' to quit: ") 211 | if user_input == "quit": 212 | break 213 | else: 214 | print(user_input) 215 | 216 | if __name__ == "__main__": 217 | main() 218 | 219 | EOF 220 | 221 | 222 | --- 223 | cat << EOF > Dockerfile 224 | FROM python:3.10-alpine 225 | 226 | WORKDIR /app 227 | 228 | COPY ./echo.py /app/echo.py 229 | 230 | CMD ["python", "/app/echo.py"] 231 | 232 | EOF 233 | 234 | Then, build it: 235 | 236 | docker build -t echo . 237 | 238 | To test that it works, you can use: 239 | 240 | docker run --rm -i --name echo echo 241 | 242 | Bonus question: what does the `-i` flag do, and what happens if you neglect to include it here? 243 | 244 | ### Writing a Deployment 245 | A Deployment is a basic Kubernetes structure, which allows you to define a workload in the form of a Pod, which should run with n replicas. Via the kubelet, a Deployment can ensure that an app is restarted if it fails, is reachable (assuming you've set up liveness and readiness probes), and more. 246 | 247 | #### YAML 248 | cat << EOF > deployment.yaml 249 | apiVersion: apps/v1 250 | kind: Deployment 251 | metadata: 252 | name: echo 253 | spec: 254 | selector: 255 | matchLabels: 256 | app: echo 257 | replicas: 1 258 | template: 259 | metadata: 260 | labels: 261 | app: echo 262 | spec: 263 | containers: 264 | - name: echo 265 | image: localhost:5000/echo:latest 266 | imagePullPolicy: Always 267 | stdin: true 268 | tty: true 269 | 270 | EOF 271 | 272 | Let's break down what's going on here, line by excruciating line: 273 | 274 | # This refers to a specific API version for the code 275 | # that follows - these are regularly updated and 276 | # deprecated, but you're warned well in advance 277 | apiVersion: apps/v1 278 | 279 | # This specifies what it is you're defining - could 280 | # also be a StatefulSet, an Ingress, a Service, etc. 281 | kind: Deployment 282 | 283 | # You can put multiple things here; the two most 284 | # common are the name of the application, and 285 | # a namespace in which to install it 286 | metadata: 287 | name: echo 288 | 289 | # This tells the Deployment what application 290 | # it should manage - in this case, it's looking 291 | # for those with the label `app: echo` 292 | spec: 293 | selector: 294 | matchlabels: 295 | app: echo 296 | # The number of replicas to deploy - note that 297 | # this is even with `spec.selector`, and like Python, 298 | # whitespace is extremely important 299 | replicas: 1 300 | 301 | # This gives the Pods a template to apply 302 | # In this case, the label `app: echo` 303 | template: 304 | metadata: 305 | labels: 306 | app: echo 307 | # Now we define the Pod's containers - note that this 308 | # is even with `template.metadata`, as it is part 309 | # of the template 310 | spec: 311 | containers: 312 | # The name of your application 313 | - name: echo 314 | # The image, optionally as a FQDN 315 | # If not specified as a FQDN, it will first 316 | # be searched for locally, and then on Dockerhub 317 | image: localhost:5000/echo:latest 318 | # When to pull - can also use Never or IfNotPresent 319 | imagePullPolicy: IfNotPresent 320 | # Technically only stdin is needed, but 321 | # if you don't also give it a psuedo-TTY 322 | # it will complain (but still run) when 323 | # you attach to the container 324 | stdin: true 325 | tty: true 326 | 327 | ### Applying the Deployment 328 | 329 | #### Pushing the build 330 | But first, we have to tag and push to our registry. Note that if you're using an M1 Mac, you will not do this, but can substitute in `docker pull` commands to verify that the images are available for you, e.g. `docker pull stephangarland/echo:web` 331 | 332 | docker tag echo:latest localhost:5000/echo:latest 333 | --- 334 | docker push localhost:5000/echo:latest 335 | --- 336 | The push refers to repository [localhost:5000/echo] 337 | 43358167f05b: Layer already exists 338 | 96568c21d3ac: Layer already exists 339 | b02dd59d34c0: Layer already exists 340 | 0b800261971d: Layer already exists 341 | 16e3ab2d4dee: Layer already exists 342 | fbd7d5451c69: Layer already exists 343 | 4fc242d58285: Layer already exists 344 | latest: digest: sha256:36450f0ec0febf8daf800f24ab81363211dc52dd6bfc3e50d5d54c508f8d89ed size: 1782 345 | 346 | #### Deploy! 347 | As stated, there are far better ways to deploy applications, but this is the most basic, and gives the most insight into what Kubernetes is doing to get your app running. 348 | 349 | If you run all of these in quick succession, you should see the following: 350 | 351 | ❯ kubectl apply -f deployment.yaml 352 | deployment.apps/echo created 353 | 354 | ❯ kubectl get deployments 355 | NAME READY UP-TO-DATE AVAILABLE AGE 356 | echo 0/1 1 0 1s 357 | 358 | ❯ kubectl get pods 359 | NAME READY STATUS RESTARTS AGE 360 | echo-746cdbd89c-hrzds 0/1 ContainerCreating 0 2s 361 | 362 | Once the pod creates and deploys (which for this, takes a very short amount of time), the latter two commands should show this: 363 | 364 | ❯ kubectl get deployments 365 | NAME READY UP-TO-DATE AVAILABLE AGE 366 | echo 1/1 1 1 2m6s 367 | 368 | ❯ kubectl get pods 369 | NAME READY STATUS RESTARTS AGE 370 | echo-746cdbd89c-hrzds 1/1 Running 0 2m35s 371 | 372 | ### Exploring the Deployment 373 | Let's apply some of the verbs available to us. 374 | 375 | #### Attach 376 | ❯ kubectl attach -i echo-746cdbd89c-hrzds 377 | If you don't see a command prompt, try pressing enter. 378 | 379 | 380 | Hi, say something, or type 'quit' to quit: Hello! 381 | Hello! 382 | Hi, say something, or type 'quit' to quit: quit 383 | Session ended, resume using 'kubectl attach echo-746cdbd89c-hrzds -c echo -i -t' command when the pod is running 384 | 385 | `attach` lets us attach to a container's default process, which in this case, is our app. 386 | 387 | #### Exec 388 | 389 | *Note: This early termination may not occur.* 390 | 391 | You could also use `exec` to get a shell into the pod, like this: 392 | 393 | ❯ kubectl exec -it echo-74bf7cdf5c-9rhxd -- sh 394 | Error from server (NotFound): pods "echo-74bf7cdf5c-9rhxd" not found 395 | 396 | #### Describe 397 | What's this? Our pod went away already? Let's `describe` the new pod to see why. 398 | 399 | ❯ kubectl describe pod echo-746cdbd89c-hrzds 400 | Name: echo-746cdbd89c-hrzds 401 | Namespace: default 402 | ... (not shown for conciseness) 403 | Containers: 404 | echo: 405 | ... 406 | Last State: Terminated 407 | Reason: Completed 408 | Exit Code: 0 409 | Started: Fri, 25 Mar 2022 15:03:57 -0500 410 | Finished: Fri, 25 Mar 2022 15:05:24 -0500 411 | 412 | Ah, there we are - since our program runs in a loop until it receives `quit` as input, once that was passed, the program exited. The kubelet noticed that the deployment no longer had a running pod, and spawned a new one. 413 | 414 | #### Get 415 | 416 | We can see this if we `get` pods: 417 | 418 | ❯ kubectl get pods 419 | NAME READY STATUS RESTARTS AGE 420 | echo-746cdbd89c-hrzds 1/1 Running 1 (62s ago) 8m 421 | 422 | #### Exec (again) 423 | Now let's exec into the pod. 424 | 425 | ❯ kubectl exec -it echo-746cdbd89c-hrzds -- sh 426 | /app # ls 427 | echo.py 428 | /app # python echo.py 429 | Hi, say something, or type 'quit' to quit: Hello 430 | Hello 431 | Hi, say something, or type 'quit' to quit: quit 432 | /app # 433 | Note that here, quitting the app didn't kill the pod - that's because we spawned a new shell to exec into, and created a new instance of the app. Look at what's running: 434 | 435 | /app # ps 436 | PID USER TIME COMMAND 437 | 1 root 0:00 python /app/echo.py 438 | 27 root 0:00 sh 439 | 41 root 0:00 ps 440 | 441 | Our app is running as the `init` process, PID 1. Kill it and watch what happens. Just kidding - `init` traps most `kill` signals in reasonable *nix distributions for good reason; but you can send it `INT` aka `2` if you'd like to see what happens (you could also kill the shell, if you'd like). 442 | 443 | #### Delete 444 | 445 | This is how you canonically restart a pod, in case you weren't aware. 446 | 447 | ❯ kubectl delete pod -l app=echo 448 | pod "echo-746cdbd89c-hrzds" deleted 449 | 450 | What's this `-l` flag? Why didn't we have to specify the entire name? Welcome to selectors - also available with their longhand flag, `--selector`. Remember the `template.metadata.labels.app` we assigned to the Deployment? That's how this is finding it. 451 | And we can see that we now have a new pod, thanks to the Deployment: 452 | 453 | ❯ kubectl get pods 454 | NAME READY STATUS RESTARTS AGE 455 | echo-746cdbd89c-x9k2m 1/1 Running 0 36s 456 | 457 | ### Scaling workloads 458 | 459 | If you have a given workload, be it a Deployment or StatefulSet, you can horizontally scale it using the command `kubectl scale`, and the flag `--replicas`. Go ahead and scale ours up to, say, 3 replicas: 460 | 461 | ❯ kubectl scale deployment echo --replicas=3 462 | deployment.apps/echo scaled 463 | 464 | Now let's look at our deployment (if you aren't quick, you might just see 3/3 ready, but that's OK): 465 | 466 | ❯ kubectl get deployments 467 | NAME READY UP-TO-DATE AVAILABLE AGE 468 | echo 1/3 3 1 68m 469 | 470 | Once the pods are all up, this will change to 3/3 ready. 471 | 472 | ❯ kubectl get pods 473 | NAME READY STATUS RESTARTS AGE 474 | echo-746cdbd89c-8v5qb 1/1 Running 0 3s 475 | echo-746cdbd89c-ns9kk 1/1 Running 0 3s 476 | echo-746cdbd89c-x9k2m 1/1 Running 0 26m 477 | 478 | Of note, all this time we haven't been specifying a deployment (or pod) for `get`, which is fine since we're only running the one. If this were a real cluster, though, there would likely be many deployments and pods, and we'd want to be more specific: 479 | 480 | ❯ kubectl get deployment echo 481 | NAME READY UP-TO-DATE AVAILABLE AGE 482 | echo 3/3 3 3 71m 483 | 484 | ### Scaling (down) workloads 485 | 486 | To horizontally scale to zero, AKA delete the pods and prevent them from coming back, use `--replicas` again, but specify 0 pods: `--replicas=0`. Alternately, if you want to completely get rid of the deployment, use either `kubectl delete deployment/echo` (imperative) or `kubectl delete -f deployment.yaml` (declarative). With the latter, kubectl is reading the deployment manifest we wrote, and removing it. 487 | 488 | Either way, once done, we can verify that it's gone: 489 | 490 | ❯ kubectl get deployment 491 | No resources found in default namespace. 492 | 493 | ## Namespaces 494 | 495 | We've briefly mentioned namespaces so far, but all the work has been done in the default namespace. This is generally a bad idea - namespaces are a way of organizing and restricting resources. We can limit a given namespace to X CPUs and Y memory, we can restrict the rights of workloads inside that namespace, and it makes it easier to keep track of things when running various `kubectl` commands if it's scoped to a namespace. 496 | 497 | Let's create one imperatively and use it, and then create another declaratively. 498 | 499 | ❯ kubectl create namespace echo 500 | namespace/echo created 501 | 502 | Now let's deploy our echo app in the new namespace: 503 | 504 | ❯ kubectl apply -f deployment.yaml -n echo 505 | deployment.apps/echo created 506 | 507 | Note that the pod isn't in the default namespace anymore: 508 | 509 | ❯ kubectl get pods 510 | No resources found in default namespace. 511 | 512 | ❯ kubectl get pods -n echo 513 | NAME READY STATUS RESTARTS AGE 514 | echo-d97d96459-s2bvk 1/1 Running 0 79s 515 | 516 | Now, let's delete it and then do it again declaratively (delete the deployment however you'd like, as described earlier). 517 | 518 | sed -i '/^spec:/i \ \ namespace: echo' deployment.yaml 519 | 520 | This adds a properly spaced `.metadata.namespace` line to the deployment manifest, looking for the target `^spec:` line and then going immediately before that. 521 | 522 | ❯ kubectl get pods -n echo 523 | NAME READY STATUS RESTARTS AGE 524 | echo-d97d96459-9l2jw 1/1 Running 0 4s 525 | 526 | There's our pod! What if we wanted to declaratively create the namespace, as well? Let's delete the namespace, which will also delete the deployment (not recommended in prod due to finalizers, but for this example it's fine): 527 | 528 | ❯ kubectl delete namespace echo 529 | namespace/echo deleted 530 | 531 | --- 532 | ex deployment.yaml < helm/Chart.yaml 575 | apiVersion: v2 576 | name: echo 577 | description: A Helm chart for a simple echo app 578 | version: 0.1.0 579 | appVersion: 0.1.0 580 | EOF 581 | 582 | `version` is the version of the Helm chart, whereas `appVersion` is the version of the application. They should both use semantic versioning. `apiVersion` would be `v1` if you needed Helm v2 compatibility, but no one should be using Helm v2 these days, so stick with `apiVersion: v2`. 583 | 584 | cat << EOF > helm/templates/deployment.yaml 585 | apiVersion: apps/v1 586 | kind: Deployment 587 | metadata: 588 | name: echo 589 | namespace: echo 590 | spec: 591 | selector: 592 | matchLabels: 593 | app: echo 594 | replicas: 1 595 | template: 596 | metadata: 597 | labels: 598 | app: echo 599 | spec: 600 | containers: 601 | - name: echo 602 | image: localhost:5000/echo:latest 603 | imagePullPolicy: Always 604 | ports: 605 | - containerPort: 8080 606 | stdin: true 607 | tty: true 608 | EOF 609 | 610 | The eagle-eyed among you will note that this is largely the same, except that we've added a `containerPort` that we'll be talking to. 611 | 612 | cat << EOF > helm/templates/service.yaml 613 | apiVersion: v1 614 | kind: Service 615 | metadata: 616 | name: echo 617 | namespace: echo 618 | spec: 619 | ports: 620 | - protocol: TCP 621 | port: 8080 622 | targetPort: 8080 623 | selector: 624 | app: echo 625 | type: NodePort 626 | EOF 627 | 628 | The Service will connect our app out to the cluster - in this case, via a NodePort, which means that a random port will be opened on every node. `targetPort` is actually redundant here, as it defaults to the same port as `port`, but it's shown for education. In production, you would typically use a `LoadBalancer` service. 629 | 630 | cat << EOF > helm/templates/ingress.yaml 631 | apiVersion: networking.k8s.io/v1 632 | kind: Ingress 633 | metadata: 634 | name: echo 635 | namespace: echo 636 | annotations: 637 | nginx.ingress.kubernetes.io/rewrite-target: /$1 638 | spec: 639 | rules: 640 | - host: echo.internal 641 | http: 642 | paths: 643 | - path: / 644 | pathType: Prefix 645 | backend: 646 | service: 647 | name: echo 648 | port: 649 | number: 8080 650 | EOF 651 | 652 | We're using an Ingress here to route traffic to the service, and ultimately, to the pod. 653 | 654 | cat << EOF > helm/templates/namespace.yaml 655 | apiVersion: v1 656 | kind: Namespace 657 | metadata: 658 | name: echo 659 | EOF 660 | 661 | The Namespace definition hasn't changed. We could also rely on Helm to do this for us, with its `--create-namespace` flag. 662 | 663 | cat << EOF > helm/templates/NOTES.txt 664 | To access, please run the following command: 665 | 666 | sudo echo "$(minikube ip) echo.internal" >> /etc/hosts 667 | 668 | Then go to http://echo.internal inyour browser. 669 | 670 | To clean up, run the following command: 671 | 672 | sudo sed -i '$d' /etc/hosts 673 | EOF 674 | 675 | `NOTES.txt` is a special file for Helm, which it will render when you run `helm install` as helpful tips to the user. In this case, we're explaining how to edit the `/etc/hosts` file so that the URI resolves. 676 | 677 | ### App 678 | 679 | But wait, I hear you saying, the app didn't have any web server! You're correct, so let's remedy that quickly: 680 | 681 | mkdir -p echo/templates && \ 682 | cat << EOF > echo/echo.py 683 | #!/usr/bin/env python 684 | 685 | from flask import Flask, request, render_template 686 | 687 | app = Flask(__name__) 688 | 689 | @app.route("/") 690 | def form(): 691 | return render_template("index.html") 692 | 693 | @app.route("/", methods=["POST"]) 694 | def form_post(): 695 | return request.form["echo_input"] 696 | 697 | if __name__ == "__main__": 698 | app.run(host="0.0.0.0", port=8080, debug=True) 699 | EOF 700 | 701 | --- 702 | 703 | cat << EOF > echo/templates/index.html 704 | 705 | 706 | Echo (echo...) 707 | 708 | 709 |

Echo (echo...)

710 |
711 | 712 | 713 |
714 | 715 | 716 | EOF 717 | 718 | (No one will ever accuse me of being a frontend dev. I regret nothing.) 719 | 720 | We need to make sure Docker can install Flask (ideally this would be pinned to a specific version): `echo "flask" > echo/requirements.txt` 721 | 722 | Next, we need to update the `Dockerfile`. 723 | 724 | cat << EOF > echo/Dockerfile 725 | FROM python:3.10-alpine 726 | 727 | WORKDIR /app 728 | 729 | COPY . /app 730 | 731 | RUN pip install -r requirements.txt 732 | 733 | EXPOSE 8080 734 | 735 | CMD ["python", "/app/echo.py"] 736 | EOF 737 | 738 | If you're using a local registry, you'll also need build, tag, and push this new image: 739 | 740 | docker build -t echo echo && \ 741 | docker tag echo:latest localhost:5000/echo:latest && \ 742 | docker push localhost:5000/echo:latest 743 | 744 | ### Installation 745 | 746 | To install the Chart, let's first see what it would do: 747 | 748 | ❯ helm install --dry-run --debug echo helm/ 749 | install.go:178: [debug] Original chart version: "" 750 | install.go:195: [debug] CHART PATH: /Users/sgarland/git/zapier/intro-to-x/k8s/helm 751 | 752 | NAME: echo 753 | LAST DEPLOYED: Fri May 6 17:02:50 2022 754 | NAMESPACE: default 755 | STATUS: pending-install 756 | REVISION: 1 757 | TEST SUITE: None 758 | USER-SUPPLIED VALUES: 759 | {} 760 | 761 | COMPUTED VALUES: 762 | {} 763 | 764 | HOOKS: 765 | MANIFEST: 766 | --- 767 | # Source: echo/templates/namespace.yaml 768 | apiVersion: v1 769 | kind: Namespace 770 | metadata: 771 | name: echo 772 | --- 773 | # Source: echo/templates/service.yaml 774 | apiVersion: v1 775 | kind: Service 776 | metadata: 777 | name: echo 778 | namespace: echo 779 | spec: 780 | ports: 781 | - protocol: TCP 782 | port: 8080 783 | targetPort: 8080 784 | selector: 785 | app: echo 786 | type: NodePort 787 | --- 788 | # Source: echo/templates/deployment.yaml 789 | apiVersion: apps/v1 790 | kind: Deployment 791 | metadata: 792 | name: echo 793 | namespace: echo 794 | spec: 795 | selector: 796 | matchLabels: 797 | app: echo 798 | replicas: 1 799 | template: 800 | metadata: 801 | labels: 802 | app: echo 803 | spec: 804 | containers: 805 | - name: echo 806 | image: localhost:5000/echo:latest 807 | imagePullPolicy: Always 808 | ports: 809 | - containerPort: 8080 810 | stdin: true 811 | tty: true 812 | --- 813 | # Source: echo/templates/ingress.yaml 814 | apiVersion: networking.k8s.io/v1 815 | kind: Ingress 816 | metadata: 817 | name: echo 818 | namespace: echo 819 | annotations: 820 | nginx.ingress.kubernetes.io/rewrite-target: / 821 | spec: 822 | rules: 823 | - host: echo.internal 824 | http: 825 | paths: 826 | - path: / 827 | pathType: Prefix 828 | backend: 829 | service: 830 | name: echo 831 | port: 832 | number: 8080 833 | 834 | NOTES: 835 | To access, please run the following command: 836 | 837 | sudo echo "$(minikube ip) echo.internal" >> /etc/hosts 838 | 839 | Then go to http://echo.internal inyour browser. 840 | 841 | To clean up, run the following command: 842 | 843 | sudo sed -i '$d' /etc/hosts 844 | 845 | Looks good! To install it, we can use the `ugprade` command with the `--install` flag - this way, if we need to make any changes, we don't have to type out a new command. 846 | 847 | ❯ helm upgrade --install echo helm 848 | Release "echo" does not exist. Installing it now. 849 | NAME: echo 850 | LAST DEPLOYED: Fri May 6 17:04:43 2022 851 | NAMESPACE: default 852 | STATUS: deployed 853 | REVISION: 1 854 | TEST SUITE: None 855 | NOTES: 856 | To access, please run the following command: 857 | 858 | sudo echo "$(minikube ip) echo.internal" >> /etc/hosts 859 | 860 | Then go to http://echo.internal inyour browser. 861 | 862 | To clean up, run the following command: 863 | 864 | sudo sed -i '$d' /etc/hosts 865 | 866 | Let's add the `/etc/hosts` entry, then we can test it out! Note that if you're using the `docker` driver, you'll need to first run `minikube service echo -n echo --url` in a separate terminal, and keep it open for the next step. Also, replace the address you cURL to with the one minikube gives you (the ingress is largely useless here, although you could add it to `/etc/hosts` if you'd like). 867 | 868 | sudo echo "$(minikube ip) echo.internal" >> /etc/hosts 869 | --- 870 | ❯ curl -d 'echo_input=Hello, world!' -X POST http://echo.internal 871 | Hello, world! 872 | 873 | # RBAC 874 | 875 | RBAC is Role-based Access Control. It's a way to control access to resources based on a user (or group's) role, rather than their identity. The assumption is that you have something else (like Okta) to authenticate the user, and then, RBAC will control that user's ability to access or modify resources. 876 | 877 | ## Example 878 | 879 | ### Generating a certificate 880 | 881 | We're going to create a certificate and user to demonstrate how RBAC works. 882 | 883 | mkdir cert && openssl genrsa -out cert/echo-user.key 4096 && openssl req -new \ 884 | -key cert/echo-user.key -out cert/echo-user.csr -subj "/CN=echo-user/O=echo-group" \ 885 | && openssl x509 -req -in cert/echo-user.csr -CA ~/.minikube/ca.crt \ 886 | -CAkey ~/.minikube/ca.key -CAcreateserial -out cert/echo-user.crt -days 365 \ 887 | || echo "Failed to create cert! Please check that ~/.minikube/ca.{crt,key} exist." 888 | 889 | Should result in the following: 890 | 891 | Generating RSA private key, 4096 bit long modulus 892 | ...........................++ 893 | ...........................++ 894 | e is 65537 (0x10001) 895 | Signature ok 896 | subject=/CN=echo-user/O=echo-group 897 | Getting CA Private Key 898 | 899 | This one-liner uses the `openssl` tool to first create a 4096-bit RSA private key, then requests a Certificate Request using that key, and finally creates a certificate signed by the Minikube Certificate Authority, with an expiry of 365 days. The ending part, if you're not familiar with shell, is an `OR` that only executes if the previous command fails - since that command is relying on two files existing in `~/.minikube`, there's a pretty good chance that they're the reason for the failure, hence the message. 900 | 901 | ### Creating a user 902 | 903 | Now, we're going to create a user entry in our kubeconfig, then create a context using it. 904 | 905 | ❯ kubectl config set-credentials echo-user --client-certificate=cert/echo-user.crt \ 906 | --client-key=cert/echo-user.key 907 | User "echo-user" set. 908 | 909 | ❯ kubectl config set-context echo-user-context --cluster=minikube --user=echo-user 910 | Context "echo-user-context" created. 911 | 912 | ❯ kubectl config use-context echo-user-context 913 | Switched to context "echo-user-context". 914 | 915 | ### Testing out the user 916 | 917 | Let's create a namespace again: 918 | 919 | ❯ kubectl create ns foobar 920 | Error from server (Forbidden): namespaces is forbidden: User "echo-user" cannot create resource "namespaces" in API group "" at the cluster scope 921 | 922 | Since Minikube is installed with its default context of `minikube`, this additional user we've added has no permissions to do, well, anything. Try `kubectl get pods` or some other read-only action, and check the result. 923 | 924 | ### Adding RBAC 925 | 926 | RBAC definitions consist of two parts - Role, and RoleBinding - and are are either scoped to a cluster, or to a namespace. Helpfully, cluster-scoped RBAC objects are named ClusterRoles and ClusterRoleBindings. 927 | 928 | cat << EOF > helm/templates/rbac.yaml 929 | apiVersion: rbac.authorization.k8s.io/v1 930 | kind: Role 931 | metadata: 932 | name: echo-ro 933 | namespace: echo 934 | rules: 935 | - apiGroups: [""] 936 | resources: ["pods"] 937 | verbs: ["get", "list", "watch"] 938 | --- 939 | apiVersion: rbac.authorization.k8s.io/v1 940 | kind: RoleBinding 941 | metadata: 942 | name: echo-ro 943 | namespace: echo 944 | subjects: 945 | - kind: User 946 | name: echo-user 947 | apiGroup: rbac.authorization.k8s.io 948 | roleRef: 949 | kind: Role 950 | name: echo-ro 951 | apiGroup: rbac.authorization.k8s.io 952 | EOF 953 | 954 | This is two RBAC objects in one file - a Role, and a RoleBinding. They're both scoped to the `echo` namespace, and as the name implies, they create a read-only role for the `echo-user` user we previously created. Note that you'll need to switch back to the `minikube` context to apply this (do you remember how?). 955 | 956 | ❯ helm upgrade --install echo helm 957 | Release "echo" has been upgraded. Happy Helming! 958 | NAME: echo 959 | LAST DEPLOYED: Wed May 18 10:51:41 2022 960 | NAMESPACE: default 961 | STATUS: deployed 962 | REVISION: 2 963 | TEST SUITE: None 964 | NOTES: 965 | To access, please run the following command: 966 | 967 | sudo echo "$(minikube ip) echo.internal" >> /etc/hosts 968 | 969 | Then go to http://echo.internal inyour browser. 970 | 971 | To clean up, run the following command: 972 | 973 | sudo sed -i '$d' /etc/hosts 974 | 975 | ### Verifying RBAC 976 | 977 | We can of course use `kubectl get role -n echo` and `kubectl get rolebinding -n echo` to view our newly-available RBAC, but `kubectl` includes a very useful feature called `kubectl auth can-i` which allows you to check if you have the ability to do a given action, as a given user and/or group. Cluster administrators can impersonate another user (this is very useful for SREs) with the `--as user.name` flag, but anyone can use it to check their current ability. 978 | 979 | ❯ kubectl config use-context echo-user-context 980 | Switched to context "echo-user-context". 981 | 982 | ❯ kubectl auth can-i get pods -n echo 983 | yes 984 | 985 | ❯ kubectl auth can-i get pods -n echo --as foobar 986 | Error from server (Forbidden): users "foobar" is forbidden: User "echo-user" cannot impersonate resource "users" in API group "" at the cluster scope 987 | 988 | ❯ kubectl auth can-i create pods -n echo 989 | no 990 | 991 | ❯ kubectl auth can-i get pods -n kube-system 992 | no 993 | 994 | ❯ kubectl auth can-i create pods --subresource exec -n echo 995 | no 996 | 997 | This last one can be problematic, and indeed, is/was the source of much pain in SRE land as developers were unable to exec into bastion pods. `exec` is a subset of `create`, and specific permission must be granted to do so. If you try without having the requisite permission, you'll see this: 998 | 999 | ❯ kubectl exec -it -n echo echo-75897c68fd-nhn64 -- sh 1000 | Error from server (Forbidden): pods "echo-75897c68fd-nhn64" is forbidden: User "echo-user" cannot create resource "pods/exec" in API group "" in the namespace "echo" 1001 | 1002 | ### Modifying RBAC 1003 | 1004 | ex helm/templates/rbac.yaml <> helm/templates/deployment.yaml 1048 | resources: 1049 | limits: 1050 | cpu: 100m 1051 | memory: 128Mi 1052 | requests: 1053 | cpu: 50m 1054 | memory: 50Mi 1055 | EOF 1056 | 1057 | Now run a `helm upgrade` cycle (make sure you're back to the `minikube` context), then examine the deployment, and finally, the pod's YAML manifest to view changes. 1058 | 1059 | ## Exploring limits and requests 1060 | 1061 | Play around (you can use `kubectl edit deployment` to speed things up) with limits and requests, and see how the scheduler and kubelet respond to combinations. 1062 | 1063 | # More to explore 1064 | 1065 | * This application could be put behind a load balancer (you could set up [MetalLB](https://metallb.universe.tf/) locally if you'd like), with additional replicas. 1066 | * This application runs as root, which is not recommended. How could you fix that? 1067 | * User entries could be captured and sent to a database stored in a dynamically generated Persistent Volume, with additional routes enabling historical views. 1068 | * HPA (Horizontal Pod Autoscaler) could be set up, along with some load testing mechanism, to demonstrate how Kubernetes will scale the application in response to demand. 1069 | * KEDA (Kubernetes Event-driven Autoscaling) could be set up to automatically scale on metrics other than CPU or Memory. 1070 | -------------------------------------------------------------------------------- /k8s/k8s-102.md: -------------------------------------------------------------------------------- 1 | # WIP DRAFT 2 | 3 | ### Viewing resources in a container 4 | 5 | # Memory limits are in /sys/fs/cgroup/memory/memory.limit_in_bytes 6 | # We can use the bash-ism `(())` to do math, converting it to MiB 7 | # Alternately if you have `bc`, you can use that, as well as `awk` 8 | ❯ echo $(($(< /sys/fs/cgroup/memory/memory.limit_in_bytes) / 1048576)) 9 | 2048 10 | 11 | # Memory requests would be in /sys/fs/cgroup/memory/memory.soft_limit_in_bytes if 12 | # Kubernetes followed normal Linux memory accounting practices, but it doesn't 13 | 14 | ❯ cat /sys/fs/cgroup/memory/memory.soft_limit_in_bytes 15 | 9223372036854771712 16 | 17 | 18 | Wondering what on earth 9223372036854771712 bytes is? Is this a hint? 19 | ❯ printf "%x\n" $(< /sys/fs/cgroup/memory/memory.soft_limit_in_bytes) 20 | 7ffffffffffff000 21 | 22 | # CPU requests are in /sys/fs/cgroup/cpu/cpu.shares, with a single core/vCPU being equal to 1024 23 | # This is thus 256 / 1024 == 0.25 24 | ❯ cat /sys/fs/cgroup/cpu/cpu.shares 25 | 256 26 | 27 | # CPU limits have to be calculated, as it's a combination of quota and period 28 | ❯ cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us 29 | 150000 30 | 31 | ❯ cat /sys/fs/cgroup/cpu/cpu.cfs_period_us 32 | 100000 33 | 34 | # So, CPU limits are: 35 | ❯ echo $(($(< /sys/fs/cgroup/cpu/cpu.cfs_quota_us) / $(< /sys/fs/cgroup/cpu/cpu.cfs_period_us))) 36 | 1 # ??? 37 | 38 | # Bash doesn't handle floats, as it turns out - the answer is 1.5 vCPUs 39 | ❯ awk -v quota="$(< /sys/fs/cgroup/cpu/cpu.cfs_quota_us)" \ 40 | -v period="$(< /sys/fs/cgroup/cpu/cpu.cfs_period_us)" \ 41 | '{print quota/period}' <(echo) 42 | 1.5 43 | 44 | ### Viewing resources on the host 45 | 46 | So what if you want to view a given container's resources from the host? More Linux internals, I'm afraid. 47 | 48 | This specific example comes from my homelab (so does the above), but once we have requests and limits set for our application, we can circle back and view them on the `minikube` node. 49 | 50 | # I'm going to look for an app called `radarr` that I know is running on this node 51 | 52 | dell01-k3s-worker-01 [~]$ ps -ax | grep radarr 53 | 3028 ? S 0:00 s6-supervise radarr 54 | 3030 ? Ssl 1159:40 /app/radarr/bin/Radarr -nobrowser -data=/config 55 | 10484 pts/0 R+ 0:00 grep radarr 56 | 57 | # Then, I'll look at its `/proc` filesystem entry 58 | dell01-k3s-worker-01 [~]$ cat /proc/3030/cgroup 59 | 15:name=openrc:/k3s-service 60 | 14:name=systemd:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543 61 | 13:rdma:/ 62 | 12:pids:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543 63 | 11:hugetlb:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543 64 | 10:net_prio:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543 65 | 9:perf_event:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543 66 | 8:net_cls:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543 67 | 7:freezer:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543 68 | 6:devices:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543 69 | 5:memory:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543 70 | 4:blkio:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543 71 | 3:cpuacct:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543 72 | 2:cpu:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543 73 | 1:cpuset:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543 74 | 0::/k3s-service 75 | 76 | # cgroups inherit from their parents, incidentally, so everything here is inheriting 77 | # from both the `burstable` and `kubepods` cgroups 78 | 79 | # We'll use `awk` to grab what we want from that list, then command substitution 80 | dell01-k3s-worker-01 [~]$ ls -l /sys/fs/cgroup/memory/$(awk -F: '/memory/ {print $NF}' /proc/3030/cgroup) 81 | total 0 82 | -rw-r--r-- 1 root root 0 May 18 15:39 cgroup.clone_children 83 | --w--w--w- 1 root root 0 May 5 17:50 cgroup.event_control 84 | -rw-r--r-- 1 root root 0 May 18 15:51 cgroup.procs 85 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.failcnt 86 | --w------- 1 root root 0 May 18 15:51 memory.force_empty 87 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.failcnt 88 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.limit_in_bytes 89 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.max_usage_in_bytes 90 | -r--r--r-- 1 root root 0 May 18 15:51 memory.kmem.slabinfo 91 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.tcp.failcnt 92 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.tcp.limit_in_bytes 93 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.tcp.max_usage_in_bytes 94 | -r--r--r-- 1 root root 0 May 18 15:39 memory.kmem.tcp.usage_in_bytes 95 | -r--r--r-- 1 root root 0 May 18 15:39 memory.kmem.usage_in_bytes 96 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.limit_in_bytes 97 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.max_usage_in_bytes 98 | -rw-r--r-- 1 root root 0 May 18 15:51 memory.move_charge_at_immigrate 99 | -r--r--r-- 1 root root 0 May 18 15:39 memory.numa_stat 100 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.oom_control 101 | ---------- 1 root root 0 May 18 15:51 memory.pressure_level 102 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.soft_limit_in_bytes 103 | -r--r--r-- 1 root root 0 May 18 15:39 memory.stat 104 | -rw-r--r-- 1 root root 0 May 18 15:51 memory.swappiness 105 | -r--r--r-- 1 root root 0 May 18 15:39 memory.usage_in_bytes 106 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.use_hierarchy 107 | -rw-r--r-- 1 root root 0 May 18 15:51 notify_on_release 108 | -rw-r--r-- 1 root root 0 May 18 15:51 tasks 109 | 110 | # Looks familiar, right? 111 | 112 | dell01-k3s-worker-01 [~]$ echo $(($(< /sys/fs/cgroup/memory/$(awk -F: '/memory/ {print $NF}' /proc/3030/cgroup)/memory.limit_in_bytes) / 1048576)) 113 | 2048 114 | 115 | # Finding the CPU information from the host is left as an exercise for the reader. 116 | 117 | ## Setting resource limits and requests 118 | 119 | cat << EOF >> helm/templates/deployment.yaml 120 | resources: 121 | limits: 122 | cpu: 100m 123 | memory: 128Mi 124 | requests: 125 | cpu: 50m 126 | memory: 50Mi 127 | EOF 128 | 129 | Now run a `helm upgrade` cycle (make sure you're back to the `minikube` context), then exec back into the pod to examine it. 130 | 131 | /app # echo $(($(< /sys/fs/cgroup/memory/memory.usage_in_bytes) / 1048576)) 132 | sh: arithmetic syntax error 133 | 134 | # As it turns out, the $(< ) command is a bash-ism for `cat`, and this is `sh`, not `bash` 135 | 136 | /app # echo $(($(cat /sys/fs/cgroup/memory/memory.usage_in_bytes) / 1048576)) 137 | 37 138 | 139 | # So, our app is using about 37 MiB of memory. 140 | 141 | /app # echo $(($(cat /sys/fs/cgroup/memory/memory.limit_in_bytes) / 1048576)) 142 | 128 143 | 144 | And we can see that our 128 MiB limit has been set. -------------------------------------------------------------------------------- /mysql/mysql-101-0.md: -------------------------------------------------------------------------------- 1 | # MySQL 101 Part I 2 | 3 | - [MySQL 101 Part I](#mysql-101-part-i) 4 | - [Prerequisites](#prerequisites) 5 | - [MySQL Client](#mysql-client) 6 | - [GUI](#gui) 7 | - [TUI](#tui) 8 | - [Introduction](#introduction) 9 | - [What is SQL?](#what-is-sql) 10 | - [What is a relational database?](#what-is-a-relational-database) 11 | - [What is ACID?](#what-is-acid) 12 | - [What is MySQL?](#what-is-mysql) 13 | - [How is it pronounced?](#how-is-it-pronounced) 14 | - [Basic definitions](#basic-definitions) 15 | - [SQL sub-languages](#sql-sub-languages) 16 | - [Other definitions](#other-definitions) 17 | - [MySQL Components](#mysql-components) 18 | - [MySQL Operations](#mysql-operations) 19 | - [Assumptions](#assumptions) 20 | - [Notes](#notes) 21 | - [Schemata](#schemata) 22 | - [Schema spelunking](#schema-spelunking) 23 | - [String literals](#string-literals) 24 | - [SQL\_MODE](#sql_mode) 25 | - [Create a schema](#create-a-schema) 26 | - [Table operations](#table-operations) 27 | - [Create tables](#create-tables) 28 | - [Data types](#data-types) 29 | - [Foreign keys](#foreign-keys) 30 | - [Why you might want foreign keys](#why-you-might-want-foreign-keys) 31 | - [Creating a foreign key](#creating-a-foreign-key) 32 | - [Demonstrating a foreign key](#demonstrating-a-foreign-key) 33 | - [Determining table size](#determining-table-size) 34 | - [Column operations](#column-operations) 35 | - [Adding columns](#adding-columns) 36 | - [Modfying columns](#modfying-columns) 37 | - [Dropping tables with foreign keys](#dropping-tables-with-foreign-keys) 38 | - [Copied table definitions](#copied-table-definitions) 39 | - [Copied table data and truncating](#copied-table-data-and-truncating) 40 | - [Transactions](#transactions) 41 | - [Generated columns](#generated-columns) 42 | - [Invisible columns](#invisible-columns) 43 | 44 | ## Prerequisites 45 | 46 | ### MySQL Client 47 | 48 | You'll need to have a MySQL client. In order of preference, some options for GUI (graphical) and TUI (terminal) are: 49 | 50 | #### GUI 51 | 52 | - [Sequel Ace](https://sequel-ace.com/) 53 | - Install from App Store, or with [Homebrew](https://brew.sh/): `HOMEBREW_NO_AUTO_UPDATE=1 brew install --cask sequel-ace` 54 | - [MySQL Workbench](https://www.mysql.com/products/workbench/) 55 | - [DBeaver](https://dbeaver.io/) 56 | 57 | #### TUI 58 | 59 | - [mysql-client](https://dev.mysql.com/doc/refman/8.0/en/mysql.html) 60 | - Install with [Homebrew](https://brew.sh/): `HOMEBREW_NO_AUTO_UPDATE=1 brew install mysql-client` 61 | 62 | 63 | Note that the server is currently using a self-signed TLS certificate, which some clients may complain about. Sequel Ace, MySQL Workbench, and msyql-client are proven to work without issue. Also note that mysql-client is available via [Homebrew](https://formulae.brew.sh/formula/mysql-client), but it won't symlink by default, so you'll need to do something like `brew link --force mysql-client`. 64 | 65 | WARNING: MySQL Workbench may not work with M1/M2 (ARM) Macs. 66 | 67 | ## Introduction 68 | 69 | ## What is SQL? 70 | 71 | Structured Query Language. It's a domain-specific language designed to manage data in a Relational Database Management System (RDBMS). It's been extended and updated many times, both in its official ANSI definition, and in implementations of it like MySQL and PostgreSQL. 72 | 73 | ## What is a relational database? 74 | 75 | It's what most people probably think of when they think of a database. Broadly speaking, data is related to other data in some manner. For example, observe these two tables (tl;dr a logical grouping of data): 76 | 77 | ```sql 78 | SHOW COLUMNS FROM users; 79 | ``` 80 | 81 | ```sql 82 | +------------+----------+------+-----+---------+----------------+ 83 | | Field | Type | Null | Key | Default | Extra | 84 | +------------+----------+------+-----+---------+----------------+ 85 | | id | bigint | NO | PRI | NULL | auto_increment | 86 | | first_name | char(64) | YES | | NULL | | 87 | | last_name | char(64) | YES | | NULL | | 88 | | user_id | bigint | NO | UNI | NULL | | 89 | +------------+----------+------+-----+---------+----------------+ 90 | 4 rows in set (0.09 sec) 91 | ``` 92 | 93 | ```sql 94 | SHOW COLUMNS FROM zaps; 95 | ``` 96 | 97 | ```sql 98 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+ 99 | | Field | Type | Null | Key | Default | Extra | 100 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+ 101 | | id | bigint unsigned | NO | PRI | NULL | auto_increment | 102 | | zap_id | bigint unsigned | NO | UNI | NULL | | 103 | | created_at | timestamp | NO | | CURRENT_TIMESTAMP | DEFAULT_GENERATED | 104 | | last_updated_at | timestamp | YES | | NULL | on update CURRENT_TIMESTAMP | 105 | | owned_by | bigint unsigned | NO | MUL | NULL | | 106 | | shared_with | json | YES | | json_array() | DEFAULT_GENERATED | 107 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+ 108 | 6 rows in set (0.01 sec) 109 | ``` 110 | 111 | Table `users` has four columns - `id`, `first_name`, `last_name`, and `user_id`. Table `zaps` has six columns - `id`, `zap_id`, `created_at`, `last_updated_at`, `owned_by`, and `shared_with`. 112 | 113 | Although it isn't explicitly defined or enforced, there is an implicit relationship between these two tables via `users.user_id` and `zaps.owned_by`. Thus, a query like `SELECT zap_id, owned_by FROM zaps JOIN users ON user_id = owned_by;` could use that relationship. Ideally, there would be additional constraints like foreign keys established to ensure referential integrity, but this example suffices for now. 114 | 115 | Also, generally speaking, RDBMS are ACID-compliant (but not always). 116 | 117 | ## What is ACID? 118 | 119 | ACID is a set of four properties that, if implemented correctly, guarantee data validity: 120 | 121 | - Atomicity 122 | - In a given transaction, each statement must either completely succeed, or fail. If any statement in a transaction fails, the entire transaction must fail. 123 | - Consistency 124 | - A given transaction can only move a database from one valid and consistent state to another. 125 | - Isolation 126 | - Even with concurrent transactions executing, the database must end up in the same state as if each transaction were executed sequentially. 127 | - Durability 128 | - Once a transaction is committed, it must remain committed in the event of a system failure. 129 | 130 | Note that the lack of one or more of these properties does not necessarily mean that data committed is invalid, only that the guarantees granted by that particular property must be accounted for elsewhere. A common counter-example of this is Eventual Consistency with distributed systems. 131 | 132 | ### What is MySQL? 133 | 134 | It's an extremely popular row-based relational database implementing and extending ANSI SQL. It's unfortunately owned by Oracle, but if you'd prefer, the MariaDB fork is essentially the same thing. 135 | 136 | #### How is it pronounced? 137 | 138 | Officially, "My Ess Que Ell," but since the SQL language was originally called SEQUEL ("Structured English Query Language"), and only changed due to trademark issues, I feel at ease saying "My Sequel." However, this tends to bring out pedants who love to haughtily correct your pronunciation, so do what you will. For what it's worth, I also pronounce kubectl (the Kubernetes CLI tool) as "kube cuddle," so I may not be the greatest influence. 139 | 140 | ## Basic definitions 141 | 142 | ### SQL sub-languages 143 | 144 | All of these can be grouped as SQL, and some of them can also be combined - `DQL` is often merged with `DML`, for example. Knowing that `DML` is generally operating on a single record at a time (but may be batched), and that `DDL` is generally operating on an entire table or schema at a time suffices for now. 145 | 146 | - DCL 147 | - Data Control Language. `GRANT`, `REVOKE`. 148 | - DDL 149 | - Data Definition Language. `ALTER`, `CREATE`, `DROP`, `TRUNCATE`. 150 | - DML 151 | - Data Manipulation Language. `CALL`, `DELETE`, `INSERT`, `LOCK`, `SELECT (with FROM or WHERE)`, `UPDATE`. 152 | - DQL 153 | - Data Query Language. `SELECT`. 154 | - TCL 155 | - Transaction Control Language. `COMMIT`, `ROLLBACK`, `SAVEPOINT`. 156 | 157 | ### Other definitions 158 | 159 | - B+ tree 160 | - An _m_`-ary tree` data structure that is self-balancing, with a variable number of children per node. It differs from the `B-tree` in that an individual data node can have either keys or children, but not both. It has `O(log(n))` time complexity for insertion, search, and deletion. It is frequently used both for filesystems and for RDBMS. 161 | - Block 162 | - The lowest reasonable level of data storage (above individual bits). Historically sized at 512 bytes due to hard drive sector sizes, but generally sized at 4 KiB in modern drives, and SSDs. Enterprise drives sometimes have 520 byte block sizes (or 4160 bytes for the 4 KiB-adjacent size), with the extra 8 bytes being used for data integrity calculations. 163 | - Filesystem 164 | - A method for the operating system to store data. May include features like copy-on-write, encryption, journaling, pre-allocation, SSD management, volume management, and more. Modern examples include APFS (default for Apple products), ext4 (default for most Linux distributions), NTFS (default for Windows), XFS (default for Red Hat and its downstream), and ZFS (default for FreeBSD). 165 | - Schema 166 | - A logical grouping of database objects, e.g. tables, indices, etc. Often called a database, but technically, the database may contain any number of schemas, each with its own unique (or shared!) set of data, access policies, etc. 167 | - Table 168 | - A logical grouping of data, of varying or similar types. May contain constraints, indices, etc. 169 | - Tablespace 170 | - The link between the logical storage layer (tables, indices) and the physical storage layer (the disk's filesystem). This is an actual file that exists on the disk, contained in `$MYSQL_DATA_DIR`, nominally `/var/lib/mysql`. 171 | - As an aside, this fact, combined with [RDS MySQL file size limits](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.KnownIssuesAndLimitations.html#MySQL.Concepts.Limits.FileSize) yields some interesting information about RDS. Since they used to (anything created before April 2014) limit a table to 2 TiB*, that means that they were using ext3, as that is its maximum file size. Instances created after April 2014 are limited to 16 TiB* files, indicating that they are probably now using ext4, as that is generally its maximum file size. 16 TB is also the limit for InnoDB with 4 KB InnoDB page sizes, so it's possible the underlying disk's filesystem is XFS or something else, but since that value defaults to 16 KB, it seems unlikely. 172 | 173 |
174 | What's a TiB? 175 | 176 | A TiB (or MiB, or GiB..) is how data is actually sized, in base-2. Written out, instead of Terabytes, it's Tebibytes, and is _2^40 bytes_ instead of _10^12 bytes_ (Terabytes are base-10). Base-10 caught on for storage marketing since the number is larger and thus sounds better, but in reality you're getting less. This is why a 1 TB hard drive shows up on your computer as having 931 GB - because it's actually 931 GiB, but it gets displayed as GB since GiB as a term never caught on. 177 | 178 | In specific relation to this point, AWS' docs state that the limits are in TB (terabytes) instead of TiB (tebibytes). It's possible that their VM subsystem limits the size to n TB, but the actual filesystem is capable of n TiB. 179 | 180 |
181 | 182 | # MySQL Components 183 | 184 | As of MySQL 8.0, this is the official architecture drawing: 185 | 186 | ![MySQL 8.0 architecture](https://cdn.zappy.app/a92561fb248524eb0927cc0ed618de52.png) 187 | 188 | * Connector 189 | * Also known as the Client, this is how you interact with the database, be it manually via a CLI client tool, or via a program using the DB. 190 | * Server 191 | * Parser 192 | * This component receives a human-readable query, and translates it into machine-readable commands, via a lexical scanner and a grammar rule module. 193 | * Optimizer 194 | * This component attempts to optimize a given query using its knowledge of the stored data, such that the relative compute time of the query is minimized. 195 | * Caches/Buffers 196 | * This component has various caches to store frequently-accessed data, temporary tables created for use by other queries, etc. 197 | * SQL Interface 198 | * This component is the link between the Connector and the rest of the Server. 199 | * Storage Engine 200 | * This component stores and manages the actual databases. Historically MySQL used the MyISAM engine, but switched to InnoDB with version 5.6. Both (and others) remain available if desired, but unless you have an extremely specific use case, you should use InnoDB. 201 | 202 | # MySQL Operations 203 | 204 | ## Assumptions 205 | 206 | - All examples here are using MySQL 8.0.23, with the InnoDB engine. 207 | - All examples here are using the mysql-client TUI program, but others may work as well. 208 | 209 | ## Notes 210 | 211 | - MySQL is case-insenitive for most, but not all operations. I'll use `UPPERCASE` to designate commands, and `lowercase` to desginate arguments and schema, table, and column names, but you're welcome to use all lowercase. 212 | - The `;` suffix to commands serves as both the command terminator, and specifies that the output should be in an ASCII table. 213 | - The `\G` suffix to commands is an alternative terminator, and specifies that the output should be in a vertical, non-tabular format. 214 | - Not all clients support this. If you're using a GUI client like Sequel Ace, you can simply scroll the output window horizontally, or expand it to make it bigger. 215 | - I'm formatting my queries with statements and clauses on the left, their arguments indented by two spaces, and any qualifiers on the same line, where possible. 216 | - This was developed on a Debian VM with 16 cores of a Xeon E5-2650 v2, 64 GiB of DDR3 RAM, and a working directory which is an NFS export over a 1GBe network, consisting of a ZFS RAIDZ2 array of spinning disks; ashift=12, blocksize=128K. Your times will vary, based mostly on the disk and RAM speed. 217 | 218 | ## Schemata 219 | 220 | A brand-new installation of MySQL will typically have four schemata - `information_schema`, `mysql`, `performance_schema`, and `sys`. 221 | 222 | - `information_schema` contains information about the schema in the database. This includes columns, column types, indices, foreign keys, and tables. 223 | - `mysql` generally contains configuration and logs. 224 | - `sys` generally contains information about the SQL engine (InnoDB here), including currently executing processes, and query metrics. 225 | - `performance_schema` contains some specific performance information about the schema in the database, such as deadlocks, locks, memory consumption, mutexes, and threads. 226 | 227 | ## Schema spelunking 228 | 229 | As mentioned, `databases` is often used to mean `schema`, and in fact in MySQL they're synonyms for this statement - `SHOW schemas` results in the exact same output. You won't have the `test` database yet, but you should see the other four shown below. NOTE: I'll demonstrate both output formats here, and will switch as needed to easily display the information. 230 | 231 | ```sql 232 | SHOW schemas; 233 | ``` 234 | 235 | ```sql 236 | +--------------------+ 237 | | Database | 238 | +--------------------+ 239 | | information_schema | 240 | | mysql | 241 | | northwind | 242 | | performance_schema | 243 | | sys | 244 | | test | 245 | +--------------------+ 246 | 6 rows in set (0.01 sec) 247 | ``` 248 | 249 | ```sql 250 | SHOW schemas\G 251 | ``` 252 | 253 | ```sql 254 | *************************** 1. row *************************** 255 | Database: information_schema 256 | *************************** 2. row *************************** 257 | Database: mysql 258 | *************************** 3. row *************************** 259 | Database: northwind 260 | *************************** 4. row *************************** 261 | Database: performance_schema 262 | *************************** 5. row *************************** 263 | Database: sys 264 | *************************** 6. row *************************** 265 | Database: test 266 | 6 rows in set (0.01 sec) 267 | ``` 268 | 269 | The `SHOW` statement behind the scenes is gathering and formatting data in a way that's easy for humans to see and understand. Often, it comes from the `information_schema` or `performance_schema` schema, as seen below. This query also demonstrates the use of the `AS` statement, which allows you to alias a column or sub-query. 270 | 271 | ```sql 272 | SELECT 273 | schema_name AS 'Database' 274 | FROM 275 | information_schema.schemata; 276 | ``` 277 | 278 | ```sql 279 | +--------------------+ 280 | | Database | 281 | +--------------------+ 282 | | mysql | 283 | | information_schema | 284 | | performance_schema | 285 | | sys | 286 | | test | 287 | | northwind | 288 | +--------------------+ 289 | 6 rows in set (0.01 sec) 290 | ``` 291 | 292 | ### String literals 293 | 294 | You may have noticed that in the above examples, sometimes a column or table name was enclosed with a single quote (`'`), sometimes a backtick ( \` ), and other times nothing at all. This is deliberate. 295 | 296 | In ANSI SQL, string literals are represented with single quotation marks, e.g. 'test.' This mode is disabled by default in MySQL, so you're free to use double quotation marks if you'd prefer; however if you were trying to pass in a command to the client from a shell (e.g. `mysql -e 'SELECT foo FROM bar'`), you might run into shell expansion issues depending on your query. Also, since you'll probably be working with other SQL implementations like Postgres, it's best to try to stay as neutral as possible. 297 | 298 | Backticks may be used at any time, and are called quoted identifiers. They tell the SQL parser to consider anything enclosed in them as a string literal. This may be useful if, for example, you created a table named `table` (please don't), had a column named `count`, etc. The full list of keywords / reserved words [is here](https://dev.mysql.com/doc/refman/8.0/en/keywords.html) if you want to see what to avoid. 299 | 300 | ```sql 301 | CREATE TABLE table (id INT); 302 | ``` 303 | 304 | ```sql 305 | ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table (id INT)' at line 1 306 | ``` 307 | 308 | vs. 309 | 310 | ```sql 311 | CREATE TABLE `table` (id INT); 312 | ``` 313 | 314 | ```sql 315 | Query OK, 0 rows affected (0.15 sec) 316 | ``` 317 | 318 | #### SQL_MODE 319 | 320 | As it turns out, you can alter this behavior. First, let's check the current `SQL_MODE`. System variables can be viewed with either `SHOW VARIABLES` or `SELECT @@<[GLOBAL, SESSION]>`. 321 | 322 | ```sql 323 | SHOW VARIABLES LIKE 'sql_mode'\G 324 | ``` 325 | 326 | ```sql 327 | *************************** 1. row *************************** 328 | Variable_name: sql_mode 329 | Value: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION 330 | 1 row in set (0.01 sec) 331 | ``` 332 | 333 | If neither `GLOBAL` or `SESSION` are specified when using the `@@` method, the session value is returned if it exists, otherwise the global value is returned. 334 | 335 | ```sql 336 | SELECT @@sql_mode\G 337 | ``` 338 | 339 | ```sql 340 | *************************** 1. row *************************** 341 | @@sql_mode: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION 342 | 1 row in set (0.00 sec) 343 | ``` 344 | 345 | We'll use the `mysql.user` table for this example. First, no quotes of any kind. As expected, we get the rows from those two columns. 346 | 347 | ```sql 348 | SELECT host, user FROM mysql.user; 349 | ``` 350 | 351 | ```sql 352 | +-------------+------------------+ 353 | | host | user | 354 | +-------------+------------------+ 355 | | % | zapier | 356 | | % | zapier_training | 357 | | 192.168.1.% | sgarland | 358 | | localhost | mysql.infoschema | 359 | | localhost | mysql.session | 360 | | localhost | mysql.sys | 361 | | localhost | root | 362 | +-------------+------------------+ 363 | 7 rows in set (0.01 sec) 364 | ``` 365 | 366 | Now, we'll mix single and double quotes. 367 | 368 | ```sql 369 | SELECT 'host', "user" FROM mysql.user; 370 | ``` 371 | 372 | ```sql 373 | +------+------+ 374 | | host | user | 375 | +------+------+ 376 | | host | user | 377 | | host | user | 378 | | host | user | 379 | | host | user | 380 | | host | user | 381 | | host | user | 382 | | host | user | 383 | +------+------+ 384 | 7 rows in set (0.00 sec) 385 | ``` 386 | 387 | In MySQL's default mode, these two are treated the same, and you get the respective string literals printed as rows for the selected columns. 388 | 389 | If single (or double) quotes are combined with backticks, you get partial results. 390 | 391 | ```sql 392 | SELECT 'host', `user` FROM mysql.user; 393 | ``` 394 | 395 | ```sql 396 | +------+------------------+ 397 | | host | user | 398 | +------+------------------+ 399 | | host | zapier | 400 | | host | zapier_training | 401 | | host | sgarland | 402 | | host | mysql.infoschema | 403 | | host | mysql.session | 404 | | host | mysql.sys | 405 | | host | root | 406 | +------+------------------+ 407 | 7 rows in set (0.00 sec) 408 | ``` 409 | 410 | Now, we'll modify the session's `sql_mode`. You don't have permission to set any global variables, but you can set most session variables. Unlike for the selection, if you don't specify `GLOBAL` or `SESSION`, the `SET` will always assume `SESSION`. 411 | 412 | ```sql 413 | SET @@sql_mode = ANSI_QUOTES; 414 | ``` 415 | 416 | ```sql 417 | Query OK, 0 rows affected (0.00 sec) 418 | 419 | mysql> SELECT @@sql_mode\G 420 | *************************** 1. row *************************** 421 | @@sql_mode: ANSI_QUOTES 422 | 1 row in set (0.00 sec) 423 | ``` 424 | 425 | Oh no, we've overridden all of the other settings! Luckily, the global variable hasn't been modified, so we can use it to build the correct setting. To do so, we'll use the `CONCAT_WS` function, which as the name implies, concatenates things with a separator. It takes the form `CONCAT_WS(sep, )`. We'll also run a `SELECT` of the global variable, nesting it as a sub-query. 426 | 427 | ```sql 428 | SET @@sql_mode = (SELECT CONCAT_WS(',', 'ANSI_QUOTES', (SELECT @@GLOBAL.sql_mode))); 429 | ``` 430 | 431 | ```sql 432 | Query OK, 0 rows affected (0.01 sec) 433 | ``` 434 | 435 | ``` 436 | SELECT @@sql_mode\G 437 | ``` 438 | 439 | ```sql 440 | *************************** 1. row *************************** 441 | @@sql_mode: ANSI_QUOTES,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION 442 | 1 row in set (0.00 sec) 443 | ``` 444 | 445 | Whew. Now we can try out the quoting differences again. 446 | 447 | ```sql 448 | SELECT 'host', "user" FROM mysql.user; 449 | ``` 450 | 451 | ```sql 452 | +------+------------------+ 453 | | host | USER | 454 | +------+------------------+ 455 | | host | zapier | 456 | | host | zapier_training | 457 | | host | sgarland | 458 | | host | mysql.infoschema | 459 | | host | mysql.session | 460 | | host | mysql.sys | 461 | | host | root | 462 | +------+------------------+ 463 | 7 rows in set (0.00 sec) 464 | ``` 465 | 466 | This time, only single quotes are treated as string literals, with double quotes being treated as identifiers. 467 | 468 | Now, set the `SESSION.sql_mode` back to its original value, using a sub-query like before. 469 | 470 | ```sql 471 | SET @@sql_mode = (SELECT @@GLOBAL.sql_mode); 472 | ``` 473 | 474 | ```sql 475 | Query OK, 0 rows affected (0.00 sec) 476 | ``` 477 | 478 | ### Create a schema 479 | 480 | Let's create some tables! First, we need a schema. There aren't a lot of options here to be covered, so we can just create one. I'll be using `foo`, but you should substitute any name you'd like that's not already in use. Ideally, we would also enable encryption at rest. This can be globally set, or specified at schema creation - any tables in the schema inherit its setting. If you're curious, InnoDB uses AES, with ECB mode for tablespaces, and CBC mode for data. Also notably, [undo logs](https://dev.mysql.com/doc/refman/8.0/en/innodb-undo-logs.html) and [redo logs](https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html) have their encryption handled by separate variables. However, since this requires some additional work (all of the easy options are only available with MySQL Enterprise; MySQL Community requires you to generate and store the key yourself), we'll skip it. 481 | 482 | ```sql 483 | CREATE SCHEMA foo; 484 | ``` 485 | 486 | ```sql 487 | Query OK, 1 row affected (0.02 sec) 488 | ``` 489 | 490 | ## Table operations 491 | 492 | ### Create tables 493 | 494 | First, we'll select our new schema so we don't have to constantly specify it. I'll be using `foo` here, but you should substitute whatever you created in the last step. 495 | 496 | ```sql 497 | USE foo; 498 | ``` 499 | 500 | Now, we'll create the `users` table. 501 | 502 | ```sql 503 | CREATE TABLE users ( 504 | id BIGINT PRIMARY KEY, 505 | first_name CHAR(64), 506 | last_name CHAR(64), 507 | uid BIGINT 508 | ); 509 | ``` 510 | 511 | ```sql 512 | Query OK, 0 rows affected (0.17 sec) 513 | ``` 514 | 515 | ```sql 516 | SHOW COLUMNS FROM users; 517 | ``` 518 | 519 | ```sql 520 | +------------+----------+------+-----+---------+-------+ 521 | | Field | Type | Null | Key | Default | Extra | 522 | +------------+----------+------+-----+---------+-------+ 523 | | id | bigint | NO | PRI | NULL | | 524 | | first_name | char(64) | YES | | NULL | | 525 | | last_name | char(64) | YES | | NULL | | 526 | | uid | bigint | YES | | NULL | | 527 | +------------+----------+------+-----+---------+-------+ 528 | 4 rows in set (0.02 sec) 529 | ``` 530 | 531 | Hmm, something's not quite right as compared to the original example - we're missing `AUTO_INCREMENT`! Without it, you'd have to manually specify the `id` value (which is this table's `PRIMARY KEY`), which is annoying. Additionally, while `id` was automatically made to be `NOT NULL` since it's the primary key, `uid` was not, so we need to change those (if you don't specify `NOT NULL`, MySQL defaults to `NULL`). Finally, `uid` should actually be named `user_id`, and it should have a `UNIQUE` constraint. 532 | 533 | NOTE: when redefining a column, it's like a `POST`, not a `PUT` - if you only specify what you want to be changed, the pre-existing definitions will be deleted. 534 | 535 | ```sql 536 | ALTER TABLE users MODIFY uid BIGINT NOT NULL UNIQUE; 537 | ``` 538 | 539 | ```sql 540 | Query OK, 0 rows affected (0.27 sec) 541 | Records: 0 Duplicates: 0 Warnings: 0 542 | ``` 543 | 544 | ```sql 545 | ALTER TABLE users MODIFY id BIGINT AUTO_INCREMENT; 546 | ``` 547 | 548 | ```sql 549 | Query OK, 0 rows affected (0.34 sec) 550 | Records: 0 Duplicates: 0 Warnings: 0 551 | ``` 552 | 553 | ```sql 554 | SHOW COLUMNS FROM users; 555 | ``` 556 | 557 | ```sql 558 | +------------+----------+------+-----+---------+----------------+ 559 | | Field | Type | Null | Key | Default | Extra | 560 | +------------+----------+------+-----+---------+----------------+ 561 | | id | bigint | NO | PRI | NULL | auto_increment | 562 | | first_name | char(64) | YES | | NULL | | 563 | | last_name | char(64) | YES | | NULL | | 564 | | uid | bigint | NO | UNI | NULL | | 565 | +------------+----------+------+-----+---------+----------------+ 566 | 4 rows in set (0.02 sec) 567 | ``` 568 | 569 | If you wanted to rename a column without specifying its definition, you can use `RENAME COLUMN`. 570 | 571 | ```sql 572 | ALTER TABLE users RENAME COLUMN uid TO user_id; 573 | ``` 574 | 575 | ```sql 576 | Query OK, 0 rows affected (0.12 sec) 577 | Records: 0 Duplicates: 0 Warnings: 0 578 | ``` 579 | 580 | Now, we'll make the `zaps` table. You have noticed by now that the primary key column `id` has been the first column in all of these definitions. While nothing stops you from placing it last, or in the middle, this is a bad idea for a variety of reasons, not least of which it's confusing for anyone used to normal ordering. There may be some small binpacking gains to be made by carefully matching column widths to page sizes (the default pagesize for InnoDB is 16 KB, and the default pagesize for most disks today is 4 KB), which can also impact performance on spinning disks. Also, prior to MySQL 8.0.13, temporary tables (usually, tables that InnoDB creates as part of a query) would silently cast `VARCHAR` and `VARBINARY` columns to their respective `CHAR` or `BINARY`. If you had some `VARCHAR` columns with a large maximum size, this could cause the required space to store them to rapidly balloon, filling up the disk. 581 | 582 | In general, column ordering in a table doesn't tremendously matter for MySQL (but it does for queries, as we'll see later), so stick to convention. 583 | 584 | ```sql 585 | CREATE TABLE zaps ( 586 | `id` BIGINT UNSIGNED PRIMARY KEY AUTO_INCREMENT, 587 | `zap_id` BIGINT UNSIGNED NOT NULL, 588 | `created_at` TIMESTAMP NOT NULL DEFAULT NOW(), 589 | `last_updated_at` TIMESTAMP NULL ON UPDATE NOW(), 590 | `owned_by` BIGINT UNSIGNED NOT NULL, 591 | UNIQUE(zap_id) 592 | ); 593 | ``` 594 | 595 | ```sql 596 | SHOW COLUMNS FROM zaps; 597 | ``` 598 | 599 | ```sql 600 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+ 601 | | Field | Type | Null | Key | Default | Extra | 602 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+ 603 | | id | bigint unsigned | NO | PRI | NULL | auto_increment | 604 | | zap_id | bigint unsigned | NO | UNI | NULL | | 605 | | created_at | timestamp | NO | | CURRENT_TIMESTAMP | DEFAULT_GENERATED | 606 | | last_updated_at | timestamp | YES | | NULL | on update CURRENT_TIMESTAMP | 607 | | owned_by | bigint unsigned | NO | | NULL | | 608 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+ 609 | 5 rows in set (0.00 sec) 610 | ``` 611 | 612 | We're introducing some new defaults here: 613 | * DEFAULT NOW() 614 | * With this, much like an `AUTO INCREMENTING` column, the current timestamp will be added to the `created_at` column when a new row is created. NOTE: This doesn't make the column immutable, and nothing stops someone from altering this value manually later. 615 | * ON UPDATE NOW() 616 | * For `last_updated_at`, while the default is `NULL`, whenever the row is updated, the current timestamp is added. 617 | 618 | `NOW()` is an alias for `CURRENT_TIMESTAMP`, and no, I didn't forget the function call on the right. For historical reasons, `CURRENT_TIMESTAMP` may be called with or without parentheses, but `NOW()` requires them. Similarly, generally any default value being declared that isn't a literal (e.g. `0`, `NULL`, etc.) is required to be wrapped in parentheses - see `(JSON_ARRAY())`. Again, for historical reasons, `TIMESTAMP` and `DATETIME` columns don't require this. Also, `JSON` _requires_ its default value to be wrapped in parentheses, even if the default is a literal (as do `BLOB`, `GEOMETRY`, and `TEXT`). See [MySQL docs on defaults](https://dev.mysql.com/doc/refman/8.0/en/data-type-defaults.html) for more information on this behavior, and [MySQL docs on timestamp initialization](https://dev.mysql.com/doc/refman/8.0/en/timestamp-initialization.html) for more information on timestamp column defaults. 619 | 620 | #### Data types 621 | 622 | What is the difference between a `VARCHAR` and a `CHAR`, and what is the integer after it? `CHAR` allocates precisely the amount of space required. If you specify that a column is 64 bytes wide, then you can store 64 bytes in it, and no matter if you're storing 1 byte or 64 bytes, the actual column usage will take 64 bytes - this is because the value is right-padded with spaces, and the trailing spaces are them removed when retrieved (by default - the trimming behavior can be modified, if desired). 623 | 624 | Let's try adding a 65-byte string to a column with a strict 64-byte limit - this can be done with the `LPAD` function, which takes the form `LPAD(, , ).` 625 | 626 | ```sql 627 | INSERT INTO users 628 | (first_name, last_name, user_id) 629 | VALUES 630 | ("Stephan", 631 | (SELECT LPAD("Garland", 65, " ")), 632 | 1 633 | ); 634 | ``` 635 | 636 | ```sql 637 | ERROR 1406 (22001): Data too long for column 'last_name' at row 1 638 | ``` 639 | 640 | Since people in different cultures may have longer names than I'm used to, making this column allowed to be wider than 64 bytes is probably a good idea, especially if there isn't a storage penalty for doing so. While a `VARCHAR` can technically be up to `2^16 - 1` bytes - the same as the row width limit - it's still a good idea to have some kind of reasonable limits in place, lest someone exploit a security hole and starting using your DB for Chia mining or something. 255 bytes was the historic maximum length allowed in older SQL implementations, and it's the maximum value that a `VARCHAR` can be stored with while having a 1-byte length prefix. Thus, we'll modify our columns to this standard. 641 | 642 | ```sql 643 | ALTER TABLE users 644 | MODIFY first_name VARCHAR(255), 645 | MODIFY last_name VARCHAR(255); 646 | ``` 647 | 648 | ```sql 649 | Query OK, 0 rows affected (0.13 sec) 650 | Records: 0 Duplicates: 0 Warnings: 0 651 | ``` 652 | 653 | ```sql 654 | SHOW COLUMNS FROM users; 655 | ``` 656 | 657 | ```sql 658 | +------------+--------------+------+-----+---------+----------------+ 659 | | Field | Type | Null | Key | Default | Extra | 660 | +------------+--------------+------+-----+---------+----------------+ 661 | | id | bigint | NO | PRI | NULL | auto_increment | 662 | | first_name | varchar(255) | YES | | NULL | | 663 | | last_name | varchar(255) | YES | | NULL | | 664 | | user_id | bigint | NO | UNI | NULL | | 665 | +------------+--------------+------+-----+---------+----------------+ 666 | 4 rows in set (0.01 sec) 667 | ``` 668 | 669 | What about ints? You may sometimes see an integer following an integer-type column definition, like `int(4)`. Confusingly, this has nothing to do with the maximum amount of data that can be stored in that column, and is only used for display. Even more confusingly, the MySQL client itself will ignore it, and show the entire stored number. Applications can choose whether or not to use the display width. In general, there's little reason to use this feature, and if you want to constrain display width, do so in your application. 670 | 671 | For floating points, MySQL supports `FLOAT` and `DOUBLE`, with the former being 4 bytes, and the latter 8 bytes. 672 | 673 | For exact precision numbers, MySQL supports `DECIMAL` and `NUMERIC`, and they are identical. 674 | 675 | There are also sub-types of `INT`, such as `SMALLINT` (2 bytes, storing a maximum value of `2^16 - 1` if unsigned), and `BIGINT`, as seen previously - it's 8 bytes, and stores a maximum value of `2^63 - 1` if signed, and `2^64 - 1` if unsigned. Since there's not much reason to have negative IDs, let's alter those definitions as well: 676 | 677 | ```sql 678 | ALTER TABLE users 679 | MODIFY id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, 680 | MODIFY user_id BIGINT UNSIGNED NOT NULL UNIQUE; 681 | ``` 682 | 683 | ```sql 684 | Query OK, 0 rows affected, 1 warning (0.10 sec) 685 | Records: 0 Duplicates: 0 Warnings: 1 686 | ``` 687 | 688 | A warning? Huh? 689 | 690 |
691 | I don't see any warnings! 692 | 693 | Your client may not display warnings, in which case you can just follow along in this document. 694 |
695 | 696 | ```sql 697 | SHOW WARNINGS\G 698 | ``` 699 | 700 | ```sql 701 | *************************** 1. row *************************** 702 | Level: Warning 703 | Code: 1831 704 | Message: Duplicate index 'user_id' defined on the table 'test.users'. This is deprecated and will be disallowed in a future release. 705 | 1 row in set (0.00 sec) 706 | ``` 707 | 708 | Let's look at the table definition. 709 | 710 | ```sql 711 | SHOW CREATE TABLE users\G 712 | ``` 713 | 714 | ```sql 715 | *************************** 1. row *************************** 716 | Table: users 717 | Create Table: CREATE TABLE `users` ( 718 | `id` bigint unsigned NOT NULL AUTO_INCREMENT, 719 | `first_name` varchar(255) DEFAULT NULL, 720 | `last_name` varchar(255) DEFAULT NULL, 721 | `user_id` bigint unsigned NOT NULL, 722 | PRIMARY KEY (`id`), 723 | UNIQUE KEY `uid` (`user_id`), 724 | UNIQUE KEY `user_id` (`user_id`) 725 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 726 | 1 row in set (0.01 sec) 727 | ``` 728 | 729 |
730 | What is SHOW CREATE TABLE? 731 | 732 | `SHOW CREATE TABLE` is a command that lets you view the query that would be used to create the table in its current state. It's safe to do, and is a good way to view columns, their types, indexes, foreign keys, etc. for a given table. 733 | 734 |
735 | 736 | Ah - constraints like `UNIQUE` don't have to be redefined along with the rest of the column definition, and in doing so, we've duplicated a constraint. While allowed for now, it's not a good practice, so we'll get rid of it. 737 | 738 | ```sql 739 | ALTER TABLE users DROP CONSTRAINT uid; 740 | ``` 741 | 742 | ```sql 743 | Query OK, 0 rows affected (0.16 sec) 744 | Records: 0 Duplicates: 0 Warnings: 0 745 | ``` 746 | 747 | ```sql 748 | SHOW COLUMNS FROM users; 749 | ``` 750 | 751 | ```sql 752 | +------------+-----------------+------+-----+---------+----------------+ 753 | | Field | Type | Null | Key | Default | Extra | 754 | +------------+-----------------+------+-----+---------+----------------+ 755 | | id | bigint unsigned | NO | PRI | NULL | auto_increment | 756 | | first_name | varchar(255) | YES | | NULL | | 757 | | last_name | varchar(255) | YES | | NULL | | 758 | | user_id | bigint unsigned | NO | UNI | NULL | | 759 | +------------+-----------------+------+-----+---------+----------------+ 760 | 4 rows in set (0.01 sec) 761 | ``` 762 | 763 | ### Foreign keys 764 | 765 | These tables seem fine to start with, but the columns that we are implicitly designing to have relationships don't have any method of enforcement. While this is a valid design - placing all referential integrity requirements onto the application - SQL was designed to handle this for us, so let's make use of it. NOTE: foreign keys bring with them a huge array of problems that will likely not be seen until your scale is large, so keep that in mind, and have a plan to migrate off of them if necessary. 766 | 767 | #### Why you might want foreign keys 768 | 769 | Let's create a user, and give them a Zap. 770 | 771 | ```sql 772 | INSERT INTO users 773 | (first_name, last_name, user_id) 774 | VALUES 775 | ('Stephan', 'Garland', 1); 776 | ``` 777 | 778 | ```sql 779 | Query OK, 1 row affected (0.02 sec) 780 | ``` 781 | 782 | ```sql 783 | INSERT INTO zaps (zap_id, owned_by) VALUES (1, 1); 784 | ``` 785 | 786 | ```sql 787 | Query OK, 1 row affected (0.03 sec) 788 | ``` 789 | 790 | ```sql 791 | TABLE zaps; 792 | ``` 793 | 794 |
795 | What is `TABLE`? 796 | 797 | Syntactic sugar (a shortcut) for `SELECT * FROM `. 798 | 799 | ```sql 800 | +----+--------+---------------------+-----------------+----------+ 801 | | id | zap_id | created_at | last_updated_at | owned_by | 802 | +----+--------+---------------------+-----------------+----------+ 803 | | 1 | 1 | 2023-02-27 10:25:01 | NULL | 1 | 804 | +----+--------+---------------------+-----------------+----------+ 805 | 1 row in set (0.00 sec) 806 | ``` 807 | 808 | 809 | 810 | We can `JOIN` on this if we want. 811 | 812 | ```sql 813 | SELECT * 814 | FROM 815 | users 816 | JOIN zaps ON 817 | users.user_id = zaps.owned_by\G 818 | ``` 819 | 820 | ```sql 821 | *************************** 1. row *************************** 822 | id: 1 823 | first_name: Stephan 824 | last_name: Garland 825 | user_id: 1 826 | email: NULL 827 | id: 1 828 | zap_id: 1 829 | created_at: 2023-02-27 10:25:01 830 | last_updated_at: NULL 831 | owned_by: 1 832 | 1 row in set (0.01 sec) 833 | ``` 834 | 835 | That's all well and good, but what if I want to delete my account? Wouldn't it be nice if devs didn't have to worry about deleting every trace of my existence? Or what if everyone's user ID has to change for a migration? Enter foreign keys. 836 | 837 | #### Creating a foreign key 838 | 839 | ```sql 840 | ALTER TABLE 841 | zaps 842 | ADD FOREIGN KEY 843 | (owned_by) 844 | REFERENCES users 845 | (user_id) 846 | ON UPDATE CASCADE 847 | ON DELETE CASCADE; 848 | ``` 849 | 850 | ```sql 851 | Query OK, 1 row affected (0.50 sec) 852 | Records: 1 Duplicates: 0 Warnings: 0 853 | ``` 854 | 855 | ```sql 856 | SHOW CREATE TABLE zaps\G 857 | ``` 858 | 859 | ```sql 860 | *************************** 1. row *************************** 861 | Table: zaps 862 | Create Table: CREATE TABLE `zaps` ( 863 | `id` bigint unsigned NOT NULL AUTO_INCREMENT, 864 | `zap_id` bigint unsigned NOT NULL, 865 | `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, 866 | `last_updated_at` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP, 867 | `owned_by` bigint unsigned NOT NULL, 868 | PRIMARY KEY (`id`), 869 | UNIQUE KEY `zap_id` (`zap_id`), 870 | KEY `owned_by` (`owned_by`), 871 | CONSTRAINT `zaps_ibfk_1` FOREIGN KEY (`owned_by`) REFERENCES `users` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE 872 | ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 873 | 1 row in set (0.00 sec) 874 | ``` 875 | 876 | Note that not only do we now have a `FOREIGN KEY` linking `zaps.owned_by` to `users.user_id`, but InnoDB has added an index on `zaps.owned_by` - this is required, and despite the documentation informing you that you must do this before adding the foreign key, it actually does it for you if you don't. 877 | 878 | #### Demonstrating a foreign key 879 | 880 | ```sql 881 | UPDATE users SET user_id = 9 WHERE id = 1; 882 | ``` 883 | 884 | Note the `WHERE` predicate - we'll go more into that later, but the most important thing to take away here is that there are very few instances where you should issue DML like `UPDATE`without a `WHERE`. 885 | 886 |
887 | Why not? 888 | 889 | If there was no predicate, the query would apply to everything in the table, e.g. every user would be modified. 890 | 891 |
892 | 893 | ```sql 894 | Query OK, 1 row affected (0.02 sec) 895 | Rows matched: 1 Changed: 1 Warnings: 0 896 | ``` 897 | 898 | ```sql 899 | SELECT * 900 | FROM 901 | users 902 | JOIN zaps ON 903 | users.user_id = zaps.owned_by\G 904 | ``` 905 | 906 | ```sql 907 | *************************** 1. row *************************** 908 | id: 1 909 | first_name: Stephan 910 | last_name: Garland 911 | user_id: 9 912 | email: NULL 913 | id: 1 914 | zap_id: 1 915 | created_at: 2023-02-27 10:25:01 916 | last_updated_at: NULL 917 | owned_by: 9 918 | 1 row in set (0.01 sec) 919 | ``` 920 | 921 | And just like that, `zaps` has updated its `owned_by` value for that Zap to equal the new value in `users`. And if we delete the `users` entry, the same `CASCADE` action will follow. 922 | 923 | ```sql 924 | DELETE FROM users WHERE id = 1; 925 | ``` 926 | 927 | ```sql 928 | Query OK, 1 row affected (0.02 sec) 929 | ``` 930 | 931 | ```sql 932 | SELECT * FROM zaps; 933 | ``` 934 | 935 | ```sql 936 | Empty set (0.00 sec) 937 | ``` 938 | 939 | ### Determining table size 940 | 941 | In order to find out how many rows are in a table, there are a few ways of doing so. InnoDB maintains information about tables in the `INFORMATION_SCHEMA.TABLES` table, including an estimate of row count. However, it's just that - an estimate. It can be made to be accurate if you use `ANALYZE TABLE`, but in production, you shouldn't do this (to be clear, it should be done, but carefully), since it places a table-wide read lock during the process. You can also use the query `SELECT COUNT(*)`, but that will perform a table scan (where the entire table is read sequentially, without indices), so it may have a performance impact on the database, as it's consuming a lot of available IOPS. Finally, assuming you have an auto-incrementing `id` field in the table, you can use `SELECT id FROM
ORDER BY id DESC LIMIT 1` to get the last incremented value. This is also an estimate, since it doesn't take any deletions into account (auto-increment is monotonic), but it's extremely fast. 942 | 943 | ```sql 944 | SELECT table_name, table_rows 945 | FROM 946 | information_schema.tables 947 | WHERE 948 | table_schema = 'test'; 949 | ``` 950 | 951 | ```sql 952 | +---------------+------------+ 953 | | TABLE_NAME | TABLE_ROWS | 954 | +---------------+------------+ 955 | | gensql | 1000 | 956 | | ref_users | 1000 | 957 | | ref_users_big | 992839 | 958 | | ref_zaps | 0 | 959 | | ref_zaps_big | 0 | 960 | | users | 1000 | 961 | | zaps | 0 | 962 | +---------------+------------+ 963 | 7 rows in set (0.01 sec) 964 | ``` 965 | 966 | ```sql 967 | ANALYZE TABLE ref_zaps; ANALYZE TABLE ref_zaps_big; 968 | ``` 969 | 970 | ```sql 971 | +---------------+---------+----------+----------+ 972 | | Table | Op | Msg_type | Msg_text | 973 | +---------------+---------+----------+----------+ 974 | | test.ref_zaps | analyze | status | OK | 975 | +---------------+---------+----------+----------+ 976 | 1 row in set (0.03 sec) 977 | 978 | +-------------------+---------+----------+----------+ 979 | | Table | Op | Msg_type | Msg_text | 980 | +-------------------+---------+----------+----------+ 981 | | test.ref_zaps_big | analyze | status | OK | 982 | +-------------------+---------+----------+----------+ 983 | 1 row in set (0.05 sec) 984 | ``` 985 | 986 | ```sql 987 | SELECT table_name, table_rows 988 | FROM 989 | information_schema.tables 990 | WHERE table_schema = 'test'; 991 | ``` 992 | 993 | ```sql 994 | +---------------+------------+ 995 | | TABLE_NAME | TABLE_ROWS | 996 | +---------------+------------+ 997 | | gensql | 1000 | 998 | | ref_users | 1000 | 999 | | ref_users_big | 992839 | 1000 | | ref_zaps | 1000 | 1001 | | ref_zaps_big | 997211 | 1002 | | users | 1000 | 1003 | | zaps | 0 | 1004 | +---------------+------------+ 1005 | 7 rows in set (0.02 sec) 1006 | ``` 1007 | 1008 | Actual row count: 1009 | 1010 | ```sql 1011 | SELECT 1012 | 'ref_users_big' AS 'table_name', 1013 | COUNT(*) AS 'row_count' 1014 | FROM 1015 | ref_users_big 1016 | UNION 1017 | SELECT 1018 | 'ref_zaps_big', 1019 | COUNT(*) 1020 | FROM 1021 | ref_zaps_big; 1022 | ``` 1023 | 1024 | ```sql 1025 | +---------------+-----------+ 1026 | | table_name | row_count | 1027 | +---------------+-----------+ 1028 | | ref_users_big | 1000000 | 1029 | | ref_zaps_big | 1000000 | 1030 | +---------------+-----------+ 1031 | 2 rows in set (2.42 sec) 1032 | ``` 1033 | 1034 |
1035 | What's a UNION? 1036 | 1037 | A way to combine query results, regardless of any relation between tables or queries. 1038 |
1039 | 1040 | ## Column operations 1041 | 1042 | ### Adding columns 1043 | 1044 | Adding columns is done with `ALTER TABLE`: 1045 | 1046 | ```sql 1047 | ALTER TABLE 1048 | zaps 1049 | ADD COLUMN 1050 | shared_with 1051 | JSON; 1052 | ``` 1053 | 1054 | ```sql 1055 | Query OK, 0 rows affected (0.18 sec) 1056 | Records: 0 Duplicates: 0 Warnings: 0 1057 | ``` 1058 | 1059 | Just as with a table definition, the column's name (`shared_with`) and type (`JSON`) are required; additonal qualifiers like `DEFAULT`, `UNIQUE`, etc. may be appended. To add some types of default values, like a JSON array, you must call the function. 1060 | 1061 | * [MySQL supports JSON](https://dev.mysql.com/doc/refman/8.0/en/json.html) as a data type! While you can of course simply store JSON strings in a text column, there are some benefits to using the native JSON datatype; among them that you can index scalars from the JSON objects, and that you can extract specific keys/values from the objects instead of the entire string. 1062 | * Please don't use this as an excuse to treat MySQL as a Document DB, though. If you want NoSQL, you should use NoSQL. RDBMS are optimized for relations. Storing some information in JSON is fine, but it shouldn't be the default. 1063 | 1064 | ### Modfying columns 1065 | 1066 | This was covered earlier during [table operations](#table-operations), but as a refresher, we'll again use `ALTER TABLE` to add a `DEFAULT` value of an empty JSON array, which must be called as its function: 1067 | 1068 | ```sql 1069 | ALTER TABLE 1070 | zaps 1071 | MODIFY COLUMN 1072 | shared_with 1073 | JSON 1074 | DEFAULT ( 1075 | JSON_ARRAY() 1076 | ); 1077 | ``` 1078 | 1079 | ```sql 1080 | Query OK, 0 rows affected (0.09 sec) 1081 | Records: 0 Duplicates: 0 Warnings: 0 1082 | ``` 1083 | 1084 | ### Dropping tables with foreign keys 1085 | 1086 | If there are foreign keys relying on the column you're trying to drop, you will first need to either disable foreign key checks, or remove those checks before you can drop the column. 1087 | 1088 | ```sql 1089 | DROP TABLE users; 1090 | ``` 1091 | 1092 | ```sql 1093 | ERROR 3730 (HY000): Cannot drop table 'users' referenced by a foreign key constraint 'zaps_ibfk_1' on table 'zaps'. 1094 | ``` 1095 | 1096 | ```sql 1097 | SET foreign_key_checks = 0; 1098 | ``` 1099 | 1100 | ```sql 1101 | Query OK, 0 rows affected (0.01 sec) 1102 | ``` 1103 | 1104 | ```sql 1105 | DROP TABLE users; 1106 | Query OK, 0 rows affected (0.30 sec) 1107 | ``` 1108 | 1109 | ```sql 1110 | SHOW CREATE TABLE zaps\G 1111 | ``` 1112 | 1113 | ```sql 1114 | *************************** 1. row *************************** 1115 | Table: zaps 1116 | Create Table: CREATE TABLE `zaps` ( 1117 | `id` bigint unsigned NOT NULL AUTO_INCREMENT, 1118 | `zap_id` bigint unsigned NOT NULL, 1119 | `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, 1120 | `last_updated_at` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP, 1121 | `owned_by` bigint unsigned NOT NULL, 1122 | `shared_with` json DEFAULT (json_array()), 1123 | PRIMARY KEY (`id`), 1124 | UNIQUE KEY `zap_id` (`zap_id`), 1125 | KEY `owned_by` (`owned_by`), 1126 | CONSTRAINT `zaps_ibfk_1` FOREIGN KEY (`owned_by`) REFERENCES `users` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE 1127 | ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1128 | 1 row in set (0.00 sec) 1129 | ``` 1130 | 1131 | Just because MySQL let us drop the table, it doesn't mean it cleaned up after us. 1132 |
1133 | How can we remove the FK? 1134 | 1135 | ```sql 1136 | ALTER TABLE zaps DROP CONSTRAINT `zaps_ibfk_1`; 1137 | ``` 1138 | 1139 | ```sql 1140 | Query OK, 0 rows affected (0.20 sec) 1141 | Records: 0 Duplicates: 0 Warnings: 0 1142 | ``` 1143 |
1144 | 1145 | Also, don't forget to re-enable `foreign_key_checks` for your session. 1146 | 1147 | ```sql 1148 | SET foreign_key_checks = 1; 1149 | ``` 1150 | 1151 | ```sql 1152 | Query OK, 0 rows affected (0.00 sec) 1153 | ``` 1154 | 1155 | But wait, how are we going to get back the `users` table? We could scroll back up and find the definition, but wouldn't it be nice if we could copy the definition from somewhere else? 1156 | 1157 | ### Copied table definitions 1158 | 1159 | Luckily, this exists in the form of `CREATE TABLE LIKE`. [MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/create-table-like.html). You do need `SELECT` privileges from the schema/table you're copying from, which is enabled for `test.ref_%` with this user. You'll also need to specify the schema the table exists in, since it's outside of the currently selected schema. 1160 | 1161 | NOTE: This schema is somewhat different from what we created before; most of it is additional, but one big change is that there is no longer an explicit `id` column, instead, the `user_id` column takes its place. 1162 | 1163 | ```sql 1164 | CREATE TABLE users LIKE test.ref_users; 1165 | ``` 1166 | 1167 | ```sql 1168 | Query OK, 0 rows affected (0.34 sec) 1169 | ``` 1170 | 1171 | There are some restrictions. The documentation lists all of them, but the biggest one is that any foreign keys aren't copied. We deleted ours so it doesn't really matter, but this could catch you by surprise if you expected them to come over with the schema definition. Also, depending on the version of MySQL you're using, a bug may exist where tables copied in this manner will logically reside (that is, within a given tablespace file) in the original table's tablespace. A way around this is with this alternative query: 1172 | 1173 | ```sql 1174 | CREATE TABLE users SELECT * FROM test.ref_users LIMIT 0; 1175 | ``` 1176 | 1177 | **Warning** 1178 | 1179 | The second form shown has a [large list of things](https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html) it does not do: 1180 | 1181 | - Copy any indexes, including primary keys 1182 | - Maintain the `AUTO_INCREMENT` attribute 1183 | - Maintain data types - `VARCHAR` may become `CHAR` 1184 | - Maintain default values for columns that are expressions 1185 | 1186 | Finally, note that both of these _only_ copy the schema definition, not the data. The table you're copying from actually has thousands of rows in it, but none of those will be in your table. 1187 | 1188 |
1189 | What if you wanted to copy data as well? 1190 | 1191 | The above alternative query hopefully hinted at it! Just take heed of the warning. 1192 | 1193 | ```sql 1194 | DROP TABLE users; CREATE TABLE users SELECT * FROM test.ref_users LIMIT 1000; 1195 | ``` 1196 | 1197 | ```sql 1198 | Query OK, 0 rows affected (0.30 sec) 1199 | 1200 | Query OK, 1000 rows affected (1.14 sec) 1201 | Records: 1000 Duplicates: 0 Warnings: 0 1202 | ``` 1203 |
1204 | 1205 | #### Copied table data and truncating 1206 | 1207 | Now that we have `users` back, let's actually fill it with more than just 1000 rows. `test.ref_users_big` has 1,000,000 rows. That would take a while to fill for everyone (my poor spinning disks), but 10,000 is reasonable. 1208 | 1209 | First, let's dump the existing values, but leave the table definition. While there are a few ways to do this, the fastest is `TRUNCATE` ([MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/truncate-table.html)). This is a `DDL` operation vs. `DML`, as instead of iterating through the table and deleting each row, it stores the table definition, drops the table, then re-creates it. This does have several limitations, especially with foreign keys, but it works fine here. 1210 | 1211 | ```sql 1212 | TRUNCATE TABLE users; 1213 | ``` 1214 | 1215 | ```sql 1216 | Query OK, 0 rows affected (0.42 sec) 1217 | ``` 1218 | 1219 | `0 rows affected` may be confusing, as we in fact just affected 1000 rows, but remember that this is the same as a `DROP TABLE`, which similarly doesn't report on the number of rows removed. 1220 | 1221 | Now, we can copy into the table; but first, we're going to `DROP` the table and create it properly with `CREATE LIKE` so we don't have any issues with missing primary keys. 1222 | 1223 | ```sql 1224 | DROP TABLE users; 1225 | CREATE TABLE users LIKE test.ref_users; 1226 | INSERT INTO users SELECT * FROM test.ref_users_big LIMIT 10000; 1227 | ``` 1228 | 1229 | ```sql 1230 | Query OK, 10000 rows affected (5.33 sec) 1231 | Records: 10000 Duplicates: 0 Warnings: 0 1232 | ``` 1233 | 1234 | ### Transactions 1235 | 1236 | Remember the discussion about doing `DML` without a predicate? There's a fix for that. 1237 | 1238 | ```sql 1239 | START TRANSACTION; 1240 | ``` 1241 | 1242 | ```sql 1243 | Query OK, 0 rows affected (0.00 sec) 1244 | ``` 1245 | 1246 | ```sql 1247 | UPDATE users SET city = "Asheville"; 1248 | ``` 1249 | 1250 | ```sql 1251 | Query OK, 9999 rows affected (6.96 sec) 1252 | Rows matched: 10000 Changed: 9999 Warnings: 0 1253 | ``` 1254 | 1255 | Uh-oh. Looks like everyone has moved to Western North Carolina. 1256 | 1257 | ```sql 1258 | ROLLBACK; 1259 | ``` 1260 | 1261 | ```sql 1262 | Query OK, 0 rows affected (5.45 sec) 1263 | ``` 1264 | 1265 | Whew, not fired. 1266 | 1267 | NOTE: Canceling a query (`Ctrl-C`), _regardless of whether or not you're in a transaction_, has the same effect, assuming the InnoDB storage engine is being used. This is the `A` in `ACID` at work - either the entire query succeeds, or none of it does. However, the rollback may take some time depending on how many rows have been affected. Also, if you don't manage to cancel the query before it completes, you're out of luck. 1268 | 1269 | ### Generated columns 1270 | 1271 | What if you wanted a column that automatically created data for you based on other columns? 1272 | 1273 | ```sql 1274 | ALTER TABLE 1275 | users 1276 | ADD COLUMN 1277 | full_name VARCHAR(510) GENERATED ALWAYS AS ( 1278 | CONCAT_WS(', ', last_name, first_name) 1279 | ); 1280 | ``` 1281 | 1282 | ```sql 1283 | Query OK, 0 rows affected (0.34 sec) 1284 | Records: 0 Duplicates: 0 Warnings: 0 1285 | ``` 1286 | 1287 | ```sql 1288 | SELECT user_id, full_name, city, country 1289 | FROM users 1290 | LIMIT 10; 1291 | ``` 1292 | 1293 | ```sql 1294 | +---------+-------------------+-------------+----------------+ 1295 | | user_id | full_name | city | country | 1296 | +---------+-------------------+-------------+----------------+ 1297 | | 1 | MacPherson, Addie | Latina | Italy | 1298 | | 2 | Airla, Valaree | Pribram | Czech Republic | 1299 | | 3 | Nett, Sheppard | Hamada | Japan | 1300 | | 4 | Kirschner, Robby | Bikaner | India | 1301 | | 5 | Bilski, Lewiss | Vörderås | Sweden | 1302 | | 6 | Yamauchi, Marleah | Rotterdam | Netherlands | 1303 | | 7 | Calore, Ania | Miyakojima | Japan | 1304 | | 8 | Breger, Gratiana | Valkeakoski | Finland | 1305 | | 9 | Serafina, Janith | Morant Bay | Jamaica | 1306 | | 10 | Beckman, Pavla | Wackersdorf | Germany | 1307 | +---------+-------------------+-------------+----------------+ 1308 | 10 rows in set (0.01 sec) 1309 | ``` 1310 | 1311 | Note that by default, this will create a `VIRTUAL` column (you can specify `STORED` after `AS` if you'd rather have a normal column), which is not actually stored, but instead calculated at query time. While this takes no storage space, it does add some amount of computational load, and more importantly comes with a [huge list](https://dev.mysql.com/doc/refman/8.0/en/create-table-generated-columns.html) of limitations. One large benefit, however, is that since the columns aren't actually created when the query is ran, the operation takes as long as a normal `ALTER TABLE` operation. If stored, the data must be written to the table, which will necessitate taking write locks. Also, since the column isn't actually being written anywhere, you can actually place the columns in any table position (by default, adding a column just appends to the end of the table) while still using the `INSTANT` algorithm, despite what the docs imply. 1312 | 1313 | The creation and deletion time in particular is markedly better when compared to `STORED` columns: 1314 | 1315 | ```sql 1316 | ALTER TABLE 1317 | users 1318 | ADD COLUMN 1319 | full_name VARCHAR(510) GENERATED ALWAYS AS ( 1320 | CONCAT_WS(', ', last_name, first_name) 1321 | ) STORED; 1322 | ``` 1323 | 1324 | ```sql 1325 | Query OK, 10000 rows affected (7.23 sec) 1326 | Records: 10000 Duplicates: 0 Warnings: 0 1327 | ``` 1328 | 1329 | ```sql 1330 | ALTER TABLE 1331 | users 1332 | DROP COLUMN full_name; 1333 | ``` 1334 | 1335 | ```sql 1336 | Query OK, 0 rows affected (2.24 sec) 1337 | Records: 0 Duplicates: 0 Warnings: 0 1338 | ``` 1339 | 1340 | Demonstrating column positioning: 1341 | 1342 | ```sql 1343 | ALTER TABLE 1344 | users 1345 | ADD COLUMN 1346 | full_name VARCHAR(510) GENERATED ALWAYS AS ( 1347 | CONCAT_WS(', ', last_name, first_name) 1348 | ) 1349 | AFTER 1350 | last_name; 1351 | ``` 1352 | 1353 | ```sql 1354 | Query OK, 0 rows affected (0.27 sec) 1355 | Records: 0 Duplicates: 0 Warnings: 0 1356 | ``` 1357 | 1358 | ```sql 1359 | SELECT * FROM users LIMIT 1\G 1360 | ``` 1361 | 1362 | ```sql 1363 | *************************** 1. row *************************** 1364 | user_id: 1 1365 | first_name: Addie 1366 | last_name: MacPherson 1367 | full_name: MacPherson, Addie 1368 | email: addie.macpherson@lizard.com 1369 | city: Latina 1370 | country: Italy 1371 | created_at: 2001-05-27 19:47:17 1372 | last_updated_at: NULL 1373 | 1 row in set (0.01 sec) 1374 | ``` 1375 | 1376 | ### Invisible columns 1377 | 1378 | You can make columns `INVISIBLE` if you'd rather they not show up unless specifically queried for. This is done with the `INVISIBLE` keyword after the type (`VARCHAR(510)` here) if being created, or modified later with `ALTER COLUMN`: 1379 | 1380 | ```sql 1381 | ALTER TABLE users ALTER COLUMN full_name SET INVISIBLE; 1382 | ``` 1383 | 1384 | ```sql 1385 | Query OK, 0 rows affected (0.19 sec) 1386 | Records: 0 Duplicates: 0 Warnings: 0 1387 | ``` 1388 | 1389 | ```sql 1390 | SELECT * FROM users LIMIT 1\G 1391 | ``` 1392 | 1393 | ```sql 1394 | *************************** 1. row *************************** 1395 | user_id: 1 1396 | first_name: Addie 1397 | last_name: MacPherson 1398 | email: addie.macpherson@lizard.com 1399 | city: Latina 1400 | country: Italy 1401 | created_at: 2001-05-27 19:47:17 1402 | last_updated_at: NULL 1403 | 1 row in set (0.00 sec) 1404 | Query OK, 0 rows affected (0.19 sec) 1405 | Records: 0 Duplicates: 0 Warnings: 0 1406 | 1 row in set (0.00 sec) 1407 | ``` 1408 | 1409 | To set them back to visible, use `SET VISIBLE`: 1410 | 1411 | ```sql 1412 | ALTER TABLE users ALTER COLUMN full_name SET VISIBLE; 1413 | ``` 1414 | 1415 | ```sql 1416 | Query OK, 0 rows affected (0.08 sec) 1417 | Records: 0 Duplicates: 0 Warnings: 0 1418 | ``` 1419 | 1420 | -------------------------------------------------------------------------------- /mysql/mysql-101-1.md: -------------------------------------------------------------------------------- 1 | # MySQL 101 Part II 2 | 3 | - [MySQL 101 Part II](#mysql-101-part-ii) 4 | - [Queries](#queries) 5 | - [Predicates](#predicates) 6 | - [WHERE](#where) 7 | - [SELECT](#select) 8 | - [Working with JSON](#working-with-json) 9 | - [Finding non-null arrays](#finding-non-null-arrays) 10 | - [Checking for a value inside an array](#checking-for-a-value-inside-an-array) 11 | - [Extracting scalars from an object](#extracting-scalars-from-an-object) 12 | - [INSERT](#insert) 13 | - [TABLE](#table) 14 | - [Joins](#joins) 15 | - [Relational alegbra](#relational-alegbra) 16 | - [Types of joins](#types-of-joins) 17 | - [Cross](#cross) 18 | - [Inner Join](#inner-join) 19 | - [Left Outer Join](#left-outer-join) 20 | - [Right Outer Join](#right-outer-join) 21 | - [Full Outer Join](#full-outer-join) 22 | - [Specifying a column's table](#specifying-a-columns-table) 23 | - [Indices](#indices) 24 | - [Single indices](#single-indices) 25 | - [Partial indices](#partial-indices) 26 | - [Functional indices](#functional-indices) 27 | - [JSON / Longtext](#json--longtext) 28 | - [Composite indices](#composite-indices) 29 | - [Testing indices](#testing-indices) 30 | - [Descending indices](#descending-indices) 31 | - [When indicies aren't helpful](#when-indicies-arent-helpful) 32 | - [HAVING](#having) 33 | - [Query optimization](#query-optimization) 34 | - [SELECT \*](#select-) 35 | - [OFFSET / LIMIT](#offset--limit) 36 | - [DISTINCT](#distinct) 37 | - [Cleanup](#cleanup) 38 | 39 | ## Queries 40 | 41 | ### Predicates 42 | 43 | A predicate is a function which asserts that something is true or false. You can think of it like a filter. 44 | 45 | #### WHERE 46 | 47 | `WHERE` is the easiest to understand and apply, and will cover most of your needs. 48 | 49 | ```sql 50 | SELECT 51 | user_id, first_name, last_name 52 | FROM 53 | users 54 | WHERE 55 | country = 'Zimbabwe'; 56 | ``` 57 | 58 | ```sql 59 | +---------+------------+-----------+ 60 | | user_id | first_name | last_name | 61 | +---------+------------+-----------+ 62 | | 106 | Ivonne | Barmen | 63 | | 1149 | Myca | Flieger | 64 | | 2143 | Dallas | Nimesh | 65 | | 4401 | Jeana | Naga | 66 | | 4623 | Godiva | Adal | 67 | | 5582 | Lexie | Fenwick | 68 | | 5586 | Carrie | Nich | 69 | | 5793 | Marten | Casady | 70 | | 6072 | Feliza | Culhert | 71 | | 6467 | Wood | O'Connor | 72 | | 7093 | Miriam | Galliett | 73 | | 7669 | Cele | Belden | 74 | | 7675 | Araldo | Hoes | 75 | | 8106 | Imojean | Beaudoin | 76 | | 9438 | Sibby | Luedtke | 77 | | 9566 | Eb | Cattima | 78 | | 9606 | Alard | Frodina | 79 | +---------+------------+-----------+ 80 | 17 rows in set (0.22 sec) 81 | ``` 82 | 83 | Note that we filtered the results with a predicate that wasn't even in the result set (`country`). 84 | 85 | You may also have seen or used the wildcard `%` with `LIKE` and `NOT LIKE`. 86 | 87 | ```sql 88 | SELECT 89 | user_id, first_name, last_name 90 | FROM 91 | users 92 | WHERE 93 | country 94 | LIKE 'Zim%'; 95 | ``` 96 | 97 | ```sql 98 | +---------+------------+-----------+ 99 | | user_id | first_name | last_name | 100 | +---------+------------+-----------+ 101 | | 106 | Ivonne | Barmen | 102 | | 1149 | Myca | Flieger | 103 | | 2143 | Dallas | Nimesh | 104 | | 4401 | Jeana | Naga | 105 | | 4623 | Godiva | Adal | 106 | | 5582 | Lexie | Fenwick | 107 | | 5586 | Carrie | Nich | 108 | | 5793 | Marten | Casady | 109 | | 6072 | Feliza | Culhert | 110 | | 6467 | Wood | O'Connor | 111 | | 7093 | Miriam | Galliett | 112 | | 7669 | Cele | Belden | 113 | | 7675 | Araldo | Hoes | 114 | | 8106 | Imojean | Beaudoin | 115 | | 9438 | Sibby | Luedtke | 116 | | 9566 | Eb | Cattima | 117 | | 9606 | Alard | Frodina | 118 | +---------+------------+-----------+ 119 | 17 rows in set (0.22 sec) 120 | ``` 121 | 122 | These two are functionally equivalent queries. However, if there is an index on the predicate column, and you use a leading wildcard (e.g. `LIKE '%babwe'`), MySQL cannot use the index, and will instead perform a table scan. If you can avoid using leading wildcards on large tables, do so. It's also worth noting that there are many times when the query optimizer determines that the table scan would be faster than using an index, and so will do so anyway. [Index usage can be hinted](https://dev.mysql.com/doc/refman/8.0/en/index-hints.html), forced, and ignored, although as of MySQL 8.0.20, the old syntax (which included hints) [is deprecated](https://dev.mysql.com/doc/refman/8.0/en/optimizer-hints.html#optimizer-hints-index-level). Examples of both are below with an `EXPLAIN SELECT`. They're from a different schema and table, as I've already set up the index. 123 | 124 | ```sql 125 | EXPLAIN SELECT 126 | user_id, first_name, last_name 127 | FROM 128 | test.ref_users 129 | USE INDEX (country) 130 | WHERE 131 | country 132 | LIKE 'Zim%'\G 133 | ``` 134 | 135 | ```sql 136 | *************************** 1. row *************************** 137 | id: 1 138 | select_type: SIMPLE 139 | table: ref_users 140 | partitions: NULL 141 | type: range 142 | possible_keys: country 143 | key: country 144 | key_len: 1023 145 | ref: NULL 146 | rows: 3 147 | filtered: 100.00 148 | Extra: Using index condition 149 | 1 row in set, 1 warning (0.01 sec) 150 | ``` 151 | 152 | ```sql 153 | EXPLAIN SELECT 154 | user_id, first_name, last_name 155 | FROM 156 | test.ref_users 157 | FORCE INDEX (country) 158 | WHERE 159 | country 160 | LIKE '%babwe'\G 161 | 162 | ```sql 163 | *************************** 1. row *************************** 164 | id: 1 165 | select_type: SIMPLE 166 | table: ref_users 167 | partitions: NULL 168 | type: ALL 169 | possible_keys: NULL 170 | key: NULL 171 | key_len: NULL 172 | ref: NULL 173 | rows: 1000 174 | filtered: 11.11 175 | Extra: Using where 176 | 1 row in set, 1 warning (0.00 sec) 177 | ``` 178 | 179 | Even when using `FORCE INDEX`, it's not being used, because it can't. 180 | 181 | ```sql 182 | EXPLAIN SELECT 183 | user_id, first_name, last_name 184 | FROM 185 | test.ref_users 186 | /*+ INDEX(ref_users country) */ 187 | WHERE 188 | country 189 | LIKE 'Zim%'\G 190 | ``` 191 | 192 | The new syntax, which looks like a C-style comment, requires both the table and column to be listed. 193 | 194 | ```sql 195 | *************************** 1. row *************************** 196 | id: 1 197 | select_type: SIMPLE 198 | table: customers 199 | partitions: NULL 200 | type: range 201 | possible_keys: city 202 | key: city 203 | key_len: 153 204 | ref: NULL 205 | rows: 2 206 | filtered: 100.00 207 | Extra: Using index condition 208 | 1 row in set, 1 warning (0.00 sec) 209 | ``` 210 | 211 | ### SELECT 212 | 213 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/select.html) 214 | 215 | You use it to select data from tables (or `/dev/stdin`). Any questions? 216 | 217 | ```sql 218 | SELECT * FROM ref_zaps LIMIT 10 OFFSET 15; 219 | ``` 220 | 221 | ```sql 222 | +--------+----------+----------------------+---------------------+-----------------+ 223 | | zap_id | owned_by | shared_with | created_at | last_updated_at | 224 | +--------+----------+----------------------+---------------------+-----------------+ 225 | | 16 | 788 | [] | 2013-10-16 21:25:30 | NULL | 226 | | 17 | 689 | [] | 2016-07-21 03:05:33 | NULL | 227 | | 18 | 735 | [] | 2020-12-16 13:51:04 | NULL | 228 | | 19 | 802 | [] | 2009-11-22 03:33:19 | NULL | 229 | | 20 | 297 | [529, 805, 541, 498] | 1997-07-11 15:05:07 | NULL | 230 | | 21 | 649 | [] | 2015-05-18 20:08:31 | NULL | 231 | | 22 | 438 | [] | 2006-12-14 15:28:30 | NULL | 232 | | 23 | 607 | [] | 2013-04-15 17:57:19 | NULL | 233 | | 24 | 460 | [] | 2018-01-28 02:05:59 | NULL | 234 | | 25 | 677 | [] | 1995-06-07 21:46:30 | NULL | 235 | +--------+----------+----------------------+---------------------+-----------------+ 236 | 10 rows in set (0.01 sec) 237 | ``` 238 | 239 |
240 | Can you think of anything missing from this table? (HINT: SHOW CREATE TABLE) 241 | 242 | There's no foreign key linking `owned_by` to a given user! In fact, they're just randomly generated numbers, but there are pairings. Let's create a foreign key now: 243 | ```sql 244 | ALTER TABLE ref_zaps ADD CONSTRAINT zap_owner_id FOREIGN KEY (owned_by) REFERENCES ref_users (user_id); 245 | ``` 246 | 247 | ```sql 248 | Query OK, 1000 rows affected (0.90 sec) 249 | Records: 1000 Duplicates: 0 Warnings: 0 250 | ``` 251 |
252 | 253 | #### Working with JSON 254 | 255 | Both JSON arrays and objects can be stored in JSON columns. Using them in queries isn't as straight-forward as other column types. 256 | 257 | ##### Finding non-null arrays 258 | 259 | ```sql 260 | SELECT * 261 | FROM 262 | ref_zaps 263 | WHERE JSON_LENGTH(shared_with) > 0 264 | LIMIT 10; 265 | ``` 266 | 267 | ```sql 268 | +--------+----------+----------------------+---------------------+-----------------+ 269 | | zap_id | owned_by | shared_with | created_at | last_updated_at | 270 | +--------+----------+----------------------+---------------------+-----------------+ 271 | | 20 | 297 | [529, 805, 541, 498] | 1997-07-11 15:05:07 | NULL | 272 | | 40 | 312 | [395, 721, 397, 930] | 2016-11-15 03:42:41 | NULL | 273 | | 60 | 469 | [261, 565, 326, 637] | 2011-09-21 11:40:22 | NULL | 274 | | 80 | 505 | [753, 766, 812, 521] | 2001-07-04 15:28:08 | NULL | 275 | | 100 | 459 | [884, 23, 163, 654] | 2008-08-30 12:53:32 | NULL | 276 | | 120 | 411 | [730, 484, 530, 449] | 2012-09-02 00:42:20 | NULL | 277 | | 140 | 191 | [611, 798, 984, 583] | 2004-12-14 04:08:09 | NULL | 278 | | 160 | 310 | [941, 353, 499, 668] | 2003-01-22 01:05:04 | NULL | 279 | | 180 | 463 | [679, 639, 760, 784] | 2022-01-22 04:31:00 | NULL | 280 | | 200 | 36 | [308, 955, 485, 298] | 2015-10-17 21:42:16 | NULL | 281 | +--------+----------+----------------------+---------------------+-----------------+ 282 | 10 rows in set (0.02 sec) 283 | ``` 284 | 285 | ##### Checking for a value inside an array 286 | 287 | ```sql 288 | SELECT 289 | zap_id, 290 | owned_by, 291 | shared_with, 292 | user_id, 293 | full_name 294 | FROM ref_zaps 295 | JOIN 296 | ref_users ON 297 | JSON_CONTAINS(shared_with, JSON_ARRAY(ref_users.user_id)) 298 | LIMIT 10; 299 | ``` 300 | 301 | ```sql 302 | +--------+----------+---------------------+---------+--------------------+ 303 | | zap_id | owned_by | shared_with | user_id | full_name | 304 | +--------+----------+---------------------+---------+--------------------+ 305 | | 240 | 697 | [3, 854, 486, 907] | 3 | Gorlin, Alene | 306 | | 100 | 459 | [884, 23, 163, 654] | 23 | Schnurr, Sissie | 307 | | 700 | 947 | [28, 173, 33, 899] | 28 | Russi, Bab | 308 | | 560 | 869 | [258, 197, 724, 31] | 31 | Quince, Caryl | 309 | | 700 | 947 | [28, 173, 33, 899] | 33 | Langille, Tonya | 310 | | 740 | 888 | [41, 221, 402, 301] | 41 | Kruter, Bonni | 311 | | 460 | 566 | [45, 793, 553, 162] | 45 | Schuh, Gasparo | 312 | | 940 | 211 | [497, 973, 323, 48] | 48 | Aylsworth, Steffen | 313 | | 260 | 861 | [313, 52, 334, 457] | 52 | Delwyn, Karoline | 314 | | 420 | 667 | [524, 527, 948, 60] | 60 | Magen, Sherill | 315 | +--------+----------+---------------------+---------+--------------------+ 316 | 10 rows in set (0.88 sec) 317 | ``` 318 | 319 | ##### Extracting scalars from an object 320 | 321 | You can select a JSON column mixed in with non-JSON as you'd expect, and the entire contents will be displayed. 322 | 323 | ```sql 324 | SELECT 325 | user_id, 326 | email, 327 | user_json 328 | FROM 329 | gensql 330 | LIMIT 10; 331 | ``` 332 | 333 | ```sql 334 | +---------+-------------------------------+-----------------------------------------------------------------------------------------------+ 335 | | user_id | email | user_json | 336 | +---------+-------------------------------+-----------------------------------------------------------------------------------------------+ 337 | | 1 | abba.wilder@bodacious.com | {"a_key": "playable", "b_key": {"c_key": ["unscathed", "humongous", "surplus", "mousiness"]}} | 338 | | 2 | antonetta.bosson@chaplain.com | {"a_key": "obedience", "b_key": {"c_key": ["depletion", "carve", "driveway", "primate"]}} | 339 | | 3 | cobb.fondea@contusion.com | {"a_key": "activity", "b_key": {"c_key": ["famine", "huskiness", "unleash", "unknotted"]}} | 340 | | 4 | hanan.keelin@aspect.com | {"a_key": "iron", "b_key": {"c_key": ["exact", "postcard", "sauciness", "dispatch"]}} | 341 | | 5 | kinna.lytle@epidermis.com | {"a_key": "flannels", "b_key": {"c_key": ["sherry", "graded", "crusader", "rumble"]}} | 342 | | 6 | carolynn.sewoll@starch.com | {"a_key": "extrude", "b_key": {"c_key": ["harmony", "ferris", "confirm", "elevate"]}} | 343 | | 7 | ola.pride@defile.com | {"a_key": "blurt", "b_key": {"c_key": ["expectant", "half", "coming", "remover"]}} | 344 | | 8 | orella.acima@subwoofer.com | {"a_key": "grape", "b_key": {"c_key": ["wrist", "galley", "fragment", "scurvy"]}} | 345 | | 9 | odilia.thorr@daredevil.com | {"a_key": "numbing", "b_key": {"c_key": ["glutinous", "repacking", "reliant", "polygon"]}} | 346 | | 10 | berrie.marybella@undertow.com | {"a_key": "unadvised", "b_key": {"c_key": ["grove", "cornhusk", "darkening", "grazing"]}} | 347 | +---------+-------------------------------+-----------------------------------------------------------------------------------------------+ 348 | 10 rows in set (0.01 sec) 349 | ``` 350 | 351 | You can also extract specific keys: 352 | 353 | ```sql 354 | -- the ->> operator is shorthand for JSON_UNQUOTE(JSON_EXTRACT()) 355 | SELECT 356 | email, 357 | user_json->>'$.b_key' 358 | FROM 359 | gensql 360 | LIMIT 10; 361 | ``` 362 | 363 | ```sql 364 | +---------+-------------------------------+---------------------------------------------------------------+ 365 | | user_id | email | user_json->>'$.b_key' | 366 | +---------+-------------------------------+---------------------------------------------------------------+ 367 | | 1 | abba.wilder@bodacious.com | {"c_key": ["unscathed", "humongous", "surplus", "mousiness"]} | 368 | | 2 | antonetta.bosson@chaplain.com | {"c_key": ["depletion", "carve", "driveway", "primate"]} | 369 | | 3 | cobb.fondea@contusion.com | {"c_key": ["famine", "huskiness", "unleash", "unknotted"]} | 370 | | 4 | hanan.keelin@aspect.com | {"c_key": ["exact", "postcard", "sauciness", "dispatch"]} | 371 | | 5 | kinna.lytle@epidermis.com | {"c_key": ["sherry", "graded", "crusader", "rumble"]} | 372 | | 6 | carolynn.sewoll@starch.com | {"c_key": ["harmony", "ferris", "confirm", "elevate"]} | 373 | | 7 | ola.pride@defile.com | {"c_key": ["expectant", "half", "coming", "remover"]} | 374 | | 8 | orella.acima@subwoofer.com | {"c_key": ["wrist", "galley", "fragment", "scurvy"]} | 375 | | 9 | odilia.thorr@daredevil.com | {"c_key": ["glutinous", "repacking", "reliant", "polygon"]} | 376 | | 10 | berrie.marybella@undertow.com | {"c_key": ["grove", "cornhusk", "darkening", "grazing"]} | 377 | +---------+-------------------------------+---------------------------------------------------------------+ 378 | 10 rows in set (0.00 sec) 379 | ``` 380 | 381 | 382 | Or nest extractions: 383 | 384 | ```sql 385 | SELECT 386 | user_id, 387 | email, 388 | user_json->>'$.b_key.c_key' 389 | FROM 390 | gensql 391 | LIMIT 10; 392 | ``` 393 | 394 | ```sql 395 | +---------+-------------------------------+----------------------------------------------------+ 396 | | user_id | email | user_json->>'$.b_key.c_key' | 397 | +---------+-------------------------------+----------------------------------------------------+ 398 | | 1 | abba.wilder@bodacious.com | ["unscathed", "humongous", "surplus", "mousiness"] | 399 | | 2 | antonetta.bosson@chaplain.com | ["depletion", "carve", "driveway", "primate"] | 400 | | 3 | cobb.fondea@contusion.com | ["famine", "huskiness", "unleash", "unknotted"] | 401 | | 4 | hanan.keelin@aspect.com | ["exact", "postcard", "sauciness", "dispatch"] | 402 | | 5 | kinna.lytle@epidermis.com | ["sherry", "graded", "crusader", "rumble"] | 403 | | 6 | carolynn.sewoll@starch.com | ["harmony", "ferris", "confirm", "elevate"] | 404 | | 7 | ola.pride@defile.com | ["expectant", "half", "coming", "remover"] | 405 | | 8 | orella.acima@subwoofer.com | ["wrist", "galley", "fragment", "scurvy"] | 406 | | 9 | odilia.thorr@daredevil.com | ["glutinous", "repacking", "reliant", "polygon"] | 407 | | 10 | berrie.marybella@undertow.com | ["grove", "cornhusk", "darkening", "grazing"] | 408 | +---------+-------------------------------+----------------------------------------------------+ 409 | 10 rows in set (0.01 sec) 410 | ``` 411 | 412 | ```sql 413 | -- the -> operator is shorthand for JSON_EXTRACT() 414 | -- arrays are 0-indexed, so this is a slice, like lst[1:3] 415 | SELECT 416 | email, 417 | user_json->'$.b_key.c_key[1 to 2]' 418 | FROM 419 | gensql 420 | LIMIT 10; 421 | ``` 422 | 423 | ```sql 424 | +--------------------------------+------------------------------------+ 425 | | email | user_json->'$.e_key.d_key[1 to 2]' | 426 | +--------------------------------+------------------------------------+ 427 | | donelle.labors@amused.com | ["idealness", "unplug"] | 428 | | mackenzie.youngran@abridge.com | ["waffle", "scion"] | 429 | | elset.kramer@tiny.com | ["dimple", "manpower"] | 430 | | theresita.faxen@plentiful.com | ["appetizer", "huskiness"] | 431 | | salomi.pasco@each.com | ["tiptop", "unsnap"] | 432 | | ashia.garate@varied.com | ["bauble", "mayflower"] | 433 | | jonathan.aulea@chastise.com | ["senior", "silicon"] | 434 | | gillan.mcnalley@slain.com | ["provider", "gradient"] | 435 | | madelon.harleigh@defiling.com | ["evoke", "tidy"] | 436 | | dagny.iverson@entryway.com | ["baton", "skillful"] | 437 | +--------------------------------+------------------------------------+ 438 | 10 rows in set (0.02 sec) 439 | ``` 440 | 441 | See [MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html) for much more about JSON operations. 442 | 443 | ### INSERT 444 | 445 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/insert.html) 446 | 447 | `INSERT` is used to insert rows into a table. There is also an `UPSERT` equivalent, with the `ON DUPLICATE KEY UPDATE` clause. With this, if an `INSERT` would cause a key collision with a `UNIQUE` index (explicit or implicit, e.g. `PRIMARY KEY`), then an `UPDATE` of that row occurs instead. 448 | 449 | ```sql 450 | INSERT INTO users 451 | (first_name, last_name, user_id) 452 | VALUES 453 | ('Leeroy', 'Jenkins', 42); 454 | ``` 455 | 456 | ```sql 457 | ERROR 1062 (23000): Duplicate entry '42' for key 'users.PRIMARY' 458 | ``` 459 | 460 | Expectedly, that failed since `user_id`, which is our primary key, already has an entry at `42`. 461 | 462 | ```sql 463 | SELECT * FROM 464 | users 465 | WHERE 466 | user_id = 42\G 467 | ``` 468 | 469 | ```sql 470 | *************************** 1. row *************************** 471 | user_id: 42 472 | first_name: Ramona 473 | last_name: Odelet 474 | full_name: Odelet, Ramona 475 | email: ramona.odelet@lucid.com 476 | city: Foligno 477 | country: Italy 478 | created_at: 2003-07-29 07:34:15 479 | last_updated_at: NULL 480 | 1 row in set (0.01 sec) 481 | ``` 482 | 483 | Now we can try again, this time with an instruction to perform an UPSERT. 484 | 485 | ```sql 486 | INSERT INTO users 487 | (first_name, last_name, user_id) 488 | VALUES 489 | ("Leeroy", "Jenkins", 42) AS vals 490 | ON DUPLICATE KEY UPDATE 491 | first_name = vals.first_name, 492 | last_name = vals.last_name; 493 | ``` 494 | 495 | ```sql 496 | Query OK, 2 rows affected (0.21 sec) 497 | ``` 498 | 499 | ```sql 500 | SELECT * FROM users WHERE user_id = 42\G 501 | ``` 502 | 503 | ```sql 504 | *************************** 1. row *************************** 505 | user_id: 42 506 | first_name: Leeroy 507 | last_name: Jenkins 508 | full_name: Jenkins, Leeroy 509 | email: ramona.odelet@lucid.com 510 | city: Foligno 511 | country: Italy 512 | created_at: 2003-07-29 07:34:15 513 | last_updated_at: 2023-02-27 13:24:26 514 | 1 row in set (0.01 sec) 515 | ``` 516 | 517 | While `full_name` updated, since it's a `GENERATED` column, `email` is now incorrect. Also, note that `last_updated_at` has changed from `NULL`, since we've modified the row. 518 | 519 | Let's put the row back to how it was before. 520 | 521 |
522 | How can this be accomplished? 523 | 524 | ```sql 525 | -- first, let's be safe with a transaction 526 | START TRANSACTION; 527 | ``` 528 | 529 | ```sql 530 | Query OK, 0 rows affected (0.01 sec) 531 | ``` 532 | 533 | ```sql 534 | -- then, use UPDATE 535 | UPDATE users SET first_name = 'Ramona', last_name = 'Odelet' WHERE user_id = 42; 536 | ``` 537 | 538 | ```sql 539 | Query OK, 1 row affected (0.01 sec) 540 | Rows matched: 1 Changed: 1 Warnings: 0 541 | ``` 542 | 543 | ```sql 544 | -- next, verify the work 545 | SELECT * FROM users WHERE user_id = 42\G 546 | ``` 547 | 548 | ```sql 549 | *************************** 1. row *************************** 550 | user_id: 42 551 | first_name: Ramona 552 | last_name: Odelet 553 | full_name: Odelet, Ramona 554 | email: ramona.odelet@lucid.com 555 | city: Foligno 556 | country: Italy 557 | created_at: 2003-07-29 07:34:15 558 | last_updated_at: 2023-02-27 13:30:10 559 | 1 row in set (0.00 sec) 560 | ``` 561 | 562 | ```sql 563 | -- finally, commit the result 564 | COMMIT; 565 | ``` 566 | 567 | ```sql 568 | Query OK, 0 rows affected (0.08 sec) 569 | ``` 570 |
571 | 572 | ### TABLE 573 | 574 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/table.html) 575 | 576 | `TABLE` is syntactic sugar for `SELECT * FROM
`. Works great if you know the table is small, but be careful on large tables! 577 | 578 | ```sql 579 | TABLE users\G 580 | ``` 581 | 582 | ```sql 583 | -- 9999 rows are above this... 584 | *************************** 10000. row *************************** 585 | user_id: 10000 586 | first_name: Gabrila 587 | last_name: Lemmueu 588 | full_name: Lemmueu, Gabrila 589 | email: gabrila.lemmueu@urgent.com 590 | city: Itanagar 591 | country: India 592 | created_at: 2020-12-10 01:58:35 593 | last_updated_at: NULL 594 | 10000 rows in set (0.48 sec) 595 | ``` 596 | 597 | ## Joins 598 | 599 | ### Relational alegbra 600 | 601 | Not a lot of it, I promise; just what we need to discuss joins. 602 | 603 | * Union: `R ∪ S --- R OR S` 604 | * Implemented in MySQL via the `UNION` keyword 605 | * Intersection: `R ∩ S --- R AND S` 606 | * Implemented in MySQL via `INNER JOIN`, or in MySQL 8.0.31, the `INTERSECT` keyword 607 | * Difference: `R ≏ S --- R - S` 608 | * Implemented in MySQL 8.0.31 via the `EXCEPT` keyword, and can be emulated using `UNION` and `NOT IN` 609 | 610 | If you're intersted in exploring relational alegbra, [this application](https://dbis-uibk.github.io/relax/calc/local/uibk/local/3) is quite useful to convert SQL to relational alegbra, and display the results. 611 | 612 | ### Types of joins 613 | 614 | #### Cross 615 | 616 | Before we demonstrate a cross join, you should have two small (very small, like < 10 rows) tables. You can either use what we learned earlier to create a new table from an existing one, or you can use any two of the following two tables: `northwind.orders_status`, `northwind.tax_status_name`, `test.ref_users_tiny`, `test.ref_users_zaps`. You can cross join across schemas if you'd like, although I can't promise the information will make any sense. 617 | 618 | Also called a Cartesian Join. This produces `n x m` rows for the two groups being joined. That said, every other join can be thought of as a cross join with a predicate. In fact, `CROSS JOIN`, `JOIN`, and `INNER JOIN` are actually syntactically equivalent in MySQL (not ANSI SQL!), but for readability, it's preferred to only use `CROSS JOIN` if you actually intend to use it. 619 | 620 | ```sql 621 | SELECT 622 | z.zap_id, 623 | u.user_id, 624 | u.full_name 625 | FROM 626 | ref_users_tiny u 627 | CROSS JOIN 628 | ref_zaps_tiny z; 629 | ``` 630 | 631 | ```sql 632 | +--------+---------+-------------------+ 633 | | zap_id | user_id | full_name | 634 | +--------+---------+-------------------+ 635 | | 1 | 4 | McGrody, Cointon | 636 | | 1 | 3 | Gorlin, Alene | 637 | | 1 | 2 | Marienthal, Shirl | 638 | | 1 | 1 | Jemena, Wyatt | 639 | | 2 | 4 | McGrody, Cointon | 640 | | 2 | 3 | Gorlin, Alene | 641 | | 2 | 2 | Marienthal, Shirl | 642 | | 2 | 1 | Jemena, Wyatt | 643 | | 3 | 4 | McGrody, Cointon | 644 | | 3 | 3 | Gorlin, Alene | 645 | | 3 | 2 | Marienthal, Shirl | 646 | | 3 | 1 | Jemena, Wyatt | 647 | | 4 | 4 | McGrody, Cointon | 648 | | 4 | 3 | Gorlin, Alene | 649 | | 4 | 2 | Marienthal, Shirl | 650 | | 4 | 1 | Jemena, Wyatt | 651 | +--------+---------+-------------------+ 652 | 16 rows in set (0.01 sec) 653 | ``` 654 | 655 | #### Inner Join 656 | 657 | The default (i.e. `JOIN` == `INNER JOIN`). This is `users AND zaps` with a predicate. 658 | 659 | ```sql 660 | SELECT 661 | z.zap_id, 662 | u.full_name, 663 | u.city, 664 | u.country 665 | FROM 666 | ref_users u 667 | JOIN 668 | ref_zaps z 669 | ON 670 | u.user_id = z.owned_by 671 | LIMIT 10; 672 | ``` 673 | 674 | ```sql 675 | +--------+-------------------+-------------+----------------+ 676 | | zap_id | full_name | city | country | 677 | +--------+-------------------+-------------+----------------+ 678 | | 411 | MacPherson, Addie | Latina | Italy | 679 | | 794 | Airla, Valaree | Pribram | Czech Republic | 680 | | 830 | Kirschner, Robby | Bikaner | India | 681 | | 697 | Bilski, Lewiss | Vörderås | Sweden | 682 | | 110 | Yamauchi, Marleah | Rotterdam | Netherlands | 683 | | 942 | Yamauchi, Marleah | Rotterdam | Netherlands | 684 | | 772 | Calore, Ania | Miyakojima | Japan | 685 | | 676 | Breger, Gratiana | Valkeakoski | Finland | 686 | | 715 | Serafina, Janith | Morant Bay | Jamaica | 687 | | 405 | Beckman, Pavla | Wackersdorf | Germany | 688 | +--------+-------------------+-------------+----------------+ 689 | 10 rows in set (0.02 sec) 690 | ``` 691 | 692 | #### Left Outer Join 693 | 694 | Left and Right Joins are both a type of Outer Join, and often just called Left or Right Join. This is `users OR zaps` with a predicate and default value (`NULL`) for `zaps`. 695 | 696 | ```sql 697 | SELECT 698 | u.user_id, 699 | u.full_name, 700 | z.zap_id, 701 | z.owned_by 702 | FROM 703 | ref_users u 704 | LEFT JOIN 705 | ref_zaps_joins z 706 | ON 707 | u.user_id = z.owned_by 708 | LIMIT 10; 709 | ``` 710 | 711 | ```sql 712 | +---------+-------------------+--------+----------+ 713 | | user_id | full_name | zap_id | owned_by | 714 | +---------+-------------------+--------+----------+ 715 | | 1 | MacPherson, Addie | 411 | 1 | 716 | | 2 | Airla, Valaree | 794 | 2 | 717 | | 3 | Nett, Sheppard | NULL | NULL | 718 | | 4 | Kirschner, Robby | 830 | 4 | 719 | | 5 | Bilski, Lewiss | 697 | 5 | 720 | | 6 | Yamauchi, Marleah | 942 | 6 | 721 | | 6 | Yamauchi, Marleah | 110 | 6 | 722 | | 7 | Calore, Ania | 772 | 7 | 723 | | 8 | Breger, Gratiana | 676 | 8 | 724 | | 9 | Serafina, Janith | 715 | 9 | 725 | +---------+-------------------+--------+----------+ 726 | 10 rows in set (0.09 sec) 727 | ``` 728 | 729 | Of course, we previously put a foreign key on `zaps.owned_by`, precisely to prevent this kind of thing from happening. Still, you can see how this kind of query could be useful. 730 | 731 | #### Right Outer Join 732 | 733 | This is the same thing, but with the tables reversed: 734 | 735 | ```sql 736 | SELECT 737 | u.user_id, 738 | u.full_name, 739 | z.zap_id, 740 | z.owned_by 741 | FROM 742 | ref_users u 743 | RIGHT JOIN 744 | ref_zaps_joins z 745 | ON 746 | u.user_id = z.owned_by 747 | LIMIT 10; 748 | ``` 749 | 750 | ```sql 751 | +---------+------------------+--------+----------+ 752 | | user_id | full_name | zap_id | owned_by | 753 | +---------+------------------+--------+----------+ 754 | | 602 | Hirz, Datha | 1 | 602 | 755 | | 593 | Meldoh, Vergil | 2 | 593 | 756 | | NULL | NULL | 3 | 0 | 757 | | 548 | Philps, Ardelia | 4 | 548 | 758 | | 957 | Joash, Electra | 5 | 957 | 759 | | 777 | Levinson, Lenore | 6 | 777 | 760 | | 648 | Vas, Tiphanie | 7 | 648 | 761 | | 959 | Brink, Kaia | 8 | 959 | 762 | | 569 | Lasser, Garrard | 9 | 569 | 763 | | 429 | Adamsen, Justen | 10 | 429 | 764 | +---------+------------------+--------+----------+ 765 | 10 rows in set (0.09 sec) 766 | ``` 767 | 768 | You can translate any `LEFT JOIN` to a `RIGHT JOIN` simply by swapping the order of the tables being joined: 769 | 770 | ```sql 771 | SELECT 772 | u.user_id, 773 | u.full_name, 774 | z.zap_id, 775 | z.owned_by 776 | FROM 777 | ref_zaps_joins z 778 | RIGHT JOIN 779 | ref_users u 780 | ON 781 | u.user_id = z.owned_by 782 | LIMIT 10; 783 | ``` 784 | 785 | ```sql 786 | +---------+-------------------+--------+----------+ 787 | | user_id | full_name | zap_id | owned_by | 788 | +---------+-------------------+--------+----------+ 789 | | 1 | MacPherson, Addie | 411 | 1 | 790 | | 2 | Airla, Valaree | 794 | 2 | 791 | | 3 | Nett, Sheppard | NULL | NULL | 792 | | 4 | Kirschner, Robby | 830 | 4 | 793 | | 5 | Bilski, Lewiss | 697 | 5 | 794 | | 6 | Yamauchi, Marleah | 942 | 6 | 795 | | 6 | Yamauchi, Marleah | 110 | 6 | 796 | | 7 | Calore, Ania | 772 | 7 | 797 | | 8 | Breger, Gratiana | 676 | 8 | 798 | | 9 | Serafina, Janith | 715 | 9 | 799 | +---------+-------------------+--------+----------+ 800 | 10 rows in set (0.15 sec) 801 | ``` 802 | 803 | #### Full Outer Join 804 | 805 | This is `users OR zaps` with a predicate and default value (`NULL`) for both tables. MySQL doesn't support `FULL JOIN` as a keyword, but it can be performed using `UNION` (or `UNION ALL` if duplicates are desired). 806 | 807 | NOTE: This query will produce 1150 rows as written. 808 | 809 | ```sql 810 | SELECT 811 | u.user_id, 812 | u.full_name, 813 | z.zap_id, 814 | z.owned_by 815 | FROM 816 | ref_users u 817 | LEFT JOIN ref_zaps_joins z ON u.user_id = z.owned_by 818 | UNION ALL 819 | SELECT 820 | u.user_id, 821 | u.full_name, 822 | z.zap_id, 823 | z.owned_by 824 | FROM 825 | ref_users u 826 | RIGHT JOIN ref_zaps_joins z ON u.user_id = z.owned_by 827 | WHERE 828 | u.user_id IS NULL; 829 | ``` 830 | 831 | To efficiently see what it's doing, you can run two queries, appending `ORDER BY -user_id DESC` and `ORDER BY user_id`, which represents the top and bottom of the result. Don't forget to add a `LIMIT` as well! 832 | 833 |
834 | What is -user_id? 835 | 836 | It's shorthand for the math expression `(0 - user_id)`, which effectively is the same thing as `ORDER BY ... ASC`, but it places `NULL` values last. Postgres avoids this weird trick and just has the `NULLS {FIRST, LAST}` option for ordering. 837 |
838 | 839 | ### Specifying a column's table 840 | 841 | You may have noticed that we've used aliases for many tables, e.g. `ref_users u`, and then notating columns with that alias as a prefix, e.g. `u.user_id`. This is not required for single tables, of course, nor is it requires with joins if every column name is unique. However, it's considered a good practice when using multiple tables. 842 | 843 | ### Indices 844 | 845 | Indices, or indexes, _may_ speed up queries. Each table **should** have a primary key (it's not required*, but, please don't do this), which is one index. Additional indices, on single or multiple columns, may be created. Most of them are stored in [B+ trees](https://en.wikipedia.org/wiki/B%2B_tree), which are similar to [B-trees](https://en.wikipedia.org/wiki/B-tree). 846 | 847 | Indices aren't free, however - when you create an index on a column, that column's values are copied to the aforementioned B+ tree. While disk space is relatively cheap, creating dozens of indices for columns that are infrequently queried should be avoided. Also, since `INSERTs` must also write to the index, they'll be slowed down somewhat. Finally, InnoDB limits a given table to a maximum of 64 secondary indices (that is, other than primary keys). 848 | 849 |
850 | Obscure facts about tables without primary keys 851 | 852 | \* Prior to MySQL 8.0.30, if you don't create a primary key, the first `UNIQUE NOT NULL` index created is automatically promoted to become the primary key. If you don't have one of those either, the table will have no primary key†. Starting with MySQL 8.0.30, if no primary key is declared, an invisible column will be created called `my_row_id` and set to be the primary key. 853 | 854 | † Not entirely true. A hidden index named `GEN_CLUST_INDEX` is created on an invisible (but a special kind of invisible, that you can never view) column named `ROW_ID` containing row IDs, but it's a monotonically increasing index that's shared globally across the entire database, not just that schema. Don't make InnoDB do this. 855 |
856 | 857 | #### Single indices 858 | 859 | Here, we'll switch over to `%_big` tables, which have 1,000,000 rows each. 860 | 861 | ```sql 862 | SELECT 863 | user_id, 864 | full_name, 865 | city, 866 | country 867 | FROM 868 | ref_users_big 869 | WHERE 870 | last_name = 'Safko'; 871 | ``` 872 | 873 | ```sql 874 | +---------+------------------+------------------------+----------------+ 875 | | user_id | full_name | city | country | 876 | +---------+------------------+------------------------+----------------+ 877 | | 66826 | Safko, Elwyn | Arad | Romania | 878 | | 68759 | Safko, Vance | Saint-Jérôme | Canada | 879 | | 81384 | Safko, Robinett | Hornchurch | United Kingdom | 880 | | 92580 | Safko, Daisi | Sherwood Park | Canada | 881 | | 121219 | Safko, Karalee | Miami Gardens | United States | 882 | | 124408 | Safko, Kyrstin | Hawick | United Kingdom | 883 | | 150615 | Safko, Kleon | Leigh | United Kingdom | 884 | | 151266 | Safko, Elita | Abag Qi | China | 885 | | 155926 | Safko, Berthe | Tullebølle | Denmark | 886 | | 168897 | Safko, Hazlett | Valletta | Malta | 887 | | ... | ... | ... | ... | 888 | | 900935 | Safko, Tommy | Paris | France | 889 | | 925514 | Safko, Rancell | Nampa | United States | 890 | | 928486 | Safko, Garry | Bardhaman | India | 891 | | 932457 | Safko, Desiree | Kherson | Ukraine | 892 | | 945316 | Safko, Courtnay | Saint Marys | Canada | 893 | | 947072 | Safko, Leonie | Durango | Mexico | 894 | | 948263 | Safko, Jarred | Las Vegas | United States | 895 | | 959464 | Safko, Gordie | Madison | United States | 896 | | 972002 | Safko, Adriena | Ubud | Indonesia | 897 | | 982089 | Safko, Gan | Milpitas | United States | 898 | +---------+------------------+------------------------+----------------+ 899 | 76 rows in set (12.24 sec) 900 | ``` 901 | 902 | Let's create an index on the last name. 903 | 904 | ```sql 905 | CREATE INDEX last_name ON ref_users_big (last_name); 906 | ``` 907 | 908 | ```sql 909 | Query OK, 0 rows affected (45.08 sec) 910 | Records: 0 Duplicates: 0 Warnings: 0 911 | ``` 912 | 913 | ```sql 914 | SELECT * FROM ref_users_big WHERE last_name = 'Safko'; 915 | ``` 916 | 917 | ```sql 918 | -- the same results as above 919 | 76 rows in set (0.04 sec) 920 | ``` 921 | 922 | The lookup is now essentially instantaneous. If this is a frequently performed query, this may be a wise decision. There are also times when you may not need an index - for example, remember that a `UNIQUE` constraint is also an index. Since all of our users in this table have an email address which is `first.last@domain.com`, you might be tempted to add a predicate of `WHERE email LIKE '%safko%'` instead of adding an index, but alas - leading wildcards disallow the use of indexes, so it requires a full table scan. 923 | 924 | #### Partial indices 925 | 926 | Starting with MySQL 8.0.13, you can also create an index on a prefix of a column for string types (`CHAR`, `VARCHAR`, etc.), and for `TEXT` and `BLOB` columns you must do this. 927 | 928 | This will create an index on the first 3 characters of last_name: 929 | 930 | ```sql 931 | ALTER TABLE ref_users_big DROP INDEX user_name; 932 | CREATE INDEX last_name_partial ON ref_users_big (last_name(3)); 933 | ``` 934 | 935 | ```sql 936 | Query OK, 0 rows affected (0.31 sec) 937 | Records: 0 Duplicates: 0 Warnings: 0 938 | 939 | Query OK, 0 rows affected (37.85 sec) 940 | Records: 0 Duplicates: 0 Warnings: 0 941 | ``` 942 | 943 | Speed for the new query is slower than before (0.16 seconds vs. 0.04 seconds), as expected, but 160 milliseconds for hashing three characters honestly isn't that bad. If you have tremendously large tables, limited disk space, or are worried about the write performance impact, this may be a good option for you. 944 | 945 | #### Functional indices 946 | 947 | You can also create an index that is itself an expression: 948 | 949 | ```sql 950 | CREATE INDEX 951 | created_month 952 | ON ref_users_big ((MONTH(created_at))); 953 | ``` 954 | 955 | Note the double parentheses around the expression. 956 | 957 | ```sql 958 | Query OK, 0 rows affected (41.15 sec) 959 | Records: 0 Duplicates: 0 Warnings: 0 960 | ``` 961 | 962 | What this specifically allows you to do is treat the `created_at` month value as an integer: 963 | 964 | ```sql 965 | EXPLAIN ANALYZE SELECT 966 | user_id, email, created_at 967 | FROM 968 | ref_users_big 969 | WHERE 970 | MONTH(created_at) = 6\G 971 | ``` 972 | 973 | ```sql 974 | *************************** 1. row *************************** 975 | EXPLAIN: -> Index lookup on ref_users_big using created_month (month(created_at)=6) (cost=19952.91 rows=153858) (actual time=2.303..12051.690 rows=82815 loops=1) 976 | 977 | 1 row in set (15.49 sec) 978 | ``` 979 | 980 | Note that in this case, it's actually _slower_ with the index, likely due to the cardinality of the month. 981 | 982 | ```sql 983 | EXPLAIN ANALYZE SELECT 984 | user_id, email, created_at 985 | FROM 986 | ref_users_big 987 | USE INDEX() 988 | WHERE 989 | MONTH(created_at) = 6\G 990 | ``` 991 | 992 | ```sql 993 | *************************** 1. row *************************** 994 | EXPLAIN: -> Filter: (month(ref_users_big.created_at) = 6) (cost=100955.37 rows=994330) (actual time=1.114..11135.192 rows=82815 loops=1) 995 | -> Table scan on ref_users_big (cost=100955.37 rows=994330) (actual time=1.010..9733.530 rows=1000000 loops=1) 996 | 997 | 1 row in set (11.43 sec) 998 | ``` 999 | 1000 | #### JSON / Longtext 1001 | 1002 | JSON has its own special requirements to be indexed, mostly if you're storing strings. First, you must select a specific part of the column's rows to be the indexed key, known as a functional key part. Additionally, the key has to have a prefix length assigned to it. Depending on the version of MySQL you're using, there may also be collation differences between the return value from various JSON functions and native storage of strings. Finally, this requires the stored data to be `k:v` objects, rather than arrays. 1003 | 1004 | Here, we're using a multi-valued index, which behind the scenes is creating a virtual, invisible column to store the extracted JSON array as a character array. 1005 | 1006 | ```sql 1007 | CREATE INDEX user_json_array_key ON gensql ( 1008 | ( 1009 | CAST( 1010 | user_json -> '$.b_key.c_key' AS CHAR(64) ARRAY 1011 | ) 1012 | ) 1013 | ); 1014 | ``` 1015 | 1016 | See [MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-multi-valued) for more information on indexing JSON values, and properly using them. 1017 | 1018 | #### Composite indices 1019 | 1020 | An index can also be created across multiple columns - for InnoDB, up to 16. 1021 | 1022 | ```sql 1023 | CREATE INDEX full_name ON ref_users (first_name, last_name); 1024 | ``` 1025 | 1026 | ```sql 1027 | Query OK, 0 rows affected (40.09 sec) 1028 | Records: 0 Duplicates: 0 Warnings: 0 1029 | ``` 1030 | 1031 | First, we'll use `IGNORE INDEX` to direct SQL to ignore the index we just created. This query counts the duplicate name tuples. Since the `id` is being included, and `GROUPing` it would result in an empty set (as it's the primary key, and thus guaranteed to be unique), `ANY VALUE` must be specified to let MySQL know that the result can be non-deterministic. Finally, `EXPLAIN ANALYZE` is being used to run the query, and explain what it's doing. This differs from `EXPLAIN`, which guesses at what would be done, but doesn't actually perform the query. Be careful using `EXPLAIN ANALYZE`, especially with destructive actions, since those queries will actually be performed! 1032 | 1033 | ```sql 1034 | EXPLAIN ANALYZE 1035 | SELECT 1036 | ANY_VALUE(id), 1037 | first_name, 1038 | last_name, 1039 | COUNT(*) c 1040 | FROM 1041 | ref_users_big 1042 | IGNORE INDEX(full_name) 1043 | GROUP BY 1044 | first_name, 1045 | last_name 1046 | HAVING 1047 | c > 1\G 1048 | ``` 1049 | 1050 | ```sql 1051 | *************************** 1. row *************************** 1052 | EXPLAIN: -> Filter: (c > 1) (actual time=23295.903..24686.641 rows=4318 loops=1) 1053 | -> Table scan on (actual time=0.005..903.621 rows=995670 loops=1) 1054 | -> Aggregate using temporary table (actual time=23295.727..24415.358 rows=995670 loops=1) 1055 | -> Table scan on ref_users_big (cost=104920.32 rows=995522) (actual time=2.329..10156.102 rows=1000000 loops=1) 1056 | 1057 | 1 row in set (25.26 sec) 1058 | ``` 1059 | 1060 | The query took 25.26 seconds, and resulted in 4318 rows. The output is read from the bottom up - a table scan was performed on the entire table, then a temporary table with the `GROUP BY` aggregation was created, and finally a second table scan on that temporary table was performed to find the duplicated tuples. 1061 | 1062 | If you're curious, `actual time` is in milliseconds, and consists of two timings - the first is the time to initiate the step and return the first row; the second is the time to initiate the step and return all rows. `cost` is an arbitrary number indicating what the query cost optimizer thinks the query costs to perform, and is meaningless. 1063 | 1064 | ```sql 1065 | EXPLAIN ANALYZE 1066 | SELECT 1067 | ANY_VALUE(id), 1068 | first_name, 1069 | last_name, 1070 | COUNT(*) c 1071 | FROM 1072 | ref_users_big 1073 | GROUP BY 1074 | first_name, 1075 | last_name 1076 | HAVING 1077 | c > 1\G 1078 | ``` 1079 | 1080 | ```sql 1081 | *************************** 1. row *************************** 1082 | EXPLAIN: -> Filter: (c > 1) (actual time=6.318..12202.646 rows=4318 loops=1) 1083 | -> Group aggregate: count(0) (actual time=0.864..11447.233 rows=995670 loops=1) 1084 | -> Index scan on ref_users_big using full_name (cost=104920.32 rows=995522) (actual time=0.815..7315.098 rows=1000000 loops=1) 1085 | 1086 | 1 row in set (12.32 sec) 1087 | ``` 1088 | 1089 | With the index in place, an index scan is performed instead of two table scans, resulting in a ~2x speedup. 1090 | 1091 | Another example, retreiving a specific doubled tuple that I know exists: 1092 | 1093 | ```sql 1094 | SELECT 1095 | user_id, 1096 | full_name, 1097 | email, 1098 | city, 1099 | country 1100 | FROM 1101 | ref_users_big 1102 | WHERE 1103 | first_name = 'Ashlie' 1104 | AND 1105 | last_name = 'Godred'; 1106 | ``` 1107 | 1108 | ```sql 1109 | +---------+----------------+-------------------------+----------+--------------+ 1110 | | user_id | full_name | email | city | country | 1111 | +---------+----------------+-------------------------+----------+--------------+ 1112 | | 974206 | Godred, Ashlie | ashlie.godred@mushy.com | Mikkeli | Finland | 1113 | | 987301 | Godred, Ashlie | ashlie.godred@suave.com | Pretoria | South Africa | 1114 | +---------+----------------+-------------------------+----------+--------------+ 1115 | 2 rows in set (0.01 sec) 1116 | ``` 1117 | 1118 | vs. if `USE INDEX()` is added to the query: 1119 | 1120 | ```sql 1121 | +---------+----------------+-------------------------+----------+--------------+ 1122 | | user_id | full_name | email | city | country | 1123 | +---------+----------------+-------------------------+----------+--------------+ 1124 | | 974206 | Godred, Ashlie | ashlie.godred@mushy.com | Mikkeli | Finland | 1125 | | 987301 | Godred, Ashlie | ashlie.godred@suave.com | Pretoria | South Africa | 1126 | +---------+----------------+-------------------------+----------+--------------+ 1127 | 2 rows in set (14.60 sec) 1128 | ``` 1129 | 1130 | Note that `USE INDEX()` is valid syntax to tell MySQL to ignore all indexes. 1131 | 1132 | If instead, either the `full_name` or `last_name_partial` index we made perviously is ignored on its own, its complement will be used, and they're effectively equally fast due to the filtered result set - here, using the partial index on `last_name` dropped the candidate tuples from 1,000,000 to 1,066. 1133 | 1134 | ```sql 1135 | EXPLAIN ANALYZE 1136 | SELECT 1137 | user_id, 1138 | full_name, 1139 | email, 1140 | city, 1141 | country 1142 | FROM 1143 | ref_users_big IGNORE INDEX(full_name) 1144 | WHERE 1145 | first_name = 'Ashlie' 1146 | AND 1147 | last_name = 'Godred'\G 1148 | ``` 1149 | 1150 | ```sql 1151 | *************************** 1. row *************************** 1152 | EXPLAIN: -> Filter: ((ref_users_big.last_name = 'Godred') and (ref_users_big.first_name = 'Ashlie')) (cost=641.79 rows=0) (actual time=315.346..322.278 rows=2 loops=1) 1153 | -> Index lookup on ref_users_big using last_name_partial (last_name='Godred') (cost=641.79 rows=1066) (actual time=6.602..317.360 rows=1066 loops=1) 1154 | 1155 | 1 row in set (0.34 sec) 1156 | ``` 1157 | #### Testing indices 1158 | 1159 | MySQL 8 added the ability to toggle an index on and off, without actually dropping it. This way, if you want to test whether or not an index is helpful, you can toggle it off, observe query performance, and then decide whether or not to leave it. 1160 | 1161 | ```sql 1162 | ALTER TABLE ref_users ALTER INDEX first_name INVISIBLE; 1163 | ``` 1164 | 1165 | ```sql 1166 | Query OK, 0 rows affected (0.28 sec) 1167 | Records: 0 Duplicates: 0 Warnings: 0 1168 | ``` 1169 | 1170 | ```sql 1171 | EXPLAIN ANALYZE 1172 | SELECT 1173 | user_id, 1174 | full_name, 1175 | email, 1176 | city, 1177 | country 1178 | FROM 1179 | ref_users_big 1180 | WHERE 1181 | first_name = 'Ashlie' 1182 | AND 1183 | last_name = 'Godred'\G 1184 | ``` 1185 | 1186 | ```sql 1187 | *************************** 1. row *************************** 1188 | EXPLAIN: -> Filter: ((ref_users_big.last_name = 'Godred') and (ref_users_big.first_name = 'Ashlie')) (cost=641.79 rows=0) (actual time=315.346..322.278 rows=2 loops=1) 1189 | -> Index lookup on ref_users_big using last_name_partial (last_name='Godred') (cost=641.79 rows=1066) (actual time=6.602..317.360 rows=1066 loops=1) 1190 | 1191 | 1 row in set (0.34 sec) 1192 | ``` 1193 | 1194 | #### Descending indices 1195 | 1196 | By default, indices are sorted in ascending order. While they can still be used when reversed, it's not as fast (although the performance difference may be minimal - test your theory before committing to it). If you are frequently querying something with `ORDER BY DESC`, it may be helpful to instead create an index in descending order. 1197 | 1198 | ```sql 1199 | CREATE INDEX first_desc ON ref_users_big (first_name DESC); 1200 | ``` 1201 | 1202 | ```sql 1203 | Query OK, 0 rows affected (41.18 sec) 1204 | Records: 0 Duplicates: 0 Warnings: 0 1205 | ``` 1206 | 1207 | #### When indicies aren't helpful 1208 | 1209 | You may have noticed in a few of the previous `EXPLAIN ANALYZE` statements two different kinds of inner joins - `nested loop inner join`, and `inner hash join`. A nested loop join is exactly what it sounds like: 1210 | 1211 | ```python 1212 | for tuple_i in table_1: 1213 | for tuple_j in table_2 1214 | if join_is_satisfied(tuple_i, tuple_j): 1215 | yield (tuple_i, tuple_j) 1216 | ``` 1217 | 1218 | This has `O(MN)` time complexity, where `M` and `N` are the number of tuples in each table. If there's an index, the 2nd loop is using it for the lookup rather than another table scan, which makes the time complexity `O(Mlog(N))`, but with large sizes this is still quite bad. Here is an example on two tables with one million rows each: 1219 | 1220 | ```sql 1221 | EXPLAIN ANALYZE 1222 | SELECT 1223 | full_name 1224 | FROM 1225 | ref_users_big 1226 | JOIN 1227 | ref_zaps_big 1228 | ON 1229 | ref_users_big.user_id = ref_zaps_big.owned_by\G 1230 | ``` 1231 | 1232 | ```sql 1233 | *************************** 1. row *************************** 1234 | EXPLAIN: -> Nested loop inner join (cost=498015.60 rows=993197) (actual time=6.998..360927.896 rows=1000000 loops=1) 1235 | -> Table scan on zaps (cost=100160.95 rows=993197) (actual time=6.685..8804.370 rows=1000000 loops=1) 1236 | -> Single-row index lookup on ref_users using user_id (user_id=zaps.owned_by) (cost=0.30 rows=1) (actual time=0.350..0.350 rows=1 loops=1000000) 1237 | 1238 | 1 row in set (6 min 2.41 sec) 1239 | ``` 1240 | 1241 | A better solution is a hash join, specifically a grace hash join, named after the GRACE database created in the 1980s at the University of Tokyo, which pioneered this method. 1242 | 1243 | ```python 1244 | dict_table_1 = {id: row for id, row in table_1} 1245 | dict_table_2 = {id: row for id, row in table_2} 1246 | for tuple_i in dict_table_1.items(): 1247 | for tuple_j in dict_table_2.items(): 1248 | if join_is_satisfied(tuple_i, tuple_j): 1249 | yield (tuple_i, tuple_j) 1250 | ``` 1251 | 1252 | While this looks very similar, there are details I've glossed over about the partioning method (it's recursive), and of course hash lookups are (optimally) `O(1)`, which speeds things up tremendously. The total time complexity for this method is `3(M+N)`. 1253 | 1254 | MySQL [added a hash join in 8.0.18](https://dev.mysql.com/blog-archive/hash-join-in-mysql-8/), but it comes with some limitations; chiefly that a table must fit into memory, and annoyingly, that the optimizer will often decide to use a nested loop if indexes exist. If it can be used, though, compare the difference: 1255 | 1256 | ```sql 1257 | EXPLAIN ANALYZE 1258 | SELECT 1259 | full_name 1260 | FROM 1261 | ref_users 1262 | IGNORE INDEX (user_id) 1263 | JOIN 1264 | zaps 1265 | ON 1266 | ref_users.user_id = zaps.owned_by\G 1267 | ``` 1268 | 1269 | ```sql 1270 | *************************** 1. row *************************** 1271 | EXPLAIN: -> Inner hash join (ref_users.user_id = zaps.owned_by) (cost=98991977261.77 rows=993197) (actual time=7814.295..21403.160 rows=1000000 loops=1) 1272 | -> Table scan on ref_users (cost=0.03 rows=996699) (actual time=0.402..9319.650 rows=1000000 loops=1) 1273 | -> Hash 1274 | -> Table scan on zaps (cost=100160.95 rows=993197) (actual time=4.566..6810.026 rows=1000000 loops=1) 1275 | 1276 | 1 row in set (21.93 sec) 1277 | ``` 1278 | 1279 | #### HAVING 1280 | 1281 | Earlier, we used `HAVING` in a `GROUP BY` aggregation. The difference between the two is that `WHERE` filters the results before they're sent to be aggregated, whereas `HAVING` filters the aggregation, and thus predicates relying on the aggregation result can be used. It's not limited to only aggregation results, though - a common use case is to allow the use of aliases or subquery results in filtering. Be aware that it's generally more performant to use `WHERE` if possible (consider re-writing your query if it isn't), but sometimes, you need it. 1282 | 1283 | ```sql 1284 | SELECT 1285 | ref_users_big.city, 1286 | COUNT(ref_zaps_big.zap_id) as zap_count 1287 | FROM 1288 | ref_users_big 1289 | LEFT JOIN 1290 | ref_zaps_big 1291 | ON 1292 | ref_users_big.user_id = ref_zaps_big.owned_by 1293 | GROUP BY 1294 | ref_users_big.city 1295 | HAVING 1296 | zap_count > 250; 1297 | ``` 1298 | 1299 | ```sql 1300 | +----------+-----------+ 1301 | | city | zap_count | 1302 | +----------+-----------+ 1303 | | Hsin-chu | 260 | 1304 | | Vitória | 293 | 1305 | | Cordoba | 290 | 1306 | | Gdañsk | 292 | 1307 | +----------+-----------+ 1308 | 4 rows in set (32.86 sec) 1309 | ``` 1310 | 1311 | ## Query optimization 1312 | 1313 | Finally into the fun stuff! 1314 | 1315 | First, I'll spoil a lot of this - it's likely that you won't have to do much of this. MySQL's optimizer is actually pretty decent. That said, there are times when you will, and knowing what _should_ be happening, and how to compare it to what is actually happening is a useful skill. 1316 | 1317 | ### SELECT * 1318 | 1319 | If you're just exploring a schema, there's nothing wrong with `SELECT * FROM
LIMIT 10` or some other small number (< ~1000). It will be nearly instantaneous. However, the problem arises when you're also using `ORDER BY`. Recall that we had a composite index on `(first_name, last_name)` called `full_name`. Compare these two: 1320 | 1321 | ```sql 1322 | EXPLAIN ANALYZE 1323 | SELECT 1324 | * 1325 | FROM 1326 | ref_users_big 1327 | ORDER BY 1328 | first_name, 1329 | last_name\G 1330 | ``` 1331 | 1332 | ```sql 1333 | *************************** 1. row *************************** 1334 | EXPLAIN: -> Sort: ref_users.first_name, ref_users.last_name (cost=100495.40 rows=996699) (actual time=12199.513..12603.379 rows=1000000 loops=1) 1335 | -> Table scan on ref_users (cost=100495.40 rows=996699) (actual time=1.755..7039.004 rows=1000000 loops=1) 1336 | 1337 | 1 row in set (13.68 sec) 1338 | ``` 1339 | 1340 | ```sql 1341 | EXPLAIN ANALYZE 1342 | SELECT 1343 | user_id, 1344 | first_name, 1345 | last_name 1346 | FROM 1347 | ref_users_big 1348 | ORDER BY 1349 | first_name, 1350 | last_name\G 1351 | ``` 1352 | 1353 | ```sql 1354 | *************************** 1. row *************************** 1355 | EXPLAIN: -> Index scan on ref_users using full_name (cost=100495.40 rows=996699) (actual time=0.433..5413.188 rows=1000000 loops=1) 1356 | 1357 | 1 row in set (6.39 sec) 1358 | ``` 1359 | 1360 | Since the the table includes columns not covered by the index (`user_id`), it would take longer to use the index and then find columns not in the index than to just do a table scan. Observe: 1361 | 1362 | ```sql 1363 | EXPLAIN ANALYZE 1364 | SELECT 1365 | * 1366 | FROM 1367 | ref_users 1368 | FORCE INDEX(full_name) 1369 | ORDER BY 1370 | first_name, 1371 | last_name\G 1372 | ``` 1373 | 1374 | ```sql 1375 | *************************** 1. row *************************** 1376 | EXPLAIN: -> Index scan on ref_users using full_name (cost=348844.90 rows=996699) (actual time=11.273..65858.816 rows=1000000 loops=1) 1377 | 1378 | 1 row in set (1 min 7.13 sec) 1379 | ``` 1380 | 1381 | In comparison, if your `ORDER BY` is covered by the index (the primary key - `user_id` here - is implicitly part of indices, and thus doesn't cause a slowdown), queries can use it, and are much faster! If you're writing software that will be accessing a database, and you don't actually need all of the columns, don't request them. Take the time to be deliberate in what you request. 1382 | 1383 | ### OFFSET / LIMIT 1384 | 1385 | If you need to get `n` rows from the middle of a table, unless you have a really good reason to do so, please don't do this: 1386 | 1387 | ```sql 1388 | -- The alternate form (and, IMO, the clearer one) is LIMIT 10 OFFSET 500000 1389 | SELECT 1390 | user_id, 1391 | full_name 1392 | FROM 1393 | ref_users_big 1394 | LIMIT 500000,10; 1395 | ``` 1396 | 1397 | ```sql 1398 | +---------+-------------------+ 1399 | | user_id | full_name | 1400 | +---------+-------------------+ 1401 | | 500001 | Ader, Wilona | 1402 | | 500002 | Lindsley, Angy | 1403 | | 500003 | Scarito, Vladimir | 1404 | | 500004 | Hoenack, Rossy | 1405 | | 500005 | Cooley, Theobald | 1406 | | 500006 | Pineda, Gaven | 1407 | | 500007 | Harberd, Odie | 1408 | | 500008 | Engleman, Mendy | 1409 | | 500009 | Michon, Dionysus | 1410 | | 500010 | Seaden, Leigha | 1411 | +---------+-------------------+ 1412 | 10 rows in set (6.29 sec) 1413 | ``` 1414 | 1415 | Doing this causes a table scan up to the specified offset. Far better, if you have a known monotonic number (like `id`), is to use a `WHERE` predicate: 1416 | 1417 | ```sql 1418 | SELECT 1419 | user_id, 1420 | full_name 1421 | FROM 1422 | ref_users_big 1423 | WHERE user_id > 500000 1424 | LIMIT 10; 1425 | ``` 1426 | 1427 | ```sql 1428 | +---------+-------------------+ 1429 | | user_id | full_name | 1430 | +---------+-------------------+ 1431 | | 500001 | Ader, Wilona | 1432 | | 500002 | Lindsley, Angy | 1433 | | 500003 | Scarito, Vladimir | 1434 | | 500004 | Hoenack, Rossy | 1435 | | 500005 | Cooley, Theobald | 1436 | | 500006 | Pineda, Gaven | 1437 | | 500007 | Harberd, Odie | 1438 | | 500008 | Engleman, Mendy | 1439 | | 500009 | Michon, Dionysus | 1440 | | 500010 | Seaden, Leigha | 1441 | +---------+-------------------+ 1442 | 10 rows in set (0.02 sec) 1443 | ``` 1444 | 1445 | Using `user_id` as the filter allows it to be used for an index range scan, which is nearly instant. If you were doing this programmatically to support pagination, the last value of `id` could be used for the next iteration's predicate. 1446 | 1447 | ### DISTINCT 1448 | 1449 | `DISTINCT` is a very useful keyword for many operations when you want to not show duplicates. Unfortunately, it also adds a fairly hefty load to the database. That's not to say you _can't_ use it, but when writing code that will end up using this, ask yourself if you could intead handle de-duplication in the application. This also comes with tradeoffs, of course - you're now pulling more data over the network, and increasing load on the application. Generally speaking, databases are bound first by disk and memory, rather than CPU or network, so using compression (increased CPU load) and/or sending more data (not using `DISTINCT`) tends to increase overall performance, but you should experiment and profile your code. 1450 | 1451 | This also tends to be something that works well early on with little load, but as either the database or application grows, it becomes unwieldy. 1452 | 1453 | ```sql 1454 | EXPLAIN ANALYZE 1455 | SELECT 1456 | first_name, 1457 | last_name 1458 | FROM 1459 | ref_users_big\G 1460 | ``` 1461 | 1462 | ```sql 1463 | *************************** 1. row *************************** 1464 | EXPLAIN: -> Table scan on ref_users_big (cost=101365.53 rows=995522) (actual time=1.815..7213.716 rows=1000000 loops=1) 1465 | 1466 | 1 row in set (8.13 sec) 1467 | ``` 1468 | 1469 | ```sql 1470 | EXPLAIN ANALYZE 1471 | SELECT DISTINCT 1472 | first_name, 1473 | last_name 1474 | FROM 1475 | ref_users_big\G 1476 | ``` 1477 | 1478 | ```sql 1479 | EXPLAIN: -> Table scan on (actual time=0.005..765.220 rows=995670 loops=1) 1480 | -> Temporary table with deduplication (cost=101050.45 rows=995522) (actual time=15306.678..16296.289 rows=995670 loops=1) 1481 | -> Table scan on ref_users_big (cost=101050.45 rows=995522) (actual time=0.825..8718.651 rows=1000000 loops=1) 1482 | 1483 | 1 row in set (17.73 sec) 1484 | ``` 1485 | ## Cleanup 1486 | 1487 | This isn't something you'll do often, if at all, so may as well do so now, eh? 1488 | 1489 | ```sql 1490 | DROP SCHEMA foo; 1491 | ``` 1492 | 1493 | ```sql 1494 | Query OK, 0 rows affected (0.05 sec) 1495 | ``` 1496 | -------------------------------------------------------------------------------- /mysql/mysql-102.md: -------------------------------------------------------------------------------- 1 | # MySQL 102 - WIP 2 | 3 | ### WITH (Common Table Expressions) 4 | 5 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/with.html) 6 | 7 | `WITH` can be used to create a temporary named result set, scoped to the statement in which it exists. They can also be recursive. A demonstration that's probably not useful in reality follows, but it does demonstrate how MySQL can be made to use indexes, even when it normally couldn't. Here, we're trying to select a random row from a large table. The row ID is selected with a sub-query that multiplies the output of `RAND()` (a float between 0-1) by the last `id` row in the table. 8 | 9 | ```sql 10 | mysql> 11 | EXPLAIN ANALYZE SELECT 12 | * 13 | FROM 14 | ref_users 15 | WHERE 16 | id = ( 17 | SELECT 18 | FLOOR( 19 | ( 20 | SELECT 21 | RAND() * ( 22 | SELECT 23 | id 24 | FROM 25 | ref_users 26 | ORDER BY 27 | id DESC 28 | LIMIT 29 | 1 30 | ) 31 | ) 32 | ) 33 | ); 34 | *************************** 1. row *************************** 35 | EXPLAIN: -> Filter: (ref_users.id = floor((rand() * (select #4)))) (cost=10799.04 rows=99735) (actual time=1545.462..8220.073 rows=3 loops=1) 36 | -> Table scan on ref_users (cost=10799.04 rows=997354) (actual time=0.441..6723.994 rows=1000000 loops=1) 37 | -> Select #4 (subquery in condition; run only once) 38 | -> Limit: 1 row(s) (cost=0.00 rows=1) (actual time=0.079..0.079 rows=1 loops=1) 39 | -> Index scan on ref_users using PRIMARY (reverse) (cost=0.00 rows=1) (actual time=0.077..0.077 rows=1 loops=1) 40 | 41 | 1 row in set, 2 warnings (8.22 sec) 42 | ``` 43 | 44 | Since `RAND()` is evaluated for every row [when used with WHERE](https://dev.mysql.com/doc/refman/8.0/en/mathematical-functions.html#function_rand), it's not constant, and thus can't be used with indices. Also, you may wind up with more than one result! 45 | 46 | If instead the `RAND()` call is placed into a CTE, it can be optimized: 47 | 48 | ```sql 49 | mysql> 50 | EXPLAIN ANALYZE 51 | WITH rand AS ( 52 | SELECT 53 | FLOOR( 54 | ( 55 | SELECT 56 | RAND() * ( 57 | SELECT 58 | id 59 | FROM 60 | ref_users 61 | ORDER BY 62 | id DESC 63 | LIMIT 64 | 1 65 | ) 66 | ) 67 | ) 68 | ) 69 | SELECT 70 | * 71 | FROM 72 | ref_users 73 | WHERE 74 | id IN (TABLE rand); 75 | *************************** 1. row *************************** 76 | EXPLAIN: -> Nested loop inner join (cost=0.55 rows=1) (actual time=0.569..0.583 rows=1 loops=1) 77 | -> Filter: (``.`FLOOR((SELECT RAND() * (SELECT id FROM ref_users ORDER BY id DESC LIMIT 1)))` is not null) (cost=0.20 rows=1) (actual time=0.085..0.095 rows=1 loops=1) 78 | -> Table scan on (cost=0.20 rows=1) (actual time=0.005..0.012 rows=1 loops=1) 79 | -> Materialize with deduplication (cost=0.00 rows=1) (actual time=0.082..0.090 rows=1 loops=1) 80 | -> Filter: (rand.`FLOOR((SELECT RAND() * (SELECT id FROM ref_users ORDER BY id DESC LIMIT 1)))` is not null) (cost=0.00 rows=1) (actual time=0.017..0.023 rows=1 loops=1) 81 | -> Table scan on rand (cost=2.61 rows=1) (actual time=0.010..0.014 rows=1 loops=1) 82 | -> Materialize CTE rand (cost=0.00 rows=1) (actual time=0.013..0.018 rows=1 loops=1) 83 | -> Rows fetched before execution (cost=0.00 rows=1) (never executed) 84 | -> Select #5 (subquery in projection; run only once) 85 | -> Limit: 1 row(s) (cost=0.00 rows=1) (actual time=0.313..0.314 rows=1 loops=1) 86 | -> Index scan on ref_users using PRIMARY (reverse) (cost=0.00 rows=1) (actual time=0.310..0.310 rows=1 loops=1) 87 | -> Filter: (ref_users.id = ``.`FLOOR((SELECT RAND() * (SELECT id FROM ref_users ORDER BY id DESC LIMIT 1)))`) (cost=0.35 rows=1) (actual time=0.477..0.479 rows=1 loops=1) 88 | -> Single-row index lookup on ref_users using PRIMARY (id=``.`FLOOR((SELECT RAND() * (SELECT id FROM ref_users ORDER BY id DESC LIMIT 1)))`) (cost=0.35 rows=1) (actual time=0.468..0.469 rows=1 loops=1) 89 | 90 | 1 row in set, 1 warning (0.00 sec) 91 | ``` 92 | 93 | ## Stored Procedures 94 | 95 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html) 96 | 97 | Stored Procedures (and Stored Functions) are a way to write SQL as functions, to be called as needed. Most normal SQL queries are accepted, as well as conditionals, loops, and the ability to accept arguments and return values. The main difference between the two is that Stored Procedures may accept arguments and write out data to variables, whereas Stored Functions may accept arguments, and return a value. 98 | 99 | Their main advantage is that known, tested queries can be stored and later called from an application. Their main disadvantage is that they require people with reasonably good SQL skills to write them, else it's unlikely they'll exceed the performance of an ORM like Django. 100 | 101 | As an example, I used this to fill `zaps` with data (NOTE: this is not an example of a well-designed stored procedure, merely one that demonstrates a variety of concepts): 102 | 103 | ```sql 104 | DELIMITER // -- This is needed so that the individual commands don't end the stored procedure 105 | CREATE PROCEDURE insert_zaps(IN num_rows int, IN pct_shared float) -- Two input args are needed 106 | BEGIN 107 | DECLARE loop_count bigint; -- Variables are initialized with a type 108 | DECLARE len_table bigint; 109 | DECLARE rand_base float; 110 | DECLARE rand_offset float; 111 | DECLARE rand_ts timestamp; 112 | DECLARE rand_user bigint; 113 | DECLARE shared_with_user bigint; 114 | SELECT id INTO len_table FROM test.ref_users ORDER BY id DESC LIMIT 1; -- SELECT INTO can be used 115 | SET loop_count = 1; -- Or, if the value is simple, simply assigned 116 | WHILE loop_count <= num_rows DO 117 | SET rand_base = RAND(); 118 | SET rand_offset = RAND(); 119 | SET rand_ts = TIMESTAMP( 120 | FROM_UNIXTIME( 121 | UNIX_TIMESTAMP(NOW()) - FLOOR( 122 | 0 + ( 123 | RAND() * 86400 * 365 * 10 124 | ) 125 | ) 126 | ) 127 | ); -- This creates a random timestamp between now and 10 years ago 128 | WITH rand AS ( 129 | SELECT 130 | FLOOR( 131 | ( 132 | SELECT 133 | rand_base * len_table 134 | ) 135 | ) 136 | ) 137 | SELECT 138 | id 139 | INTO rand_user 140 | FROM 141 | test.ref_users 142 | WHERE 143 | id IN (TABLE rand); -- This is the CTE demonstrated earlier to determine the table length 144 | INSERT INTO zaps (zap_id, created_at, owned_by) VALUES (loop_count, rand_ts, rand_user); 145 | IF ROUND(rand_base, 1) > (1 - pct_shared) THEN -- Roughly determine the amount of shared Zaps 146 | SELECT CAST(FLOOR(rand_base * rand_offset * len_table) AS unsigned) INTO shared_with_user; 147 | UPDATE 148 | zaps 149 | SET 150 | shared_with = JSON_ARRAY_APPEND( 151 | shared_with, 152 | '$', 153 | shared_with_user 154 | ) -- JSON_ARRAY_APPEND(array, key, value) 155 | WHERE 156 | id = loop_count; 157 | END IF; 158 | SET loop_count = loop_count + 1; 159 | END WHILE; 160 | END // 161 | DELIMITER ; 162 | ``` 163 | -------------------------------------------------------------------------------- /terraform/tf-101.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | ## What is Terraform? 4 | 5 | It's an Infrastructure-As-Code tool. It allows for declaratively creating infrastructure on cloud providers, in colos, and even your homelab. There are extensions for practically everything you can think of; if you're missing one (and you know Golang), you can write it. 6 | 7 | ## What is declarative? 8 | 9 | Computer languages generally fall into one of two types - imperative, and declarative. Most are imperative, which means that you explicitly tell the language what to do. With a declarative language, you describe what you want, and it figures out how to get there. That makes it sound fancier and easier than it is; in reality, you have to describe in very specific terms what it is you want. 10 | 11 | Terraform is mostly declarative, with some recent nods to imperative programming such as for loops - prior to version 0.12, you had to define the `count` of a resource you wanted instantiated, and it would make `n` copies of it. 12 | 13 | # Terraform Basics 14 | 15 | ## Resources vs. Modules 16 | 17 | Broadly speaking, resources specifically instantiate a named resource, like an EC2 instance, or a DNS record, whereas a module generically defines those things - usually with default values assigned - and you can later call them, saving typing. You _can_ define your entire infrastructure solely with resources, but you'll be missing out on a huge advantage of Terraform. 18 | 19 | The (redacted) example module creates a Redis instance, a Postgres instance, security groups for both, and a Cloudwatch metric for Redis. Going further, looking at its `variables.tf` file, we see that there are quite a few options - the type and size of storage for the DB, the version of both Redis and Postgres, encryption and snapshot options, and more. For more information on what is required to be passed to the resource, you can consult Terraform's documentation - here is the [Elasticache (Redis) page.](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/elasticache_replication_group) 20 | 21 | ## Variables 22 | 23 | Anything with a `default` value will pass this into the Terraform module when it's called if it isn't overridden. For example, `aws_db_instance.rds.storage_type` has as its value `var.db_storage_type`, which instructs Terraform to look at the variable `db_storage_type` - it's set to `gp2`, so we don't have to change it. `db_allocated_storage` has as a default `100` (in GiB, which is helpfully displayed as the variable's description), but it's overridden in the calling module to `20`. 24 | 25 | You may have noticed that some of the variables defined have `type` set. Terraform is dynamically typed, but much like Python with mypy, allows for static typing if desired. 26 | 27 | Locals, seen at the top of `main.tf`, are just that - local variables to that file. They can be anywhere, but historically are placed at the top of the file. They're generally used as seen here, to write out what would otherwise be bulky code with ternaries into something cleaner for later use. They're referenced with `local.varname` instead of `var.varname`. 28 | 29 | Terraform underwent a large syntax change between v0.11 and v0.12. In 0.11, all variables were encased in the `"${var.foo}"` syntax you may see scattered around. That has been simplified to `var.foo` or `local.foo`, whichever is correct for the variable. The exception is for string interpolation - using variables along with plaintext (or concatenating strings without the use of the `join` function) requires all variables to be wrapped in `"${}"`. People comfortable with Bash programming will feel at home here. 30 | 31 | Modules may also include a `terraform.tfvars` file, which has a `key=value` mapping for variable assignment. These are often used to have production and staging versions of infrastructure. 32 | 33 | Variable definition precedence takes the following order, from first to last, with the latest definition standing: env vars --> `terraform.tfvars` --> `-var $foo` on the command line. In general, you'll want to mimic what you see in use in the repository. 34 | 35 | ## Functions 36 | 37 | Terraform includes many built-in functions. One of them seen here is `flatten`. [Here is Terraform's](https://www.terraform.io/language/functions/flatten) documentation on the function, but you may be able to guess that it's flattening lists or lists of lists into a single list. Read through the documentation to get an idea of the rest of them. 38 | 39 | ## Plans and Applies 40 | 41 | This is the main draw of Terraform. When you run `terraform plan`, it looks at the existing infrastructure, compares it to its statefile, and generates a human-readable diff. It also includes things that have changed outside of its scope (for example, if someone manually creates a database using the AWS console), and at the bottom, a summary saying how many entities will be created, changed, and destroyed. You can then save this plan and apply it later - this is what Atlantis does. Additionally, during this time the statefile is locked, so no other changes can be made. This ensures that your expected output is applied with no surprises due to someone else making a change at the same time. 42 | 43 | To destroy infrastructure, in general you'll delete the resource/module from the code, and then run a plan. Terraform will detect that it exists in infrastructure but not in code, and generate a plan to destroy it which you can apply. In practice, some resources have protection enabled that prevents destroys. To destroy them, you either have to do two plan/apply cycles (one to remove the deletion protection, and another to destroy the resource), or manually delete it from the AWS console or command line, and then run the plan/apply. You can see this in `main.tf` on L144, with an explanation comment block above it. 44 | 45 | Targeted applies (where you specifically instruct Terraform to only affect a specific resource) also exist, but these are rarely needed and shouldn't be relied upon. Similarly, you can import pre-existing resources into the statefile, although the syntax can be a bit confusing, and there are also occasional bizarre gotchas such as needing a region to be hard-coded in the infrastructure code. --------------------------------------------------------------------------------