├── .gitignore
├── LICENSE
├── k8s
    ├── deployment.yaml
    ├── echo
    │   ├── Dockerfile
    │   ├── echo.py
    │   ├── requirements.txt
    │   └── templates
    │   │   └── index.html
    ├── helm
    │   ├── Chart.yaml
    │   └── templates
    │   │   ├── NOTES.txt
    │   │   ├── deployment.yaml
    │   │   ├── ingress.yaml
    │   │   ├── namespace.yaml
    │   │   ├── rbac.yaml
    │   │   └── service.yaml
    ├── k8s-101.md
    └── k8s-102.md
├── mysql
    ├── mysql-101-0.md
    ├── mysql-101-1.md
    └── mysql-102.md
└── terraform
    └── tf-101.md


/.gitignore:
--------------------------------------------------------------------------------
1 | k8s/cert/*


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | Mozilla Public License Version 2.0
  2 | ==================================
  3 | 
  4 | 1. Definitions
  5 | --------------
  6 | 
  7 | 1.1. "Contributor"
  8 |     means each individual or legal entity that creates, contributes to
  9 |     the creation of, or owns Covered Software.
 10 | 
 11 | 1.2. "Contributor Version"
 12 |     means the combination of the Contributions of others (if any) used
 13 |     by a Contributor and that particular Contributor's Contribution.
 14 | 
 15 | 1.3. "Contribution"
 16 |     means Covered Software of a particular Contributor.
 17 | 
 18 | 1.4. "Covered Software"
 19 |     means Source Code Form to which the initial Contributor has attached
 20 |     the notice in Exhibit A, the Executable Form of such Source Code
 21 |     Form, and Modifications of such Source Code Form, in each case
 22 |     including portions thereof.
 23 | 
 24 | 1.5. "Incompatible With Secondary Licenses"
 25 |     means
 26 | 
 27 |     (a) that the initial Contributor has attached the notice described
 28 |         in Exhibit B to the Covered Software; or
 29 | 
 30 |     (b) that the Covered Software was made available under the terms of
 31 |         version 1.1 or earlier of the License, but not also under the
 32 |         terms of a Secondary License.
 33 | 
 34 | 1.6. "Executable Form"
 35 |     means any form of the work other than Source Code Form.
 36 | 
 37 | 1.7. "Larger Work"
 38 |     means a work that combines Covered Software with other material, in
 39 |     a separate file or files, that is not Covered Software.
 40 | 
 41 | 1.8. "License"
 42 |     means this document.
 43 | 
 44 | 1.9. "Licensable"
 45 |     means having the right to grant, to the maximum extent possible,
 46 |     whether at the time of the initial grant or subsequently, any and
 47 |     all of the rights conveyed by this License.
 48 | 
 49 | 1.10. "Modifications"
 50 |     means any of the following:
 51 | 
 52 |     (a) any file in Source Code Form that results from an addition to,
 53 |         deletion from, or modification of the contents of Covered
 54 |         Software; or
 55 | 
 56 |     (b) any new file in Source Code Form that contains any Covered
 57 |         Software.
 58 | 
 59 | 1.11. "Patent Claims" of a Contributor
 60 |     means any patent claim(s), including without limitation, method,
 61 |     process, and apparatus claims, in any patent Licensable by such
 62 |     Contributor that would be infringed, but for the grant of the
 63 |     License, by the making, using, selling, offering for sale, having
 64 |     made, import, or transfer of either its Contributions or its
 65 |     Contributor Version.
 66 | 
 67 | 1.12. "Secondary License"
 68 |     means either the GNU General Public License, Version 2.0, the GNU
 69 |     Lesser General Public License, Version 2.1, the GNU Affero General
 70 |     Public License, Version 3.0, or any later versions of those
 71 |     licenses.
 72 | 
 73 | 1.13. "Source Code Form"
 74 |     means the form of the work preferred for making modifications.
 75 | 
 76 | 1.14. "You" (or "Your")
 77 |     means an individual or a legal entity exercising rights under this
 78 |     License. For legal entities, "You" includes any entity that
 79 |     controls, is controlled by, or is under common control with You. For
 80 |     purposes of this definition, "control" means (a) the power, direct
 81 |     or indirect, to cause the direction or management of such entity,
 82 |     whether by contract or otherwise, or (b) ownership of more than
 83 |     fifty percent (50%) of the outstanding shares or beneficial
 84 |     ownership of such entity.
 85 | 
 86 | 2. License Grants and Conditions
 87 | --------------------------------
 88 | 
 89 | 2.1. Grants
 90 | 
 91 | Each Contributor hereby grants You a world-wide, royalty-free,
 92 | non-exclusive license:
 93 | 
 94 | (a) under intellectual property rights (other than patent or trademark)
 95 |     Licensable by such Contributor to use, reproduce, make available,
 96 |     modify, display, perform, distribute, and otherwise exploit its
 97 |     Contributions, either on an unmodified basis, with Modifications, or
 98 |     as part of a Larger Work; and
 99 | 
100 | (b) under Patent Claims of such Contributor to make, use, sell, offer
101 |     for sale, have made, import, and otherwise transfer either its
102 |     Contributions or its Contributor Version.
103 | 
104 | 2.2. Effective Date
105 | 
106 | The licenses granted in Section 2.1 with respect to any Contribution
107 | become effective for each Contribution on the date the Contributor first
108 | distributes such Contribution.
109 | 
110 | 2.3. Limitations on Grant Scope
111 | 
112 | The licenses granted in this Section 2 are the only rights granted under
113 | this License. No additional rights or licenses will be implied from the
114 | distribution or licensing of Covered Software under this License.
115 | Notwithstanding Section 2.1(b) above, no patent license is granted by a
116 | Contributor:
117 | 
118 | (a) for any code that a Contributor has removed from Covered Software;
119 |     or
120 | 
121 | (b) for infringements caused by: (i) Your and any other third party's
122 |     modifications of Covered Software, or (ii) the combination of its
123 |     Contributions with other software (except as part of its Contributor
124 |     Version); or
125 | 
126 | (c) under Patent Claims infringed by Covered Software in the absence of
127 |     its Contributions.
128 | 
129 | This License does not grant any rights in the trademarks, service marks,
130 | or logos of any Contributor (except as may be necessary to comply with
131 | the notice requirements in Section 3.4).
132 | 
133 | 2.4. Subsequent Licenses
134 | 
135 | No Contributor makes additional grants as a result of Your choice to
136 | distribute the Covered Software under a subsequent version of this
137 | License (see Section 10.2) or under the terms of a Secondary License (if
138 | permitted under the terms of Section 3.3).
139 | 
140 | 2.5. Representation
141 | 
142 | Each Contributor represents that the Contributor believes its
143 | Contributions are its original creation(s) or it has sufficient rights
144 | to grant the rights to its Contributions conveyed by this License.
145 | 
146 | 2.6. Fair Use
147 | 
148 | This License is not intended to limit any rights You have under
149 | applicable copyright doctrines of fair use, fair dealing, or other
150 | equivalents.
151 | 
152 | 2.7. Conditions
153 | 
154 | Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted
155 | in Section 2.1.
156 | 
157 | 3. Responsibilities
158 | -------------------
159 | 
160 | 3.1. Distribution of Source Form
161 | 
162 | All distribution of Covered Software in Source Code Form, including any
163 | Modifications that You create or to which You contribute, must be under
164 | the terms of this License. You must inform recipients that the Source
165 | Code Form of the Covered Software is governed by the terms of this
166 | License, and how they can obtain a copy of this License. You may not
167 | attempt to alter or restrict the recipients' rights in the Source Code
168 | Form.
169 | 
170 | 3.2. Distribution of Executable Form
171 | 
172 | If You distribute Covered Software in Executable Form then:
173 | 
174 | (a) such Covered Software must also be made available in Source Code
175 |     Form, as described in Section 3.1, and You must inform recipients of
176 |     the Executable Form how they can obtain a copy of such Source Code
177 |     Form by reasonable means in a timely manner, at a charge no more
178 |     than the cost of distribution to the recipient; and
179 | 
180 | (b) You may distribute such Executable Form under the terms of this
181 |     License, or sublicense it under different terms, provided that the
182 |     license for the Executable Form does not attempt to limit or alter
183 |     the recipients' rights in the Source Code Form under this License.
184 | 
185 | 3.3. Distribution of a Larger Work
186 | 
187 | You may create and distribute a Larger Work under terms of Your choice,
188 | provided that You also comply with the requirements of this License for
189 | the Covered Software. If the Larger Work is a combination of Covered
190 | Software with a work governed by one or more Secondary Licenses, and the
191 | Covered Software is not Incompatible With Secondary Licenses, this
192 | License permits You to additionally distribute such Covered Software
193 | under the terms of such Secondary License(s), so that the recipient of
194 | the Larger Work may, at their option, further distribute the Covered
195 | Software under the terms of either this License or such Secondary
196 | License(s).
197 | 
198 | 3.4. Notices
199 | 
200 | You may not remove or alter the substance of any license notices
201 | (including copyright notices, patent notices, disclaimers of warranty,
202 | or limitations of liability) contained within the Source Code Form of
203 | the Covered Software, except that You may alter any license notices to
204 | the extent required to remedy known factual inaccuracies.
205 | 
206 | 3.5. Application of Additional Terms
207 | 
208 | You may choose to offer, and to charge a fee for, warranty, support,
209 | indemnity or liability obligations to one or more recipients of Covered
210 | Software. However, You may do so only on Your own behalf, and not on
211 | behalf of any Contributor. You must make it absolutely clear that any
212 | such warranty, support, indemnity, or liability obligation is offered by
213 | You alone, and You hereby agree to indemnify every Contributor for any
214 | liability incurred by such Contributor as a result of warranty, support,
215 | indemnity or liability terms You offer. You may include additional
216 | disclaimers of warranty and limitations of liability specific to any
217 | jurisdiction.
218 | 
219 | 4. Inability to Comply Due to Statute or Regulation
220 | ---------------------------------------------------
221 | 
222 | If it is impossible for You to comply with any of the terms of this
223 | License with respect to some or all of the Covered Software due to
224 | statute, judicial order, or regulation then You must: (a) comply with
225 | the terms of this License to the maximum extent possible; and (b)
226 | describe the limitations and the code they affect. Such description must
227 | be placed in a text file included with all distributions of the Covered
228 | Software under this License. Except to the extent prohibited by statute
229 | or regulation, such description must be sufficiently detailed for a
230 | recipient of ordinary skill to be able to understand it.
231 | 
232 | 5. Termination
233 | --------------
234 | 
235 | 5.1. The rights granted under this License will terminate automatically
236 | if You fail to comply with any of its terms. However, if You become
237 | compliant, then the rights granted under this License from a particular
238 | Contributor are reinstated (a) provisionally, unless and until such
239 | Contributor explicitly and finally terminates Your grants, and (b) on an
240 | ongoing basis, if such Contributor fails to notify You of the
241 | non-compliance by some reasonable means prior to 60 days after You have
242 | come back into compliance. Moreover, Your grants from a particular
243 | Contributor are reinstated on an ongoing basis if such Contributor
244 | notifies You of the non-compliance by some reasonable means, this is the
245 | first time You have received notice of non-compliance with this License
246 | from such Contributor, and You become compliant prior to 30 days after
247 | Your receipt of the notice.
248 | 
249 | 5.2. If You initiate litigation against any entity by asserting a patent
250 | infringement claim (excluding declaratory judgment actions,
251 | counter-claims, and cross-claims) alleging that a Contributor Version
252 | directly or indirectly infringes any patent, then the rights granted to
253 | You by any and all Contributors for the Covered Software under Section
254 | 2.1 of this License shall terminate.
255 | 
256 | 5.3. In the event of termination under Sections 5.1 or 5.2 above, all
257 | end user license agreements (excluding distributors and resellers) which
258 | have been validly granted by You or Your distributors under this License
259 | prior to termination shall survive termination.
260 | 
261 | ************************************************************************
262 | *                                                                      *
263 | *  6. Disclaimer of Warranty                                           *
264 | *  -------------------------                                           *
265 | *                                                                      *
266 | *  Covered Software is provided under this License on an "as is"       *
267 | *  basis, without warranty of any kind, either expressed, implied, or  *
268 | *  statutory, including, without limitation, warranties that the       *
269 | *  Covered Software is free of defects, merchantable, fit for a        *
270 | *  particular purpose or non-infringing. The entire risk as to the     *
271 | *  quality and performance of the Covered Software is with You.        *
272 | *  Should any Covered Software prove defective in any respect, You     *
273 | *  (not any Contributor) assume the cost of any necessary servicing,   *
274 | *  repair, or correction. This disclaimer of warranty constitutes an   *
275 | *  essential part of this License. No use of any Covered Software is   *
276 | *  authorized under this License except under this disclaimer.         *
277 | *                                                                      *
278 | ************************************************************************
279 | 
280 | ************************************************************************
281 | *                                                                      *
282 | *  7. Limitation of Liability                                          *
283 | *  --------------------------                                          *
284 | *                                                                      *
285 | *  Under no circumstances and under no legal theory, whether tort      *
286 | *  (including negligence), contract, or otherwise, shall any           *
287 | *  Contributor, or anyone who distributes Covered Software as          *
288 | *  permitted above, be liable to You for any direct, indirect,         *
289 | *  special, incidental, or consequential damages of any character      *
290 | *  including, without limitation, damages for lost profits, loss of    *
291 | *  goodwill, work stoppage, computer failure or malfunction, or any    *
292 | *  and all other commercial damages or losses, even if such party      *
293 | *  shall have been informed of the possibility of such damages. This   *
294 | *  limitation of liability shall not apply to liability for death or   *
295 | *  personal injury resulting from such party's negligence to the       *
296 | *  extent applicable law prohibits such limitation. Some               *
297 | *  jurisdictions do not allow the exclusion or limitation of           *
298 | *  incidental or consequential damages, so this exclusion and          *
299 | *  limitation may not apply to You.                                    *
300 | *                                                                      *
301 | ************************************************************************
302 | 
303 | 8. Litigation
304 | -------------
305 | 
306 | Any litigation relating to this License may be brought only in the
307 | courts of a jurisdiction where the defendant maintains its principal
308 | place of business and such litigation shall be governed by laws of that
309 | jurisdiction, without reference to its conflict-of-law provisions.
310 | Nothing in this Section shall prevent a party's ability to bring
311 | cross-claims or counter-claims.
312 | 
313 | 9. Miscellaneous
314 | ----------------
315 | 
316 | This License represents the complete agreement concerning the subject
317 | matter hereof. If any provision of this License is held to be
318 | unenforceable, such provision shall be reformed only to the extent
319 | necessary to make it enforceable. Any law or regulation which provides
320 | that the language of a contract shall be construed against the drafter
321 | shall not be used to construe this License against a Contributor.
322 | 
323 | 10. Versions of the License
324 | ---------------------------
325 | 
326 | 10.1. New Versions
327 | 
328 | Mozilla Foundation is the license steward. Except as provided in Section
329 | 10.3, no one other than the license steward has the right to modify or
330 | publish new versions of this License. Each version will be given a
331 | distinguishing version number.
332 | 
333 | 10.2. Effect of New Versions
334 | 
335 | You may distribute the Covered Software under the terms of the version
336 | of the License under which You originally received the Covered Software,
337 | or under the terms of any subsequent version published by the license
338 | steward.
339 | 
340 | 10.3. Modified Versions
341 | 
342 | If you create software not governed by this License, and you want to
343 | create a new license for such software, you may create and use a
344 | modified version of this License if you rename the license and remove
345 | any references to the name of the license steward (except to note that
346 | such modified license differs from this License).
347 | 
348 | 10.4. Distributing Source Code Form that is Incompatible With Secondary
349 | Licenses
350 | 
351 | If You choose to distribute Source Code Form that is Incompatible With
352 | Secondary Licenses under the terms of this version of the License, the
353 | notice described in Exhibit B of this License must be attached.
354 | 
355 | Exhibit A - Source Code Form License Notice
356 | -------------------------------------------
357 | 
358 |   This Source Code Form is subject to the terms of the Mozilla Public
359 |   License, v. 2.0. If a copy of the MPL was not distributed with this
360 |   file, You can obtain one at http://mozilla.org/MPL/2.0/.
361 | 
362 | If it is not possible or desirable to put the notice in a particular
363 | file, then You may include the notice in a location (such as a LICENSE
364 | file in a relevant directory) where a recipient would be likely to look
365 | for such a notice.
366 | 
367 | You may add additional accurate notices of copyright ownership.
368 | 
369 | Exhibit B - "Incompatible With Secondary Licenses" Notice
370 | ---------------------------------------------------------
371 | 
372 |   This Source Code Form is "Incompatible With Secondary Licenses", as
373 |   defined by the Mozilla Public License, v. 2.0.
374 | 


--------------------------------------------------------------------------------
/k8s/deployment.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: Namespace
 3 | metadata:
 4 |   name: echo
 5 | ---
 6 | apiVersion: apps/v1
 7 | kind: Deployment
 8 | metadata:
 9 |   name: echo
10 |   namespace: echo
11 | spec:
12 |   selector:
13 |     matchLabels:
14 |       app: echo
15 |   replicas: 1
16 |   template:
17 |     metadata:
18 |       labels:
19 |         app: echo
20 |     spec:
21 |       containers:
22 |       - name: echo
23 |         image: localhost:5000/echo:latest
24 |         imagePullPolicy: Always
25 |         stdin: true
26 |         tty: true
27 | 


--------------------------------------------------------------------------------
/k8s/echo/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM python:3.10-alpine
 2 | 
 3 | WORKDIR /app
 4 | 
 5 | COPY . /app
 6 | 
 7 | RUN pip install -r requirements.txt
 8 | 
 9 | EXPOSE 8080
10 | 
11 | CMD ["python", "/app/echo.py"]
12 | 


--------------------------------------------------------------------------------
/k8s/echo/echo.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | from flask import Flask, request, render_template
 4 | 
 5 | app = Flask(__name__)
 6 | 
 7 | @app.route("/")
 8 | def form():
 9 |     return render_template("index.html")
10 | 
11 | @app.route("/", methods=["POST"])
12 | def form_post():
13 |     return request.form["echo_input"]
14 | 
15 | if __name__ == "__main__":
16 |     app.run(host="0.0.0.0", port=8080, debug=True)
17 | 


--------------------------------------------------------------------------------
/k8s/echo/requirements.txt:
--------------------------------------------------------------------------------
1 | flask
2 | 


--------------------------------------------------------------------------------
/k8s/echo/templates/index.html:
--------------------------------------------------------------------------------
 1 | <html>
 2 | <head>
 3 |   <title>Echo (echo...)</title>
 4 | </head>
 5 | <body>
 6 |     <h1>Echo (echo...)</h1>
 7 |     <form method="POST">
 8 |         <input type="text" name="echo_input" placeholder="Say something!">
 9 |         <input type="submit" value="Echo!">
10 |     </form>
11 | </body>


--------------------------------------------------------------------------------
/k8s/helm/Chart.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v2
2 | name: echo
3 | description: A Helm chart for a simple echo app
4 | version: 0.1.0
5 | appVersion: 0.1.0
6 | 


--------------------------------------------------------------------------------
/k8s/helm/templates/NOTES.txt:
--------------------------------------------------------------------------------
 1 | To access, please run the following command:
 2 | 
 3 |     sudo echo "$(minikube ip) echo.internal" >> /etc/hosts
 4 | 
 5 | Then go to http://echo.internal inyour browser.
 6 | 
 7 | To clean up, run the following command:
 8 | 
 9 |     sudo sed -i'' '$d' /etc/hosts
10 | 


--------------------------------------------------------------------------------
/k8s/helm/templates/deployment.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: apps/v1
 2 | kind: Deployment
 3 | metadata:
 4 |   name: echo
 5 |   namespace: echo
 6 | spec:
 7 |   selector:
 8 |     matchLabels:
 9 |       app: echo
10 |   replicas: 1
11 |   template:
12 |     metadata:
13 |       labels:
14 |         app: echo
15 |     spec:
16 |       containers:
17 |       - name: echo
18 |         image: localhost:5000/echo:latest
19 |         imagePullPolicy: Always
20 |         ports:
21 |           - containerPort: 8080
22 |         stdin: true
23 |         tty: true
24 |         resources:
25 |           limits:
26 |             cpu: 100m
27 |             memory: 128Mi
28 |           requests:
29 |             cpu: 50m
30 |             memory: 50Mi
31 | 


--------------------------------------------------------------------------------
/k8s/helm/templates/ingress.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: networking.k8s.io/v1
 2 | kind: Ingress
 3 | metadata:
 4 |   name: echo
 5 |   namespace: echo
 6 |   annotations:
 7 |     nginx.ingress.kubernetes.io/rewrite-target: /
 8 | spec:
 9 |   rules:
10 |   - host: echo.internal
11 |     http:
12 |       paths:
13 |       - path: /
14 |         pathType: Prefix
15 |         backend:
16 |           service:
17 |             name: echo
18 |             port:
19 |               number: 8080
20 | 


--------------------------------------------------------------------------------
/k8s/helm/templates/namespace.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: Namespace
3 | metadata:
4 |   name: echo
5 | 


--------------------------------------------------------------------------------
/k8s/helm/templates/rbac.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: rbac.authorization.k8s.io/v1
 2 | kind: Role
 3 | metadata:
 4 |   name: echo-rw
 5 |   namespace: echo
 6 | rules:
 7 |   - apiGroups: [""]
 8 |     resources: ["pods"]
 9 |     verbs: ["get", "list", "watch"]
10 |   - apiGroups: [""]
11 |     resources: ["pods/exec"]
12 |     verbs: ["create"]
13 | ---
14 | apiVersion: rbac.authorization.k8s.io/v1
15 | kind: RoleBinding
16 | metadata:
17 |   name: echo-rw
18 |   namespace: echo
19 | subjects:
20 | - kind: User
21 |   name: echo-user
22 |   apiGroup: rbac.authorization.k8s.io
23 | roleRef:
24 |   kind: Role
25 |   name: echo-rw
26 |   apiGroup: rbac.authorization.k8s.io
27 | 


--------------------------------------------------------------------------------
/k8s/helm/templates/service.yaml:
--------------------------------------------------------------------------------
 1 | apiVersion: v1
 2 | kind: Service
 3 | metadata:
 4 |   name: echo
 5 |   namespace: echo
 6 | spec:
 7 |   ports:
 8 |     - protocol: TCP
 9 |       port: 8080
10 |       targetPort: 8080
11 |   selector:
12 |     app: echo
13 |   type: NodePort
14 | 


--------------------------------------------------------------------------------
/k8s/k8s-101.md:
--------------------------------------------------------------------------------
   1 | # Introduction
   2 | 
   3 | ## What is Kubernetes?
   4 | 
   5 | Kubernetes is a container orchestration system. It manages the scheduling and execution of containers. Similar platforms exist like Docker Swarm, Apache Mesos, and Hashicorp Nomad. Kubernetes by far has the dominant market share, and is also the most complex out of those listed.
   6 | 
   7 | ## Kubernetes components
   8 | 
   9 | ### High level
  10 | * Cluster: a logical grouping of one or more nodes.
  11 | * Node: a server running Kubernetes - can be bare metal, a VM, or a container.
  12 | * Pod: a logical grouping of one or more containers.
  13 | * Container: the same as Docker.
  14 | 
  15 | #### Node roles
  16 | * Control plane: schedules pods, detects and responds to cluster events, maintains cluster state in a database.
  17 | * Worker: runs user-defined workloads via DaemonSets, StatefulSets, or Deployments.
  18 | 	* Note that while not recommended in production, for development purposes, a single-node cluster can serve as both of these roles.
  19 | 
  20 | ### Low[er] level
  21 | 
  22 | For a more thorough examination of Kubernetes components, [the official documentation](https://kubernetes.io/docs/concepts/overview/components/) is recommended. A brief overview of some components follows:
  23 | 
  24 | * kube-apiserver: handles requests to the API, typically via kubectl.
  25 | * etcd: a key/value store utilizing the Raft algorithm for consensus; frequently used as the store for Kubernetes cluster data.
  26 | 	* K3s (and thus K3d) uses an embedded SQLite database as its backing store by default; in general any database may be used, but in production etcd is the standard.
  27 | * kube-scheduler: assigns workloads to a node, constrained by resource limits, affinity/anti-affinity rules, etc.
  28 | * kubelet: an agent running on every node, ensuring that a Pod's containers are running.
  29 | 
  30 | ## Kubernetes distributions
  31 | 
  32 | Each cloud provider has their own - Amazon has EKS, Azure has AKS, Google has GKE, DigitalOcean has DOKS, etc. Vanilla Kubernetes can be installed [either manually](https://github.com/kelseyhightower/kubernetes-the-hard-way), or with a tool like [kubeadm](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/). Various distributions also exist, much like Linux distributions. [Minikube](https://minikube.sigs.k8s.io/docs/) is a popular way to bootstrap a single-node cluster for development in an existing operating system. K3d is based on [k3s](https://k3s.io/), which is a lightweight single-binary distribution of Kubernetes. Rancher Labs (owned by SuSE) also make a full single-purpose OS called [k3os](https://k3os.io/) which is designed to run k3s, and only k3s. A similar (albeit running vanilla Kubernetes) but even more extreme example is [Talos](https://www.talos.dev/), which is completely immutable, has no shell access, and allows access only via its API. Amazon has a similar offering called [Bottlerocket](https://aws.amazon.com/bottlerocket/).
  33 | 
  34 | In general, any Kubernetes distribution will be perfectly adequate for learning, and it comes down to personal preference. For production, there are arguments to be made for managed services like EKS, but that's beyond the scope of this document.
  35 | 
  36 | # Getting started
  37 | 
  38 | ## Install
  39 | 
  40 | ### Prerequisites:
  41 | 
  42 |  - Docker
  43 | 	 - There are many ways to do this, pick your favorite
  44 | 	 - If you're using Minikube, you can just `eval` its daemon
  45 |  - helm
  46 |    - `brew install helm`
  47 |  - kubectl
  48 | 	 - `brew install kubectl`
  49 |   - Optional:
  50 |      - `brew install hidetatz/tap/kubecolor`
  51 |   - Optional but please do it:
  52 |      - `brew install gnu-sed`
  53 | -	Add the following to your shell rc file:
  54 |     - `alias kubectl=kubecolor` (if kubecolor was installed)
  55 |     - `alias k=kubectl`
  56 |     - Install your shell's plugin for kubectl
  57 | 
  58 | ### Install and verification
  59 | 
  60 | #### M1 Macs (ARM)
  61 | 
  62 | Install Docker Desktop, and launch minikube as below, but with `--driver docker` instead. Additionally, skip all steps regarding using a registry, and whenever the image path is referenced, use these instead:
  63 | 
  64 |     # For the first part, which has no networking functionality
  65 |     stephangarland/echo:local
  66 |     # For the second part, which exposes a container port
  67 |     stephangarland/echo:web
  68 | 
  69 | #### Intel Macs (x86-64)
  70 | 
  71 | Install the `hyperkit` driver with `brew install hyperkit`, and then minikube with `brew install minikube`.
  72 | 
  73 | First, run `brew install minikube`. Then, run `minikube start` with a few options: `minikube start --memory 8GB --cpus 4 --driver hyperkit`. Assuming you have the memory and CPU to spare, this ensures we won't run into any backing hardware issues. `hyperkit` as the driver means we don't have to download anything additional to spin up the VM that runs the cluster. 
  74 | 
  75 |     ❯ minikube start --memory 8GB --cpus 4 --driver hyperkit
  76 |     😄  minikube v1.25.2 on Darwin 11.6.5
  77 |         ▪ KUBECONFIG=/Users/sgarland/.kube/.switch_tmp/config.1541775917.tmp
  78 |         ▪ MINIKUBE_ACTIVE_DOCKERD=minikube
  79 |     ✨  Using the hyperkit driver based on user configuration
  80 |     👍  Starting control plane node minikube in cluster minikube
  81 |     🔥  Creating hyperkit VM (CPUs=4, Memory=8192MB, Disk=20000MB) ...
  82 |     🐳  Preparing Kubernetes v1.23.3 on Docker 20.10.12 ...
  83 |         ▪ kubelet.housekeeping-interval=5m
  84 |         ▪ Generating certificates and keys ...
  85 |         ▪ Booting up control plane ...
  86 |         ▪ Configuring RBAC rules ...
  87 |     🔎  Verifying Kubernetes components...
  88 |         ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
  89 |     🌟  Enabled addons: storage-provisioner, default-storageclass
  90 |     🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
  91 | 
  92 | Next, we'll enable the registry addon:
  93 | 
  94 |     ❯ minikube addons enable registry
  95 |         ▪ Using image registry:2.7.1
  96 |         ▪ Using image gcr.io/google_containers/kube-registry-proxy:0.4
  97 |     🔎  Verifying registry addon...
  98 |     🌟  The 'registry' addon is enabled
  99 | 
 100 | Also, since we'll need it later, let's enable the ingress addon now:
 101 | 
 102 |     ❯ minikube addons enable ingress
 103 |         ▪ Using image k8s.gcr.io/ingress-nginx/controller:v1.1.1
 104 |         ▪ Using image k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.1.1
 105 |         ▪ Using image k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.1.1
 106 |     🔎  Verifying ingress addon...
 107 |     🌟  The 'ingress' addon is enabled
 108 | 
 109 | Next, if you don't already have the docker daemon (hint: does `docker version` return anything?), we'll hook into Minikube's:
 110 | 
 111 |     eval $(minikube -p minikube docker-env)
 112 | 
 113 | Finally, we need to modify networking a little bit using `socat` to get the registry to listen to our local docker daemon:
 114 | 
 115 |     ❯ docker run --rm -it -d --network=host alpine ash -c "apk add socat && socat TCP-LISTEN:5000,reuseaddr,fork TCP:$(minikube ip):5000"
 116 |     Unable to find image 'alpine:latest' locally
 117 |     latest: Pulling from library/alpine
 118 |     df9b9388f04a: Already exists
 119 |     Digest: sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454
 120 |     Status: Downloaded newer image for alpine:latest
 121 |     fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/main/x86_64/APKINDEX.tar.gz
 122 |     fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/community/x86_64/APKINDEX.tar.gz
 123 |     (1/4) Installing ncurses-terminfo-base (6.3_p20211120-r0)
 124 |     (2/4) Installing ncurses-libs (6.3_p20211120-r0)
 125 |     (3/4) Installing readline (8.1.1-r0)
 126 |     (4/4) Installing socat (1.7.4.2-r0)
 127 |     Executing busybox-1.34.1-r5.trigger
 128 |     OK: 7 MiB in 18 packages
 129 | 
 130 | Let's verify the cluster:
 131 | 
 132 | 	❯ kubectl get nodes
 133 |     NAME       STATUS   ROLES                  AGE     VERSION
 134 |     minikube   Ready    control-plane,master   6m26s   v1.23.3
 135 | 
 136 | For more detail, use `describe`. There's a lot here, but I'll highlight some pertinent information.
 137 | 
 138 | 	❯ kubectl describe nodes
 139 | 	Name:               minikube
 140 | 	Roles:              control-plane,master
 141 | 	Labels:             beta.kubernetes.io/arch=amd64
 142 | 	...
 143 | 	Capacity:
 144 |     cpu:                4
 145 |     ephemeral-storage:  17784752Ki
 146 |     hugepages-2Mi:      0
 147 |     memory:             8161900Ki
 148 |     pods:               110
 149 | 	...
 150 | 	Events:
 151 | 	  Type    Reason                   Age                    From        Message
 152 |     ----    ------                   ----                   ----        -------
 153 |     Normal  Starting                 6m28s                  kube-proxy
 154 |     Normal  NodeHasSufficientMemory  6m52s (x5 over 6m52s)  kubelet     Node minikube status is now: NodeHasSufficientMemory
 155 |     Normal  NodeHasNoDiskPressure    6m52s (x5 over 6m52s)  kubelet     Node minikube status is now: NodeHasNoDiskPressure
 156 |     Normal  NodeHasSufficientPID     6m52s (x4 over 6m52s)  kubelet     Node minikube status is now: NodeHasSufficientPID
 157 |     Normal  Starting                 6m42s                  kubelet     Starting kubelet.
 158 |     Normal  NodeHasNoDiskPressure    6m42s                  kubelet     Node minikube status is now:   NodeHasNoDiskPressure
 159 |     Normal  NodeHasSufficientPID     6m42s                  kubelet     Node minikube status is now: NodeHasSufficientPID
 160 |     Normal  NodeNotReady             6m42s                  kubelet     Node minikube status is now: NodeNotReady
 161 |     Normal  NodeAllocatableEnforced  6m42s                  kubelet     Updated Node Allocatable limit across pods
 162 |     Normal  NodeHasSufficientMemory  6m42s                  kubelet     Node minikube status is now: NodeHasSufficientMemory
 163 |     Normal  NodeReady                6m31s                  kubelet     Node minikube status is now: NodeReady
 164 |   
 165 | At the top, we can see the name, role, labels, and annotations. The name is self-explanatory. The role here is showing two - control-plane, and master. These are the same thing, and are in parallel since Kubernetes v1.20. `master` is being deprecated in favor of `control-plane` and will be fully removed in a future release. The purpose and limitations of this, along with taints, will be discussed later. Labels are key/value pairs that can be arbitrarily applied, but usually carry semantic meaning for either the user or an application. For example, `kubernetes.io/arch=amd64` tells us that this node has `amd64` architecture. Clusters can be of mixed architecture, so it's good to be able to easily tell apart `x86` and `arm` nodes for scheduling purposes.
 166 | 
 167 | Let's label the node, for fun:
 168 | 
 169 | 	  ❯ kubectl label node --all "my.name.is=$(whoami)"
 170 | 	  node/minikube labeled
 171 | Using the `--all` flag applies it to all nodes; without it, you'd need to add the node's name (`k3d-sgarland-cluster-server-0` for me).
 172 | 
 173 | We can then see the new label with the `--show-labels `flag:
 174 | 
 175 | 	  ❯ kubectl get nodes --show-labels
 176 | 	  NAME       STATUS   ROLES                  AGE     VERSION   LABELS
 177 |     minikube   Ready    control-plane,master   9m32s   v1.23.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=minikube,kubernetes.io/os=linux,minikube.k8s.io/commit=362d5fdc0a3dbee389b3d3f1034e8023e72bd3a7,minikube.k8s.io/name=minikube,minikube.k8s.io/primary=true,minikube.k8s.io/updated_at=2022_05_06T14_14_05_0700,minikube.k8s.io/version=v1.25.2,my.name.is=sgarland,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
 178 | 
 179 | Unfortunately it's a bit messy in the default comma-separated form, but should be able to spot your change in there. To delete the label, the syntax is somewhat confusing; you use the key and a `-` sign to indicate that it should be removed:
 180 | 
 181 |     ❯ kubectl label node --all "my.name.is-"
 182 |     node/minikube labeled
 183 | 
 184 | Next is Capacity. We can see things like the amount of ephemeral storage and memory available (16 GB and 8 GB, respectively) to the cluster, as well as allocatable pods. 110 pods is not actually resource-based, but networking - with a `/24` block being assigned to each node, there are 256 addresses available. With slightly over double the amount of addresses than the maximum number of pods, this reduces IP address reuse as pods come and go from the node.
 185 | 
 186 | Finally, Events. In this section, the kubelet reports the status of the node, here showing that it has sufficient memory, disk, PID, and is ready.
 187 | 
 188 | # Exploration
 189 | 
 190 | ## Imperative vs Declarative
 191 | 
 192 | Ideally, everything is maintained in code, and changes are made with some form of state management, be it ArgoCD, Flux, or others. Less optimally, you can issue commands with `kubectl apply`, which reads your input file and compares it to existing, then makes changes. Even less optimally, you can directly issue `kubectl` commands.
 193 | 
 194 | ## Kubectl verbs
 195 | 
 196 | So far we've used a few - `get`, `describe`, and `label`. Kubernetes [loosely follows HTTP verbs](https://kubernetes.io/docs/reference/access-authn-authz/authorization/#determine-the-request-verb), with some extras thrown in. One important note is that since you are directly communicating with the API, there are no warnings for destructive actions. If you tell it to delete a Persistent Volume, it will do so (with some exceptions for finalizers).
 197 | 
 198 | ## Create a deployment
 199 | 
 200 | Let's deploy a simple application. If you have a small Dockerized app you'd like to run you're welcome to use it here, but otherwise, we'll focus on this simple echo app that echoes the user's input.
 201 | 
 202 | ### Building the application
 203 | Use your own, or copy/paste this into a shell to write them to `echo.py` and `Dockerfile`, respectively.
 204 | 
 205 |     cat << EOF > echo.py
 206 |     #!/usr/bin/env python
 207 | 
 208 | 	def main():
 209 | 	    while True:
 210 | 	        user_input = input("Hi, say something, or type 'quit' to quit: ")
 211 | 	        if user_input == "quit":
 212 | 	            break
 213 | 	        else:
 214 | 	            print(user_input)
 215 | 
 216 | 	if __name__ == "__main__":
 217 | 	    main()
 218 | 	
 219 | 	EOF
 220 | 
 221 | 
 222 | ---
 223 |     cat << EOF > Dockerfile
 224 | 	FROM python:3.10-alpine
 225 | 	
 226 | 	WORKDIR /app
 227 | 	
 228 | 	COPY ./echo.py /app/echo.py
 229 | 	
 230 | 	CMD ["python", "/app/echo.py"]
 231 | 	
 232 | 	EOF
 233 | 
 234 | Then, build it:
 235 | 
 236 |     docker build -t echo .
 237 | 
 238 | To test that it works, you can use:
 239 | 
 240 |     docker run --rm -i --name echo echo
 241 |     
 242 | Bonus question: what does the `-i` flag do, and what happens if you neglect to include it here?
 243 | 
 244 | ### Writing a Deployment
 245 | A Deployment is a basic Kubernetes structure, which allows you to define a workload in the form of a Pod, which should run with n replicas. Via the kubelet, a Deployment can ensure that an app is restarted if it fails, is reachable (assuming you've set up liveness and readiness probes), and more.
 246 | 
 247 | #### YAML
 248 |     cat << EOF > deployment.yaml
 249 |     apiVersion: apps/v1
 250 | 	kind: Deployment
 251 | 	metadata:
 252 | 	  name: echo
 253 | 	spec:
 254 | 	  selector:
 255 | 	    matchLabels:
 256 | 	      app: echo
 257 | 	  replicas: 1
 258 | 	  template:
 259 | 	    metadata:
 260 | 	      labels:
 261 | 	        app: echo
 262 | 	    spec:
 263 | 	      containers:
 264 | 	      - name: echo
 265 | 	        image: localhost:5000/echo:latest
 266 | 	        imagePullPolicy: Always
 267 | 	        stdin: true
 268 | 	        tty: true
 269 | 	
 270 | 	EOF
 271 | 
 272 | Let's break down what's going on here, line by excruciating line:
 273 | 
 274 |     # This refers to a specific API version for the code
 275 |     # that follows - these are regularly updated and
 276 |     # deprecated, but you're warned well in advance
 277 |     apiVersion: apps/v1
 278 |     
 279 |     # This specifies what it is you're defining - could
 280 |     # also be a StatefulSet, an Ingress, a Service, etc.
 281 |     kind: Deployment
 282 |     
 283 |     # You can put multiple things here; the two most
 284 |     # common are the name of the application, and
 285 |     # a namespace in which to install it
 286 |     metadata:
 287 |       name: echo
 288 |     
 289 |     # This tells the Deployment what application
 290 |     # it should manage - in this case, it's looking
 291 |     # for those with the label `app: echo`
 292 |     spec:
 293 |       selector:
 294 |         matchlabels:
 295 |           app: echo
 296 |       # The number of replicas to deploy - note that
 297 |       # this is even with `spec.selector`, and like Python,
 298 |       # whitespace is extremely important
 299 |       replicas: 1
 300 | 
 301 |       # This gives the Pods a template to apply
 302 |       # In this case, the label `app: echo`
 303 |       template:
 304 |         metadata:
 305 |           labels:
 306 |             app: echo
 307 |         # Now we define the Pod's containers - note that this
 308 |         # is even with `template.metadata`, as it is part
 309 |         # of the template
 310 |         spec:
 311 |           containers:
 312 |           # The name of your application
 313 |           - name: echo
 314 |             # The image, optionally as a FQDN
 315 |             # If not specified as a FQDN, it will first 
 316 |             # be searched for locally, and then on Dockerhub
 317 |             image: localhost:5000/echo:latest
 318 |             # When to pull - can also use Never or IfNotPresent
 319 |             imagePullPolicy: IfNotPresent
 320 |             # Technically only stdin is needed, but
 321 |             # if you don't also give it a psuedo-TTY
 322 |             # it will complain (but still run) when
 323 |             # you attach to the container
 324 |             stdin: true
 325 |             tty: true
 326 | 
 327 | ### Applying the Deployment
 328 | 
 329 | #### Pushing the build
 330 | But first, we have to tag and push to our registry. Note that if you're using an M1 Mac, you will not do this, but can substitute in `docker pull` commands to verify that the images are available for you, e.g. `docker pull stephangarland/echo:web`
 331 | 
 332 |     docker tag echo:latest localhost:5000/echo:latest
 333 | ---
 334 |     docker push localhost:5000/echo:latest
 335 | ---
 336 |     The push refers to repository [localhost:5000/echo]
 337 |     43358167f05b: Layer already exists
 338 |     96568c21d3ac: Layer already exists
 339 |     b02dd59d34c0: Layer already exists
 340 |     0b800261971d: Layer already exists
 341 |     16e3ab2d4dee: Layer already exists
 342 |     fbd7d5451c69: Layer already exists
 343 |     4fc242d58285: Layer already exists
 344 |     latest: digest: sha256:36450f0ec0febf8daf800f24ab81363211dc52dd6bfc3e50d5d54c508f8d89ed size: 1782
 345 | 
 346 | #### Deploy!
 347 | As stated, there are far better ways to deploy applications, but this is the most basic, and gives the most insight into what Kubernetes is doing to get your app running.
 348 | 
 349 | If you run all of these in quick succession, you should see the following:
 350 | 
 351 |     ❯ kubectl apply -f deployment.yaml
 352 |     deployment.apps/echo created
 353 | 
 354 |     ❯ kubectl get deployments
 355 |     NAME   READY   UP-TO-DATE   AVAILABLE   AGE
 356 |     echo   0/1     1            0           1s
 357 | 
 358 |     ❯ kubectl get pods
 359 |     NAME                    READY   STATUS              RESTARTS   AGE
 360 |     echo-746cdbd89c-hrzds   0/1     ContainerCreating   0          2s
 361 | 
 362 | Once the pod creates and deploys (which for this, takes a very short amount of time), the latter two commands should show this:
 363 | 
 364 |     ❯ kubectl get deployments
 365 |     NAME   READY   UP-TO-DATE   AVAILABLE   AGE
 366 |     echo   1/1     1            1           2m6s
 367 |     
 368 |     ❯ kubectl get pods
 369 |     NAME                    READY   STATUS    RESTARTS   AGE
 370 |     echo-746cdbd89c-hrzds   1/1     Running   0          2m35s
 371 | 
 372 | ### Exploring the Deployment
 373 | Let's apply some of the verbs available to us.
 374 | 
 375 | #### Attach
 376 |     ❯ kubectl attach -i echo-746cdbd89c-hrzds
 377 |     If you don't see a command prompt, try pressing enter.
 378 | 
 379 | 
 380 |     Hi, say something, or type 'quit' to quit: Hello!
 381 |     Hello!
 382 |     Hi, say something, or type 'quit' to quit: quit
 383 |     Session ended, resume using 'kubectl attach echo-746cdbd89c-hrzds -c echo -i -t' command when the pod is running
 384 | 
 385 | `attach` lets us attach to a container's default process, which in this case, is our app.
 386 | 
 387 | #### Exec
 388 | 
 389 | *Note: This early termination may not occur.*
 390 | 
 391 | You could also use `exec` to get a shell into the pod, like this:
 392 | 
 393 |     ❯ kubectl exec -it echo-74bf7cdf5c-9rhxd -- sh
 394 |     Error from server (NotFound): pods "echo-74bf7cdf5c-9rhxd" not found
 395 | 
 396 | #### Describe
 397 | What's this? Our pod went away already? Let's `describe` the new pod to see why.
 398 | 
 399 |     ❯ kubectl describe pod echo-746cdbd89c-hrzds
 400 |     Name:         echo-746cdbd89c-hrzds
 401 |     Namespace:    default
 402 |     ... (not shown for conciseness)
 403 |     Containers:
 404 |       echo:
 405 |         ...
 406 |          Last State:     Terminated
 407 |          Reason:       Completed
 408 |          Exit Code:    0
 409 |          Started:      Fri, 25 Mar 2022 15:03:57 -0500
 410 |          Finished:     Fri, 25 Mar 2022 15:05:24 -0500
 411 | 
 412 | Ah, there we are - since our program runs in a loop until it receives `quit` as input, once that was passed, the program exited. The kubelet noticed that the deployment no longer had a running pod, and spawned a new one.
 413 | 
 414 | #### Get
 415 | 
 416 | We can see this if we `get` pods:
 417 | 
 418 |     ❯ kubectl get pods
 419 |     NAME                    READY   STATUS    RESTARTS      AGE
 420 |     echo-746cdbd89c-hrzds   1/1     Running   1 (62s ago)   8m
 421 | 
 422 | #### Exec (again)
 423 | Now let's exec into the pod.
 424 | 
 425 |     ❯ kubectl exec -it echo-746cdbd89c-hrzds -- sh
 426 |     /app # ls
 427 |     echo.py
 428 |     /app # python echo.py
 429 |     Hi, say something, or type 'quit' to quit: Hello
 430 |     Hello
 431 |     Hi, say something, or type 'quit' to quit: quit
 432 |     /app #
 433 | Note that here, quitting the app didn't kill the pod - that's because we spawned a new shell to exec into, and created a new instance of the app. Look at what's running:
 434 | 
 435 |     /app # ps
 436 |     PID   USER     TIME  COMMAND
 437 |     1 root      0:00 python /app/echo.py
 438 |     27 root      0:00 sh
 439 |     41 root      0:00 ps
 440 | 
 441 | Our app is running as the `init` process, PID 1. Kill it and watch what happens. Just kidding - `init` traps most `kill` signals in reasonable *nix distributions for good reason; but you can send it `INT` aka `2` if you'd like to see what happens (you could also kill the shell, if you'd like).
 442 | 
 443 | #### Delete
 444 | 
 445 | This is how you canonically restart a pod, in case you weren't aware.
 446 | 
 447 |     ❯ kubectl delete pod -l app=echo
 448 |     pod "echo-746cdbd89c-hrzds" deleted
 449 |   
 450 | What's this `-l` flag? Why didn't we have to specify the entire name? Welcome to selectors - also available with their longhand flag, `--selector`. Remember the `template.metadata.labels.app` we assigned to the Deployment? That's how this is finding it.
 451 | And we can see that we now have a new pod, thanks to the Deployment: 
 452 | 
 453 |     ❯ kubectl get pods
 454 |     NAME                    READY   STATUS    RESTARTS   AGE
 455 |     echo-746cdbd89c-x9k2m   1/1     Running   0          36s
 456 | 
 457 | ### Scaling workloads
 458 | 
 459 | If you have a given workload, be it a Deployment or StatefulSet, you can horizontally scale it using the command `kubectl scale`, and the flag `--replicas`. Go ahead and scale ours up to, say, 3 replicas:
 460 | 
 461 |     ❯ kubectl scale deployment echo --replicas=3
 462 |     deployment.apps/echo scaled
 463 | 
 464 | Now let's look at our deployment (if you aren't quick, you might just see 3/3 ready, but that's OK):
 465 | 
 466 |     ❯ kubectl get deployments
 467 |     NAME   READY   UP-TO-DATE   AVAILABLE   AGE
 468 |     echo   1/3     3            1           68m
 469 | 
 470 | Once the pods are all up, this will change to 3/3 ready.
 471 | 
 472 |     ❯ kubectl get pods
 473 |     NAME                    READY   STATUS    RESTARTS   AGE
 474 |     echo-746cdbd89c-8v5qb   1/1     Running   0          3s
 475 |     echo-746cdbd89c-ns9kk   1/1     Running   0          3s
 476 |     echo-746cdbd89c-x9k2m   1/1     Running   0          26m
 477 | 
 478 | Of note, all this time we haven't been specifying a deployment (or pod) for `get`, which is fine since we're only running the one. If this were a real cluster, though, there would likely be many deployments and pods, and we'd want to be more specific:
 479 | 
 480 |     ❯ kubectl get deployment echo
 481 |     NAME   READY   UP-TO-DATE   AVAILABLE   AGE
 482 |     echo   3/3     3            3           71m
 483 | 
 484 | ### Scaling (down) workloads
 485 | 
 486 | To horizontally scale to zero, AKA delete the pods and prevent them from coming back, use `--replicas` again, but specify 0 pods: `--replicas=0`. Alternately, if you want to completely get rid of the deployment, use either `kubectl delete deployment/echo` (imperative) or `kubectl delete -f deployment.yaml` (declarative). With the latter, kubectl is reading the deployment manifest we wrote, and removing it.
 487 | 
 488 | Either way, once done, we can verify that it's gone:
 489 | 
 490 |     ❯ kubectl get deployment
 491 |     No resources found in default namespace.
 492 | 
 493 | ## Namespaces
 494 | 
 495 | We've briefly mentioned namespaces so far, but all the work has been done in the default namespace. This is generally a bad idea - namespaces are a way of organizing and restricting resources. We can limit a given namespace to X CPUs and Y memory, we can restrict the rights of workloads inside that namespace, and it makes it easier to keep track of things when running various `kubectl` commands if it's scoped to a namespace.
 496 | 
 497 | Let's create one imperatively and use it, and then create another declaratively.
 498 | 
 499 |     ❯ kubectl create namespace echo
 500 |     namespace/echo created
 501 | 
 502 | Now let's deploy our echo app in the new namespace:
 503 | 
 504 |     ❯ kubectl apply -f deployment.yaml -n echo
 505 |     deployment.apps/echo created
 506 | 
 507 | Note that the pod isn't in the default namespace anymore:
 508 | 
 509 |     ❯ kubectl get pods
 510 |     No resources found in default namespace.
 511 | 
 512 |     ❯ kubectl get pods -n echo
 513 |     NAME                   READY   STATUS    RESTARTS   AGE
 514 |     echo-d97d96459-s2bvk   1/1     Running   0          79s
 515 | 
 516 | Now, let's delete it and then do it again declaratively (delete the deployment however you'd like, as described earlier).
 517 | 
 518 |     sed -i '/^spec:/i \ \ namespace: echo' deployment.yaml
 519 | 
 520 | This adds a properly spaced `.metadata.namespace` line to the deployment manifest, looking for the target `^spec:` line and then going immediately before that.
 521 | 
 522 |     ❯ kubectl get pods -n echo
 523 |     NAME                   READY   STATUS    RESTARTS   AGE
 524 |     echo-d97d96459-9l2jw   1/1     Running   0          4s
 525 | 
 526 | There's our pod! What if we wanted to declaratively create the namespace, as well? Let's delete the namespace, which will also delete the deployment (not recommended in prod due to finalizers, but for this example it's fine):
 527 | 
 528 |     ❯ kubectl delete namespace echo
 529 |     namespace/echo deleted
 530 | 
 531 |   ---
 532 |     ex deployment.yaml <<EOF
 533 |     1 insert
 534 |     apiVersion: v1
 535 |     kind: Namespace
 536 |     metadata:
 537 |       name: echo
 538 |     ---
 539 |     .
 540 |     xit
 541 |     EOF
 542 | 
 543 | You can simply insert everything from `apiVersion` to `EOF` into the file if you'd like, but this is a mildly interesting way to insert arbitrary text into the beginning of a file without [explicitly] using a temporary file. It has to be inserted at the beginning because otherwise when we apply it, the deployment will fail since the namespace doesn't exist yet (but it would succeed with a second `apply` command). If you think this is annoying, you're right, and there are ways around it.
 544 | 
 545 | Applying the deployment now creates both the namespace, and deploys to it:
 546 | 
 547 |     ❯ kubectl apply -f deployment.yaml
 548 |     namespace/echo created
 549 |     deployment.apps/echo created
 550 | 
 551 | ## Helm
 552 | 
 553 | This is all terrifically annoying, though - no one wants to be issuing `kubectl` commands directly against clusters. Luckily, there's a better way - [Helm](https://helm.sh/).
 554 | 
 555 | ### Components
 556 | 
 557 | I'll defer to Helm's excellent docs, but in short, a Helm chart consists at its core of a simple file structure:
 558 | 
 559 |     ❯ tree
 560 |     .
 561 |     ├── Chart.yaml
 562 |     └── templates
 563 |         ├── NOTES.txt
 564 |         ├── deployment.yaml
 565 |         ├── ingress.yaml
 566 |         ├── namespace.yaml
 567 |         └── service.yaml
 568 | 
 569 |     1 directory, 6 files
 570 | 
 571 | `Chart.yaml` is required, but doesn't have to contain a lot:
 572 | 
 573 |     mkdir -p helm/templates && \
 574 |     cat << EOF > helm/Chart.yaml
 575 |     apiVersion: v2
 576 |     name: echo
 577 |     description: A Helm chart for a simple echo app
 578 |     version: 0.1.0
 579 |     appVersion: 0.1.0
 580 |     EOF
 581 | 
 582 | `version` is the version of the Helm chart, whereas `appVersion` is the version of the application. They should both use semantic versioning. `apiVersion` would be `v1` if you needed Helm v2 compatibility, but no one should be using Helm v2 these days, so stick with `apiVersion: v2`.
 583 | 
 584 |     cat << EOF > helm/templates/deployment.yaml
 585 |     apiVersion: apps/v1
 586 |     kind: Deployment
 587 |     metadata:
 588 |       name: echo
 589 |       namespace: echo
 590 |     spec:
 591 |       selector:
 592 |         matchLabels:
 593 |           app: echo
 594 |       replicas: 1
 595 |       template:
 596 |         metadata:
 597 |           labels:
 598 |             app: echo
 599 |         spec:
 600 |           containers:
 601 |           - name: echo
 602 |             image: localhost:5000/echo:latest
 603 |             imagePullPolicy: Always
 604 |             ports:
 605 |               - containerPort: 8080
 606 |             stdin: true
 607 |             tty: true
 608 |     EOF
 609 | 
 610 | The eagle-eyed among you will note that this is largely the same, except that we've added a `containerPort` that we'll be talking to.
 611 | 
 612 |     cat << EOF > helm/templates/service.yaml
 613 |     apiVersion: v1
 614 |     kind: Service
 615 |     metadata:
 616 |       name: echo
 617 |       namespace: echo
 618 |     spec:
 619 |       ports:
 620 |         - protocol: TCP
 621 |           port: 8080
 622 |           targetPort: 8080
 623 |       selector:
 624 |         app: echo
 625 |       type: NodePort
 626 |     EOF
 627 | 
 628 | The Service will connect our app out to the cluster - in this case, via a NodePort, which means that a random port will be opened on every node. `targetPort` is actually redundant here, as it defaults to the same port as `port`, but it's shown for education. In production, you would typically use a `LoadBalancer` service.
 629 | 
 630 |     cat << EOF > helm/templates/ingress.yaml
 631 |     apiVersion: networking.k8s.io/v1
 632 |     kind: Ingress
 633 |     metadata:
 634 |       name: echo
 635 |       namespace: echo
 636 |       annotations:
 637 |         nginx.ingress.kubernetes.io/rewrite-target: /$1
 638 |     spec:
 639 |       rules:
 640 |       - host: echo.internal
 641 |         http:
 642 |           paths:
 643 |           - path: /
 644 |             pathType: Prefix
 645 |             backend:
 646 |               service:
 647 |                 name: echo
 648 |                 port:
 649 |                   number: 8080
 650 |     EOF
 651 | 
 652 | We're using an Ingress here to route traffic to the service, and ultimately, to the pod.
 653 | 
 654 |     cat << EOF > helm/templates/namespace.yaml
 655 |     apiVersion: v1
 656 |     kind: Namespace
 657 |     metadata:
 658 |       name: echo
 659 |     EOF
 660 | 
 661 | The Namespace definition hasn't changed. We could also rely on Helm to do this for us, with its `--create-namespace` flag.
 662 | 
 663 |     cat << EOF > helm/templates/NOTES.txt
 664 |     To access, please run the following command:
 665 | 
 666 |         sudo echo "$(minikube ip) echo.internal" >> /etc/hosts
 667 | 
 668 |     Then go to http://echo.internal inyour browser.
 669 | 
 670 |     To clean up, run the following command:
 671 | 
 672 |         sudo sed -i '$d' /etc/hosts
 673 |     EOF
 674 | 
 675 | `NOTES.txt` is a special file for Helm, which it will render when you run `helm install` as helpful tips to the user. In this case, we're explaining how to edit the `/etc/hosts` file so that the URI resolves.
 676 | 
 677 | ### App
 678 | 
 679 | But wait, I hear you saying, the app didn't have any web server! You're correct, so let's remedy that quickly:
 680 | 
 681 |     mkdir -p echo/templates && \
 682 |     cat << EOF > echo/echo.py
 683 |     #!/usr/bin/env python
 684 | 
 685 |     from flask import Flask, request, render_template
 686 | 
 687 |     app = Flask(__name__)
 688 | 
 689 |     @app.route("/")
 690 |     def form():
 691 |         return render_template("index.html")
 692 | 
 693 |     @app.route("/", methods=["POST"])
 694 |     def form_post():
 695 |         return request.form["echo_input"]
 696 | 
 697 |     if __name__ == "__main__":
 698 |         app.run(host="0.0.0.0", port=8080, debug=True)
 699 |     EOF
 700 | 
 701 |   ---
 702 | 
 703 |     cat << EOF > echo/templates/index.html
 704 |     <html>
 705 |     <head>
 706 |       <title>Echo (echo...)</title>
 707 |     </head>
 708 |     <body>
 709 |         <h1>Echo (echo...)</h1>
 710 |         <form method="POST">
 711 |             <input type="text" name="echo_input" placeholder="Say something!">
 712 |             <input type="submit" value="Echo!">
 713 |         </form>
 714 |     </body>
 715 |     </html>
 716 |     EOF
 717 | 
 718 | (No one will ever accuse me of being a frontend dev. I regret nothing.)
 719 | 
 720 | We need to make sure Docker can install Flask (ideally this would be pinned to a specific version): `echo "flask" > echo/requirements.txt`
 721 | 
 722 | Next, we need to update the `Dockerfile`.
 723 | 
 724 |     cat << EOF > echo/Dockerfile
 725 |     FROM python:3.10-alpine
 726 | 
 727 |     WORKDIR /app
 728 | 
 729 |     COPY . /app
 730 | 
 731 |     RUN pip install -r requirements.txt
 732 | 
 733 |     EXPOSE 8080
 734 | 
 735 |     CMD ["python", "/app/echo.py"]
 736 |     EOF
 737 | 
 738 | If you're using a local registry, you'll also need build, tag, and push this new image:
 739 | 
 740 |     docker build -t echo echo && \
 741 |     docker tag echo:latest localhost:5000/echo:latest && \
 742 |     docker push localhost:5000/echo:latest
 743 | 
 744 | ### Installation
 745 | 
 746 | To install the Chart, let's first see what it would do:
 747 | 
 748 |     ❯ helm install --dry-run --debug echo helm/
 749 |     install.go:178: [debug] Original chart version: ""
 750 |     install.go:195: [debug] CHART PATH: /Users/sgarland/git/zapier/intro-to-x/k8s/helm
 751 | 
 752 |     NAME: echo
 753 |     LAST DEPLOYED: Fri May  6 17:02:50 2022
 754 |     NAMESPACE: default
 755 |     STATUS: pending-install
 756 |     REVISION: 1
 757 |     TEST SUITE: None
 758 |     USER-SUPPLIED VALUES:
 759 |     {}
 760 | 
 761 |     COMPUTED VALUES:
 762 |     {}
 763 | 
 764 |     HOOKS:
 765 |     MANIFEST:
 766 |     ---
 767 |     # Source: echo/templates/namespace.yaml
 768 |     apiVersion: v1
 769 |     kind: Namespace
 770 |     metadata:
 771 |       name: echo
 772 |     ---
 773 |     # Source: echo/templates/service.yaml
 774 |     apiVersion: v1
 775 |     kind: Service
 776 |     metadata:
 777 |       name: echo
 778 |       namespace: echo
 779 |     spec:
 780 |       ports:
 781 |         - protocol: TCP
 782 |           port: 8080
 783 |           targetPort: 8080
 784 |       selector:
 785 |         app: echo
 786 |       type: NodePort
 787 |     ---
 788 |     # Source: echo/templates/deployment.yaml
 789 |     apiVersion: apps/v1
 790 |     kind: Deployment
 791 |     metadata:
 792 |       name: echo
 793 |       namespace: echo
 794 |     spec:
 795 |       selector:
 796 |         matchLabels:
 797 |           app: echo
 798 |       replicas: 1
 799 |       template:
 800 |         metadata:
 801 |           labels:
 802 |             app: echo
 803 |         spec:
 804 |           containers:
 805 |           - name: echo
 806 |             image: localhost:5000/echo:latest
 807 |             imagePullPolicy: Always
 808 |             ports:
 809 |               - containerPort: 8080
 810 |             stdin: true
 811 |             tty: true
 812 |     ---
 813 |     # Source: echo/templates/ingress.yaml
 814 |     apiVersion: networking.k8s.io/v1
 815 |     kind: Ingress
 816 |     metadata:
 817 |       name: echo
 818 |       namespace: echo
 819 |       annotations:
 820 |         nginx.ingress.kubernetes.io/rewrite-target: /
 821 |     spec:
 822 |       rules:
 823 |       - host: echo.internal
 824 |         http:
 825 |           paths:
 826 |           - path: /
 827 |             pathType: Prefix
 828 |             backend:
 829 |               service:
 830 |                 name: echo
 831 |                 port:
 832 |                   number: 8080
 833 | 
 834 |     NOTES:
 835 |     To access, please run the following command:
 836 | 
 837 |         sudo echo "$(minikube ip) echo.internal" >> /etc/hosts
 838 | 
 839 |     Then go to http://echo.internal inyour browser.
 840 | 
 841 |     To clean up, run the following command:
 842 | 
 843 |         sudo sed -i '$d' /etc/hosts
 844 | 
 845 | Looks good! To install it, we can use the `ugprade` command with the `--install` flag - this way, if we need to make any changes, we don't have to type out a new command.
 846 | 
 847 |     ❯ helm upgrade --install echo helm
 848 |     Release "echo" does not exist. Installing it now.
 849 |     NAME: echo
 850 |     LAST DEPLOYED: Fri May  6 17:04:43 2022
 851 |     NAMESPACE: default
 852 |     STATUS: deployed
 853 |     REVISION: 1
 854 |     TEST SUITE: None
 855 |     NOTES:
 856 |     To access, please run the following command:
 857 | 
 858 |         sudo echo "$(minikube ip) echo.internal" >> /etc/hosts
 859 | 
 860 |     Then go to http://echo.internal inyour browser.
 861 | 
 862 |     To clean up, run the following command:
 863 | 
 864 |         sudo sed -i '$d' /etc/hosts
 865 | 
 866 | Let's add the `/etc/hosts` entry, then we can test it out! Note that if you're using the `docker` driver, you'll need to first run `minikube service echo -n echo --url` in a separate terminal, and keep it open for the next step. Also, replace the address you cURL to with the one minikube gives you (the ingress is largely useless here, although you could add it to `/etc/hosts` if you'd like).
 867 | 
 868 |     sudo echo "$(minikube ip) echo.internal" >> /etc/hosts
 869 | ---
 870 |     ❯ curl -d 'echo_input=Hello, world!' -X POST http://echo.internal
 871 |     Hello, world!
 872 | 
 873 | # RBAC
 874 | 
 875 | RBAC is Role-based Access Control. It's a way to control access to resources based on a user (or group's) role, rather than their identity. The assumption is that you have something else (like Okta) to authenticate the user, and then, RBAC will control that user's ability to access or modify resources.
 876 | 
 877 | ## Example
 878 | 
 879 | ### Generating a certificate
 880 | 
 881 | We're going to create a certificate and user to demonstrate how RBAC works.
 882 | 
 883 |     mkdir cert && openssl genrsa -out cert/echo-user.key 4096 && openssl req -new \
 884 |     -key cert/echo-user.key -out cert/echo-user.csr -subj "/CN=echo-user/O=echo-group" \
 885 |     && openssl x509 -req -in cert/echo-user.csr -CA ~/.minikube/ca.crt \
 886 |     -CAkey ~/.minikube/ca.key -CAcreateserial -out cert/echo-user.crt -days 365 \
 887 |     || echo "Failed to create cert! Please check that ~/.minikube/ca.{crt,key} exist."
 888 | 
 889 | Should result in the following:
 890 | 
 891 |     Generating RSA private key, 4096 bit long modulus
 892 |     ...........................++
 893 |     ...........................++
 894 |     e is 65537 (0x10001)
 895 |     Signature ok
 896 |     subject=/CN=echo-user/O=echo-group
 897 |     Getting CA Private Key
 898 | 
 899 | This one-liner uses the `openssl` tool to first create a 4096-bit RSA private key, then requests a Certificate Request using that key, and finally creates a certificate signed by the Minikube Certificate Authority, with an expiry of 365 days. The ending part, if you're not familiar with shell, is an `OR` that only executes if the previous command fails - since that command is relying on two files existing in `~/.minikube`, there's a pretty good chance that they're the reason for the failure, hence the message.
 900 | 
 901 | ### Creating a user
 902 | 
 903 | Now, we're going to create a user entry in our kubeconfig, then create a context using it.
 904 | 
 905 |     ❯ kubectl config set-credentials echo-user --client-certificate=cert/echo-user.crt \
 906 |       --client-key=cert/echo-user.key
 907 |     User "echo-user" set.
 908 | 
 909 |     ❯ kubectl config set-context echo-user-context --cluster=minikube --user=echo-user
 910 |     Context "echo-user-context" created.
 911 | 
 912 |     ❯ kubectl config use-context echo-user-context
 913 |     Switched to context "echo-user-context".
 914 | 
 915 | ### Testing out the user
 916 | 
 917 | Let's create a namespace again:
 918 | 
 919 |     ❯ kubectl create ns foobar
 920 |     Error from server (Forbidden): namespaces is forbidden: User "echo-user" cannot create resource "namespaces" in API group "" at the cluster scope
 921 | 
 922 | Since Minikube is installed with its default context of `minikube`, this additional user we've added has no permissions to do, well, anything. Try `kubectl get pods` or some other read-only action, and check the result.
 923 | 
 924 | ### Adding RBAC
 925 | 
 926 | RBAC definitions consist of two parts - Role, and RoleBinding - and are are either scoped to a cluster, or to a namespace. Helpfully, cluster-scoped RBAC objects are named ClusterRoles and ClusterRoleBindings.
 927 | 
 928 |     cat << EOF > helm/templates/rbac.yaml
 929 |     apiVersion: rbac.authorization.k8s.io/v1
 930 |     kind: Role
 931 |     metadata:
 932 |       name: echo-ro
 933 |       namespace: echo
 934 |     rules:
 935 |       - apiGroups: [""]
 936 |         resources: ["pods"]
 937 |         verbs: ["get", "list", "watch"]
 938 |     ---
 939 |     apiVersion: rbac.authorization.k8s.io/v1
 940 |     kind: RoleBinding
 941 |     metadata:
 942 |       name: echo-ro
 943 |       namespace: echo
 944 |     subjects:
 945 |     - kind: User
 946 |       name: echo-user
 947 |       apiGroup: rbac.authorization.k8s.io
 948 |     roleRef:
 949 |       kind: Role
 950 |       name: echo-ro
 951 |       apiGroup: rbac.authorization.k8s.io
 952 |     EOF
 953 | 
 954 | This is two RBAC objects in one file - a Role, and a RoleBinding. They're both scoped to the `echo` namespace, and as the name implies, they create a read-only role for the `echo-user` user we previously created. Note that you'll need to switch back to the `minikube` context to apply this (do you remember how?).
 955 | 
 956 |     ❯ helm upgrade --install echo helm
 957 |     Release "echo" has been upgraded. Happy Helming!
 958 |     NAME: echo
 959 |     LAST DEPLOYED: Wed May 18 10:51:41 2022
 960 |     NAMESPACE: default
 961 |     STATUS: deployed
 962 |     REVISION: 2
 963 |     TEST SUITE: None
 964 |     NOTES:
 965 |     To access, please run the following command:
 966 | 
 967 |         sudo echo "$(minikube ip) echo.internal" >> /etc/hosts
 968 | 
 969 |     Then go to http://echo.internal inyour browser.
 970 | 
 971 |     To clean up, run the following command:
 972 | 
 973 |         sudo sed -i '$d' /etc/hosts
 974 | 
 975 | ### Verifying RBAC
 976 | 
 977 | We can of course use `kubectl get role -n echo` and `kubectl get rolebinding -n echo` to view our newly-available RBAC, but `kubectl` includes a very useful feature called `kubectl auth can-i` which allows you to check if you have the ability to do a given action, as a given user and/or group. Cluster administrators can impersonate another user (this is very useful for SREs) with the `--as user.name` flag, but anyone can use it to check their current ability.
 978 | 
 979 |     ❯ kubectl config use-context echo-user-context
 980 |     Switched to context "echo-user-context".
 981 | 
 982 |     ❯ kubectl auth can-i get pods -n echo
 983 |     yes
 984 | 
 985 |     ❯ kubectl auth can-i get pods -n echo --as foobar
 986 |     Error from server (Forbidden): users "foobar" is forbidden: User "echo-user" cannot impersonate resource "users" in API group "" at the cluster scope
 987 | 
 988 |     ❯ kubectl auth can-i create pods -n echo
 989 |     no
 990 | 
 991 |     ❯ kubectl auth can-i get pods -n kube-system
 992 |     no
 993 | 
 994 |     ❯ kubectl auth can-i create pods --subresource exec -n echo
 995 |     no
 996 | 
 997 | This last one can be problematic, and indeed, is/was the source of much pain in SRE land as developers were unable to exec into bastion pods. `exec` is a subset of `create`, and specific permission must be granted to do so. If you try without having the requisite permission, you'll see this:
 998 | 
 999 |     ❯ kubectl exec -it -n echo echo-75897c68fd-nhn64 -- sh
1000 |     Error from server (Forbidden): pods "echo-75897c68fd-nhn64" is forbidden: User "echo-user" cannot create resource "pods/exec" in API group "" in the namespace "echo"
1001 | 
1002 | ### Modifying RBAC
1003 | 
1004 |     ex helm/templates/rbac.yaml <<EOF
1005 |     ?---? insert
1006 |       - apiGroups: [""]
1007 |         resources: ["pods/exec"]
1008 |         verbs: ["create"]
1009 |     .
1010 |     :%s/echo-ro/echo-rw/g
1011 |     xit
1012 |     EOF
1013 | 
1014 | Or, you know, use an actual editor. For the curious, `?---? insert` instructs `ex` to find the first match of `---`, and then insert what follows immediately before it. Since we only have one match in that file, which is the bottom of our Role definition, this works nicely. The `.` terminates the `insert` command, after which we do a global search-and-replace for `echo-ro` with `echo-rw`, to clarify the role.
1015 | 
1016 | ### Verifying RBAC (again)
1017 | 
1018 | First, run a `helm upgrade` as before to apply the new role. Then, let's test it out:
1019 | 
1020 |     ❯ kubectl exec -it -n echo echo-75897c68fd-nhn64 -- sh
1021 |     /app #
1022 | 
1023 | Excellent! Not that we have logs in this simple example, but as an exercise, first try getting logs, and if it doesn't work, edit the RBAC as necessary to enable the `echo-user` to read logs.
1024 | 
1025 | # Resource Limits and Requests
1026 | 
1027 | ## Cgroups
1028 | 
1029 | First, a quick background.
1030 | 
1031 | Docker (and other container runtime) build on Linux Control Groups, or cgroups. Cgroups are a way to split up a system's resources into groups with arbitrary limits, and to control the processes allowed to use each group. This is generally actually a sub-nest in that most Kubernetes nodes are themselves usually run on a VM, which is itself a way of divvying up a system's resources.
1032 | 
1033 | If you exec into a container, and you try to get its resources the normal way (`/proc/meminfo` and `/proc/cpuinfo`), you'll get the node's resources, not the container's. This can be confusing if, for instance, you wrote a NodeJS program which calls `totalmem()` to determine available memory. Instead, you need to dive deep into Linux internals.
1034 | 
1035 | ## Requests vs. Limits
1036 | 
1037 | The Kubernetes scheduler will not schedule a pod onto a node that can't fulfill all of its resource requests. If it has 8 GiB of allocatable memory, with 7 GiB used, and you try to request 2 GiB more for a new pod, it will fail to schedule; the same for CPU. You can, however, set a low request with a high limit, and the pod will be scheduled - you just might not like the performance of your pods later.
1038 | 
1039 | Limits, on the other hand, are monitored by the Kubelet. If a container's memory usage exceeds its limit, Linux's OOM killer will kill the container. A container *may* be allowed to exceed its CPU limit, but this is not a guarantee. Generally, you should expect a container to get throttled if it exceeds its CPU limit. It is for this reason that [some state](https://news.ycombinator.com/item?id=24381813) that you should never set CPU limits, only requests, and let spiky workloads be spiky. [Others disagree](https://news.ycombinator.com/item?id=24357992).
1040 | 
1041 | ## Viewing resources
1042 | 
1043 | You can, of course, use `kubectl get pods -n echo $pod_name -o yaml` to get a pod's (and therefore container's) resources, but if you really want to see what Linux has assigned, you need to dig deeper. This will be covered as part of Kubernetes 102.
1044 | 
1045 | ## Setting resource limits and requests
1046 | 
1047 |     cat << EOF >> helm/templates/deployment.yaml
1048 |             resources:
1049 |               limits:
1050 |                 cpu: 100m
1051 |                 memory: 128Mi
1052 |               requests:
1053 |                 cpu: 50m
1054 |                 memory: 50Mi
1055 |     EOF
1056 | 
1057 | Now run a `helm upgrade` cycle (make sure you're back to the `minikube` context), then examine the deployment, and finally, the pod's YAML manifest to view changes.
1058 | 
1059 | ## Exploring limits and requests
1060 | 
1061 | Play around (you can use `kubectl edit deployment` to speed things up) with limits and requests, and see how the scheduler and kubelet respond to combinations.
1062 | 
1063 | # More to explore
1064 | 
1065 | * This application could be put behind a load balancer (you could set up [MetalLB](https://metallb.universe.tf/) locally if you'd like), with additional replicas.
1066 | * This application runs as root, which is not recommended. How could you fix that?
1067 | * User entries could be captured and sent to a database stored in a dynamically generated Persistent Volume, with additional routes enabling historical views.
1068 | * HPA (Horizontal Pod Autoscaler) could be set up, along with some load testing mechanism, to demonstrate how Kubernetes will scale the application in response to demand.
1069 | * KEDA (Kubernetes Event-driven Autoscaling) could be set up to automatically scale on metrics other than CPU or Memory.
1070 | 


--------------------------------------------------------------------------------
/k8s/k8s-102.md:
--------------------------------------------------------------------------------
  1 | # WIP DRAFT
  2 | 
  3 | ### Viewing resources in a container
  4 | 
  5 |     # Memory limits are in /sys/fs/cgroup/memory/memory.limit_in_bytes
  6 |     # We can use the bash-ism `(())` to do math, converting it to MiB
  7 |     # Alternately if you have `bc`, you can use that, as well as `awk`
  8 |     ❯ echo $(($(< /sys/fs/cgroup/memory/memory.limit_in_bytes) / 1048576))
  9 |     2048
 10 | 
 11 |     # Memory requests would be in /sys/fs/cgroup/memory/memory.soft_limit_in_bytes if
 12 |     # Kubernetes followed normal Linux memory accounting practices, but it doesn't
 13 | 
 14 |     ❯ cat /sys/fs/cgroup/memory/memory.soft_limit_in_bytes
 15 |     9223372036854771712
 16 | 
 17 | 
 18 |     Wondering what on earth 9223372036854771712 bytes is? Is this a hint?
 19 |     ❯ printf "%x\n" $(< /sys/fs/cgroup/memory/memory.soft_limit_in_bytes)
 20 |     7ffffffffffff000
 21 | 
 22 |     # CPU requests are in /sys/fs/cgroup/cpu/cpu.shares, with a single core/vCPU being equal to 1024
 23 |     # This is thus 256 / 1024 == 0.25
 24 |     ❯ cat /sys/fs/cgroup/cpu/cpu.shares
 25 |     256
 26 | 
 27 |     # CPU limits have to be calculated, as it's a combination of quota and period
 28 |     ❯ cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
 29 |     150000
 30 | 
 31 |     ❯ cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
 32 |     100000
 33 | 
 34 |     # So, CPU limits are:
 35 |     ❯ echo $(($(< /sys/fs/cgroup/cpu/cpu.cfs_quota_us) / $(< /sys/fs/cgroup/cpu/cpu.cfs_period_us)))
 36 |     1 # ???
 37 | 
 38 |     # Bash doesn't handle floats, as it turns out - the answer is 1.5 vCPUs
 39 |     ❯ awk -v quota="$(< /sys/fs/cgroup/cpu/cpu.cfs_quota_us)" \
 40 |     -v period="$(< /sys/fs/cgroup/cpu/cpu.cfs_period_us)" \
 41 |     '{print quota/period}' <(echo)
 42 |     1.5
 43 | 
 44 | ### Viewing resources on the host
 45 | 
 46 | So what if you want to view a given container's resources from the host? More Linux internals, I'm afraid.
 47 | 
 48 | This specific example comes from my homelab (so does the above), but once we have requests and limits set for our application, we can circle back and view them on the `minikube` node.
 49 | 
 50 |     # I'm going to look for an app called `radarr` that I know is running on this node
 51 | 
 52 |     dell01-k3s-worker-01 [~]$ ps -ax | grep radarr
 53 |      3028 ?        S      0:00 s6-supervise radarr
 54 |      3030 ?        Ssl  1159:40 /app/radarr/bin/Radarr -nobrowser -data=/config
 55 |     10484 pts/0    R+     0:00 grep radarr
 56 | 
 57 |     # Then, I'll look at its `/proc` filesystem entry
 58 |     dell01-k3s-worker-01 [~]$ cat /proc/3030/cgroup
 59 |     15:name=openrc:/k3s-service
 60 |     14:name=systemd:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
 61 |     13:rdma:/
 62 |     12:pids:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
 63 |     11:hugetlb:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
 64 |     10:net_prio:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
 65 |     9:perf_event:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
 66 |     8:net_cls:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
 67 |     7:freezer:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
 68 |     6:devices:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
 69 |     5:memory:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
 70 |     4:blkio:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
 71 |     3:cpuacct:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
 72 |     2:cpu:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
 73 |     1:cpuset:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
 74 |     0::/k3s-service
 75 | 
 76 |     # cgroups inherit from their parents, incidentally, so everything here is inheriting
 77 |     # from both the `burstable` and `kubepods` cgroups
 78 | 
 79 |     # We'll use `awk` to grab what we want from that list, then command substitution
 80 |     dell01-k3s-worker-01 [~]$ ls -l /sys/fs/cgroup/memory/$(awk -F: '/memory/ {print $NF}' /proc/3030/cgroup)
 81 |     total 0
 82 |     -rw-r--r-- 1 root root 0 May 18 15:39 cgroup.clone_children
 83 |     --w--w--w- 1 root root 0 May  5 17:50 cgroup.event_control
 84 |     -rw-r--r-- 1 root root 0 May 18 15:51 cgroup.procs
 85 |     -rw-r--r-- 1 root root 0 May 18 15:39 memory.failcnt
 86 |     --w------- 1 root root 0 May 18 15:51 memory.force_empty
 87 |     -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.failcnt
 88 |     -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.limit_in_bytes
 89 |     -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.max_usage_in_bytes
 90 |     -r--r--r-- 1 root root 0 May 18 15:51 memory.kmem.slabinfo
 91 |     -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.tcp.failcnt
 92 |     -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.tcp.limit_in_bytes
 93 |     -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.tcp.max_usage_in_bytes
 94 |     -r--r--r-- 1 root root 0 May 18 15:39 memory.kmem.tcp.usage_in_bytes
 95 |     -r--r--r-- 1 root root 0 May 18 15:39 memory.kmem.usage_in_bytes
 96 |     -rw-r--r-- 1 root root 0 May 18 15:39 memory.limit_in_bytes
 97 |     -rw-r--r-- 1 root root 0 May 18 15:39 memory.max_usage_in_bytes
 98 |     -rw-r--r-- 1 root root 0 May 18 15:51 memory.move_charge_at_immigrate
 99 |     -r--r--r-- 1 root root 0 May 18 15:39 memory.numa_stat
100 |     -rw-r--r-- 1 root root 0 May 18 15:39 memory.oom_control
101 |     ---------- 1 root root 0 May 18 15:51 memory.pressure_level
102 |     -rw-r--r-- 1 root root 0 May 18 15:39 memory.soft_limit_in_bytes
103 |     -r--r--r-- 1 root root 0 May 18 15:39 memory.stat
104 |     -rw-r--r-- 1 root root 0 May 18 15:51 memory.swappiness
105 |     -r--r--r-- 1 root root 0 May 18 15:39 memory.usage_in_bytes
106 |     -rw-r--r-- 1 root root 0 May 18 15:39 memory.use_hierarchy
107 |     -rw-r--r-- 1 root root 0 May 18 15:51 notify_on_release
108 |     -rw-r--r-- 1 root root 0 May 18 15:51 tasks
109 | 
110 |     # Looks familiar, right?
111 | 
112 |     dell01-k3s-worker-01 [~]$ echo $(($(< /sys/fs/cgroup/memory/$(awk -F: '/memory/ {print $NF}' /proc/3030/cgroup)/memory.limit_in_bytes) / 1048576))
113 |     2048
114 | 
115 |     # Finding the CPU information from the host is left as an exercise for the reader.
116 | 
117 | ## Setting resource limits and requests
118 | 
119 |     cat << EOF >> helm/templates/deployment.yaml
120 |             resources:
121 |               limits:
122 |                 cpu: 100m
123 |                 memory: 128Mi
124 |               requests:
125 |                 cpu: 50m
126 |                 memory: 50Mi
127 |     EOF
128 | 
129 | Now run a `helm upgrade` cycle (make sure you're back to the `minikube` context), then exec back into the pod to examine it.
130 | 
131 |     /app # echo $(($(< /sys/fs/cgroup/memory/memory.usage_in_bytes) / 1048576))
132 |     sh: arithmetic syntax error
133 | 
134 |     # As it turns out, the $(< ) command is a bash-ism for `cat`, and this is `sh`, not `bash`
135 | 
136 |     /app # echo $(($(cat /sys/fs/cgroup/memory/memory.usage_in_bytes) / 1048576))
137 |     37
138 | 
139 |     # So, our app is using about 37 MiB of memory.
140 | 
141 |     /app # echo $(($(cat /sys/fs/cgroup/memory/memory.limit_in_bytes) / 1048576))
142 |     128
143 | 
144 |     And we can see that our 128 MiB limit has been set.


--------------------------------------------------------------------------------
/mysql/mysql-101-0.md:
--------------------------------------------------------------------------------
   1 | # MySQL 101 Part I
   2 | 
   3 | - [MySQL 101 Part I](#mysql-101-part-i)
   4 |   - [Prerequisites](#prerequisites)
   5 |     - [MySQL Client](#mysql-client)
   6 |       - [GUI](#gui)
   7 |       - [TUI](#tui)
   8 |   - [Introduction](#introduction)
   9 |   - [What is SQL?](#what-is-sql)
  10 |   - [What is a relational database?](#what-is-a-relational-database)
  11 |   - [What is ACID?](#what-is-acid)
  12 |     - [What is MySQL?](#what-is-mysql)
  13 |       - [How is it pronounced?](#how-is-it-pronounced)
  14 |   - [Basic definitions](#basic-definitions)
  15 |     - [SQL sub-languages](#sql-sub-languages)
  16 |     - [Other definitions](#other-definitions)
  17 | - [MySQL Components](#mysql-components)
  18 | - [MySQL Operations](#mysql-operations)
  19 |   - [Assumptions](#assumptions)
  20 |   - [Notes](#notes)
  21 |   - [Schemata](#schemata)
  22 |   - [Schema spelunking](#schema-spelunking)
  23 |     - [String literals](#string-literals)
  24 |       - [SQL\_MODE](#sql_mode)
  25 |     - [Create a schema](#create-a-schema)
  26 |   - [Table operations](#table-operations)
  27 |     - [Create tables](#create-tables)
  28 |       - [Data types](#data-types)
  29 |     - [Foreign keys](#foreign-keys)
  30 |       - [Why you might want foreign keys](#why-you-might-want-foreign-keys)
  31 |       - [Creating a foreign key](#creating-a-foreign-key)
  32 |       - [Demonstrating a foreign key](#demonstrating-a-foreign-key)
  33 |     - [Determining table size](#determining-table-size)
  34 |   - [Column operations](#column-operations)
  35 |     - [Adding columns](#adding-columns)
  36 |     - [Modfying columns](#modfying-columns)
  37 |     - [Dropping tables with foreign keys](#dropping-tables-with-foreign-keys)
  38 |     - [Copied table definitions](#copied-table-definitions)
  39 |       - [Copied table data and truncating](#copied-table-data-and-truncating)
  40 |     - [Transactions](#transactions)
  41 |     - [Generated columns](#generated-columns)
  42 |     - [Invisible columns](#invisible-columns)
  43 | 
  44 | ## Prerequisites
  45 | 
  46 | ### MySQL Client
  47 | 
  48 | You'll need to have a MySQL client. In order of preference, some options for GUI (graphical) and TUI (terminal) are:
  49 | 
  50 | #### GUI
  51 | 
  52 | - [Sequel Ace](https://sequel-ace.com/)
  53 |   - Install from App Store, or with [Homebrew](https://brew.sh/): `HOMEBREW_NO_AUTO_UPDATE=1 brew install --cask sequel-ace`
  54 | - [MySQL Workbench](https://www.mysql.com/products/workbench/)
  55 | - [DBeaver](https://dbeaver.io/)
  56 | 
  57 | #### TUI
  58 | 
  59 | - [mysql-client](https://dev.mysql.com/doc/refman/8.0/en/mysql.html)
  60 |   - Install with [Homebrew](https://brew.sh/): `HOMEBREW_NO_AUTO_UPDATE=1 brew install mysql-client`
  61 | 
  62 | 
  63 | Note that the server is currently using a self-signed TLS certificate, which some clients may complain about. Sequel Ace, MySQL Workbench, and msyql-client are proven to work without issue. Also note that mysql-client is available via [Homebrew](https://formulae.brew.sh/formula/mysql-client), but it won't symlink by default, so you'll need to do something like `brew link --force mysql-client`.
  64 | 
  65 | WARNING: MySQL Workbench may not work with M1/M2 (ARM) Macs.
  66 | 
  67 | ## Introduction
  68 | 
  69 | ## What is SQL?
  70 | 
  71 | Structured Query Language. It's a domain-specific language designed to manage data in a Relational Database Management System (RDBMS). It's been extended and updated many times, both in its official ANSI definition, and in implementations of it like MySQL and PostgreSQL.
  72 | 
  73 | ## What is a relational database?
  74 | 
  75 | It's what most people probably think of when they think of a database. Broadly speaking, data is related to other data in some manner. For example, observe these two tables (tl;dr a logical grouping of data):
  76 | 
  77 | ```sql
  78 | SHOW COLUMNS FROM users;
  79 | ```
  80 | 
  81 | ```sql
  82 | +------------+----------+------+-----+---------+----------------+
  83 | | Field      | Type     | Null | Key | Default | Extra          |
  84 | +------------+----------+------+-----+---------+----------------+
  85 | | id         | bigint   | NO   | PRI | NULL    | auto_increment |
  86 | | first_name | char(64) | YES  |     | NULL    |                |
  87 | | last_name  | char(64) | YES  |     | NULL    |                |
  88 | | user_id    | bigint   | NO   | UNI | NULL    |                |
  89 | +------------+----------+------+-----+---------+----------------+
  90 | 4 rows in set (0.09 sec)
  91 | ```
  92 | 
  93 | ```sql
  94 | SHOW COLUMNS FROM zaps;
  95 | ```
  96 | 
  97 | ```sql
  98 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+
  99 | | Field           | Type            | Null | Key | Default           | Extra                       |
 100 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+
 101 | | id              | bigint unsigned | NO   | PRI | NULL              | auto_increment              |
 102 | | zap_id          | bigint unsigned | NO   | UNI | NULL              |                             |
 103 | | created_at      | timestamp       | NO   |     | CURRENT_TIMESTAMP | DEFAULT_GENERATED           |
 104 | | last_updated_at | timestamp       | YES  |     | NULL              | on update CURRENT_TIMESTAMP |
 105 | | owned_by        | bigint unsigned | NO   | MUL | NULL              |                             |
 106 | | shared_with     | json            | YES  |     | json_array()      | DEFAULT_GENERATED           |
 107 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+
 108 | 6 rows in set (0.01 sec)
 109 | ```
 110 | 
 111 | Table `users` has four columns - `id`, `first_name`, `last_name`, and `user_id`. Table `zaps` has six columns - `id`, `zap_id`, `created_at`, `last_updated_at`, `owned_by`, and `shared_with`.
 112 | 
 113 | Although it isn't explicitly defined or enforced, there is an implicit relationship between these two tables via `users.user_id` and `zaps.owned_by`. Thus, a query like `SELECT zap_id, owned_by FROM zaps JOIN users ON user_id = owned_by;` could use that relationship. Ideally, there would be additional constraints like foreign keys established to ensure referential integrity, but this example suffices for now.
 114 | 
 115 | Also, generally speaking, RDBMS are ACID-compliant (but not always).
 116 | 
 117 | ## What is ACID?
 118 | 
 119 | ACID is a set of four properties that, if implemented correctly, guarantee data validity:
 120 | 
 121 | - Atomicity
 122 |   - In a given transaction, each statement must either completely succeed, or fail. If any statement in a transaction fails, the entire transaction must fail.
 123 | - Consistency
 124 |   - A given transaction can only move a database from one valid and consistent state to another.
 125 | - Isolation
 126 |   - Even with concurrent transactions executing, the database must end up in the same state as if each transaction were executed sequentially.
 127 | - Durability
 128 |   - Once a transaction is committed, it must remain committed in the event of a system failure.
 129 | 
 130 | Note that the lack of one or more of these properties does not necessarily mean that data committed is invalid, only that the guarantees granted by that particular property must be accounted for elsewhere. A common counter-example of this is Eventual Consistency with distributed systems.
 131 | 
 132 | ### What is MySQL?
 133 | 
 134 | It's an extremely popular row-based relational database implementing and extending ANSI SQL. It's unfortunately owned by Oracle, but if you'd prefer, the MariaDB fork is essentially the same thing.
 135 | 
 136 | #### How is it pronounced?
 137 | 
 138 | Officially, "My Ess Que Ell," but since the SQL language was originally called SEQUEL ("Structured English Query Language"), and only changed due to trademark issues, I feel at ease saying "My Sequel." However, this tends to bring out pedants who love to haughtily correct your pronunciation, so do what you will. For what it's worth, I also pronounce kubectl (the Kubernetes CLI tool) as "kube cuddle," so I may not be the greatest influence.
 139 | 
 140 | ## Basic definitions
 141 | 
 142 | ### SQL sub-languages
 143 | 
 144 | All of these can be grouped as SQL, and some of them can also be combined - `DQL` is often merged with `DML`, for example. Knowing that `DML` is generally operating on a single record at a time (but may be batched), and that `DDL` is generally operating on an entire table or schema at a time suffices for now.
 145 | 
 146 | - DCL
 147 |   - Data Control Language. `GRANT`, `REVOKE`.
 148 | - DDL
 149 |   - Data Definition Language. `ALTER`, `CREATE`, `DROP`, `TRUNCATE`.
 150 | - DML
 151 |   - Data Manipulation Language. `CALL`, `DELETE`, `INSERT`, `LOCK`, `SELECT (with FROM or WHERE)`, `UPDATE`.
 152 | - DQL
 153 |   - Data Query Language. `SELECT`.
 154 | - TCL
 155 |   - Transaction Control Language. `COMMIT`, `ROLLBACK`, `SAVEPOINT`.
 156 | 
 157 | ### Other definitions
 158 | 
 159 | - B+ tree
 160 |   - An _m_`-ary tree` data structure that is self-balancing, with a variable number of children per node. It differs from the `B-tree` in that an individual data node can have either keys or children, but not both. It has `O(log(n))` time complexity for insertion, search, and deletion. It is frequently used both for filesystems and for RDBMS.
 161 | - Block
 162 |   - The lowest reasonable level of data storage (above individual bits). Historically sized at 512 bytes due to hard drive sector sizes, but generally sized at 4 KiB in modern drives, and SSDs. Enterprise drives sometimes have 520 byte block sizes (or 4160 bytes for the 4 KiB-adjacent size), with the extra 8 bytes being used for data integrity calculations.
 163 | - Filesystem
 164 |   - A method for the operating system to store data. May include features like copy-on-write, encryption, journaling, pre-allocation, SSD management, volume management, and more. Modern examples include APFS (default for Apple products), ext4 (default for most Linux distributions), NTFS (default for Windows), XFS (default for Red Hat and its downstream), and ZFS (default for FreeBSD).
 165 | - Schema
 166 |   - A logical grouping of database objects, e.g. tables, indices, etc. Often called a database, but technically, the database may contain any number of schemas, each with its own unique (or shared!) set of data, access policies, etc.
 167 | - Table
 168 |   - A logical grouping of data, of varying or similar types. May contain constraints, indices, etc.
 169 | - Tablespace
 170 |   - The link between the logical storage layer (tables, indices) and the physical storage layer (the disk's filesystem). This is an actual file that exists on the disk, contained in `$MYSQL_DATA_DIR`, nominally `/var/lib/mysql`.
 171 |     - As an aside, this fact, combined with [RDS MySQL file size limits](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.KnownIssuesAndLimitations.html#MySQL.Concepts.Limits.FileSize) yields some interesting information about RDS. Since they used to (anything created before April 2014) limit a table to 2 TiB*, that means that they were using ext3, as that is its maximum file size. Instances created after April 2014 are limited to 16 TiB* files, indicating that they are probably now using ext4, as that is generally its maximum file size. 16 TB is also the limit for InnoDB with 4 KB InnoDB page sizes, so it's possible the underlying disk's filesystem is XFS or something else, but since that value defaults to 16 KB, it seems unlikely.
 172 | 
 173 | <details>
 174 | <summary>What's a TiB?</summary>
 175 | 
 176 |   A TiB (or MiB, or GiB..) is how data is actually sized, in base-2. Written out, instead of Terabytes, it's Tebibytes, and is _2^40 bytes_ instead of _10^12 bytes_ (Terabytes are base-10). Base-10 caught on for storage marketing since the number is larger and thus sounds better, but in reality you're getting less. This is why a 1 TB hard drive shows up on your computer as having 931 GB - because it's actually 931 GiB, but it gets displayed as GB since GiB as a term never caught on.
 177 | 
 178 |   In specific relation to this point, AWS' docs state that the limits are in TB (terabytes) instead of TiB (tebibytes). It's possible that their VM subsystem limits the size to n TB, but the actual filesystem is capable of n TiB.
 179 | 
 180 | </details>
 181 | 
 182 | # MySQL Components
 183 | 
 184 | As of MySQL 8.0, this is the official architecture drawing:
 185 | 
 186 | ![MySQL 8.0 architecture](https://cdn.zappy.app/a92561fb248524eb0927cc0ed618de52.png)
 187 | 
 188 | * Connector
 189 |   * Also known as the Client, this is how you interact with the database, be it manually via a CLI client tool, or via a program using the DB.
 190 | * Server
 191 |   * Parser
 192 |     * This component receives a human-readable query, and translates it into machine-readable commands, via a lexical scanner and a grammar rule module.
 193 |   * Optimizer
 194 |     * This component attempts to optimize a given query using its knowledge of the stored data, such that the relative compute time of the query is minimized.
 195 |   * Caches/Buffers
 196 |     * This component has various caches to store frequently-accessed data, temporary tables created for use by other queries, etc.
 197 |   * SQL Interface
 198 |     * This component is the link between the Connector and the rest of the Server.
 199 | * Storage Engine
 200 |   * This component stores and manages the actual databases. Historically MySQL used the MyISAM engine, but switched to InnoDB with version 5.6. Both (and others) remain available if desired, but unless you have an extremely specific use case, you should use InnoDB.
 201 | 
 202 | # MySQL Operations
 203 | 
 204 | ## Assumptions
 205 | 
 206 | - All examples here are using MySQL 8.0.23, with the InnoDB engine.
 207 | - All examples here are using the mysql-client TUI program, but others may work as well.
 208 | 
 209 | ## Notes
 210 | 
 211 | - MySQL is case-insenitive for most, but not all operations. I'll use `UPPERCASE` to designate commands, and `lowercase` to desginate arguments and schema, table, and column names, but you're welcome to use all lowercase.
 212 | - The `;` suffix to commands serves as both the command terminator, and specifies that the output should be in an ASCII table.
 213 | - The `\G` suffix to commands is an alternative terminator, and specifies that the output should be in a vertical, non-tabular format.
 214 |   - Not all clients support this. If you're using a GUI client like Sequel Ace, you can simply scroll the output window horizontally, or expand it to make it bigger.
 215 | - I'm formatting my queries with statements and clauses on the left, their arguments indented by two spaces, and any qualifiers on the same line, where possible.
 216 | - This was developed on a Debian VM with 16 cores of a Xeon E5-2650 v2, 64 GiB of DDR3 RAM, and a working directory which is an NFS export over a 1GBe network, consisting of a ZFS RAIDZ2 array of spinning disks; ashift=12, blocksize=128K. Your times will vary, based mostly on the disk and RAM speed.
 217 | 
 218 | ## Schemata
 219 | 
 220 | A brand-new installation of MySQL will typically have four schemata - `information_schema`, `mysql`, `performance_schema`, and `sys`.
 221 | 
 222 | - `information_schema` contains information about the schema in the database. This includes columns, column types, indices, foreign keys, and tables.
 223 | - `mysql` generally contains configuration and logs.
 224 | - `sys` generally contains information about the SQL engine (InnoDB here), including currently executing processes, and query metrics.
 225 | - `performance_schema` contains some specific performance information about the schema in the database, such as deadlocks, locks, memory consumption, mutexes, and threads.
 226 | 
 227 | ## Schema spelunking
 228 | 
 229 | As mentioned, `databases` is often used to mean `schema`, and in fact in MySQL they're synonyms for this statement - `SHOW schemas` results in the exact same output. You won't have the `test` database yet, but you should see the other four shown below. NOTE: I'll demonstrate both output formats here, and will switch as needed to easily display the information.
 230 | 
 231 | ```sql
 232 | SHOW schemas;
 233 | ```
 234 | 
 235 | ```sql
 236 | +--------------------+
 237 | | Database           |
 238 | +--------------------+
 239 | | information_schema |
 240 | | mysql              |
 241 | | northwind          |
 242 | | performance_schema |
 243 | | sys                |
 244 | | test               |
 245 | +--------------------+
 246 | 6 rows in set (0.01 sec)
 247 | ```
 248 | 
 249 | ```sql
 250 | SHOW schemas\G
 251 | ```
 252 | 
 253 | ```sql
 254 | *************************** 1. row ***************************
 255 | Database: information_schema
 256 | *************************** 2. row ***************************
 257 | Database: mysql
 258 | *************************** 3. row ***************************
 259 | Database: northwind
 260 | *************************** 4. row ***************************
 261 | Database: performance_schema
 262 | *************************** 5. row ***************************
 263 | Database: sys
 264 | *************************** 6. row ***************************
 265 | Database: test
 266 | 6 rows in set (0.01 sec)
 267 | ```
 268 | 
 269 | The `SHOW` statement behind the scenes is gathering and formatting data in a way that's easy for humans to see and understand. Often, it comes from the `information_schema` or `performance_schema` schema, as seen below. This query also demonstrates the use of the `AS` statement, which allows you to alias a column or sub-query.
 270 | 
 271 | ```sql
 272 | SELECT
 273 |   schema_name AS 'Database'
 274 | FROM
 275 |   information_schema.schemata;
 276 | ```
 277 | 
 278 | ```sql
 279 | +--------------------+
 280 | | Database           |
 281 | +--------------------+
 282 | | mysql              |
 283 | | information_schema |
 284 | | performance_schema |
 285 | | sys                |
 286 | | test               |
 287 | | northwind          |
 288 | +--------------------+
 289 | 6 rows in set (0.01 sec)
 290 | ```
 291 | 
 292 | ### String literals
 293 | 
 294 | You may have noticed that in the above examples, sometimes a column or table name was enclosed with a single quote (`'`), sometimes a backtick ( \` ), and other times nothing at all. This is deliberate.
 295 | 
 296 | In ANSI SQL, string literals are represented with single quotation marks, e.g. 'test.' This mode is disabled by default in MySQL, so you're free to use double quotation marks if you'd prefer; however if you were trying to pass in a command to the client from a shell (e.g. `mysql -e 'SELECT foo FROM bar'`), you might run into shell expansion issues depending on your query. Also, since you'll probably be working with other SQL implementations like Postgres, it's best to try to stay as neutral as possible.
 297 | 
 298 | Backticks may be used at any time, and are called quoted identifiers. They tell the SQL parser to consider anything enclosed in them as a string literal. This may be useful if, for example, you created a table named `table` (please don't), had a column named `count`, etc. The full list of keywords / reserved words [is here](https://dev.mysql.com/doc/refman/8.0/en/keywords.html) if you want to see what to avoid.
 299 | 
 300 | ```sql
 301 | CREATE TABLE table (id INT);
 302 | ```
 303 | 
 304 | ```sql
 305 | ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table (id INT)' at line 1
 306 | ```
 307 | 
 308 | vs.
 309 | 
 310 | ```sql
 311 | CREATE TABLE `table` (id INT);
 312 | ```
 313 | 
 314 | ```sql
 315 | Query OK, 0 rows affected (0.15 sec)
 316 | ```
 317 | 
 318 | #### SQL_MODE
 319 | 
 320 | As it turns out, you can alter this behavior. First, let's check the current `SQL_MODE`. System variables can be viewed with either `SHOW VARIABLES` or `SELECT @@<[GLOBAL, SESSION]>`.
 321 | 
 322 | ```sql
 323 | SHOW VARIABLES LIKE 'sql_mode'\G
 324 | ```
 325 | 
 326 | ```sql
 327 | *************************** 1. row ***************************
 328 | Variable_name: sql_mode
 329 |         Value: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
 330 | 1 row in set (0.01 sec)
 331 | ```
 332 | 
 333 | If neither `GLOBAL` or `SESSION` are specified when using the `@@` method, the session value is returned if it exists, otherwise the global value is returned.
 334 | 
 335 | ```sql
 336 | SELECT @@sql_mode\G
 337 | ```
 338 | 
 339 | ```sql
 340 | *************************** 1. row ***************************
 341 | @@sql_mode: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
 342 | 1 row in set (0.00 sec)
 343 | ```
 344 | 
 345 | We'll use the `mysql.user` table for this example. First, no quotes of any kind. As expected, we get the rows from those two columns.
 346 | 
 347 | ```sql
 348 | SELECT host, user FROM mysql.user;
 349 | ```
 350 | 
 351 | ```sql
 352 | +-------------+------------------+
 353 | | host        | user             |
 354 | +-------------+------------------+
 355 | | %           | zapier           |
 356 | | %           | zapier_training  |
 357 | | 192.168.1.% | sgarland         |
 358 | | localhost   | mysql.infoschema |
 359 | | localhost   | mysql.session    |
 360 | | localhost   | mysql.sys        |
 361 | | localhost   | root             |
 362 | +-------------+------------------+
 363 | 7 rows in set (0.01 sec)
 364 | ```
 365 | 
 366 | Now, we'll mix single and double quotes.
 367 | 
 368 | ```sql
 369 | SELECT 'host', "user" FROM mysql.user;
 370 | ```
 371 | 
 372 | ```sql
 373 | +------+------+
 374 | | host | user |
 375 | +------+------+
 376 | | host | user |
 377 | | host | user |
 378 | | host | user |
 379 | | host | user |
 380 | | host | user |
 381 | | host | user |
 382 | | host | user |
 383 | +------+------+
 384 | 7 rows in set (0.00 sec)
 385 | ```
 386 | 
 387 | In MySQL's default mode, these two are treated the same, and you get the respective string literals printed as rows for the selected columns.
 388 | 
 389 | If single (or double) quotes are combined with backticks, you get partial results.
 390 | 
 391 | ```sql
 392 | SELECT 'host', `user` FROM mysql.user;
 393 | ```
 394 | 
 395 | ```sql
 396 | +------+------------------+
 397 | | host | user             |
 398 | +------+------------------+
 399 | | host | zapier           |
 400 | | host | zapier_training  |
 401 | | host | sgarland         |
 402 | | host | mysql.infoschema |
 403 | | host | mysql.session    |
 404 | | host | mysql.sys        |
 405 | | host | root             |
 406 | +------+------------------+
 407 | 7 rows in set (0.00 sec)
 408 | ```
 409 | 
 410 | Now, we'll modify the session's `sql_mode`. You don't have permission to set any global variables, but you can set most session variables. Unlike for the selection, if you don't specify `GLOBAL` or `SESSION`, the `SET` will always assume `SESSION`.
 411 | 
 412 | ```sql
 413 | SET @@sql_mode = ANSI_QUOTES;
 414 | ```
 415 | 
 416 | ```sql
 417 | Query OK, 0 rows affected (0.00 sec)
 418 | 
 419 | mysql> SELECT @@sql_mode\G
 420 | *************************** 1. row ***************************
 421 | @@sql_mode: ANSI_QUOTES
 422 | 1 row in set (0.00 sec)
 423 | ```
 424 | 
 425 | Oh no, we've overridden all of the other settings! Luckily, the global variable hasn't been modified, so we can use it to build the correct setting. To do so, we'll use the `CONCAT_WS` function, which as the name implies, concatenates things with a separator. It takes the form `CONCAT_WS(sep, <expressions>)`. We'll also run a `SELECT` of the global variable, nesting it as a sub-query.
 426 | 
 427 | ```sql
 428 | SET @@sql_mode = (SELECT CONCAT_WS(',', 'ANSI_QUOTES', (SELECT @@GLOBAL.sql_mode)));
 429 | ```
 430 | 
 431 | ```sql
 432 | Query OK, 0 rows affected (0.01 sec)
 433 | ```
 434 | 
 435 | ```
 436 | SELECT @@sql_mode\G
 437 | ```
 438 | 
 439 | ```sql
 440 | *************************** 1. row ***************************
 441 | @@sql_mode: ANSI_QUOTES,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
 442 | 1 row in set (0.00 sec)
 443 | ```
 444 | 
 445 | Whew. Now we can try out the quoting differences again.
 446 | 
 447 | ```sql
 448 | SELECT 'host', "user" FROM mysql.user;
 449 | ```
 450 | 
 451 | ```sql
 452 | +------+------------------+
 453 | | host | USER             |
 454 | +------+------------------+
 455 | | host | zapier           |
 456 | | host | zapier_training  |
 457 | | host | sgarland         |
 458 | | host | mysql.infoschema |
 459 | | host | mysql.session    |
 460 | | host | mysql.sys        |
 461 | | host | root             |
 462 | +------+------------------+
 463 | 7 rows in set (0.00 sec)
 464 | ```
 465 | 
 466 | This time, only single quotes are treated as string literals, with double quotes being treated as identifiers.
 467 | 
 468 | Now, set the `SESSION.sql_mode` back to its original value, using a sub-query like before.
 469 | 
 470 | ```sql
 471 | SET @@sql_mode = (SELECT @@GLOBAL.sql_mode);
 472 | ```
 473 | 
 474 | ```sql
 475 | Query OK, 0 rows affected (0.00 sec)
 476 | ```
 477 | 
 478 | ### Create a schema
 479 | 
 480 | Let's create some tables! First, we need a schema. There aren't a lot of options here to be covered, so we can just create one. I'll be using `foo`, but you should substitute any name you'd like that's not already in use. Ideally, we would also enable encryption at rest. This can be globally set, or specified at schema creation - any tables in the schema inherit its setting. If you're curious, InnoDB uses AES, with ECB mode for tablespaces, and CBC mode for data. Also notably, [undo logs](https://dev.mysql.com/doc/refman/8.0/en/innodb-undo-logs.html) and [redo logs](https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html) have their encryption handled by separate variables. However, since this requires some additional work (all of the easy options are only available with MySQL Enterprise; MySQL Community requires you to generate and store the key yourself), we'll skip it.
 481 | 
 482 | ```sql
 483 | CREATE SCHEMA foo;
 484 | ```
 485 | 
 486 | ```sql
 487 | Query OK, 1 row affected (0.02 sec)
 488 | ```
 489 | 
 490 | ## Table operations
 491 | 
 492 | ### Create tables
 493 | 
 494 | First, we'll select our new schema so we don't have to constantly specify it. I'll be using `foo` here, but you should substitute whatever you created in the last step.
 495 | 
 496 | ```sql
 497 | USE foo;
 498 | ```
 499 | 
 500 | Now, we'll create the `users` table.
 501 | 
 502 | ```sql
 503 | CREATE TABLE users (
 504 |   id BIGINT PRIMARY KEY,
 505 |   first_name CHAR(64),
 506 |   last_name CHAR(64),
 507 |   uid BIGINT
 508 | );
 509 | ```
 510 | 
 511 | ```sql
 512 | Query OK, 0 rows affected (0.17 sec)
 513 | ```
 514 | 
 515 | ```sql
 516 | SHOW COLUMNS FROM users;
 517 | ```
 518 | 
 519 | ```sql
 520 | +------------+----------+------+-----+---------+-------+
 521 | | Field      | Type     | Null | Key | Default | Extra |
 522 | +------------+----------+------+-----+---------+-------+
 523 | | id         | bigint   | NO   | PRI | NULL    |       |
 524 | | first_name | char(64) | YES  |     | NULL    |       |
 525 | | last_name  | char(64) | YES  |     | NULL    |       |
 526 | | uid        | bigint   | YES  |     | NULL    |       |
 527 | +------------+----------+------+-----+---------+-------+
 528 | 4 rows in set (0.02 sec)
 529 | ```
 530 | 
 531 | Hmm, something's not quite right as compared to the original example - we're missing `AUTO_INCREMENT`! Without it, you'd have to manually specify the `id` value (which is this table's `PRIMARY KEY`), which is annoying. Additionally, while `id` was automatically made to be `NOT NULL` since it's the primary key, `uid` was not, so we need to change those (if you don't specify `NOT NULL`, MySQL defaults to `NULL`). Finally, `uid` should actually be named `user_id`, and it should have a `UNIQUE` constraint.
 532 | 
 533 | NOTE: when redefining a column, it's like a `POST`, not a `PUT` - if you only specify what you want to be changed, the pre-existing definitions will be deleted.
 534 | 
 535 | ```sql
 536 | ALTER TABLE users MODIFY uid BIGINT NOT NULL UNIQUE;
 537 | ```
 538 | 
 539 | ```sql
 540 | Query OK, 0 rows affected (0.27 sec)
 541 | Records: 0  Duplicates: 0  Warnings: 0
 542 | ```
 543 | 
 544 | ```sql
 545 | ALTER TABLE users MODIFY id BIGINT AUTO_INCREMENT;
 546 | ```
 547 | 
 548 | ```sql
 549 | Query OK, 0 rows affected (0.34 sec)
 550 | Records: 0  Duplicates: 0  Warnings: 0
 551 | ```
 552 | 
 553 | ```sql
 554 | SHOW COLUMNS FROM users;
 555 | ```
 556 | 
 557 | ```sql
 558 | +------------+----------+------+-----+---------+----------------+
 559 | | Field      | Type     | Null | Key | Default | Extra          |
 560 | +------------+----------+------+-----+---------+----------------+
 561 | | id         | bigint   | NO   | PRI | NULL    | auto_increment |
 562 | | first_name | char(64) | YES  |     | NULL    |                |
 563 | | last_name  | char(64) | YES  |     | NULL    |                |
 564 | | uid        | bigint   | NO   | UNI | NULL    |                |
 565 | +------------+----------+------+-----+---------+----------------+
 566 | 4 rows in set (0.02 sec)
 567 | ```
 568 | 
 569 | If you wanted to rename a column without specifying its definition, you can use `RENAME COLUMN`.
 570 | 
 571 | ```sql
 572 | ALTER TABLE users RENAME COLUMN uid TO user_id;
 573 | ```
 574 | 
 575 | ```sql
 576 | Query OK, 0 rows affected (0.12 sec)
 577 | Records: 0  Duplicates: 0  Warnings: 0
 578 | ```
 579 | 
 580 | Now, we'll make the `zaps` table. You have noticed by now that the primary key column `id` has been the first column in all of these definitions. While nothing stops you from placing it last, or in the middle, this is a bad idea for a variety of reasons, not least of which it's confusing for anyone used to normal ordering. There may be some small binpacking gains to be made by carefully matching column widths to page sizes (the default pagesize for InnoDB is 16 KB, and the default pagesize for most disks today is 4 KB), which can also impact performance on spinning disks. Also, prior to MySQL 8.0.13, temporary tables (usually, tables that InnoDB creates as part of a query) would silently cast `VARCHAR` and `VARBINARY` columns to their respective `CHAR` or `BINARY`. If you had some `VARCHAR` columns with a large maximum size, this could cause the required space to store them to rapidly balloon, filling up the disk.
 581 | 
 582 | In general, column ordering in a table doesn't tremendously matter for MySQL (but it does for queries, as we'll see later), so stick to convention.
 583 | 
 584 | ```sql
 585 | CREATE TABLE zaps (
 586 |   `id` BIGINT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
 587 |   `zap_id` BIGINT UNSIGNED NOT NULL,
 588 |   `created_at` TIMESTAMP NOT NULL DEFAULT NOW(),
 589 |   `last_updated_at` TIMESTAMP NULL ON UPDATE NOW(),
 590 |   `owned_by` BIGINT UNSIGNED NOT NULL,
 591 |   UNIQUE(zap_id)
 592 | );
 593 | ```
 594 | 
 595 | ```sql
 596 | SHOW COLUMNS FROM zaps;
 597 | ```
 598 | 
 599 | ```sql
 600 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+
 601 | | Field           | Type            | Null | Key | Default           | Extra                       |
 602 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+
 603 | | id              | bigint unsigned | NO   | PRI | NULL              | auto_increment              |
 604 | | zap_id          | bigint unsigned | NO   | UNI | NULL              |                             |
 605 | | created_at      | timestamp       | NO   |     | CURRENT_TIMESTAMP | DEFAULT_GENERATED           |
 606 | | last_updated_at | timestamp       | YES  |     | NULL              | on update CURRENT_TIMESTAMP |
 607 | | owned_by        | bigint unsigned | NO   |     | NULL              |                             |
 608 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+
 609 | 5 rows in set (0.00 sec)
 610 | ```
 611 | 
 612 | We're introducing some new defaults here:
 613 | * DEFAULT NOW()
 614 |   * With this, much like an `AUTO INCREMENTING` column, the current timestamp will be added to the `created_at` column when a new row is created. NOTE: This doesn't make the column immutable, and nothing stops someone from altering this value manually later.
 615 | * ON UPDATE NOW()
 616 |   * For `last_updated_at`, while the default is `NULL`, whenever the row is updated, the current timestamp is added.
 617 | 
 618 | `NOW()` is an alias for `CURRENT_TIMESTAMP`, and no, I didn't forget the function call on the right. For historical reasons, `CURRENT_TIMESTAMP` may be called with or without parentheses, but `NOW()` requires them. Similarly, generally any default value being declared that isn't a literal (e.g. `0`, `NULL`, etc.) is required to be wrapped in parentheses - see `(JSON_ARRAY())`. Again, for historical reasons, `TIMESTAMP` and `DATETIME` columns don't require this. Also, `JSON` _requires_ its default value to be wrapped in parentheses, even if the default is a literal (as do `BLOB`, `GEOMETRY`, and `TEXT`). See [MySQL docs on defaults](https://dev.mysql.com/doc/refman/8.0/en/data-type-defaults.html) for more information on this behavior, and [MySQL docs on timestamp initialization](https://dev.mysql.com/doc/refman/8.0/en/timestamp-initialization.html) for more information on timestamp column defaults.
 619 | 
 620 | #### Data types
 621 | 
 622 | What is the difference between a `VARCHAR` and a `CHAR`, and what is the integer after it? `CHAR` allocates precisely the amount of space required. If you specify that a column is 64 bytes wide, then you can store 64 bytes in it, and no matter if you're storing 1 byte or 64 bytes, the actual column usage will take 64 bytes - this is because the value is right-padded with spaces, and the trailing spaces are them removed when retrieved (by default - the trimming behavior can be modified, if desired).
 623 | 
 624 | Let's try adding a 65-byte string to a column with a strict 64-byte limit - this can be done with the `LPAD` function, which takes the form `LPAD(<string>, <padding>, <padding_character>).`
 625 | 
 626 | ```sql
 627 | INSERT INTO users
 628 |   (first_name, last_name, user_id)
 629 | VALUES
 630 |   ("Stephan",
 631 |   (SELECT LPAD("Garland", 65, " ")),
 632 |   1
 633 | );
 634 | ```
 635 | 
 636 | ```sql
 637 | ERROR 1406 (22001): Data too long for column 'last_name' at row 1
 638 | ```
 639 | 
 640 | Since people in different cultures may have longer names than I'm used to, making this column allowed to be wider than 64 bytes is probably a good idea, especially if there isn't a storage penalty for doing so. While a `VARCHAR` can technically be up to `2^16 - 1` bytes - the same as the row width limit - it's still a good idea to have some kind of reasonable limits in place, lest someone exploit a security hole and starting using your DB for Chia mining or something. 255 bytes was the historic maximum length allowed in older SQL implementations, and it's the maximum value that a `VARCHAR` can be stored with while having a 1-byte length prefix. Thus, we'll modify our columns to this standard.
 641 | 
 642 | ```sql
 643 | ALTER TABLE users
 644 |   MODIFY first_name VARCHAR(255),
 645 |   MODIFY last_name VARCHAR(255);
 646 | ```
 647 | 
 648 | ```sql
 649 | Query OK, 0 rows affected (0.13 sec)
 650 | Records: 0  Duplicates: 0  Warnings: 0
 651 | ```
 652 | 
 653 | ```sql
 654 | SHOW COLUMNS FROM users;
 655 | ```
 656 | 
 657 | ```sql
 658 | +------------+--------------+------+-----+---------+----------------+
 659 | | Field      | Type         | Null | Key | Default | Extra          |
 660 | +------------+--------------+------+-----+---------+----------------+
 661 | | id         | bigint       | NO   | PRI | NULL    | auto_increment |
 662 | | first_name | varchar(255) | YES  |     | NULL    |                |
 663 | | last_name  | varchar(255) | YES  |     | NULL    |                |
 664 | | user_id    | bigint       | NO   | UNI | NULL    |                |
 665 | +------------+--------------+------+-----+---------+----------------+
 666 | 4 rows in set (0.01 sec)
 667 | ```
 668 | 
 669 | What about ints? You may sometimes see an integer following an integer-type column definition, like `int(4)`. Confusingly, this has nothing to do with the maximum amount of data that can be stored in that column, and is only used for display. Even more confusingly, the MySQL client itself will ignore it, and show the entire stored number. Applications can choose whether or not to use the display width. In general, there's little reason to use this feature, and if you want to constrain display width, do so in your application.
 670 | 
 671 | For floating points, MySQL supports `FLOAT` and `DOUBLE`, with the former being 4 bytes, and the latter 8 bytes.
 672 | 
 673 | For exact precision numbers, MySQL supports `DECIMAL` and `NUMERIC`, and they are identical.
 674 | 
 675 | There are also sub-types of `INT`, such as `SMALLINT` (2 bytes, storing a maximum value of `2^16 - 1` if unsigned), and `BIGINT`, as seen previously - it's 8 bytes, and stores a maximum value of `2^63 - 1` if signed, and `2^64 - 1` if unsigned. Since there's not much reason to have negative IDs, let's alter those definitions as well:
 676 | 
 677 | ```sql
 678 | ALTER TABLE users
 679 |   MODIFY id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
 680 |   MODIFY user_id BIGINT UNSIGNED NOT NULL UNIQUE;
 681 | ```
 682 | 
 683 | ```sql
 684 | Query OK, 0 rows affected, 1 warning (0.10 sec)
 685 | Records: 0  Duplicates: 0  Warnings: 1
 686 | ```
 687 | 
 688 | A warning? Huh?
 689 | 
 690 | <details>
 691 |   <summary>I don't see any warnings!</summary>
 692 | 
 693 |   Your client may not display warnings, in which case you can just follow along in this document.
 694 | </details>
 695 | 
 696 | ```sql
 697 | SHOW WARNINGS\G
 698 | ```
 699 | 
 700 | ```sql
 701 | *************************** 1. row ***************************
 702 |   Level: Warning
 703 |    Code: 1831
 704 | Message: Duplicate index 'user_id' defined on the table 'test.users'. This is deprecated and will be disallowed in a future release.
 705 | 1 row in set (0.00 sec)
 706 | ```
 707 | 
 708 | Let's look at the table definition.
 709 | 
 710 | ```sql
 711 | SHOW CREATE TABLE users\G
 712 | ```
 713 | 
 714 | ```sql
 715 | *************************** 1. row ***************************
 716 |        Table: users
 717 | Create Table: CREATE TABLE `users` (
 718 |   `id` bigint unsigned NOT NULL AUTO_INCREMENT,
 719 |   `first_name` varchar(255) DEFAULT NULL,
 720 |   `last_name` varchar(255) DEFAULT NULL,
 721 |   `user_id` bigint unsigned NOT NULL,
 722 |   PRIMARY KEY (`id`),
 723 |   UNIQUE KEY `uid` (`user_id`),
 724 |   UNIQUE KEY `user_id` (`user_id`)
 725 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
 726 | 1 row in set (0.01 sec)
 727 | ```
 728 | 
 729 | <details>
 730 |   <summary>What is SHOW CREATE TABLE?</summary>
 731 | 
 732 |   `SHOW CREATE TABLE` is a command that lets you view the query that would be used to create the table in its current state. It's safe to do, and is a good way to view columns, their types, indexes, foreign keys, etc. for a given table.
 733 |   </summary>
 734 | </details>
 735 | 
 736 | Ah - constraints like `UNIQUE` don't have to be redefined along with the rest of the column definition, and in doing so, we've duplicated a constraint. While allowed for now, it's not a good practice, so we'll get rid of it.
 737 | 
 738 | ```sql
 739 | ALTER TABLE users DROP CONSTRAINT uid;
 740 | ```
 741 | 
 742 | ```sql
 743 | Query OK, 0 rows affected (0.16 sec)
 744 | Records: 0  Duplicates: 0  Warnings: 0
 745 | ```
 746 | 
 747 | ```sql
 748 | SHOW COLUMNS FROM users;
 749 | ```
 750 | 
 751 | ```sql
 752 | +------------+-----------------+------+-----+---------+----------------+
 753 | | Field      | Type            | Null | Key | Default | Extra          |
 754 | +------------+-----------------+------+-----+---------+----------------+
 755 | | id         | bigint unsigned | NO   | PRI | NULL    | auto_increment |
 756 | | first_name | varchar(255)    | YES  |     | NULL    |                |
 757 | | last_name  | varchar(255)    | YES  |     | NULL    |                |
 758 | | user_id    | bigint unsigned | NO   | UNI | NULL    |                |
 759 | +------------+-----------------+------+-----+---------+----------------+
 760 | 4 rows in set (0.01 sec)
 761 | ```
 762 | 
 763 | ### Foreign keys
 764 | 
 765 | These tables seem fine to start with, but the columns that we are implicitly designing to have relationships don't have any method of enforcement. While this is a valid design - placing all referential integrity requirements onto the application - SQL was designed to handle this for us, so let's make use of it. NOTE: foreign keys bring with them a huge array of problems that will likely not be seen until your scale is large, so keep that in mind, and have a plan to migrate off of them if necessary.
 766 | 
 767 | #### Why you might want foreign keys
 768 | 
 769 | Let's create a user, and give them a Zap.
 770 | 
 771 | ```sql
 772 | INSERT INTO users
 773 |   (first_name, last_name, user_id)
 774 | VALUES
 775 |   ('Stephan', 'Garland', 1);
 776 | ```
 777 | 
 778 | ```sql
 779 | Query OK, 1 row affected (0.02 sec)
 780 | ```
 781 | 
 782 | ```sql
 783 | INSERT INTO zaps (zap_id, owned_by) VALUES (1, 1);
 784 | ```
 785 | 
 786 | ```sql
 787 | Query OK, 1 row affected (0.03 sec)
 788 | ```
 789 | 
 790 | ```sql
 791 | TABLE zaps;
 792 | ```
 793 | 
 794 | <details>
 795 |   <summary>What is `TABLE`?</summary>
 796 | 
 797 |   Syntactic sugar (a shortcut) for `SELECT * FROM <table>`.
 798 | 
 799 | ```sql
 800 | +----+--------+---------------------+-----------------+----------+
 801 | | id | zap_id | created_at          | last_updated_at | owned_by |
 802 | +----+--------+---------------------+-----------------+----------+
 803 | |  1 |      1 | 2023-02-27 10:25:01 | NULL            |        1 |
 804 | +----+--------+---------------------+-----------------+----------+
 805 | 1 row in set (0.00 sec)
 806 | ```
 807 | 
 808 | </details>
 809 | 
 810 | We can `JOIN` on this if we want.
 811 | 
 812 | ```sql
 813 | SELECT *
 814 | FROM
 815 |   users
 816 | JOIN zaps ON
 817 |   users.user_id = zaps.owned_by\G
 818 | ```
 819 | 
 820 | ```sql
 821 | *************************** 1. row ***************************
 822 |              id: 1
 823 |      first_name: Stephan
 824 |       last_name: Garland
 825 |         user_id: 1
 826 |           email: NULL
 827 |              id: 1
 828 |          zap_id: 1
 829 |      created_at: 2023-02-27 10:25:01
 830 | last_updated_at: NULL
 831 |        owned_by: 1
 832 | 1 row in set (0.01 sec)
 833 | ```
 834 | 
 835 | That's all well and good, but what if I want to delete my account? Wouldn't it be nice if devs didn't have to worry about deleting every trace of my existence? Or what if everyone's user ID has to change for a migration? Enter foreign keys.
 836 | 
 837 | #### Creating a foreign key
 838 | 
 839 | ```sql
 840 | ALTER TABLE
 841 |   zaps
 842 | ADD FOREIGN KEY
 843 |   (owned_by)
 844 | REFERENCES users
 845 |   (user_id)
 846 | ON UPDATE CASCADE
 847 | ON DELETE CASCADE;
 848 | ```
 849 | 
 850 | ```sql
 851 | Query OK, 1 row affected (0.50 sec)
 852 | Records: 1  Duplicates: 0  Warnings: 0
 853 | ```
 854 | 
 855 | ```sql
 856 | SHOW CREATE TABLE zaps\G
 857 | ```
 858 | 
 859 | ```sql
 860 | *************************** 1. row ***************************
 861 |        Table: zaps
 862 | Create Table: CREATE TABLE `zaps` (
 863 |   `id` bigint unsigned NOT NULL AUTO_INCREMENT,
 864 |   `zap_id` bigint unsigned NOT NULL,
 865 |   `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 866 |   `last_updated_at` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
 867 |   `owned_by` bigint unsigned NOT NULL,
 868 |   PRIMARY KEY (`id`),
 869 |   UNIQUE KEY `zap_id` (`zap_id`),
 870 |   KEY `owned_by` (`owned_by`),
 871 |   CONSTRAINT `zaps_ibfk_1` FOREIGN KEY (`owned_by`) REFERENCES `users` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
 872 | ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
 873 | 1 row in set (0.00 sec)
 874 | ```
 875 | 
 876 | Note that not only do we now have a `FOREIGN KEY` linking `zaps.owned_by` to `users.user_id`, but InnoDB has added an index on `zaps.owned_by` - this is required, and despite the documentation informing you that you must do this before adding the foreign key, it actually does it for you if you don't.
 877 | 
 878 | #### Demonstrating a foreign key
 879 | 
 880 | ```sql
 881 | UPDATE users SET user_id = 9 WHERE id = 1;
 882 | ```
 883 | 
 884 | Note the `WHERE` predicate - we'll go more into that later, but the most important thing to take away here is that there are very few instances where you should issue DML like `UPDATE`without a `WHERE`.
 885 | 
 886 | <details>
 887 |   <summary> Why not?</summary>
 888 | 
 889 |   If there was no predicate, the query would apply to everything in the table, e.g. every user would be modified.
 890 | 
 891 | </details>
 892 | 
 893 | ```sql
 894 | Query OK, 1 row affected (0.02 sec)
 895 | Rows matched: 1  Changed: 1  Warnings: 0
 896 | ```
 897 | 
 898 | ```sql
 899 | SELECT *
 900 | FROM
 901 |   users
 902 | JOIN zaps ON
 903 |   users.user_id = zaps.owned_by\G
 904 | ```
 905 | 
 906 | ```sql
 907 | *************************** 1. row ***************************
 908 |              id: 1
 909 |      first_name: Stephan
 910 |       last_name: Garland
 911 |         user_id: 9
 912 |           email: NULL
 913 |              id: 1
 914 |          zap_id: 1
 915 |      created_at: 2023-02-27 10:25:01
 916 | last_updated_at: NULL
 917 |        owned_by: 9
 918 | 1 row in set (0.01 sec)
 919 | ```
 920 | 
 921 | And just like that, `zaps` has updated its `owned_by` value for that Zap to equal the new value in `users`. And if we delete the `users` entry, the same `CASCADE` action will follow.
 922 | 
 923 | ```sql
 924 | DELETE FROM users WHERE id = 1;
 925 | ```
 926 | 
 927 | ```sql
 928 | Query OK, 1 row affected (0.02 sec)
 929 | ```
 930 | 
 931 | ```sql
 932 | SELECT * FROM zaps;
 933 | ```
 934 | 
 935 | ```sql
 936 | Empty set (0.00 sec)
 937 | ```
 938 | 
 939 | ### Determining table size
 940 | 
 941 | In order to find out how many rows are in a table, there are a few ways of doing so. InnoDB maintains information about tables in the `INFORMATION_SCHEMA.TABLES` table, including an estimate of row count. However, it's just that - an estimate. It can be made to be accurate if you use `ANALYZE TABLE`, but in production, you shouldn't do this (to be clear, it should be done, but carefully), since it places a table-wide read lock during the process. You can also use the query `SELECT COUNT(*)`, but that will perform a table scan (where the entire table is read sequentially, without indices), so it may have a performance impact on the database, as it's consuming a lot of available IOPS. Finally, assuming you have an auto-incrementing `id` field in the table, you can use `SELECT id FROM <table> ORDER BY id DESC LIMIT 1` to get the last incremented value. This is also an estimate, since it doesn't take any deletions into account (auto-increment is monotonic), but it's extremely fast.
 942 | 
 943 | ```sql
 944 | SELECT table_name, table_rows
 945 | FROM
 946 |   information_schema.tables
 947 | WHERE
 948 |   table_schema = 'test';
 949 | ```
 950 | 
 951 | ```sql
 952 | +---------------+------------+
 953 | | TABLE_NAME    | TABLE_ROWS |
 954 | +---------------+------------+
 955 | | gensql        |       1000 |
 956 | | ref_users     |       1000 |
 957 | | ref_users_big |     992839 |
 958 | | ref_zaps      |          0 |
 959 | | ref_zaps_big  |          0 |
 960 | | users         |       1000 |
 961 | | zaps          |          0 |
 962 | +---------------+------------+
 963 | 7 rows in set (0.01 sec)
 964 | ```
 965 | 
 966 | ```sql
 967 | ANALYZE TABLE ref_zaps; ANALYZE TABLE ref_zaps_big;
 968 | ```
 969 | 
 970 | ```sql
 971 | +---------------+---------+----------+----------+
 972 | | Table         | Op      | Msg_type | Msg_text |
 973 | +---------------+---------+----------+----------+
 974 | | test.ref_zaps | analyze | status   | OK       |
 975 | +---------------+---------+----------+----------+
 976 | 1 row in set (0.03 sec)
 977 | 
 978 | +-------------------+---------+----------+----------+
 979 | | Table             | Op      | Msg_type | Msg_text |
 980 | +-------------------+---------+----------+----------+
 981 | | test.ref_zaps_big | analyze | status   | OK       |
 982 | +-------------------+---------+----------+----------+
 983 | 1 row in set (0.05 sec)
 984 | ```
 985 | 
 986 | ```sql
 987 | SELECT table_name, table_rows
 988 | FROM
 989 |   information_schema.tables
 990 | WHERE table_schema = 'test';
 991 | ```
 992 | 
 993 | ```sql
 994 | +---------------+------------+
 995 | | TABLE_NAME    | TABLE_ROWS |
 996 | +---------------+------------+
 997 | | gensql        |       1000 |
 998 | | ref_users     |       1000 |
 999 | | ref_users_big |     992839 |
1000 | | ref_zaps      |       1000 |
1001 | | ref_zaps_big  |     997211 |
1002 | | users         |       1000 |
1003 | | zaps          |          0 |
1004 | +---------------+------------+
1005 | 7 rows in set (0.02 sec)
1006 | ```
1007 | 
1008 | Actual row count:
1009 | 
1010 | ```sql
1011 | SELECT
1012 |   'ref_users_big' AS 'table_name',
1013 |   COUNT(*) AS 'row_count'
1014 | FROM
1015 |   ref_users_big
1016 | UNION
1017 | SELECT
1018 |   'ref_zaps_big',
1019 |   COUNT(*)
1020 | FROM
1021 |   ref_zaps_big;
1022 | ```
1023 | 
1024 | ```sql
1025 | +---------------+-----------+
1026 | | table_name    | row_count |
1027 | +---------------+-----------+
1028 | | ref_users_big |   1000000 |
1029 | | ref_zaps_big  |   1000000 |
1030 | +---------------+-----------+
1031 | 2 rows in set (2.42 sec)
1032 | ```
1033 | 
1034 | <details>
1035 |   <summary>What's a UNION?</summary>
1036 | 
1037 |   A way to combine query results, regardless of any relation between tables or queries.
1038 | </details>
1039 | 
1040 | ## Column operations
1041 | 
1042 | ### Adding columns
1043 | 
1044 | Adding columns is done with `ALTER TABLE`:
1045 | 
1046 | ```sql
1047 | ALTER TABLE
1048 |   zaps
1049 | ADD COLUMN
1050 |   shared_with
1051 | JSON;
1052 | ```
1053 | 
1054 | ```sql
1055 | Query OK, 0 rows affected (0.18 sec)
1056 | Records: 0  Duplicates: 0  Warnings: 0
1057 | ```
1058 | 
1059 | Just as with a table definition, the column's name (`shared_with`) and type (`JSON`) are required; additonal qualifiers like `DEFAULT`, `UNIQUE`, etc. may be appended. To add some types of default values, like a JSON array, you must call the function.
1060 | 
1061 |   * [MySQL supports JSON](https://dev.mysql.com/doc/refman/8.0/en/json.html) as a data type! While you can of course simply store JSON strings in a text column, there are some benefits to using the native JSON datatype; among them that you can index scalars from the JSON objects, and that you can extract specific keys/values from the objects instead of the entire string.
1062 |   * Please don't use this as an excuse to treat MySQL as a Document DB, though. If you want NoSQL, you should use NoSQL. RDBMS are optimized for relations. Storing some information in JSON is fine, but it shouldn't be the default.
1063 | 
1064 | ### Modfying columns
1065 | 
1066 | This was covered earlier during [table operations](#table-operations), but as a refresher, we'll again use `ALTER TABLE` to add a `DEFAULT` value of an empty JSON array, which must be called as its function:
1067 | 
1068 | ```sql
1069 | ALTER TABLE
1070 |   zaps
1071 | MODIFY COLUMN
1072 | shared_with
1073 |   JSON
1074 |   DEFAULT (
1075 |     JSON_ARRAY()
1076 |   );
1077 | ```
1078 | 
1079 | ```sql
1080 | Query OK, 0 rows affected (0.09 sec)
1081 | Records: 0  Duplicates: 0  Warnings: 0
1082 | ```
1083 | 
1084 | ### Dropping tables with foreign keys
1085 | 
1086 | If there are foreign keys relying on the column you're trying to drop, you will first need to either disable foreign key checks, or remove those checks before you can drop the column.
1087 | 
1088 | ```sql
1089 | DROP TABLE users;
1090 | ```
1091 | 
1092 | ```sql
1093 | ERROR 3730 (HY000): Cannot drop table 'users' referenced by a foreign key constraint 'zaps_ibfk_1' on table 'zaps'.
1094 | ```
1095 | 
1096 | ```sql
1097 | SET foreign_key_checks = 0;
1098 | ```
1099 | 
1100 | ```sql
1101 | Query OK, 0 rows affected (0.01 sec)
1102 | ```
1103 | 
1104 | ```sql
1105 | DROP TABLE users;
1106 | Query OK, 0 rows affected (0.30 sec)
1107 | ```
1108 | 
1109 | ```sql
1110 | SHOW CREATE TABLE zaps\G
1111 | ```
1112 | 
1113 | ```sql
1114 | *************************** 1. row ***************************
1115 |        Table: zaps
1116 | Create Table: CREATE TABLE `zaps` (
1117 |   `id` bigint unsigned NOT NULL AUTO_INCREMENT,
1118 |   `zap_id` bigint unsigned NOT NULL,
1119 |   `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
1120 |   `last_updated_at` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
1121 |   `owned_by` bigint unsigned NOT NULL,
1122 |   `shared_with` json DEFAULT (json_array()),
1123 |   PRIMARY KEY (`id`),
1124 |   UNIQUE KEY `zap_id` (`zap_id`),
1125 |   KEY `owned_by` (`owned_by`),
1126 |   CONSTRAINT `zaps_ibfk_1` FOREIGN KEY (`owned_by`) REFERENCES `users` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
1127 | ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1128 | 1 row in set (0.00 sec)
1129 | ```
1130 | 
1131 | Just because MySQL let us drop the table, it doesn't mean it cleaned up after us.
1132 | <details>
1133 |   <summary>How can we remove the FK?</summary>
1134 | 
1135 |   ```sql
1136 |   ALTER TABLE zaps DROP CONSTRAINT `zaps_ibfk_1`;
1137 |   ```
1138 | 
1139 |   ```sql
1140 |   Query OK, 0 rows affected (0.20 sec)
1141 |   Records: 0  Duplicates: 0  Warnings: 0
1142 |   ```
1143 | </details>
1144 | 
1145 | Also, don't forget to re-enable `foreign_key_checks` for your session.
1146 | 
1147 | ```sql
1148 | SET foreign_key_checks = 1;
1149 | ```
1150 | 
1151 | ```sql
1152 | Query OK, 0 rows affected (0.00 sec)
1153 | ```
1154 | 
1155 | But wait, how are we going to get back the `users` table? We could scroll back up and find the definition, but wouldn't it be nice if we could copy the definition from somewhere else?
1156 | 
1157 | ### Copied table definitions
1158 | 
1159 | Luckily, this exists in the form of `CREATE TABLE LIKE`. [MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/create-table-like.html). You do need `SELECT` privileges from the schema/table you're copying from, which is enabled for `test.ref_%` with this user. You'll also need to specify the schema the table exists in, since it's outside of the currently selected schema.
1160 | 
1161 | NOTE: This schema is somewhat different from what we created before; most of it is additional, but one big change is that there is no longer an explicit `id` column, instead, the `user_id` column takes its place.
1162 | 
1163 | ```sql
1164 | CREATE TABLE users LIKE test.ref_users;
1165 | ```
1166 | 
1167 | ```sql
1168 | Query OK, 0 rows affected (0.34 sec)
1169 | ```
1170 | 
1171 | There are some restrictions. The documentation lists all of them, but the biggest one is that any foreign keys aren't copied. We deleted ours so it doesn't really matter, but this could catch you by surprise if you expected them to come over with the schema definition. Also, depending on the version of MySQL you're using, a bug may exist where tables copied in this manner will logically reside (that is, within a given tablespace file) in the original table's tablespace. A way around this is with this alternative query:
1172 | 
1173 | ```sql
1174 | CREATE TABLE users SELECT * FROM test.ref_users LIMIT 0;
1175 | ```
1176 | 
1177 | **Warning**
1178 | 
1179 | The second form shown has a [large list of things](https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html) it does not do:
1180 | 
1181 | - Copy any indexes, including primary keys
1182 | - Maintain the `AUTO_INCREMENT` attribute
1183 | - Maintain data types - `VARCHAR` may become `CHAR`
1184 | - Maintain default values for columns that are expressions
1185 | 
1186 | Finally, note that both of these _only_ copy the schema definition, not the data. The table you're copying from actually has thousands of rows in it, but none of those will be in your table.
1187 | 
1188 | <details>
1189 |   <summary>What if you wanted to copy data as well?</summary>
1190 | 
1191 |   The above alternative query hopefully hinted at it! Just take heed of the warning.
1192 | 
1193 |   ```sql
1194 |   DROP TABLE users; CREATE TABLE users SELECT * FROM test.ref_users LIMIT 1000;
1195 |   ```
1196 | 
1197 |   ```sql
1198 |   Query OK, 0 rows affected (0.30 sec)
1199 | 
1200 |   Query OK, 1000 rows affected (1.14 sec)
1201 |   Records: 1000  Duplicates: 0  Warnings: 0
1202 |   ```
1203 | </details>
1204 | 
1205 | #### Copied table data and truncating
1206 | 
1207 | Now that we have `users` back, let's actually fill it with more than just 1000 rows. `test.ref_users_big` has 1,000,000 rows. That would take a while to fill for everyone (my poor spinning disks), but 10,000 is reasonable.
1208 | 
1209 | First, let's dump the existing values, but leave the table definition. While there are a few ways to do this, the fastest is `TRUNCATE` ([MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/truncate-table.html)). This is a `DDL` operation vs. `DML`, as instead of iterating through the table and deleting each row, it stores the table definition, drops the table, then re-creates it. This does have several limitations, especially with foreign keys, but it works fine here.
1210 | 
1211 | ```sql
1212 | TRUNCATE TABLE users;
1213 | ```
1214 | 
1215 | ```sql
1216 | Query OK, 0 rows affected (0.42 sec)
1217 | ```
1218 | 
1219 | `0 rows affected` may be confusing, as we in fact just affected 1000 rows, but remember that this is the same as a `DROP TABLE`, which similarly doesn't report on the number of rows removed.
1220 | 
1221 | Now, we can copy into the table; but first, we're going to `DROP` the table and create it properly with `CREATE LIKE` so we don't have any issues with missing primary keys.
1222 | 
1223 | ```sql
1224 | DROP TABLE users;
1225 | CREATE TABLE users LIKE test.ref_users;
1226 | INSERT INTO users SELECT * FROM test.ref_users_big LIMIT 10000;
1227 | ```
1228 | 
1229 | ```sql
1230 | Query OK, 10000 rows affected (5.33 sec)
1231 | Records: 10000  Duplicates: 0  Warnings: 0
1232 | ```
1233 | 
1234 | ### Transactions
1235 | 
1236 | Remember the discussion about doing `DML` without a predicate? There's a fix for that.
1237 | 
1238 | ```sql
1239 | START TRANSACTION;
1240 | ```
1241 | 
1242 | ```sql
1243 | Query OK, 0 rows affected (0.00 sec)
1244 | ```
1245 | 
1246 | ```sql
1247 | UPDATE users SET city = "Asheville";
1248 | ```
1249 | 
1250 | ```sql
1251 | Query OK, 9999 rows affected (6.96 sec)
1252 | Rows matched: 10000  Changed: 9999  Warnings: 0
1253 | ```
1254 | 
1255 | Uh-oh. Looks like everyone has moved to Western North Carolina.
1256 | 
1257 | ```sql
1258 | ROLLBACK;
1259 | ```
1260 | 
1261 | ```sql
1262 | Query OK, 0 rows affected (5.45 sec)
1263 | ```
1264 | 
1265 | Whew, not fired.
1266 | 
1267 | NOTE: Canceling a query (`Ctrl-C`), _regardless of whether or not you're in a transaction_, has the same effect, assuming the InnoDB storage engine is being used. This is the `A` in `ACID` at work - either the entire query succeeds, or none of it does. However, the rollback may take some time depending on how many rows have been affected. Also, if you don't manage to cancel the query before it completes, you're out of luck.
1268 | 
1269 | ### Generated columns
1270 | 
1271 | What if you wanted a column that automatically created data for you based on other columns?
1272 | 
1273 | ```sql
1274 | ALTER TABLE
1275 |   users
1276 | ADD COLUMN
1277 |   full_name VARCHAR(510) GENERATED ALWAYS AS (
1278 |     CONCAT_WS(', ', last_name, first_name)
1279 |   );
1280 | ```
1281 | 
1282 | ```sql
1283 | Query OK, 0 rows affected (0.34 sec)
1284 | Records: 0  Duplicates: 0  Warnings: 0
1285 | ```
1286 | 
1287 | ```sql
1288 | SELECT user_id, full_name, city, country
1289 |   FROM users
1290 | LIMIT 10;
1291 | ```
1292 | 
1293 | ```sql
1294 | +---------+-------------------+-------------+----------------+
1295 | | user_id | full_name         | city        | country        |
1296 | +---------+-------------------+-------------+----------------+
1297 | |       1 | MacPherson, Addie | Latina      | Italy          |
1298 | |       2 | Airla, Valaree    | Pribram     | Czech Republic |
1299 | |       3 | Nett, Sheppard    | Hamada      | Japan          |
1300 | |       4 | Kirschner, Robby  | Bikaner     | India          |
1301 | |       5 | Bilski, Lewiss    | Vörderås    | Sweden         |
1302 | |       6 | Yamauchi, Marleah | Rotterdam   | Netherlands    |
1303 | |       7 | Calore, Ania      | Miyakojima  | Japan          |
1304 | |       8 | Breger, Gratiana  | Valkeakoski | Finland        |
1305 | |       9 | Serafina, Janith  | Morant Bay  | Jamaica        |
1306 | |      10 | Beckman, Pavla    | Wackersdorf | Germany        |
1307 | +---------+-------------------+-------------+----------------+
1308 | 10 rows in set (0.01 sec)
1309 | ```
1310 | 
1311 | Note that by default, this will create a `VIRTUAL` column (you can specify `STORED` after `AS` if you'd rather have a normal column), which is not actually stored, but instead calculated at query time. While this takes no storage space, it does add some amount of computational load, and more importantly comes with a [huge list](https://dev.mysql.com/doc/refman/8.0/en/create-table-generated-columns.html) of limitations. One large benefit, however, is that since the columns aren't actually created when the query is ran, the operation takes as long as a normal `ALTER TABLE` operation. If stored, the data must be written to the table, which will necessitate taking write locks. Also, since the column isn't actually being written anywhere, you can actually place the columns in any table position (by default, adding a column just appends to the end of the table) while still using the `INSTANT` algorithm, despite what the docs imply.
1312 | 
1313 | The creation and deletion time in particular is markedly better when compared to `STORED` columns:
1314 | 
1315 | ```sql
1316 | ALTER TABLE
1317 |   users
1318 | ADD COLUMN
1319 |   full_name VARCHAR(510) GENERATED ALWAYS AS (
1320 |     CONCAT_WS(', ', last_name, first_name)
1321 |   ) STORED;
1322 | ```
1323 | 
1324 | ```sql
1325 | Query OK, 10000 rows affected (7.23 sec)
1326 | Records: 10000  Duplicates: 0  Warnings: 0
1327 | ```
1328 | 
1329 | ```sql
1330 | ALTER TABLE
1331 |   users
1332 | DROP COLUMN full_name;
1333 | ```
1334 | 
1335 | ```sql
1336 | Query OK, 0 rows affected (2.24 sec)
1337 | Records: 0  Duplicates: 0  Warnings: 0
1338 | ```
1339 | 
1340 | Demonstrating column positioning:
1341 | 
1342 | ```sql
1343 | ALTER TABLE
1344 |   users
1345 | ADD COLUMN
1346 |   full_name VARCHAR(510) GENERATED ALWAYS AS (
1347 |     CONCAT_WS(', ', last_name, first_name)
1348 |   )
1349 | AFTER
1350 |   last_name;
1351 | ```
1352 | 
1353 | ```sql
1354 | Query OK, 0 rows affected (0.27 sec)
1355 | Records: 0  Duplicates: 0  Warnings: 0
1356 | ```
1357 | 
1358 | ```sql
1359 | SELECT * FROM users LIMIT 1\G
1360 | ```
1361 | 
1362 | ```sql
1363 | *************************** 1. row ***************************
1364 |         user_id: 1
1365 |      first_name: Addie
1366 |       last_name: MacPherson
1367 |       full_name: MacPherson, Addie
1368 |           email: addie.macpherson@lizard.com
1369 |            city: Latina
1370 |         country: Italy
1371 |      created_at: 2001-05-27 19:47:17
1372 | last_updated_at: NULL
1373 | 1 row in set (0.01 sec)
1374 | ```
1375 | 
1376 | ### Invisible columns
1377 | 
1378 | You can make columns `INVISIBLE` if you'd rather they not show up unless specifically queried for. This is done with the `INVISIBLE` keyword after the type (`VARCHAR(510)` here) if being created, or modified later with `ALTER COLUMN`:
1379 | 
1380 | ```sql
1381 | ALTER TABLE users ALTER COLUMN full_name SET INVISIBLE;
1382 | ```
1383 | 
1384 | ```sql
1385 | Query OK, 0 rows affected (0.19 sec)
1386 | Records: 0  Duplicates: 0  Warnings: 0
1387 | ```
1388 | 
1389 | ```sql
1390 | SELECT * FROM users LIMIT 1\G
1391 | ```
1392 | 
1393 | ```sql
1394 | *************************** 1. row ***************************
1395 |         user_id: 1
1396 |      first_name: Addie
1397 |       last_name: MacPherson
1398 |           email: addie.macpherson@lizard.com
1399 |            city: Latina
1400 |         country: Italy
1401 |      created_at: 2001-05-27 19:47:17
1402 | last_updated_at: NULL
1403 | 1 row in set (0.00 sec)
1404 | Query OK, 0 rows affected (0.19 sec)
1405 | Records: 0  Duplicates: 0  Warnings: 0
1406 | 1 row in set (0.00 sec)
1407 | ```
1408 | 
1409 | To set them back to visible, use `SET VISIBLE`:
1410 | 
1411 | ```sql
1412 | ALTER TABLE users ALTER COLUMN full_name SET VISIBLE;
1413 | ```
1414 | 
1415 | ```sql
1416 | Query OK, 0 rows affected (0.08 sec)
1417 | Records: 0  Duplicates: 0  Warnings: 0
1418 | ```
1419 | 
1420 | 


--------------------------------------------------------------------------------
/mysql/mysql-101-1.md:
--------------------------------------------------------------------------------
   1 | # MySQL 101 Part II
   2 | 
   3 | - [MySQL 101 Part II](#mysql-101-part-ii)
   4 |   - [Queries](#queries)
   5 |     - [Predicates](#predicates)
   6 |       - [WHERE](#where)
   7 |     - [SELECT](#select)
   8 |       - [Working with JSON](#working-with-json)
   9 |         - [Finding non-null arrays](#finding-non-null-arrays)
  10 |         - [Checking for a value inside an array](#checking-for-a-value-inside-an-array)
  11 |         - [Extracting scalars from an object](#extracting-scalars-from-an-object)
  12 |     - [INSERT](#insert)
  13 |     - [TABLE](#table)
  14 |   - [Joins](#joins)
  15 |     - [Relational alegbra](#relational-alegbra)
  16 |     - [Types of joins](#types-of-joins)
  17 |       - [Cross](#cross)
  18 |       - [Inner Join](#inner-join)
  19 |       - [Left Outer Join](#left-outer-join)
  20 |       - [Right Outer Join](#right-outer-join)
  21 |       - [Full Outer Join](#full-outer-join)
  22 |     - [Specifying a column's table](#specifying-a-columns-table)
  23 |     - [Indices](#indices)
  24 |       - [Single indices](#single-indices)
  25 |       - [Partial indices](#partial-indices)
  26 |       - [Functional indices](#functional-indices)
  27 |       - [JSON / Longtext](#json--longtext)
  28 |       - [Composite indices](#composite-indices)
  29 |       - [Testing indices](#testing-indices)
  30 |       - [Descending indices](#descending-indices)
  31 |       - [When indicies aren't helpful](#when-indicies-arent-helpful)
  32 |       - [HAVING](#having)
  33 |   - [Query optimization](#query-optimization)
  34 |     - [SELECT \*](#select-)
  35 |     - [OFFSET / LIMIT](#offset--limit)
  36 |     - [DISTINCT](#distinct)
  37 |   - [Cleanup](#cleanup)
  38 | 
  39 | ## Queries
  40 | 
  41 | ### Predicates
  42 | 
  43 | A predicate is a function which asserts that something is true or false. You can think of it like a filter.
  44 | 
  45 | #### WHERE
  46 | 
  47 | `WHERE` is the easiest to understand and apply, and will cover most of your needs.
  48 | 
  49 | ```sql
  50 | SELECT
  51 |   user_id, first_name, last_name
  52 | FROM
  53 |   users
  54 | WHERE
  55 |   country = 'Zimbabwe';
  56 | ```
  57 | 
  58 | ```sql
  59 | +---------+------------+-----------+
  60 | | user_id | first_name | last_name |
  61 | +---------+------------+-----------+
  62 | |     106 | Ivonne     | Barmen    |
  63 | |    1149 | Myca       | Flieger   |
  64 | |    2143 | Dallas     | Nimesh    |
  65 | |    4401 | Jeana      | Naga      |
  66 | |    4623 | Godiva     | Adal      |
  67 | |    5582 | Lexie      | Fenwick   |
  68 | |    5586 | Carrie     | Nich      |
  69 | |    5793 | Marten     | Casady    |
  70 | |    6072 | Feliza     | Culhert   |
  71 | |    6467 | Wood       | O'Connor  |
  72 | |    7093 | Miriam     | Galliett  |
  73 | |    7669 | Cele       | Belden    |
  74 | |    7675 | Araldo     | Hoes      |
  75 | |    8106 | Imojean    | Beaudoin  |
  76 | |    9438 | Sibby      | Luedtke   |
  77 | |    9566 | Eb         | Cattima   |
  78 | |    9606 | Alard      | Frodina   |
  79 | +---------+------------+-----------+
  80 | 17 rows in set (0.22 sec)
  81 | ```
  82 | 
  83 | Note that we filtered the results with a predicate that wasn't even in the result set (`country`).
  84 | 
  85 | You may also have seen or used the wildcard `%` with `LIKE` and `NOT LIKE`.
  86 | 
  87 | ```sql
  88 | SELECT
  89 |   user_id, first_name, last_name
  90 | FROM
  91 |   users
  92 | WHERE
  93 |   country
  94 | LIKE 'Zim%';
  95 | ```
  96 | 
  97 | ```sql
  98 | +---------+------------+-----------+
  99 | | user_id | first_name | last_name |
 100 | +---------+------------+-----------+
 101 | |     106 | Ivonne     | Barmen    |
 102 | |    1149 | Myca       | Flieger   |
 103 | |    2143 | Dallas     | Nimesh    |
 104 | |    4401 | Jeana      | Naga      |
 105 | |    4623 | Godiva     | Adal      |
 106 | |    5582 | Lexie      | Fenwick   |
 107 | |    5586 | Carrie     | Nich      |
 108 | |    5793 | Marten     | Casady    |
 109 | |    6072 | Feliza     | Culhert   |
 110 | |    6467 | Wood       | O'Connor  |
 111 | |    7093 | Miriam     | Galliett  |
 112 | |    7669 | Cele       | Belden    |
 113 | |    7675 | Araldo     | Hoes      |
 114 | |    8106 | Imojean    | Beaudoin  |
 115 | |    9438 | Sibby      | Luedtke   |
 116 | |    9566 | Eb         | Cattima   |
 117 | |    9606 | Alard      | Frodina   |
 118 | +---------+------------+-----------+
 119 | 17 rows in set (0.22 sec)
 120 | ```
 121 | 
 122 | These two are functionally equivalent queries. However, if there is an index on the predicate column, and you use a leading wildcard (e.g. `LIKE '%babwe'`), MySQL cannot use the index, and will instead perform a table scan. If you can avoid using leading wildcards on large tables, do so. It's also worth noting that there are many times when the query optimizer determines that the table scan would be faster than using an index, and so will do so anyway. [Index usage can be hinted](https://dev.mysql.com/doc/refman/8.0/en/index-hints.html), forced, and ignored, although as of MySQL 8.0.20, the old syntax (which included hints) [is deprecated](https://dev.mysql.com/doc/refman/8.0/en/optimizer-hints.html#optimizer-hints-index-level). Examples of both are below with an `EXPLAIN SELECT`. They're from a different schema and table, as I've already set up the index.
 123 | 
 124 | ```sql
 125 | EXPLAIN SELECT
 126 |   user_id, first_name, last_name
 127 | FROM
 128 |   test.ref_users
 129 | USE INDEX (country)
 130 | WHERE
 131 |   country
 132 | LIKE 'Zim%'\G
 133 | ```
 134 | 
 135 | ```sql
 136 | *************************** 1. row ***************************
 137 |            id: 1
 138 |   select_type: SIMPLE
 139 |         table: ref_users
 140 |    partitions: NULL
 141 |          type: range
 142 | possible_keys: country
 143 |           key: country
 144 |       key_len: 1023
 145 |           ref: NULL
 146 |          rows: 3
 147 |      filtered: 100.00
 148 |         Extra: Using index condition
 149 | 1 row in set, 1 warning (0.01 sec)
 150 | ```
 151 | 
 152 | ```sql
 153 | EXPLAIN SELECT
 154 |   user_id, first_name, last_name
 155 | FROM
 156 |   test.ref_users
 157 | FORCE INDEX (country)
 158 | WHERE
 159 |   country
 160 | LIKE '%babwe'\G
 161 | 
 162 | ```sql
 163 | *************************** 1. row ***************************
 164 |            id: 1
 165 |   select_type: SIMPLE
 166 |         table: ref_users
 167 |    partitions: NULL
 168 |          type: ALL
 169 | possible_keys: NULL
 170 |           key: NULL
 171 |       key_len: NULL
 172 |           ref: NULL
 173 |          rows: 1000
 174 |      filtered: 11.11
 175 |         Extra: Using where
 176 | 1 row in set, 1 warning (0.00 sec)
 177 | ```
 178 | 
 179 | Even when using `FORCE INDEX`, it's not being used, because it can't.
 180 | 
 181 | ```sql
 182 | EXPLAIN SELECT
 183 |   user_id, first_name, last_name
 184 | FROM
 185 |   test.ref_users
 186 | /*+ INDEX(ref_users country) */
 187 | WHERE
 188 |   country
 189 | LIKE 'Zim%'\G
 190 | ```
 191 | 
 192 | The new syntax, which looks like a C-style comment, requires both the table and column to be listed.
 193 | 
 194 | ```sql
 195 | *************************** 1. row ***************************
 196 |            id: 1
 197 |   select_type: SIMPLE
 198 |         table: customers
 199 |    partitions: NULL
 200 |          type: range
 201 | possible_keys: city
 202 |           key: city
 203 |       key_len: 153
 204 |           ref: NULL
 205 |          rows: 2
 206 |      filtered: 100.00
 207 |         Extra: Using index condition
 208 | 1 row in set, 1 warning (0.00 sec)
 209 | ```
 210 | 
 211 | ### SELECT
 212 | 
 213 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/select.html)
 214 | 
 215 | You use it to select data from tables (or `/dev/stdin`). Any questions?
 216 | 
 217 | ```sql
 218 | SELECT * FROM ref_zaps LIMIT 10 OFFSET 15;
 219 | ```
 220 | 
 221 | ```sql
 222 | +--------+----------+----------------------+---------------------+-----------------+
 223 | | zap_id | owned_by | shared_with          | created_at          | last_updated_at |
 224 | +--------+----------+----------------------+---------------------+-----------------+
 225 | |     16 |      788 | []                   | 2013-10-16 21:25:30 | NULL            |
 226 | |     17 |      689 | []                   | 2016-07-21 03:05:33 | NULL            |
 227 | |     18 |      735 | []                   | 2020-12-16 13:51:04 | NULL            |
 228 | |     19 |      802 | []                   | 2009-11-22 03:33:19 | NULL            |
 229 | |     20 |      297 | [529, 805, 541, 498] | 1997-07-11 15:05:07 | NULL            |
 230 | |     21 |      649 | []                   | 2015-05-18 20:08:31 | NULL            |
 231 | |     22 |      438 | []                   | 2006-12-14 15:28:30 | NULL            |
 232 | |     23 |      607 | []                   | 2013-04-15 17:57:19 | NULL            |
 233 | |     24 |      460 | []                   | 2018-01-28 02:05:59 | NULL            |
 234 | |     25 |      677 | []                   | 1995-06-07 21:46:30 | NULL            |
 235 | +--------+----------+----------------------+---------------------+-----------------+
 236 | 10 rows in set (0.01 sec)
 237 | ```
 238 | 
 239 | <details>
 240 |   <summary>Can you think of anything missing from this table? (HINT: SHOW CREATE TABLE)</summary>
 241 | 
 242 |   There's no foreign key linking `owned_by` to a given user! In fact, they're just randomly generated numbers, but there are pairings. Let's create a foreign key now:
 243 |   ```sql
 244 |   ALTER TABLE ref_zaps ADD CONSTRAINT zap_owner_id FOREIGN KEY (owned_by) REFERENCES ref_users (user_id);
 245 |   ```
 246 | 
 247 |   ```sql
 248 |   Query OK, 1000 rows affected (0.90 sec)
 249 |   Records: 1000  Duplicates: 0  Warnings: 0
 250 |   ```
 251 | </details>
 252 | 
 253 | #### Working with JSON
 254 | 
 255 | Both JSON arrays and objects can be stored in JSON columns. Using them in queries isn't as straight-forward as other column types.
 256 | 
 257 | ##### Finding non-null arrays
 258 | 
 259 | ```sql
 260 | SELECT *
 261 | FROM
 262 |   ref_zaps
 263 | WHERE JSON_LENGTH(shared_with) > 0
 264 | LIMIT 10;
 265 | ```
 266 | 
 267 | ```sql
 268 | +--------+----------+----------------------+---------------------+-----------------+
 269 | | zap_id | owned_by | shared_with          | created_at          | last_updated_at |
 270 | +--------+----------+----------------------+---------------------+-----------------+
 271 | |     20 |      297 | [529, 805, 541, 498] | 1997-07-11 15:05:07 | NULL            |
 272 | |     40 |      312 | [395, 721, 397, 930] | 2016-11-15 03:42:41 | NULL            |
 273 | |     60 |      469 | [261, 565, 326, 637] | 2011-09-21 11:40:22 | NULL            |
 274 | |     80 |      505 | [753, 766, 812, 521] | 2001-07-04 15:28:08 | NULL            |
 275 | |    100 |      459 | [884, 23, 163, 654]  | 2008-08-30 12:53:32 | NULL            |
 276 | |    120 |      411 | [730, 484, 530, 449] | 2012-09-02 00:42:20 | NULL            |
 277 | |    140 |      191 | [611, 798, 984, 583] | 2004-12-14 04:08:09 | NULL            |
 278 | |    160 |      310 | [941, 353, 499, 668] | 2003-01-22 01:05:04 | NULL            |
 279 | |    180 |      463 | [679, 639, 760, 784] | 2022-01-22 04:31:00 | NULL            |
 280 | |    200 |       36 | [308, 955, 485, 298] | 2015-10-17 21:42:16 | NULL            |
 281 | +--------+----------+----------------------+---------------------+-----------------+
 282 | 10 rows in set (0.02 sec)
 283 | ```
 284 | 
 285 | ##### Checking for a value inside an array
 286 | 
 287 | ```sql
 288 | SELECT
 289 |   zap_id,
 290 |   owned_by,
 291 |   shared_with,
 292 |   user_id,
 293 |   full_name
 294 | FROM ref_zaps
 295 | JOIN
 296 |   ref_users ON
 297 | JSON_CONTAINS(shared_with, JSON_ARRAY(ref_users.user_id))
 298 | LIMIT 10;
 299 | ```
 300 | 
 301 | ```sql
 302 | +--------+----------+---------------------+---------+--------------------+
 303 | | zap_id | owned_by | shared_with         | user_id | full_name          |
 304 | +--------+----------+---------------------+---------+--------------------+
 305 | |    240 |      697 | [3, 854, 486, 907]  |       3 | Gorlin, Alene      |
 306 | |    100 |      459 | [884, 23, 163, 654] |      23 | Schnurr, Sissie    |
 307 | |    700 |      947 | [28, 173, 33, 899]  |      28 | Russi, Bab         |
 308 | |    560 |      869 | [258, 197, 724, 31] |      31 | Quince, Caryl      |
 309 | |    700 |      947 | [28, 173, 33, 899]  |      33 | Langille, Tonya    |
 310 | |    740 |      888 | [41, 221, 402, 301] |      41 | Kruter, Bonni      |
 311 | |    460 |      566 | [45, 793, 553, 162] |      45 | Schuh, Gasparo     |
 312 | |    940 |      211 | [497, 973, 323, 48] |      48 | Aylsworth, Steffen |
 313 | |    260 |      861 | [313, 52, 334, 457] |      52 | Delwyn, Karoline   |
 314 | |    420 |      667 | [524, 527, 948, 60] |      60 | Magen, Sherill     |
 315 | +--------+----------+---------------------+---------+--------------------+
 316 | 10 rows in set (0.88 sec)
 317 | ```
 318 | 
 319 | ##### Extracting scalars from an object
 320 | 
 321 | You can select a JSON column mixed in with non-JSON as you'd expect, and the entire contents will be displayed.
 322 | 
 323 | ```sql
 324 | SELECT
 325 |   user_id,
 326 |   email,
 327 |   user_json
 328 | FROM
 329 |   gensql
 330 | LIMIT 10;
 331 | ```
 332 | 
 333 | ```sql
 334 | +---------+-------------------------------+-----------------------------------------------------------------------------------------------+
 335 | | user_id | email                         | user_json                                                                                     |
 336 | +---------+-------------------------------+-----------------------------------------------------------------------------------------------+
 337 | |       1 | abba.wilder@bodacious.com     | {"a_key": "playable", "b_key": {"c_key": ["unscathed", "humongous", "surplus", "mousiness"]}} |
 338 | |       2 | antonetta.bosson@chaplain.com | {"a_key": "obedience", "b_key": {"c_key": ["depletion", "carve", "driveway", "primate"]}}     |
 339 | |       3 | cobb.fondea@contusion.com     | {"a_key": "activity", "b_key": {"c_key": ["famine", "huskiness", "unleash", "unknotted"]}}    |
 340 | |       4 | hanan.keelin@aspect.com       | {"a_key": "iron", "b_key": {"c_key": ["exact", "postcard", "sauciness", "dispatch"]}}         |
 341 | |       5 | kinna.lytle@epidermis.com     | {"a_key": "flannels", "b_key": {"c_key": ["sherry", "graded", "crusader", "rumble"]}}         |
 342 | |       6 | carolynn.sewoll@starch.com    | {"a_key": "extrude", "b_key": {"c_key": ["harmony", "ferris", "confirm", "elevate"]}}         |
 343 | |       7 | ola.pride@defile.com          | {"a_key": "blurt", "b_key": {"c_key": ["expectant", "half", "coming", "remover"]}}            |
 344 | |       8 | orella.acima@subwoofer.com    | {"a_key": "grape", "b_key": {"c_key": ["wrist", "galley", "fragment", "scurvy"]}}             |
 345 | |       9 | odilia.thorr@daredevil.com    | {"a_key": "numbing", "b_key": {"c_key": ["glutinous", "repacking", "reliant", "polygon"]}}    |
 346 | |      10 | berrie.marybella@undertow.com | {"a_key": "unadvised", "b_key": {"c_key": ["grove", "cornhusk", "darkening", "grazing"]}}     |
 347 | +---------+-------------------------------+-----------------------------------------------------------------------------------------------+
 348 | 10 rows in set (0.01 sec)
 349 | ```
 350 | 
 351 | You can also extract specific keys:
 352 | 
 353 | ```sql
 354 | -- the ->> operator is shorthand for JSON_UNQUOTE(JSON_EXTRACT())
 355 | SELECT
 356 |   email,
 357 |   user_json->>'$.b_key'
 358 | FROM
 359 |   gensql
 360 | LIMIT 10;
 361 | ```
 362 | 
 363 | ```sql
 364 | +---------+-------------------------------+---------------------------------------------------------------+
 365 | | user_id | email                         | user_json->>'$.b_key'                                         |
 366 | +---------+-------------------------------+---------------------------------------------------------------+
 367 | |       1 | abba.wilder@bodacious.com     | {"c_key": ["unscathed", "humongous", "surplus", "mousiness"]} |
 368 | |       2 | antonetta.bosson@chaplain.com | {"c_key": ["depletion", "carve", "driveway", "primate"]}      |
 369 | |       3 | cobb.fondea@contusion.com     | {"c_key": ["famine", "huskiness", "unleash", "unknotted"]}    |
 370 | |       4 | hanan.keelin@aspect.com       | {"c_key": ["exact", "postcard", "sauciness", "dispatch"]}     |
 371 | |       5 | kinna.lytle@epidermis.com     | {"c_key": ["sherry", "graded", "crusader", "rumble"]}         |
 372 | |       6 | carolynn.sewoll@starch.com    | {"c_key": ["harmony", "ferris", "confirm", "elevate"]}        |
 373 | |       7 | ola.pride@defile.com          | {"c_key": ["expectant", "half", "coming", "remover"]}         |
 374 | |       8 | orella.acima@subwoofer.com    | {"c_key": ["wrist", "galley", "fragment", "scurvy"]}          |
 375 | |       9 | odilia.thorr@daredevil.com    | {"c_key": ["glutinous", "repacking", "reliant", "polygon"]}   |
 376 | |      10 | berrie.marybella@undertow.com | {"c_key": ["grove", "cornhusk", "darkening", "grazing"]}      |
 377 | +---------+-------------------------------+---------------------------------------------------------------+
 378 | 10 rows in set (0.00 sec)
 379 | ```
 380 | 
 381 | 
 382 | Or nest extractions:
 383 | 
 384 | ```sql
 385 | SELECT
 386 |   user_id,
 387 |   email,
 388 |   user_json->>'$.b_key.c_key'
 389 | FROM
 390 |   gensql
 391 | LIMIT 10;
 392 | ```
 393 | 
 394 | ```sql
 395 | +---------+-------------------------------+----------------------------------------------------+
 396 | | user_id | email                         | user_json->>'$.b_key.c_key'                        |
 397 | +---------+-------------------------------+----------------------------------------------------+
 398 | |       1 | abba.wilder@bodacious.com     | ["unscathed", "humongous", "surplus", "mousiness"] |
 399 | |       2 | antonetta.bosson@chaplain.com | ["depletion", "carve", "driveway", "primate"]      |
 400 | |       3 | cobb.fondea@contusion.com     | ["famine", "huskiness", "unleash", "unknotted"]    |
 401 | |       4 | hanan.keelin@aspect.com       | ["exact", "postcard", "sauciness", "dispatch"]     |
 402 | |       5 | kinna.lytle@epidermis.com     | ["sherry", "graded", "crusader", "rumble"]         |
 403 | |       6 | carolynn.sewoll@starch.com    | ["harmony", "ferris", "confirm", "elevate"]        |
 404 | |       7 | ola.pride@defile.com          | ["expectant", "half", "coming", "remover"]         |
 405 | |       8 | orella.acima@subwoofer.com    | ["wrist", "galley", "fragment", "scurvy"]          |
 406 | |       9 | odilia.thorr@daredevil.com    | ["glutinous", "repacking", "reliant", "polygon"]   |
 407 | |      10 | berrie.marybella@undertow.com | ["grove", "cornhusk", "darkening", "grazing"]      |
 408 | +---------+-------------------------------+----------------------------------------------------+
 409 | 10 rows in set (0.01 sec)
 410 | ```
 411 | 
 412 | ```sql
 413 | -- the -> operator is shorthand for JSON_EXTRACT()
 414 | -- arrays are 0-indexed, so this is a slice, like lst[1:3]
 415 | SELECT
 416 |   email,
 417 |   user_json->'$.b_key.c_key[1 to 2]'
 418 | FROM
 419 |   gensql
 420 | LIMIT 10;
 421 | ```
 422 | 
 423 | ```sql
 424 | +--------------------------------+------------------------------------+
 425 | | email                          | user_json->'$.e_key.d_key[1 to 2]' |
 426 | +--------------------------------+------------------------------------+
 427 | | donelle.labors@amused.com      | ["idealness", "unplug"]            |
 428 | | mackenzie.youngran@abridge.com | ["waffle", "scion"]                |
 429 | | elset.kramer@tiny.com          | ["dimple", "manpower"]             |
 430 | | theresita.faxen@plentiful.com  | ["appetizer", "huskiness"]         |
 431 | | salomi.pasco@each.com          | ["tiptop", "unsnap"]               |
 432 | | ashia.garate@varied.com        | ["bauble", "mayflower"]            |
 433 | | jonathan.aulea@chastise.com    | ["senior", "silicon"]              |
 434 | | gillan.mcnalley@slain.com      | ["provider", "gradient"]           |
 435 | | madelon.harleigh@defiling.com  | ["evoke", "tidy"]                  |
 436 | | dagny.iverson@entryway.com     | ["baton", "skillful"]              |
 437 | +--------------------------------+------------------------------------+
 438 | 10 rows in set (0.02 sec)
 439 | ```
 440 | 
 441 | See [MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html) for much more about JSON operations.
 442 | 
 443 | ### INSERT
 444 | 
 445 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/insert.html)
 446 | 
 447 | `INSERT` is used to insert rows into a table. There is also an `UPSERT` equivalent, with the `ON DUPLICATE KEY UPDATE` clause. With this, if an `INSERT` would cause a key collision with a `UNIQUE` index (explicit or implicit, e.g. `PRIMARY KEY`), then an `UPDATE` of that row occurs instead.
 448 | 
 449 | ```sql
 450 | INSERT INTO users
 451 |   (first_name, last_name, user_id)
 452 | VALUES
 453 |   ('Leeroy', 'Jenkins', 42);
 454 | ```
 455 | 
 456 | ```sql
 457 | ERROR 1062 (23000): Duplicate entry '42' for key 'users.PRIMARY'
 458 | ```
 459 | 
 460 | Expectedly, that failed since `user_id`, which is our primary key, already has an entry at `42`.
 461 | 
 462 | ```sql
 463 | SELECT * FROM
 464 |   users
 465 | WHERE
 466 |   user_id = 42\G
 467 | ```
 468 | 
 469 | ```sql
 470 | *************************** 1. row ***************************
 471 |         user_id: 42
 472 |      first_name: Ramona
 473 |       last_name: Odelet
 474 |       full_name: Odelet, Ramona
 475 |           email: ramona.odelet@lucid.com
 476 |            city: Foligno
 477 |         country: Italy
 478 |      created_at: 2003-07-29 07:34:15
 479 | last_updated_at: NULL
 480 | 1 row in set (0.01 sec)
 481 | ```
 482 | 
 483 | Now we can try again, this time with an instruction to perform an UPSERT.
 484 | 
 485 | ```sql
 486 | INSERT INTO users
 487 |   (first_name, last_name, user_id)
 488 | VALUES
 489 |   ("Leeroy", "Jenkins", 42) AS vals
 490 | ON DUPLICATE KEY UPDATE
 491 |   first_name = vals.first_name,
 492 |   last_name = vals.last_name;
 493 | ```
 494 | 
 495 | ```sql
 496 | Query OK, 2 rows affected (0.21 sec)
 497 | ```
 498 | 
 499 | ```sql
 500 | SELECT * FROM users WHERE user_id = 42\G
 501 | ```
 502 | 
 503 | ```sql
 504 | *************************** 1. row ***************************
 505 |         user_id: 42
 506 |      first_name: Leeroy
 507 |       last_name: Jenkins
 508 |       full_name: Jenkins, Leeroy
 509 |           email: ramona.odelet@lucid.com
 510 |            city: Foligno
 511 |         country: Italy
 512 |      created_at: 2003-07-29 07:34:15
 513 | last_updated_at: 2023-02-27 13:24:26
 514 | 1 row in set (0.01 sec)
 515 | ```
 516 | 
 517 | While `full_name` updated, since it's a `GENERATED` column, `email` is now incorrect. Also, note that `last_updated_at` has changed from `NULL`, since we've modified the row.
 518 | 
 519 | Let's put the row back to how it was before.
 520 | 
 521 | <details>
 522 |   <summary>How can this be accomplished?</summary>
 523 | 
 524 |   ```sql
 525 |   -- first, let's be safe with a transaction
 526 |   START TRANSACTION;
 527 |   ```
 528 | 
 529 |   ```sql
 530 |   Query OK, 0 rows affected (0.01 sec)
 531 |   ```
 532 | 
 533 |   ```sql
 534 |   -- then, use UPDATE
 535 |   UPDATE users SET first_name = 'Ramona', last_name = 'Odelet' WHERE user_id = 42;
 536 |   ```
 537 | 
 538 |   ```sql
 539 |   Query OK, 1 row affected (0.01 sec)
 540 |   Rows matched: 1  Changed: 1  Warnings: 0
 541 |   ```
 542 | 
 543 |   ```sql
 544 |   -- next, verify the work
 545 |   SELECT * FROM users WHERE user_id = 42\G
 546 |   ```
 547 | 
 548 |   ```sql
 549 |   *************************** 1. row ***************************
 550 |           user_id: 42
 551 |        first_name: Ramona
 552 |         last_name: Odelet
 553 |         full_name: Odelet, Ramona
 554 |             email: ramona.odelet@lucid.com
 555 |              city: Foligno
 556 |           country: Italy
 557 |        created_at: 2003-07-29 07:34:15
 558 |   last_updated_at: 2023-02-27 13:30:10
 559 |   1 row in set (0.00 sec)
 560 |   ```
 561 | 
 562 |   ```sql
 563 |   -- finally, commit the result
 564 |   COMMIT;
 565 |   ```
 566 | 
 567 |   ```sql
 568 |   Query OK, 0 rows affected (0.08 sec)
 569 |   ```
 570 | </details>
 571 | 
 572 | ### TABLE
 573 | 
 574 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/table.html)
 575 | 
 576 | `TABLE` is syntactic sugar for `SELECT * FROM <table>`. Works great if you know the table is small, but be careful on large tables!
 577 | 
 578 | ```sql
 579 | TABLE users\G
 580 | ```
 581 | 
 582 | ```sql
 583 | -- 9999 rows are above this...
 584 | *************************** 10000. row ***************************
 585 |         user_id: 10000
 586 |      first_name: Gabrila
 587 |       last_name: Lemmueu
 588 |       full_name: Lemmueu, Gabrila
 589 |           email: gabrila.lemmueu@urgent.com
 590 |            city: Itanagar
 591 |         country: India
 592 |      created_at: 2020-12-10 01:58:35
 593 | last_updated_at: NULL
 594 | 10000 rows in set (0.48 sec)
 595 | ```
 596 | 
 597 | ## Joins
 598 | 
 599 | ### Relational alegbra
 600 | 
 601 | Not a lot of it, I promise; just what we need to discuss joins.
 602 | 
 603 | * Union: `R ∪ S --- R OR S`
 604 |   * Implemented in MySQL via the `UNION` keyword
 605 | * Intersection: `R ∩ S --- R AND S`
 606 |   * Implemented in MySQL via `INNER JOIN`, or in MySQL 8.0.31, the `INTERSECT` keyword
 607 | * Difference: `R ≏ S --- R - S`
 608 |   * Implemented in MySQL 8.0.31 via the `EXCEPT` keyword, and can be emulated using `UNION` and `NOT IN`
 609 | 
 610 | If you're intersted in exploring relational alegbra, [this application](https://dbis-uibk.github.io/relax/calc/local/uibk/local/3) is quite useful to convert SQL to relational alegbra, and display the results.
 611 | 
 612 | ### Types of joins
 613 | 
 614 | #### Cross
 615 | 
 616 | Before we demonstrate a cross join, you should have two small (very small, like < 10 rows) tables. You can either use what we learned earlier to create a new table from an existing one, or you can use any two of the following two tables: `northwind.orders_status`, `northwind.tax_status_name`, `test.ref_users_tiny`, `test.ref_users_zaps`. You can cross join across schemas if you'd like, although I can't promise the information will make any sense.
 617 | 
 618 | Also called a Cartesian Join. This produces `n x m` rows for the two groups being joined. That said, every other join can be thought of as a cross join with a predicate. In fact, `CROSS JOIN`, `JOIN`, and `INNER JOIN` are actually syntactically equivalent in MySQL (not ANSI SQL!), but for readability, it's preferred to only use `CROSS JOIN` if you actually intend to use it.
 619 | 
 620 | ```sql
 621 | SELECT
 622 |   z.zap_id,
 623 |   u.user_id,
 624 |   u.full_name
 625 | FROM
 626 |   ref_users_tiny u
 627 | CROSS JOIN
 628 |   ref_zaps_tiny z;
 629 | ```
 630 | 
 631 | ```sql
 632 | +--------+---------+-------------------+
 633 | | zap_id | user_id | full_name         |
 634 | +--------+---------+-------------------+
 635 | |      1 |       4 | McGrody, Cointon  |
 636 | |      1 |       3 | Gorlin, Alene     |
 637 | |      1 |       2 | Marienthal, Shirl |
 638 | |      1 |       1 | Jemena, Wyatt     |
 639 | |      2 |       4 | McGrody, Cointon  |
 640 | |      2 |       3 | Gorlin, Alene     |
 641 | |      2 |       2 | Marienthal, Shirl |
 642 | |      2 |       1 | Jemena, Wyatt     |
 643 | |      3 |       4 | McGrody, Cointon  |
 644 | |      3 |       3 | Gorlin, Alene     |
 645 | |      3 |       2 | Marienthal, Shirl |
 646 | |      3 |       1 | Jemena, Wyatt     |
 647 | |      4 |       4 | McGrody, Cointon  |
 648 | |      4 |       3 | Gorlin, Alene     |
 649 | |      4 |       2 | Marienthal, Shirl |
 650 | |      4 |       1 | Jemena, Wyatt     |
 651 | +--------+---------+-------------------+
 652 | 16 rows in set (0.01 sec)
 653 | ```
 654 | 
 655 | #### Inner Join
 656 | 
 657 | The default (i.e. `JOIN` == `INNER JOIN`). This is `users AND zaps` with a predicate.
 658 | 
 659 | ```sql
 660 | SELECT
 661 |   z.zap_id,
 662 |   u.full_name,
 663 |   u.city,
 664 |   u.country
 665 | FROM
 666 |   ref_users u
 667 | JOIN
 668 |   ref_zaps z
 669 | ON
 670 |   u.user_id = z.owned_by
 671 | LIMIT 10;
 672 | ```
 673 | 
 674 | ```sql
 675 | +--------+-------------------+-------------+----------------+
 676 | | zap_id | full_name         | city        | country        |
 677 | +--------+-------------------+-------------+----------------+
 678 | |    411 | MacPherson, Addie | Latina      | Italy          |
 679 | |    794 | Airla, Valaree    | Pribram     | Czech Republic |
 680 | |    830 | Kirschner, Robby  | Bikaner     | India          |
 681 | |    697 | Bilski, Lewiss    | Vörderås    | Sweden         |
 682 | |    110 | Yamauchi, Marleah | Rotterdam   | Netherlands    |
 683 | |    942 | Yamauchi, Marleah | Rotterdam   | Netherlands    |
 684 | |    772 | Calore, Ania      | Miyakojima  | Japan          |
 685 | |    676 | Breger, Gratiana  | Valkeakoski | Finland        |
 686 | |    715 | Serafina, Janith  | Morant Bay  | Jamaica        |
 687 | |    405 | Beckman, Pavla    | Wackersdorf | Germany        |
 688 | +--------+-------------------+-------------+----------------+
 689 | 10 rows in set (0.02 sec)
 690 | ```
 691 | 
 692 | #### Left Outer Join
 693 | 
 694 | Left and Right Joins are both a type of Outer Join, and often just called Left or Right Join. This is `users OR zaps` with a predicate and default value (`NULL`) for `zaps`.
 695 | 
 696 | ```sql
 697 | SELECT
 698 |   u.user_id,
 699 |   u.full_name,
 700 |   z.zap_id,
 701 |   z.owned_by
 702 | FROM
 703 |   ref_users u
 704 | LEFT JOIN
 705 |   ref_zaps_joins z
 706 | ON
 707 |   u.user_id = z.owned_by
 708 | LIMIT 10;
 709 | ```
 710 | 
 711 | ```sql
 712 | +---------+-------------------+--------+----------+
 713 | | user_id | full_name         | zap_id | owned_by |
 714 | +---------+-------------------+--------+----------+
 715 | |       1 | MacPherson, Addie |    411 |        1 |
 716 | |       2 | Airla, Valaree    |    794 |        2 |
 717 | |       3 | Nett, Sheppard    |   NULL |     NULL |
 718 | |       4 | Kirschner, Robby  |    830 |        4 |
 719 | |       5 | Bilski, Lewiss    |    697 |        5 |
 720 | |       6 | Yamauchi, Marleah |    942 |        6 |
 721 | |       6 | Yamauchi, Marleah |    110 |        6 |
 722 | |       7 | Calore, Ania      |    772 |        7 |
 723 | |       8 | Breger, Gratiana  |    676 |        8 |
 724 | |       9 | Serafina, Janith  |    715 |        9 |
 725 | +---------+-------------------+--------+----------+
 726 | 10 rows in set (0.09 sec)
 727 | ```
 728 | 
 729 | Of course, we previously put a foreign key on `zaps.owned_by`, precisely to prevent this kind of thing from happening. Still, you can see how this kind of query could be useful.
 730 | 
 731 | #### Right Outer Join
 732 | 
 733 | This is the same thing, but with the tables reversed:
 734 | 
 735 | ```sql
 736 | SELECT
 737 |   u.user_id,
 738 |   u.full_name,
 739 |   z.zap_id,
 740 |   z.owned_by
 741 | FROM
 742 |   ref_users u
 743 | RIGHT JOIN
 744 |   ref_zaps_joins z
 745 | ON
 746 |   u.user_id = z.owned_by
 747 | LIMIT 10;
 748 | ```
 749 | 
 750 | ```sql
 751 | +---------+------------------+--------+----------+
 752 | | user_id | full_name        | zap_id | owned_by |
 753 | +---------+------------------+--------+----------+
 754 | |     602 | Hirz, Datha      |      1 |      602 |
 755 | |     593 | Meldoh, Vergil   |      2 |      593 |
 756 | |    NULL | NULL             |      3 |        0 |
 757 | |     548 | Philps, Ardelia  |      4 |      548 |
 758 | |     957 | Joash, Electra   |      5 |      957 |
 759 | |     777 | Levinson, Lenore |      6 |      777 |
 760 | |     648 | Vas, Tiphanie    |      7 |      648 |
 761 | |     959 | Brink, Kaia      |      8 |      959 |
 762 | |     569 | Lasser, Garrard  |      9 |      569 |
 763 | |     429 | Adamsen, Justen  |     10 |      429 |
 764 | +---------+------------------+--------+----------+
 765 | 10 rows in set (0.09 sec)
 766 | ```
 767 | 
 768 | You can translate any `LEFT JOIN` to a `RIGHT JOIN` simply by swapping the order of the tables being joined:
 769 | 
 770 | ```sql
 771 | SELECT
 772 |   u.user_id,
 773 |   u.full_name,
 774 |   z.zap_id,
 775 |   z.owned_by
 776 | FROM
 777 |   ref_zaps_joins z
 778 | RIGHT JOIN
 779 |   ref_users u
 780 | ON
 781 |   u.user_id = z.owned_by
 782 | LIMIT 10;
 783 | ```
 784 | 
 785 | ```sql
 786 | +---------+-------------------+--------+----------+
 787 | | user_id | full_name         | zap_id | owned_by |
 788 | +---------+-------------------+--------+----------+
 789 | |       1 | MacPherson, Addie |    411 |        1 |
 790 | |       2 | Airla, Valaree    |    794 |        2 |
 791 | |       3 | Nett, Sheppard    |   NULL |     NULL |
 792 | |       4 | Kirschner, Robby  |    830 |        4 |
 793 | |       5 | Bilski, Lewiss    |    697 |        5 |
 794 | |       6 | Yamauchi, Marleah |    942 |        6 |
 795 | |       6 | Yamauchi, Marleah |    110 |        6 |
 796 | |       7 | Calore, Ania      |    772 |        7 |
 797 | |       8 | Breger, Gratiana  |    676 |        8 |
 798 | |       9 | Serafina, Janith  |    715 |        9 |
 799 | +---------+-------------------+--------+----------+
 800 | 10 rows in set (0.15 sec)
 801 | ```
 802 | 
 803 | #### Full Outer Join
 804 | 
 805 | This is `users OR zaps` with a predicate and default value (`NULL`) for both tables. MySQL doesn't support `FULL JOIN` as a keyword, but it can be performed using `UNION` (or `UNION ALL` if duplicates are desired).
 806 | 
 807 | NOTE: This query will produce 1150 rows as written.
 808 | 
 809 | ```sql
 810 | SELECT
 811 |   u.user_id,
 812 |   u.full_name,
 813 |   z.zap_id,
 814 |   z.owned_by
 815 | FROM
 816 |   ref_users u
 817 |   LEFT JOIN ref_zaps_joins z ON u.user_id = z.owned_by
 818 | UNION ALL
 819 | SELECT
 820 |   u.user_id,
 821 |   u.full_name,
 822 |   z.zap_id,
 823 |   z.owned_by
 824 | FROM
 825 |   ref_users u
 826 |   RIGHT JOIN ref_zaps_joins z ON u.user_id = z.owned_by
 827 | WHERE
 828 |   u.user_id IS NULL;
 829 | ```
 830 | 
 831 | To efficiently see what it's doing, you can run two queries, appending `ORDER BY -user_id DESC` and `ORDER BY user_id`, which represents the top and bottom of the result. Don't forget to add a `LIMIT` as well!
 832 | 
 833 | <details>
 834 |   <summary>What is -user_id?</summary>
 835 | 
 836 |   It's shorthand for the math expression `(0 - user_id)`, which effectively is the same thing as `ORDER BY ... ASC`, but it places `NULL` values last. Postgres avoids this weird trick and just has the `NULLS {FIRST, LAST}` option for ordering.
 837 | </details>
 838 | 
 839 | ### Specifying a column's table
 840 | 
 841 | You may have noticed that we've used aliases for many tables, e.g. `ref_users u`, and then notating columns with that alias as a prefix, e.g. `u.user_id`. This is not required for single tables, of course, nor is it requires with joins if every column name is unique. However, it's considered a good practice when using multiple tables.
 842 | 
 843 | ### Indices
 844 | 
 845 | Indices, or indexes, _may_ speed up queries. Each table **should** have a primary key (it's not required*, but, please don't do this), which is one index. Additional indices, on single or multiple columns, may be created. Most of them are stored in [B+ trees](https://en.wikipedia.org/wiki/B%2B_tree), which are similar to [B-trees](https://en.wikipedia.org/wiki/B-tree).
 846 | 
 847 | Indices aren't free, however - when you create an index on a column, that column's values are copied to the aforementioned B+ tree. While disk space is relatively cheap, creating dozens of indices for columns that are infrequently queried should be avoided. Also, since `INSERTs` must also write to the index, they'll be slowed down somewhat. Finally, InnoDB limits a given table to a maximum of 64 secondary indices (that is, other than primary keys).
 848 | 
 849 | <details>
 850 | <summary>Obscure facts about tables without primary keys</summary>
 851 | 
 852 | \* Prior to MySQL 8.0.30, if you don't create a primary key, the first `UNIQUE NOT NULL` index created is automatically promoted to become the primary key. If you don't have one of those either, the table will have no primary key†. Starting with MySQL 8.0.30, if no primary key is declared, an invisible column will be created called `my_row_id` and set to be the primary key.
 853 | 
 854 | † Not entirely true. A hidden index named `GEN_CLUST_INDEX` is created on an invisible (but a special kind of invisible, that you can never view) column named `ROW_ID` containing row IDs, but it's a monotonically increasing index that's shared globally across the entire database, not just that schema. Don't make InnoDB do this.
 855 | </details>
 856 | 
 857 | #### Single indices
 858 | 
 859 | Here, we'll switch over to `%_big` tables, which have 1,000,000 rows each.
 860 | 
 861 | ```sql
 862 | SELECT
 863 |   user_id,
 864 |   full_name,
 865 |   city,
 866 |   country
 867 | FROM
 868 |   ref_users_big
 869 | WHERE
 870 |   last_name = 'Safko';
 871 | ```
 872 | 
 873 | ```sql
 874 | +---------+------------------+------------------------+----------------+
 875 | | user_id | full_name        | city                   | country        |
 876 | +---------+------------------+------------------------+----------------+
 877 | |   66826 | Safko, Elwyn     | Arad                   | Romania        |
 878 | |   68759 | Safko, Vance     | Saint-Jérôme           | Canada         |
 879 | |   81384 | Safko, Robinett  | Hornchurch             | United Kingdom |
 880 | |   92580 | Safko, Daisi     | Sherwood Park          | Canada         |
 881 | |  121219 | Safko, Karalee   | Miami Gardens          | United States  |
 882 | |  124408 | Safko, Kyrstin   | Hawick                 | United Kingdom |
 883 | |  150615 | Safko, Kleon     | Leigh                  | United Kingdom |
 884 | |  151266 | Safko, Elita     | Abag Qi                | China          |
 885 | |  155926 | Safko, Berthe    | Tullebølle             | Denmark        |
 886 | |  168897 | Safko, Hazlett   | Valletta               | Malta          |
 887 | |   ...   |     ...          |         ...            |      ...       |
 888 | |  900935 | Safko, Tommy     | Paris                  | France         |
 889 | |  925514 | Safko, Rancell   | Nampa                  | United States  |
 890 | |  928486 | Safko, Garry     | Bardhaman              | India          |
 891 | |  932457 | Safko, Desiree   | Kherson                | Ukraine        |
 892 | |  945316 | Safko, Courtnay  | Saint Marys            | Canada         |
 893 | |  947072 | Safko, Leonie    | Durango                | Mexico         |
 894 | |  948263 | Safko, Jarred    | Las Vegas              | United States  |
 895 | |  959464 | Safko, Gordie    | Madison                | United States  |
 896 | |  972002 | Safko, Adriena   | Ubud                   | Indonesia      |
 897 | |  982089 | Safko, Gan       | Milpitas               | United States  |
 898 | +---------+------------------+------------------------+----------------+
 899 | 76 rows in set (12.24 sec)
 900 | ```
 901 | 
 902 | Let's create an index on the last name.
 903 | 
 904 | ```sql
 905 | CREATE INDEX last_name ON ref_users_big (last_name);
 906 | ```
 907 | 
 908 | ```sql
 909 | Query OK, 0 rows affected (45.08 sec)
 910 | Records: 0  Duplicates: 0  Warnings: 0
 911 | ```
 912 | 
 913 | ```sql
 914 | SELECT * FROM ref_users_big WHERE last_name = 'Safko';
 915 | ```
 916 | 
 917 | ```sql
 918 | -- the same results as above
 919 | 76 rows in set (0.04 sec)
 920 | ```
 921 | 
 922 | The lookup is now essentially instantaneous. If this is a frequently performed query, this may be a wise decision. There are also times when you may not need an index - for example, remember that a `UNIQUE` constraint is also an index. Since all of our users in this table have an email address which is `first.last@domain.com`, you might be tempted to add a predicate of `WHERE email LIKE '%safko%'` instead of adding an index, but alas - leading wildcards disallow the use of indexes, so it requires a full table scan.
 923 | 
 924 | #### Partial indices
 925 | 
 926 | Starting with MySQL 8.0.13, you can also create an index on a prefix of a column for string types (`CHAR`, `VARCHAR`, etc.), and for `TEXT` and `BLOB` columns you must do this.
 927 | 
 928 | This will create an index on the first 3 characters of last_name:
 929 | 
 930 | ```sql
 931 | ALTER TABLE ref_users_big DROP INDEX user_name;
 932 | CREATE INDEX last_name_partial ON ref_users_big (last_name(3));
 933 | ```
 934 | 
 935 | ```sql
 936 | Query OK, 0 rows affected (0.31 sec)
 937 | Records: 0  Duplicates: 0  Warnings: 0
 938 | 
 939 | Query OK, 0 rows affected (37.85 sec)
 940 | Records: 0  Duplicates: 0  Warnings: 0
 941 | ```
 942 | 
 943 | Speed for the new query is slower than before (0.16 seconds vs. 0.04 seconds), as expected, but 160 milliseconds for hashing three characters honestly isn't that bad. If you have tremendously large tables, limited disk space, or are worried about the write performance impact, this may be a good option for you.
 944 | 
 945 | #### Functional indices
 946 | 
 947 | You can also create an index that is itself an expression:
 948 | 
 949 | ```sql
 950 | CREATE INDEX
 951 |   created_month
 952 | ON ref_users_big ((MONTH(created_at)));
 953 | ```
 954 | 
 955 | Note the double parentheses around the expression.
 956 | 
 957 | ```sql
 958 | Query OK, 0 rows affected (41.15 sec)
 959 | Records: 0  Duplicates: 0  Warnings: 0
 960 | ```
 961 | 
 962 | What this specifically allows you to do is treat the `created_at` month value as an integer:
 963 | 
 964 | ```sql
 965 | EXPLAIN ANALYZE SELECT
 966 |   user_id, email, created_at
 967 | FROM
 968 |   ref_users_big
 969 | WHERE
 970 |   MONTH(created_at) = 6\G
 971 | ```
 972 | 
 973 | ```sql
 974 | *************************** 1. row ***************************
 975 | EXPLAIN: -> Index lookup on ref_users_big using created_month (month(created_at)=6)  (cost=19952.91 rows=153858) (actual time=2.303..12051.690 rows=82815 loops=1)
 976 | 
 977 | 1 row in set (15.49 sec)
 978 | ```
 979 | 
 980 | Note that in this case, it's actually _slower_ with the index, likely due to the cardinality of the month.
 981 | 
 982 | ```sql
 983 | EXPLAIN ANALYZE SELECT
 984 |   user_id, email, created_at
 985 | FROM
 986 |   ref_users_big
 987 | USE INDEX()
 988 | WHERE
 989 |   MONTH(created_at) = 6\G
 990 | ```
 991 | 
 992 | ```sql
 993 | *************************** 1. row ***************************
 994 | EXPLAIN: -> Filter: (month(ref_users_big.created_at) = 6)  (cost=100955.37 rows=994330) (actual time=1.114..11135.192 rows=82815 loops=1)
 995 |     -> Table scan on ref_users_big  (cost=100955.37 rows=994330) (actual time=1.010..9733.530 rows=1000000 loops=1)
 996 | 
 997 | 1 row in set (11.43 sec)
 998 | ```
 999 | 
1000 | #### JSON / Longtext
1001 | 
1002 | JSON has its own special requirements to be indexed, mostly if you're storing strings. First, you must select a specific part of the column's rows to be the indexed key, known as a functional key part. Additionally, the key has to have a prefix length assigned to it. Depending on the version of MySQL you're using, there may also be collation differences between the return value from various JSON functions and native storage of strings. Finally, this requires the stored data to be `k:v` objects, rather than arrays.
1003 | 
1004 | Here, we're using a multi-valued index, which behind the scenes is creating a virtual, invisible column to store the extracted JSON array as a character array.
1005 | 
1006 | ```sql
1007 | CREATE INDEX user_json_array_key ON gensql (
1008 |   (
1009 |     CAST(
1010 |       user_json -> '$.b_key.c_key' AS CHAR(64) ARRAY
1011 |     )
1012 |   )
1013 | );
1014 | ```
1015 | 
1016 | See [MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-multi-valued) for more information on indexing JSON values, and properly using them.
1017 | 
1018 | #### Composite indices
1019 | 
1020 | An index can also be created across multiple columns - for InnoDB, up to 16.
1021 | 
1022 | ```sql
1023 | CREATE INDEX full_name ON ref_users (first_name, last_name);
1024 | ```
1025 | 
1026 | ```sql
1027 | Query OK, 0 rows affected (40.09 sec)
1028 | Records: 0  Duplicates: 0  Warnings: 0
1029 | ```
1030 | 
1031 | First, we'll use `IGNORE INDEX` to direct SQL to ignore the index we just created. This query counts the duplicate name tuples. Since the `id` is being included, and `GROUPing` it would result in an empty set (as it's the primary key, and thus guaranteed to be unique), `ANY VALUE` must be specified to let MySQL know that the result can be non-deterministic. Finally, `EXPLAIN ANALYZE` is being used to run the query, and explain what it's doing. This differs from `EXPLAIN`, which guesses at what would be done, but doesn't actually perform the query. Be careful using `EXPLAIN ANALYZE`, especially with destructive actions, since those queries will actually be performed!
1032 | 
1033 | ```sql
1034 | EXPLAIN ANALYZE
1035 | SELECT
1036 |   ANY_VALUE(id),
1037 |   first_name,
1038 |   last_name,
1039 |   COUNT(*) c
1040 | FROM
1041 |   ref_users_big
1042 | IGNORE INDEX(full_name)
1043 | GROUP BY
1044 |   first_name,
1045 |   last_name
1046 | HAVING
1047 |   c > 1\G
1048 | ```
1049 | 
1050 | ```sql
1051 | *************************** 1. row ***************************
1052 | EXPLAIN: -> Filter: (c > 1)  (actual time=23295.903..24686.641 rows=4318 loops=1)
1053 |     -> Table scan on <temporary>  (actual time=0.005..903.621 rows=995670 loops=1)
1054 |         -> Aggregate using temporary table  (actual time=23295.727..24415.358 rows=995670 loops=1)
1055 |             -> Table scan on ref_users_big  (cost=104920.32 rows=995522) (actual time=2.329..10156.102 rows=1000000 loops=1)
1056 | 
1057 | 1 row in set (25.26 sec)
1058 | ```
1059 | 
1060 | The query took 25.26 seconds, and resulted in 4318 rows. The output is read from the bottom up - a table scan was performed on the entire table, then a temporary table with the `GROUP BY` aggregation was created, and finally a second table scan on that temporary table was performed to find the duplicated tuples.
1061 | 
1062 | If you're curious, `actual time` is in milliseconds, and consists of two timings - the first is the time to initiate the step and return the first row; the second is the time to initiate the step and return all rows. `cost` is an arbitrary number indicating what the query cost optimizer thinks the query costs to perform, and is meaningless.
1063 | 
1064 | ```sql
1065 | EXPLAIN ANALYZE
1066 | SELECT
1067 |   ANY_VALUE(id),
1068 |   first_name,
1069 |   last_name,
1070 |   COUNT(*) c
1071 | FROM
1072 |   ref_users_big
1073 | GROUP BY
1074 |   first_name,
1075 |   last_name
1076 | HAVING
1077 |   c > 1\G
1078 | ```
1079 | 
1080 | ```sql
1081 | *************************** 1. row ***************************
1082 | EXPLAIN: -> Filter: (c > 1)  (actual time=6.318..12202.646 rows=4318 loops=1)
1083 |     -> Group aggregate: count(0)  (actual time=0.864..11447.233 rows=995670 loops=1)
1084 |         -> Index scan on ref_users_big using full_name  (cost=104920.32 rows=995522) (actual time=0.815..7315.098 rows=1000000 loops=1)
1085 | 
1086 | 1 row in set (12.32 sec)
1087 | ```
1088 | 
1089 | With the index in place, an index scan is performed instead of two table scans, resulting in a ~2x speedup.
1090 | 
1091 | Another example, retreiving a specific doubled tuple that I know exists:
1092 | 
1093 | ```sql
1094 | SELECT
1095 |   user_id,
1096 |   full_name,
1097 |   email,
1098 |   city,
1099 |   country
1100 | FROM
1101 |   ref_users_big
1102 | WHERE
1103 |   first_name = 'Ashlie'
1104 | AND
1105 |   last_name = 'Godred';
1106 | ```
1107 | 
1108 | ```sql
1109 | +---------+----------------+-------------------------+----------+--------------+
1110 | | user_id | full_name      | email                   | city     | country      |
1111 | +---------+----------------+-------------------------+----------+--------------+
1112 | |  974206 | Godred, Ashlie | ashlie.godred@mushy.com | Mikkeli  | Finland      |
1113 | |  987301 | Godred, Ashlie | ashlie.godred@suave.com | Pretoria | South Africa |
1114 | +---------+----------------+-------------------------+----------+--------------+
1115 | 2 rows in set (0.01 sec)
1116 | ```
1117 | 
1118 | vs. if `USE INDEX()` is added to the query:
1119 | 
1120 | ```sql
1121 | +---------+----------------+-------------------------+----------+--------------+
1122 | | user_id | full_name      | email                   | city     | country      |
1123 | +---------+----------------+-------------------------+----------+--------------+
1124 | |  974206 | Godred, Ashlie | ashlie.godred@mushy.com | Mikkeli  | Finland      |
1125 | |  987301 | Godred, Ashlie | ashlie.godred@suave.com | Pretoria | South Africa |
1126 | +---------+----------------+-------------------------+----------+--------------+
1127 | 2 rows in set (14.60 sec)
1128 | ```
1129 | 
1130 | Note that `USE INDEX()` is valid syntax to tell MySQL to ignore all indexes.
1131 | 
1132 | If instead, either the `full_name` or `last_name_partial` index we made perviously is ignored on its own, its complement will be used, and they're effectively equally fast due to the filtered result set - here, using the partial index on `last_name` dropped the candidate tuples from 1,000,000 to 1,066.
1133 | 
1134 | ```sql
1135 | EXPLAIN ANALYZE
1136 | SELECT
1137 |   user_id,
1138 |   full_name,
1139 |   email,
1140 |   city,
1141 |   country
1142 | FROM
1143 |   ref_users_big IGNORE INDEX(full_name)
1144 | WHERE
1145 |   first_name = 'Ashlie'
1146 |   AND
1147 |   last_name = 'Godred'\G
1148 | ```
1149 | 
1150 | ```sql
1151 | *************************** 1. row ***************************
1152 | EXPLAIN: -> Filter: ((ref_users_big.last_name = 'Godred') and (ref_users_big.first_name = 'Ashlie'))  (cost=641.79 rows=0) (actual time=315.346..322.278 rows=2 loops=1)
1153 |     -> Index lookup on ref_users_big using last_name_partial (last_name='Godred')  (cost=641.79 rows=1066) (actual time=6.602..317.360 rows=1066 loops=1)
1154 | 
1155 | 1 row in set (0.34 sec)
1156 | ```
1157 | #### Testing indices
1158 | 
1159 | MySQL 8 added the ability to toggle an index on and off, without actually dropping it. This way, if you want to test whether or not an index is helpful, you can toggle it off, observe query performance, and then decide whether or not to leave it.
1160 | 
1161 | ```sql
1162 | ALTER TABLE ref_users ALTER INDEX first_name INVISIBLE;
1163 | ```
1164 | 
1165 | ```sql
1166 | Query OK, 0 rows affected (0.28 sec)
1167 | Records: 0  Duplicates: 0  Warnings: 0
1168 | ```
1169 | 
1170 | ```sql
1171 | EXPLAIN ANALYZE
1172 | SELECT
1173 |   user_id,
1174 |   full_name,
1175 |   email,
1176 |   city,
1177 |   country
1178 | FROM
1179 |   ref_users_big
1180 | WHERE
1181 |   first_name = 'Ashlie'
1182 |   AND
1183 |   last_name = 'Godred'\G
1184 | ```
1185 | 
1186 | ```sql
1187 | *************************** 1. row ***************************
1188 | EXPLAIN: -> Filter: ((ref_users_big.last_name = 'Godred') and (ref_users_big.first_name = 'Ashlie'))  (cost=641.79 rows=0) (actual time=315.346..322.278 rows=2 loops=1)
1189 |     -> Index lookup on ref_users_big using last_name_partial (last_name='Godred')  (cost=641.79 rows=1066) (actual time=6.602..317.360 rows=1066 loops=1)
1190 | 
1191 | 1 row in set (0.34 sec)
1192 | ```
1193 | 
1194 | #### Descending indices
1195 | 
1196 | By default, indices are sorted in ascending order. While they can still be used when reversed, it's not as fast (although the performance difference may be minimal - test your theory before committing to it). If you are frequently querying something with `ORDER BY <row> DESC`, it may be helpful to instead create an index in descending order.
1197 | 
1198 | ```sql
1199 | CREATE INDEX first_desc ON ref_users_big (first_name DESC);
1200 | ```
1201 | 
1202 | ```sql
1203 | Query OK, 0 rows affected (41.18 sec)
1204 | Records: 0  Duplicates: 0  Warnings: 0
1205 | ```
1206 | 
1207 | #### When indicies aren't helpful
1208 | 
1209 | You may have noticed in a few of the previous `EXPLAIN ANALYZE` statements two different kinds of inner joins - `nested loop inner join`, and `inner hash join`. A nested loop join is exactly what it sounds like:
1210 | 
1211 | ```python
1212 | for tuple_i in table_1:
1213 |     for tuple_j in table_2
1214 |         if join_is_satisfied(tuple_i, tuple_j):
1215 |             yield (tuple_i, tuple_j)
1216 | ```
1217 | 
1218 | This has `O(MN)` time complexity, where `M` and `N` are the number of tuples in each table. If there's an index, the 2nd loop is using it for the lookup rather than another table scan, which makes the time complexity `O(Mlog(N))`, but with large sizes this is still quite bad. Here is an example on two tables with one million rows each:
1219 | 
1220 | ```sql
1221 | EXPLAIN ANALYZE
1222 | SELECT
1223 |   full_name
1224 | FROM
1225 |   ref_users_big
1226 | JOIN
1227 |   ref_zaps_big
1228 | ON
1229 |   ref_users_big.user_id = ref_zaps_big.owned_by\G
1230 | ```
1231 | 
1232 | ```sql
1233 | *************************** 1. row ***************************
1234 | EXPLAIN: -> Nested loop inner join  (cost=498015.60 rows=993197) (actual time=6.998..360927.896 rows=1000000 loops=1)
1235 |     -> Table scan on zaps  (cost=100160.95 rows=993197) (actual time=6.685..8804.370 rows=1000000 loops=1)
1236 |     -> Single-row index lookup on ref_users using user_id (user_id=zaps.owned_by)  (cost=0.30 rows=1) (actual time=0.350..0.350 rows=1 loops=1000000)
1237 | 
1238 | 1 row in set (6 min 2.41 sec)
1239 | ```
1240 | 
1241 | A better solution is a hash join, specifically a grace hash join, named after the GRACE database created in the 1980s at the University of Tokyo, which pioneered this method.
1242 | 
1243 | ```python
1244 | dict_table_1 = {id: row for id, row in table_1}
1245 | dict_table_2 = {id: row for id, row in table_2}
1246 | for tuple_i in dict_table_1.items():
1247 |     for tuple_j in dict_table_2.items():
1248 |         if join_is_satisfied(tuple_i, tuple_j):
1249 |             yield (tuple_i, tuple_j)
1250 | ```
1251 | 
1252 | While this looks very similar, there are details I've glossed over about the partioning method (it's recursive), and of course hash lookups are (optimally) `O(1)`, which speeds things up tremendously. The total time complexity for this method is `3(M+N)`.
1253 | 
1254 | MySQL [added a hash join in 8.0.18](https://dev.mysql.com/blog-archive/hash-join-in-mysql-8/), but it comes with some limitations; chiefly that a table must fit into memory, and annoyingly, that the optimizer will often decide to use a nested loop if indexes exist. If it can be used, though, compare the difference:
1255 | 
1256 | ```sql
1257 | EXPLAIN ANALYZE
1258 | SELECT
1259 |   full_name
1260 | FROM
1261 |   ref_users
1262 | IGNORE INDEX (user_id)
1263 | JOIN
1264 |   zaps
1265 | ON
1266 |   ref_users.user_id = zaps.owned_by\G
1267 | ```
1268 | 
1269 | ```sql
1270 | *************************** 1. row ***************************
1271 | EXPLAIN: -> Inner hash join (ref_users.user_id = zaps.owned_by)  (cost=98991977261.77 rows=993197) (actual time=7814.295..21403.160 rows=1000000 loops=1)
1272 |     -> Table scan on ref_users  (cost=0.03 rows=996699) (actual time=0.402..9319.650 rows=1000000 loops=1)
1273 |     -> Hash
1274 |         -> Table scan on zaps  (cost=100160.95 rows=993197) (actual time=4.566..6810.026 rows=1000000 loops=1)
1275 | 
1276 | 1 row in set (21.93 sec)
1277 | ```
1278 | 
1279 | #### HAVING
1280 | 
1281 | Earlier, we used `HAVING` in a `GROUP BY` aggregation. The difference between the two is that `WHERE` filters the results before they're sent to be aggregated, whereas `HAVING` filters the aggregation, and thus predicates relying on the aggregation result can be used. It's not limited to only aggregation results, though - a common use case is to allow the use of aliases or subquery results in filtering. Be aware that it's generally more performant to use `WHERE` if possible (consider re-writing your query if it isn't), but sometimes, you need it.
1282 | 
1283 | ```sql
1284 | SELECT
1285 |   ref_users_big.city,
1286 |   COUNT(ref_zaps_big.zap_id) as zap_count
1287 | FROM
1288 |   ref_users_big
1289 | LEFT JOIN
1290 |   ref_zaps_big
1291 | ON
1292 |   ref_users_big.user_id = ref_zaps_big.owned_by
1293 | GROUP BY
1294 |   ref_users_big.city
1295 | HAVING
1296 |   zap_count > 250;
1297 | ```
1298 | 
1299 | ```sql
1300 | +----------+-----------+
1301 | | city     | zap_count |
1302 | +----------+-----------+
1303 | | Hsin-chu |       260 |
1304 | | Vitória  |       293 |
1305 | | Cordoba  |       290 |
1306 | | Gdañsk   |       292 |
1307 | +----------+-----------+
1308 | 4 rows in set (32.86 sec)
1309 | ```
1310 | 
1311 | ## Query optimization
1312 | 
1313 | Finally into the fun stuff!
1314 | 
1315 | First, I'll spoil a lot of this - it's likely that you won't have to do much of this. MySQL's optimizer is actually pretty decent. That said, there are times when you will, and knowing what _should_ be happening, and how to compare it to what is actually happening is a useful skill.
1316 | 
1317 | ### SELECT *
1318 | 
1319 | If you're just exploring a schema, there's nothing wrong with `SELECT * FROM <table> LIMIT 10` or some other small number (< ~1000). It will be nearly instantaneous. However, the problem arises when you're also using `ORDER BY`. Recall that we had a composite index on `(first_name, last_name)` called `full_name`. Compare these two:
1320 | 
1321 | ```sql
1322 | EXPLAIN ANALYZE
1323 | SELECT
1324 |   *
1325 | FROM
1326 |   ref_users_big
1327 | ORDER BY
1328 |   first_name,
1329 |   last_name\G
1330 | ```
1331 | 
1332 | ```sql
1333 | *************************** 1. row ***************************
1334 | EXPLAIN: -> Sort: ref_users.first_name, ref_users.last_name  (cost=100495.40 rows=996699) (actual time=12199.513..12603.379 rows=1000000 loops=1)
1335 |     -> Table scan on ref_users  (cost=100495.40 rows=996699) (actual time=1.755..7039.004 rows=1000000 loops=1)
1336 | 
1337 | 1 row in set (13.68 sec)
1338 | ```
1339 | 
1340 | ```sql
1341 | EXPLAIN ANALYZE
1342 | SELECT
1343 |   user_id,
1344 |   first_name,
1345 |   last_name
1346 | FROM
1347 |   ref_users_big
1348 | ORDER BY
1349 |   first_name,
1350 |   last_name\G
1351 | ```
1352 | 
1353 | ```sql
1354 | *************************** 1. row ***************************
1355 | EXPLAIN: -> Index scan on ref_users using full_name  (cost=100495.40 rows=996699) (actual time=0.433..5413.188 rows=1000000 loops=1)
1356 | 
1357 | 1 row in set (6.39 sec)
1358 | ```
1359 | 
1360 | Since the the table includes columns not covered by the index (`user_id`), it would take longer to use the index and then find columns not in the index than to just do a table scan. Observe:
1361 | 
1362 | ```sql
1363 | EXPLAIN ANALYZE
1364 | SELECT
1365 |   *
1366 | FROM
1367 |   ref_users
1368 | FORCE INDEX(full_name)
1369 | ORDER BY
1370 |   first_name,
1371 |   last_name\G
1372 | ```
1373 | 
1374 | ```sql
1375 | *************************** 1. row ***************************
1376 | EXPLAIN: -> Index scan on ref_users using full_name  (cost=348844.90 rows=996699) (actual time=11.273..65858.816 rows=1000000 loops=1)
1377 | 
1378 | 1 row in set (1 min 7.13 sec)
1379 | ```
1380 | 
1381 | In comparison, if your `ORDER BY` is covered by the index (the primary key - `user_id` here - is implicitly part of indices, and thus doesn't cause a slowdown), queries can use it, and are much faster! If you're writing software that will be accessing a database, and you don't actually need all of the columns, don't request them. Take the time to be deliberate in what you request.
1382 | 
1383 | ### OFFSET / LIMIT
1384 | 
1385 | If you need to get `n` rows from the middle of a table, unless you have a really good reason to do so, please don't do this:
1386 | 
1387 | ```sql
1388 | -- The alternate form (and, IMO, the clearer one) is LIMIT 10 OFFSET 500000
1389 | SELECT
1390 |   user_id,
1391 |   full_name
1392 | FROM
1393 |   ref_users_big
1394 | LIMIT 500000,10;
1395 | ```
1396 | 
1397 | ```sql
1398 | +---------+-------------------+
1399 | | user_id | full_name         |
1400 | +---------+-------------------+
1401 | |  500001 | Ader, Wilona      |
1402 | |  500002 | Lindsley, Angy    |
1403 | |  500003 | Scarito, Vladimir |
1404 | |  500004 | Hoenack, Rossy    |
1405 | |  500005 | Cooley, Theobald  |
1406 | |  500006 | Pineda, Gaven     |
1407 | |  500007 | Harberd, Odie     |
1408 | |  500008 | Engleman, Mendy   |
1409 | |  500009 | Michon, Dionysus  |
1410 | |  500010 | Seaden, Leigha    |
1411 | +---------+-------------------+
1412 | 10 rows in set (6.29 sec)
1413 | ```
1414 | 
1415 | Doing this causes a table scan up to the specified offset. Far better, if you have a known monotonic number (like `id`), is to use a `WHERE` predicate:
1416 | 
1417 | ```sql
1418 | SELECT
1419 |   user_id,
1420 |   full_name
1421 | FROM
1422 |   ref_users_big
1423 | WHERE user_id > 500000
1424 | LIMIT 10;
1425 | ```
1426 | 
1427 | ```sql
1428 | +---------+-------------------+
1429 | | user_id | full_name         |
1430 | +---------+-------------------+
1431 | |  500001 | Ader, Wilona      |
1432 | |  500002 | Lindsley, Angy    |
1433 | |  500003 | Scarito, Vladimir |
1434 | |  500004 | Hoenack, Rossy    |
1435 | |  500005 | Cooley, Theobald  |
1436 | |  500006 | Pineda, Gaven     |
1437 | |  500007 | Harberd, Odie     |
1438 | |  500008 | Engleman, Mendy   |
1439 | |  500009 | Michon, Dionysus  |
1440 | |  500010 | Seaden, Leigha    |
1441 | +---------+-------------------+
1442 | 10 rows in set (0.02 sec)
1443 | ```
1444 | 
1445 | Using `user_id` as the filter allows it to be used for an index range scan, which is nearly instant. If you were doing this programmatically to support pagination, the last value of `id` could be used for the next iteration's predicate.
1446 | 
1447 | ### DISTINCT
1448 | 
1449 | `DISTINCT` is a very useful keyword for many operations when you want to not show duplicates. Unfortunately, it also adds a fairly hefty load to the database. That's not to say you _can't_ use it, but when writing code that will end up using this, ask yourself if you could intead handle de-duplication in the application. This also comes with tradeoffs, of course - you're now pulling more data over the network, and increasing load on the application. Generally speaking, databases are bound first by disk and memory, rather than CPU or network, so using compression (increased CPU load) and/or sending more data (not using `DISTINCT`) tends to increase overall performance, but you should experiment and profile your code.
1450 | 
1451 | This also tends to be something that works well early on with little load, but as either the database or application grows, it becomes unwieldy.
1452 | 
1453 | ```sql
1454 | EXPLAIN ANALYZE
1455 | SELECT
1456 |   first_name,
1457 |   last_name
1458 | FROM
1459 |   ref_users_big\G
1460 | ```
1461 | 
1462 | ```sql
1463 | *************************** 1. row ***************************
1464 | EXPLAIN: -> Table scan on ref_users_big  (cost=101365.53 rows=995522) (actual time=1.815..7213.716 rows=1000000 loops=1)
1465 | 
1466 | 1 row in set (8.13 sec)
1467 | ```
1468 | 
1469 | ```sql
1470 | EXPLAIN ANALYZE
1471 | SELECT DISTINCT
1472 |   first_name,
1473 |   last_name
1474 | FROM
1475 |   ref_users_big\G
1476 | ```
1477 | 
1478 | ```sql
1479 | EXPLAIN: -> Table scan on <temporary>  (actual time=0.005..765.220 rows=995670 loops=1)
1480 |     -> Temporary table with deduplication  (cost=101050.45 rows=995522) (actual time=15306.678..16296.289 rows=995670 loops=1)
1481 |         -> Table scan on ref_users_big  (cost=101050.45 rows=995522) (actual time=0.825..8718.651 rows=1000000 loops=1)
1482 | 
1483 | 1 row in set (17.73 sec)
1484 | ```
1485 | ## Cleanup
1486 | 
1487 | This isn't something you'll do often, if at all, so may as well do so now, eh?
1488 | 
1489 | ```sql
1490 | DROP SCHEMA foo;
1491 | ```
1492 | 
1493 | ```sql
1494 | Query OK, 0 rows affected (0.05 sec)
1495 | ```
1496 | 


--------------------------------------------------------------------------------
/mysql/mysql-102.md:
--------------------------------------------------------------------------------
  1 | # MySQL 102 - WIP
  2 | 
  3 | ### WITH (Common Table Expressions)
  4 | 
  5 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/with.html)
  6 | 
  7 | `WITH` can be used to create a temporary named result set, scoped to the statement in which it exists. They can also be recursive. A demonstration that's probably not useful in reality follows, but it does demonstrate how MySQL can be made to use indexes, even when it normally couldn't. Here, we're trying to select a random row from a large table. The row ID is selected with a sub-query that multiplies the output of `RAND()` (a float between 0-1) by the last `id` row in the table.
  8 | 
  9 | ```sql
 10 | mysql>
 11 | EXPLAIN ANALYZE SELECT
 12 |   *
 13 | FROM
 14 |   ref_users
 15 | WHERE
 16 |   id = (
 17 |     SELECT
 18 |       FLOOR(
 19 |         (
 20 |           SELECT
 21 |             RAND() * (
 22 |               SELECT
 23 |                 id
 24 |               FROM
 25 |                 ref_users
 26 |               ORDER BY
 27 |                 id DESC
 28 |               LIMIT
 29 |                 1
 30 |             )
 31 |         )
 32 |       )
 33 |   );
 34 | *************************** 1. row ***************************
 35 | EXPLAIN: -> Filter: (ref_users.id = floor((rand() * (select #4))))  (cost=10799.04 rows=99735) (actual time=1545.462..8220.073 rows=3 loops=1)
 36 |     -> Table scan on ref_users  (cost=10799.04 rows=997354) (actual time=0.441..6723.994 rows=1000000 loops=1)
 37 |     -> Select #4 (subquery in condition; run only once)
 38 |         -> Limit: 1 row(s)  (cost=0.00 rows=1) (actual time=0.079..0.079 rows=1 loops=1)
 39 |             -> Index scan on ref_users using PRIMARY (reverse)  (cost=0.00 rows=1) (actual time=0.077..0.077 rows=1 loops=1)
 40 | 
 41 | 1 row in set, 2 warnings (8.22 sec)
 42 | ```
 43 | 
 44 | Since `RAND()` is evaluated for every row [when used with WHERE](https://dev.mysql.com/doc/refman/8.0/en/mathematical-functions.html#function_rand), it's not constant, and thus can't be used with indices. Also, you may wind up with more than one result!
 45 | 
 46 | If instead the `RAND()` call is placed into a CTE, it can be optimized:
 47 | 
 48 | ```sql
 49 | mysql>
 50 | EXPLAIN ANALYZE
 51 | WITH rand AS (
 52 |   SELECT
 53 |     FLOOR(
 54 |       (
 55 |         SELECT
 56 |           RAND() * (
 57 |             SELECT
 58 |               id
 59 |             FROM
 60 |               ref_users
 61 |             ORDER BY
 62 |               id DESC
 63 |             LIMIT
 64 |               1
 65 |           )
 66 |       )
 67 |     )
 68 | )
 69 | SELECT
 70 |   *
 71 | FROM
 72 |   ref_users
 73 | WHERE
 74 |   id IN (TABLE rand);
 75 | *************************** 1. row ***************************
 76 | EXPLAIN: -> Nested loop inner join  (cost=0.55 rows=1) (actual time=0.569..0.583 rows=1 loops=1)
 77 |     -> Filter: (`<subquery2>`.`FLOOR((SELECT RAND() * (SELECT id FROM ref_users ORDER BY id DESC LIMIT 1)))` is not null)  (cost=0.20 rows=1) (actual time=0.085..0.095 rows=1 loops=1)
 78 |         -> Table scan on <subquery2>  (cost=0.20 rows=1) (actual time=0.005..0.012 rows=1 loops=1)
 79 |             -> Materialize with deduplication  (cost=0.00 rows=1) (actual time=0.082..0.090 rows=1 loops=1)
 80 |                 -> Filter: (rand.`FLOOR((SELECT RAND() * (SELECT id FROM ref_users ORDER BY id DESC LIMIT 1)))` is not null)  (cost=0.00 rows=1) (actual time=0.017..0.023 rows=1 loops=1)
 81 |                     -> Table scan on rand  (cost=2.61 rows=1) (actual time=0.010..0.014 rows=1 loops=1)
 82 |                         -> Materialize CTE rand  (cost=0.00 rows=1) (actual time=0.013..0.018 rows=1 loops=1)
 83 |                             -> Rows fetched before execution  (cost=0.00 rows=1) (never executed)
 84 |                             -> Select #5 (subquery in projection; run only once)
 85 |                                 -> Limit: 1 row(s)  (cost=0.00 rows=1) (actual time=0.313..0.314 rows=1 loops=1)
 86 |                                     -> Index scan on ref_users using PRIMARY (reverse)  (cost=0.00 rows=1) (actual time=0.310..0.310 rows=1 loops=1)
 87 |     -> Filter: (ref_users.id = `<subquery2>`.`FLOOR((SELECT RAND() * (SELECT id FROM ref_users ORDER BY id DESC LIMIT 1)))`)  (cost=0.35 rows=1) (actual time=0.477..0.479 rows=1 loops=1)
 88 |         -> Single-row index lookup on ref_users using PRIMARY (id=`<subquery2>`.`FLOOR((SELECT RAND() * (SELECT id FROM ref_users ORDER BY id DESC LIMIT 1)))`)  (cost=0.35 rows=1) (actual time=0.468..0.469 rows=1 loops=1)
 89 | 
 90 | 1 row in set, 1 warning (0.00 sec)
 91 | ```
 92 | 
 93 | ## Stored Procedures
 94 | 
 95 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html)
 96 | 
 97 | Stored Procedures (and Stored Functions) are a way to write SQL as functions, to be called as needed. Most normal SQL queries are accepted, as well as conditionals, loops, and the ability to accept arguments and return values. The main difference between the two is that Stored Procedures may accept arguments and write out data to variables, whereas Stored Functions may accept arguments, and return a value.
 98 | 
 99 | Their main advantage is that known, tested queries can be stored and later called from an application. Their main disadvantage is that they require people with reasonably good SQL skills to write them, else it's unlikely they'll exceed the performance of an ORM like Django.
100 | 
101 | As an example, I used this to fill `zaps` with data (NOTE: this is not an example of a well-designed stored procedure, merely one that demonstrates a variety of concepts):
102 | 
103 | ```sql
104 | DELIMITER // -- This is needed so that the individual commands don't end the stored procedure
105 | CREATE PROCEDURE insert_zaps(IN num_rows int, IN pct_shared float) -- Two input args are needed
106 | BEGIN
107 |   DECLARE loop_count bigint; -- Variables are initialized with a type
108 |   DECLARE len_table bigint;
109 |   DECLARE rand_base float;
110 |   DECLARE rand_offset float;
111 |   DECLARE rand_ts timestamp;
112 |   DECLARE rand_user bigint;
113 |   DECLARE shared_with_user bigint;
114 |   SELECT id INTO len_table FROM test.ref_users ORDER BY id DESC LIMIT 1; -- SELECT INTO can be used
115 |   SET loop_count = 1; -- Or, if the value is simple, simply assigned
116 |   WHILE loop_count <= num_rows DO
117 |     SET rand_base = RAND();
118 |     SET rand_offset = RAND();
119 |     SET rand_ts = TIMESTAMP(
120 |                     FROM_UNIXTIME(
121 |                       UNIX_TIMESTAMP(NOW()) - FLOOR(
122 |                         0 + (
123 |                           RAND() * 86400 * 365 * 10
124 |                         )
125 |                       )
126 |                     )
127 |                   ); -- This creates a random timestamp between now and 10 years ago
128 |     WITH rand AS (
129 |         SELECT
130 |           FLOOR(
131 |             (
132 |               SELECT
133 |                 rand_base * len_table
134 |             )
135 |           )
136 |       )
137 |       SELECT
138 |         id
139 |       INTO rand_user
140 |       FROM
141 |         test.ref_users
142 |       WHERE
143 |         id IN (TABLE rand); -- This is the CTE demonstrated earlier to determine the table length
144 |     INSERT INTO zaps (zap_id, created_at, owned_by) VALUES (loop_count, rand_ts, rand_user);
145 |     IF ROUND(rand_base, 1) > (1 - pct_shared) THEN -- Roughly determine the amount of shared Zaps
146 |       SELECT CAST(FLOOR(rand_base * rand_offset * len_table) AS unsigned) INTO shared_with_user;
147 |       UPDATE
148 |         zaps
149 |       SET
150 |         shared_with = JSON_ARRAY_APPEND(
151 |           shared_with,
152 |           '$',
153 |           shared_with_user
154 |         ) -- JSON_ARRAY_APPEND(array, key, value)
155 |       WHERE
156 |         id = loop_count;
157 |     END IF;
158 |     SET loop_count = loop_count + 1;
159 |   END WHILE;
160 |   END //
161 |   DELIMITER ;
162 | ```
163 | 


--------------------------------------------------------------------------------
/terraform/tf-101.md:
--------------------------------------------------------------------------------
 1 | # Introduction
 2 | 
 3 | ## What is Terraform?
 4 | 
 5 | It's an Infrastructure-As-Code tool. It allows for declaratively creating infrastructure on cloud providers, in colos, and even your homelab. There are extensions for practically everything you can think of; if you're missing one (and you know Golang), you can write it.
 6 | 
 7 | ## What is declarative?
 8 | 
 9 | Computer languages generally fall into one of two types - imperative, and declarative. Most are imperative, which means that you explicitly tell the language what to do. With a declarative language, you describe what you want, and it figures out how to get there. That makes it sound fancier and easier than it is; in reality, you have to describe in very specific terms what it is you want.
10 | 
11 | Terraform is mostly declarative, with some recent nods to imperative programming such as for loops - prior to version 0.12, you had to define the `count` of a resource you wanted instantiated, and it would make `n` copies of it.
12 | 
13 | # Terraform Basics
14 | 
15 | ## Resources vs. Modules
16 | 
17 | Broadly speaking, resources specifically instantiate a named resource, like an EC2 instance, or a DNS record, whereas a module generically defines those things - usually with default values assigned - and you can later call them, saving typing. You _can_ define your entire infrastructure solely with resources, but you'll be missing out on a huge advantage of Terraform.
18 | 
19 | The (redacted) example module creates a Redis instance, a Postgres instance, security groups for both, and a Cloudwatch metric for Redis. Going further, looking at its `variables.tf` file, we see that there are quite a few options - the type and size of storage for the DB, the version of both Redis and Postgres, encryption and snapshot options, and more. For more information on what is required to be passed to the resource, you can consult Terraform's documentation - here is the [Elasticache (Redis) page.](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/elasticache_replication_group)
20 | 
21 | ## Variables
22 | 
23 | Anything with a `default` value will pass this into the Terraform module when it's called if it isn't overridden. For example, `aws_db_instance.rds.storage_type` has as its value `var.db_storage_type`, which instructs Terraform to look at the variable `db_storage_type` - it's set to `gp2`, so we don't have to change it. `db_allocated_storage` has as a default `100` (in GiB, which is helpfully displayed as the variable's description), but it's overridden in the calling module to `20`.
24 | 
25 | You may have noticed that some of the variables defined have `type` set. Terraform is dynamically typed, but much like Python with mypy, allows for static typing if desired.
26 | 
27 | Locals, seen at the top of `main.tf`, are just that - local variables to that file. They can be anywhere, but historically are placed at the top of the file. They're generally used as seen here, to write out what would otherwise be bulky code with ternaries into something cleaner for later use. They're referenced with `local.varname` instead of `var.varname`.
28 | 
29 | Terraform underwent a large syntax change between v0.11 and v0.12. In 0.11, all variables were encased in the `"${var.foo}"` syntax you may see scattered around. That has been simplified to `var.foo` or `local.foo`, whichever is correct for the variable. The exception is for string interpolation - using variables along with plaintext (or concatenating strings without the use of the `join` function) requires all variables to be wrapped in `"${}"`. People comfortable with Bash programming will feel at home here.
30 | 
31 | Modules may also include a `terraform.tfvars` file, which has a `key=value` mapping for variable assignment. These are often used to have production and staging versions of infrastructure.
32 | 
33 | Variable definition precedence takes the following order, from first to last, with the latest definition standing: env vars --> `terraform.tfvars` --> `-var $foo` on the command line. In general, you'll want to mimic what you see in use in the repository.
34 | 
35 | ## Functions
36 | 
37 | Terraform includes many built-in functions. One of them seen here is `flatten`. [Here is Terraform's](https://www.terraform.io/language/functions/flatten) documentation on the function, but you may be able to guess that it's flattening lists or lists of lists into a single list. Read through the documentation to get an idea of the rest of them.
38 | 
39 | ## Plans and Applies
40 | 
41 | This is the main draw of Terraform. When you run `terraform plan`, it looks at the existing infrastructure, compares it to its statefile, and generates a human-readable diff. It also includes things that have changed outside of its scope (for example, if someone manually creates a database using the AWS console), and at the bottom, a summary saying how many entities will be created, changed, and destroyed. You can then save this plan and apply it later - this is what Atlantis does. Additionally, during this time the statefile is locked, so no other changes can be made. This ensures that your expected output is applied with no surprises due to someone else making a change at the same time.
42 | 
43 | To destroy infrastructure, in general you'll delete the resource/module from the code, and then run a plan. Terraform will detect that it exists in infrastructure but not in code, and generate a plan to destroy it which you can apply. In practice, some resources have protection enabled that prevents destroys. To destroy them, you either have to do two plan/apply cycles (one to remove the deletion protection, and another to destroy the resource), or manually delete it from the AWS console or command line, and then run the plan/apply. You can see this in `main.tf` on L144, with an explanation comment block above it.
44 | 
45 | Targeted applies (where you specifically instruct Terraform to only affect a specific resource) also exist, but these are rarely needed and shouldn't be relied upon. Similarly, you can import pre-existing resources into the statefile, although the syntax can be a bit confusing, and there are also occasional bizarre gotchas such as needing a region to be hard-coded in the infrastructure code.


--------------------------------------------------------------------------------