├── .gitignore
├── LICENSE
├── k8s
├── deployment.yaml
├── echo
│ ├── Dockerfile
│ ├── echo.py
│ ├── requirements.txt
│ └── templates
│ │ └── index.html
├── helm
│ ├── Chart.yaml
│ └── templates
│ │ ├── NOTES.txt
│ │ ├── deployment.yaml
│ │ ├── ingress.yaml
│ │ ├── namespace.yaml
│ │ ├── rbac.yaml
│ │ └── service.yaml
├── k8s-101.md
└── k8s-102.md
├── mysql
├── mysql-101-0.md
├── mysql-101-1.md
└── mysql-102.md
└── terraform
└── tf-101.md
/.gitignore:
--------------------------------------------------------------------------------
1 | k8s/cert/*
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Mozilla Public License Version 2.0
2 | ==================================
3 |
4 | 1. Definitions
5 | --------------
6 |
7 | 1.1. "Contributor"
8 | means each individual or legal entity that creates, contributes to
9 | the creation of, or owns Covered Software.
10 |
11 | 1.2. "Contributor Version"
12 | means the combination of the Contributions of others (if any) used
13 | by a Contributor and that particular Contributor's Contribution.
14 |
15 | 1.3. "Contribution"
16 | means Covered Software of a particular Contributor.
17 |
18 | 1.4. "Covered Software"
19 | means Source Code Form to which the initial Contributor has attached
20 | the notice in Exhibit A, the Executable Form of such Source Code
21 | Form, and Modifications of such Source Code Form, in each case
22 | including portions thereof.
23 |
24 | 1.5. "Incompatible With Secondary Licenses"
25 | means
26 |
27 | (a) that the initial Contributor has attached the notice described
28 | in Exhibit B to the Covered Software; or
29 |
30 | (b) that the Covered Software was made available under the terms of
31 | version 1.1 or earlier of the License, but not also under the
32 | terms of a Secondary License.
33 |
34 | 1.6. "Executable Form"
35 | means any form of the work other than Source Code Form.
36 |
37 | 1.7. "Larger Work"
38 | means a work that combines Covered Software with other material, in
39 | a separate file or files, that is not Covered Software.
40 |
41 | 1.8. "License"
42 | means this document.
43 |
44 | 1.9. "Licensable"
45 | means having the right to grant, to the maximum extent possible,
46 | whether at the time of the initial grant or subsequently, any and
47 | all of the rights conveyed by this License.
48 |
49 | 1.10. "Modifications"
50 | means any of the following:
51 |
52 | (a) any file in Source Code Form that results from an addition to,
53 | deletion from, or modification of the contents of Covered
54 | Software; or
55 |
56 | (b) any new file in Source Code Form that contains any Covered
57 | Software.
58 |
59 | 1.11. "Patent Claims" of a Contributor
60 | means any patent claim(s), including without limitation, method,
61 | process, and apparatus claims, in any patent Licensable by such
62 | Contributor that would be infringed, but for the grant of the
63 | License, by the making, using, selling, offering for sale, having
64 | made, import, or transfer of either its Contributions or its
65 | Contributor Version.
66 |
67 | 1.12. "Secondary License"
68 | means either the GNU General Public License, Version 2.0, the GNU
69 | Lesser General Public License, Version 2.1, the GNU Affero General
70 | Public License, Version 3.0, or any later versions of those
71 | licenses.
72 |
73 | 1.13. "Source Code Form"
74 | means the form of the work preferred for making modifications.
75 |
76 | 1.14. "You" (or "Your")
77 | means an individual or a legal entity exercising rights under this
78 | License. For legal entities, "You" includes any entity that
79 | controls, is controlled by, or is under common control with You. For
80 | purposes of this definition, "control" means (a) the power, direct
81 | or indirect, to cause the direction or management of such entity,
82 | whether by contract or otherwise, or (b) ownership of more than
83 | fifty percent (50%) of the outstanding shares or beneficial
84 | ownership of such entity.
85 |
86 | 2. License Grants and Conditions
87 | --------------------------------
88 |
89 | 2.1. Grants
90 |
91 | Each Contributor hereby grants You a world-wide, royalty-free,
92 | non-exclusive license:
93 |
94 | (a) under intellectual property rights (other than patent or trademark)
95 | Licensable by such Contributor to use, reproduce, make available,
96 | modify, display, perform, distribute, and otherwise exploit its
97 | Contributions, either on an unmodified basis, with Modifications, or
98 | as part of a Larger Work; and
99 |
100 | (b) under Patent Claims of such Contributor to make, use, sell, offer
101 | for sale, have made, import, and otherwise transfer either its
102 | Contributions or its Contributor Version.
103 |
104 | 2.2. Effective Date
105 |
106 | The licenses granted in Section 2.1 with respect to any Contribution
107 | become effective for each Contribution on the date the Contributor first
108 | distributes such Contribution.
109 |
110 | 2.3. Limitations on Grant Scope
111 |
112 | The licenses granted in this Section 2 are the only rights granted under
113 | this License. No additional rights or licenses will be implied from the
114 | distribution or licensing of Covered Software under this License.
115 | Notwithstanding Section 2.1(b) above, no patent license is granted by a
116 | Contributor:
117 |
118 | (a) for any code that a Contributor has removed from Covered Software;
119 | or
120 |
121 | (b) for infringements caused by: (i) Your and any other third party's
122 | modifications of Covered Software, or (ii) the combination of its
123 | Contributions with other software (except as part of its Contributor
124 | Version); or
125 |
126 | (c) under Patent Claims infringed by Covered Software in the absence of
127 | its Contributions.
128 |
129 | This License does not grant any rights in the trademarks, service marks,
130 | or logos of any Contributor (except as may be necessary to comply with
131 | the notice requirements in Section 3.4).
132 |
133 | 2.4. Subsequent Licenses
134 |
135 | No Contributor makes additional grants as a result of Your choice to
136 | distribute the Covered Software under a subsequent version of this
137 | License (see Section 10.2) or under the terms of a Secondary License (if
138 | permitted under the terms of Section 3.3).
139 |
140 | 2.5. Representation
141 |
142 | Each Contributor represents that the Contributor believes its
143 | Contributions are its original creation(s) or it has sufficient rights
144 | to grant the rights to its Contributions conveyed by this License.
145 |
146 | 2.6. Fair Use
147 |
148 | This License is not intended to limit any rights You have under
149 | applicable copyright doctrines of fair use, fair dealing, or other
150 | equivalents.
151 |
152 | 2.7. Conditions
153 |
154 | Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted
155 | in Section 2.1.
156 |
157 | 3. Responsibilities
158 | -------------------
159 |
160 | 3.1. Distribution of Source Form
161 |
162 | All distribution of Covered Software in Source Code Form, including any
163 | Modifications that You create or to which You contribute, must be under
164 | the terms of this License. You must inform recipients that the Source
165 | Code Form of the Covered Software is governed by the terms of this
166 | License, and how they can obtain a copy of this License. You may not
167 | attempt to alter or restrict the recipients' rights in the Source Code
168 | Form.
169 |
170 | 3.2. Distribution of Executable Form
171 |
172 | If You distribute Covered Software in Executable Form then:
173 |
174 | (a) such Covered Software must also be made available in Source Code
175 | Form, as described in Section 3.1, and You must inform recipients of
176 | the Executable Form how they can obtain a copy of such Source Code
177 | Form by reasonable means in a timely manner, at a charge no more
178 | than the cost of distribution to the recipient; and
179 |
180 | (b) You may distribute such Executable Form under the terms of this
181 | License, or sublicense it under different terms, provided that the
182 | license for the Executable Form does not attempt to limit or alter
183 | the recipients' rights in the Source Code Form under this License.
184 |
185 | 3.3. Distribution of a Larger Work
186 |
187 | You may create and distribute a Larger Work under terms of Your choice,
188 | provided that You also comply with the requirements of this License for
189 | the Covered Software. If the Larger Work is a combination of Covered
190 | Software with a work governed by one or more Secondary Licenses, and the
191 | Covered Software is not Incompatible With Secondary Licenses, this
192 | License permits You to additionally distribute such Covered Software
193 | under the terms of such Secondary License(s), so that the recipient of
194 | the Larger Work may, at their option, further distribute the Covered
195 | Software under the terms of either this License or such Secondary
196 | License(s).
197 |
198 | 3.4. Notices
199 |
200 | You may not remove or alter the substance of any license notices
201 | (including copyright notices, patent notices, disclaimers of warranty,
202 | or limitations of liability) contained within the Source Code Form of
203 | the Covered Software, except that You may alter any license notices to
204 | the extent required to remedy known factual inaccuracies.
205 |
206 | 3.5. Application of Additional Terms
207 |
208 | You may choose to offer, and to charge a fee for, warranty, support,
209 | indemnity or liability obligations to one or more recipients of Covered
210 | Software. However, You may do so only on Your own behalf, and not on
211 | behalf of any Contributor. You must make it absolutely clear that any
212 | such warranty, support, indemnity, or liability obligation is offered by
213 | You alone, and You hereby agree to indemnify every Contributor for any
214 | liability incurred by such Contributor as a result of warranty, support,
215 | indemnity or liability terms You offer. You may include additional
216 | disclaimers of warranty and limitations of liability specific to any
217 | jurisdiction.
218 |
219 | 4. Inability to Comply Due to Statute or Regulation
220 | ---------------------------------------------------
221 |
222 | If it is impossible for You to comply with any of the terms of this
223 | License with respect to some or all of the Covered Software due to
224 | statute, judicial order, or regulation then You must: (a) comply with
225 | the terms of this License to the maximum extent possible; and (b)
226 | describe the limitations and the code they affect. Such description must
227 | be placed in a text file included with all distributions of the Covered
228 | Software under this License. Except to the extent prohibited by statute
229 | or regulation, such description must be sufficiently detailed for a
230 | recipient of ordinary skill to be able to understand it.
231 |
232 | 5. Termination
233 | --------------
234 |
235 | 5.1. The rights granted under this License will terminate automatically
236 | if You fail to comply with any of its terms. However, if You become
237 | compliant, then the rights granted under this License from a particular
238 | Contributor are reinstated (a) provisionally, unless and until such
239 | Contributor explicitly and finally terminates Your grants, and (b) on an
240 | ongoing basis, if such Contributor fails to notify You of the
241 | non-compliance by some reasonable means prior to 60 days after You have
242 | come back into compliance. Moreover, Your grants from a particular
243 | Contributor are reinstated on an ongoing basis if such Contributor
244 | notifies You of the non-compliance by some reasonable means, this is the
245 | first time You have received notice of non-compliance with this License
246 | from such Contributor, and You become compliant prior to 30 days after
247 | Your receipt of the notice.
248 |
249 | 5.2. If You initiate litigation against any entity by asserting a patent
250 | infringement claim (excluding declaratory judgment actions,
251 | counter-claims, and cross-claims) alleging that a Contributor Version
252 | directly or indirectly infringes any patent, then the rights granted to
253 | You by any and all Contributors for the Covered Software under Section
254 | 2.1 of this License shall terminate.
255 |
256 | 5.3. In the event of termination under Sections 5.1 or 5.2 above, all
257 | end user license agreements (excluding distributors and resellers) which
258 | have been validly granted by You or Your distributors under this License
259 | prior to termination shall survive termination.
260 |
261 | ************************************************************************
262 | * *
263 | * 6. Disclaimer of Warranty *
264 | * ------------------------- *
265 | * *
266 | * Covered Software is provided under this License on an "as is" *
267 | * basis, without warranty of any kind, either expressed, implied, or *
268 | * statutory, including, without limitation, warranties that the *
269 | * Covered Software is free of defects, merchantable, fit for a *
270 | * particular purpose or non-infringing. The entire risk as to the *
271 | * quality and performance of the Covered Software is with You. *
272 | * Should any Covered Software prove defective in any respect, You *
273 | * (not any Contributor) assume the cost of any necessary servicing, *
274 | * repair, or correction. This disclaimer of warranty constitutes an *
275 | * essential part of this License. No use of any Covered Software is *
276 | * authorized under this License except under this disclaimer. *
277 | * *
278 | ************************************************************************
279 |
280 | ************************************************************************
281 | * *
282 | * 7. Limitation of Liability *
283 | * -------------------------- *
284 | * *
285 | * Under no circumstances and under no legal theory, whether tort *
286 | * (including negligence), contract, or otherwise, shall any *
287 | * Contributor, or anyone who distributes Covered Software as *
288 | * permitted above, be liable to You for any direct, indirect, *
289 | * special, incidental, or consequential damages of any character *
290 | * including, without limitation, damages for lost profits, loss of *
291 | * goodwill, work stoppage, computer failure or malfunction, or any *
292 | * and all other commercial damages or losses, even if such party *
293 | * shall have been informed of the possibility of such damages. This *
294 | * limitation of liability shall not apply to liability for death or *
295 | * personal injury resulting from such party's negligence to the *
296 | * extent applicable law prohibits such limitation. Some *
297 | * jurisdictions do not allow the exclusion or limitation of *
298 | * incidental or consequential damages, so this exclusion and *
299 | * limitation may not apply to You. *
300 | * *
301 | ************************************************************************
302 |
303 | 8. Litigation
304 | -------------
305 |
306 | Any litigation relating to this License may be brought only in the
307 | courts of a jurisdiction where the defendant maintains its principal
308 | place of business and such litigation shall be governed by laws of that
309 | jurisdiction, without reference to its conflict-of-law provisions.
310 | Nothing in this Section shall prevent a party's ability to bring
311 | cross-claims or counter-claims.
312 |
313 | 9. Miscellaneous
314 | ----------------
315 |
316 | This License represents the complete agreement concerning the subject
317 | matter hereof. If any provision of this License is held to be
318 | unenforceable, such provision shall be reformed only to the extent
319 | necessary to make it enforceable. Any law or regulation which provides
320 | that the language of a contract shall be construed against the drafter
321 | shall not be used to construe this License against a Contributor.
322 |
323 | 10. Versions of the License
324 | ---------------------------
325 |
326 | 10.1. New Versions
327 |
328 | Mozilla Foundation is the license steward. Except as provided in Section
329 | 10.3, no one other than the license steward has the right to modify or
330 | publish new versions of this License. Each version will be given a
331 | distinguishing version number.
332 |
333 | 10.2. Effect of New Versions
334 |
335 | You may distribute the Covered Software under the terms of the version
336 | of the License under which You originally received the Covered Software,
337 | or under the terms of any subsequent version published by the license
338 | steward.
339 |
340 | 10.3. Modified Versions
341 |
342 | If you create software not governed by this License, and you want to
343 | create a new license for such software, you may create and use a
344 | modified version of this License if you rename the license and remove
345 | any references to the name of the license steward (except to note that
346 | such modified license differs from this License).
347 |
348 | 10.4. Distributing Source Code Form that is Incompatible With Secondary
349 | Licenses
350 |
351 | If You choose to distribute Source Code Form that is Incompatible With
352 | Secondary Licenses under the terms of this version of the License, the
353 | notice described in Exhibit B of this License must be attached.
354 |
355 | Exhibit A - Source Code Form License Notice
356 | -------------------------------------------
357 |
358 | This Source Code Form is subject to the terms of the Mozilla Public
359 | License, v. 2.0. If a copy of the MPL was not distributed with this
360 | file, You can obtain one at http://mozilla.org/MPL/2.0/.
361 |
362 | If it is not possible or desirable to put the notice in a particular
363 | file, then You may include the notice in a location (such as a LICENSE
364 | file in a relevant directory) where a recipient would be likely to look
365 | for such a notice.
366 |
367 | You may add additional accurate notices of copyright ownership.
368 |
369 | Exhibit B - "Incompatible With Secondary Licenses" Notice
370 | ---------------------------------------------------------
371 |
372 | This Source Code Form is "Incompatible With Secondary Licenses", as
373 | defined by the Mozilla Public License, v. 2.0.
374 |
--------------------------------------------------------------------------------
/k8s/deployment.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: Namespace
3 | metadata:
4 | name: echo
5 | ---
6 | apiVersion: apps/v1
7 | kind: Deployment
8 | metadata:
9 | name: echo
10 | namespace: echo
11 | spec:
12 | selector:
13 | matchLabels:
14 | app: echo
15 | replicas: 1
16 | template:
17 | metadata:
18 | labels:
19 | app: echo
20 | spec:
21 | containers:
22 | - name: echo
23 | image: localhost:5000/echo:latest
24 | imagePullPolicy: Always
25 | stdin: true
26 | tty: true
27 |
--------------------------------------------------------------------------------
/k8s/echo/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM python:3.10-alpine
2 |
3 | WORKDIR /app
4 |
5 | COPY . /app
6 |
7 | RUN pip install -r requirements.txt
8 |
9 | EXPOSE 8080
10 |
11 | CMD ["python", "/app/echo.py"]
12 |
--------------------------------------------------------------------------------
/k8s/echo/echo.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 |
3 | from flask import Flask, request, render_template
4 |
5 | app = Flask(__name__)
6 |
7 | @app.route("/")
8 | def form():
9 | return render_template("index.html")
10 |
11 | @app.route("/", methods=["POST"])
12 | def form_post():
13 | return request.form["echo_input"]
14 |
15 | if __name__ == "__main__":
16 | app.run(host="0.0.0.0", port=8080, debug=True)
17 |
--------------------------------------------------------------------------------
/k8s/echo/requirements.txt:
--------------------------------------------------------------------------------
1 | flask
2 |
--------------------------------------------------------------------------------
/k8s/echo/templates/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 | Echo (echo...)
4 |
5 |
6 | Echo (echo...)
7 |
11 |
--------------------------------------------------------------------------------
/k8s/helm/Chart.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v2
2 | name: echo
3 | description: A Helm chart for a simple echo app
4 | version: 0.1.0
5 | appVersion: 0.1.0
6 |
--------------------------------------------------------------------------------
/k8s/helm/templates/NOTES.txt:
--------------------------------------------------------------------------------
1 | To access, please run the following command:
2 |
3 | sudo echo "$(minikube ip) echo.internal" >> /etc/hosts
4 |
5 | Then go to http://echo.internal inyour browser.
6 |
7 | To clean up, run the following command:
8 |
9 | sudo sed -i'' '$d' /etc/hosts
10 |
--------------------------------------------------------------------------------
/k8s/helm/templates/deployment.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: Deployment
3 | metadata:
4 | name: echo
5 | namespace: echo
6 | spec:
7 | selector:
8 | matchLabels:
9 | app: echo
10 | replicas: 1
11 | template:
12 | metadata:
13 | labels:
14 | app: echo
15 | spec:
16 | containers:
17 | - name: echo
18 | image: localhost:5000/echo:latest
19 | imagePullPolicy: Always
20 | ports:
21 | - containerPort: 8080
22 | stdin: true
23 | tty: true
24 | resources:
25 | limits:
26 | cpu: 100m
27 | memory: 128Mi
28 | requests:
29 | cpu: 50m
30 | memory: 50Mi
31 |
--------------------------------------------------------------------------------
/k8s/helm/templates/ingress.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: networking.k8s.io/v1
2 | kind: Ingress
3 | metadata:
4 | name: echo
5 | namespace: echo
6 | annotations:
7 | nginx.ingress.kubernetes.io/rewrite-target: /
8 | spec:
9 | rules:
10 | - host: echo.internal
11 | http:
12 | paths:
13 | - path: /
14 | pathType: Prefix
15 | backend:
16 | service:
17 | name: echo
18 | port:
19 | number: 8080
20 |
--------------------------------------------------------------------------------
/k8s/helm/templates/namespace.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: Namespace
3 | metadata:
4 | name: echo
5 |
--------------------------------------------------------------------------------
/k8s/helm/templates/rbac.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: rbac.authorization.k8s.io/v1
2 | kind: Role
3 | metadata:
4 | name: echo-rw
5 | namespace: echo
6 | rules:
7 | - apiGroups: [""]
8 | resources: ["pods"]
9 | verbs: ["get", "list", "watch"]
10 | - apiGroups: [""]
11 | resources: ["pods/exec"]
12 | verbs: ["create"]
13 | ---
14 | apiVersion: rbac.authorization.k8s.io/v1
15 | kind: RoleBinding
16 | metadata:
17 | name: echo-rw
18 | namespace: echo
19 | subjects:
20 | - kind: User
21 | name: echo-user
22 | apiGroup: rbac.authorization.k8s.io
23 | roleRef:
24 | kind: Role
25 | name: echo-rw
26 | apiGroup: rbac.authorization.k8s.io
27 |
--------------------------------------------------------------------------------
/k8s/helm/templates/service.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: Service
3 | metadata:
4 | name: echo
5 | namespace: echo
6 | spec:
7 | ports:
8 | - protocol: TCP
9 | port: 8080
10 | targetPort: 8080
11 | selector:
12 | app: echo
13 | type: NodePort
14 |
--------------------------------------------------------------------------------
/k8s/k8s-101.md:
--------------------------------------------------------------------------------
1 | # Introduction
2 |
3 | ## What is Kubernetes?
4 |
5 | Kubernetes is a container orchestration system. It manages the scheduling and execution of containers. Similar platforms exist like Docker Swarm, Apache Mesos, and Hashicorp Nomad. Kubernetes by far has the dominant market share, and is also the most complex out of those listed.
6 |
7 | ## Kubernetes components
8 |
9 | ### High level
10 | * Cluster: a logical grouping of one or more nodes.
11 | * Node: a server running Kubernetes - can be bare metal, a VM, or a container.
12 | * Pod: a logical grouping of one or more containers.
13 | * Container: the same as Docker.
14 |
15 | #### Node roles
16 | * Control plane: schedules pods, detects and responds to cluster events, maintains cluster state in a database.
17 | * Worker: runs user-defined workloads via DaemonSets, StatefulSets, or Deployments.
18 | * Note that while not recommended in production, for development purposes, a single-node cluster can serve as both of these roles.
19 |
20 | ### Low[er] level
21 |
22 | For a more thorough examination of Kubernetes components, [the official documentation](https://kubernetes.io/docs/concepts/overview/components/) is recommended. A brief overview of some components follows:
23 |
24 | * kube-apiserver: handles requests to the API, typically via kubectl.
25 | * etcd: a key/value store utilizing the Raft algorithm for consensus; frequently used as the store for Kubernetes cluster data.
26 | * K3s (and thus K3d) uses an embedded SQLite database as its backing store by default; in general any database may be used, but in production etcd is the standard.
27 | * kube-scheduler: assigns workloads to a node, constrained by resource limits, affinity/anti-affinity rules, etc.
28 | * kubelet: an agent running on every node, ensuring that a Pod's containers are running.
29 |
30 | ## Kubernetes distributions
31 |
32 | Each cloud provider has their own - Amazon has EKS, Azure has AKS, Google has GKE, DigitalOcean has DOKS, etc. Vanilla Kubernetes can be installed [either manually](https://github.com/kelseyhightower/kubernetes-the-hard-way), or with a tool like [kubeadm](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/). Various distributions also exist, much like Linux distributions. [Minikube](https://minikube.sigs.k8s.io/docs/) is a popular way to bootstrap a single-node cluster for development in an existing operating system. K3d is based on [k3s](https://k3s.io/), which is a lightweight single-binary distribution of Kubernetes. Rancher Labs (owned by SuSE) also make a full single-purpose OS called [k3os](https://k3os.io/) which is designed to run k3s, and only k3s. A similar (albeit running vanilla Kubernetes) but even more extreme example is [Talos](https://www.talos.dev/), which is completely immutable, has no shell access, and allows access only via its API. Amazon has a similar offering called [Bottlerocket](https://aws.amazon.com/bottlerocket/).
33 |
34 | In general, any Kubernetes distribution will be perfectly adequate for learning, and it comes down to personal preference. For production, there are arguments to be made for managed services like EKS, but that's beyond the scope of this document.
35 |
36 | # Getting started
37 |
38 | ## Install
39 |
40 | ### Prerequisites:
41 |
42 | - Docker
43 | - There are many ways to do this, pick your favorite
44 | - If you're using Minikube, you can just `eval` its daemon
45 | - helm
46 | - `brew install helm`
47 | - kubectl
48 | - `brew install kubectl`
49 | - Optional:
50 | - `brew install hidetatz/tap/kubecolor`
51 | - Optional but please do it:
52 | - `brew install gnu-sed`
53 | - Add the following to your shell rc file:
54 | - `alias kubectl=kubecolor` (if kubecolor was installed)
55 | - `alias k=kubectl`
56 | - Install your shell's plugin for kubectl
57 |
58 | ### Install and verification
59 |
60 | #### M1 Macs (ARM)
61 |
62 | Install Docker Desktop, and launch minikube as below, but with `--driver docker` instead. Additionally, skip all steps regarding using a registry, and whenever the image path is referenced, use these instead:
63 |
64 | # For the first part, which has no networking functionality
65 | stephangarland/echo:local
66 | # For the second part, which exposes a container port
67 | stephangarland/echo:web
68 |
69 | #### Intel Macs (x86-64)
70 |
71 | Install the `hyperkit` driver with `brew install hyperkit`, and then minikube with `brew install minikube`.
72 |
73 | First, run `brew install minikube`. Then, run `minikube start` with a few options: `minikube start --memory 8GB --cpus 4 --driver hyperkit`. Assuming you have the memory and CPU to spare, this ensures we won't run into any backing hardware issues. `hyperkit` as the driver means we don't have to download anything additional to spin up the VM that runs the cluster.
74 |
75 | ❯ minikube start --memory 8GB --cpus 4 --driver hyperkit
76 | 😄 minikube v1.25.2 on Darwin 11.6.5
77 | ▪ KUBECONFIG=/Users/sgarland/.kube/.switch_tmp/config.1541775917.tmp
78 | ▪ MINIKUBE_ACTIVE_DOCKERD=minikube
79 | ✨ Using the hyperkit driver based on user configuration
80 | 👍 Starting control plane node minikube in cluster minikube
81 | 🔥 Creating hyperkit VM (CPUs=4, Memory=8192MB, Disk=20000MB) ...
82 | 🐳 Preparing Kubernetes v1.23.3 on Docker 20.10.12 ...
83 | ▪ kubelet.housekeeping-interval=5m
84 | ▪ Generating certificates and keys ...
85 | ▪ Booting up control plane ...
86 | ▪ Configuring RBAC rules ...
87 | 🔎 Verifying Kubernetes components...
88 | ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
89 | 🌟 Enabled addons: storage-provisioner, default-storageclass
90 | 🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
91 |
92 | Next, we'll enable the registry addon:
93 |
94 | ❯ minikube addons enable registry
95 | ▪ Using image registry:2.7.1
96 | ▪ Using image gcr.io/google_containers/kube-registry-proxy:0.4
97 | 🔎 Verifying registry addon...
98 | 🌟 The 'registry' addon is enabled
99 |
100 | Also, since we'll need it later, let's enable the ingress addon now:
101 |
102 | ❯ minikube addons enable ingress
103 | ▪ Using image k8s.gcr.io/ingress-nginx/controller:v1.1.1
104 | ▪ Using image k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.1.1
105 | ▪ Using image k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.1.1
106 | 🔎 Verifying ingress addon...
107 | 🌟 The 'ingress' addon is enabled
108 |
109 | Next, if you don't already have the docker daemon (hint: does `docker version` return anything?), we'll hook into Minikube's:
110 |
111 | eval $(minikube -p minikube docker-env)
112 |
113 | Finally, we need to modify networking a little bit using `socat` to get the registry to listen to our local docker daemon:
114 |
115 | ❯ docker run --rm -it -d --network=host alpine ash -c "apk add socat && socat TCP-LISTEN:5000,reuseaddr,fork TCP:$(minikube ip):5000"
116 | Unable to find image 'alpine:latest' locally
117 | latest: Pulling from library/alpine
118 | df9b9388f04a: Already exists
119 | Digest: sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454
120 | Status: Downloaded newer image for alpine:latest
121 | fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/main/x86_64/APKINDEX.tar.gz
122 | fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/community/x86_64/APKINDEX.tar.gz
123 | (1/4) Installing ncurses-terminfo-base (6.3_p20211120-r0)
124 | (2/4) Installing ncurses-libs (6.3_p20211120-r0)
125 | (3/4) Installing readline (8.1.1-r0)
126 | (4/4) Installing socat (1.7.4.2-r0)
127 | Executing busybox-1.34.1-r5.trigger
128 | OK: 7 MiB in 18 packages
129 |
130 | Let's verify the cluster:
131 |
132 | ❯ kubectl get nodes
133 | NAME STATUS ROLES AGE VERSION
134 | minikube Ready control-plane,master 6m26s v1.23.3
135 |
136 | For more detail, use `describe`. There's a lot here, but I'll highlight some pertinent information.
137 |
138 | ❯ kubectl describe nodes
139 | Name: minikube
140 | Roles: control-plane,master
141 | Labels: beta.kubernetes.io/arch=amd64
142 | ...
143 | Capacity:
144 | cpu: 4
145 | ephemeral-storage: 17784752Ki
146 | hugepages-2Mi: 0
147 | memory: 8161900Ki
148 | pods: 110
149 | ...
150 | Events:
151 | Type Reason Age From Message
152 | ---- ------ ---- ---- -------
153 | Normal Starting 6m28s kube-proxy
154 | Normal NodeHasSufficientMemory 6m52s (x5 over 6m52s) kubelet Node minikube status is now: NodeHasSufficientMemory
155 | Normal NodeHasNoDiskPressure 6m52s (x5 over 6m52s) kubelet Node minikube status is now: NodeHasNoDiskPressure
156 | Normal NodeHasSufficientPID 6m52s (x4 over 6m52s) kubelet Node minikube status is now: NodeHasSufficientPID
157 | Normal Starting 6m42s kubelet Starting kubelet.
158 | Normal NodeHasNoDiskPressure 6m42s kubelet Node minikube status is now: NodeHasNoDiskPressure
159 | Normal NodeHasSufficientPID 6m42s kubelet Node minikube status is now: NodeHasSufficientPID
160 | Normal NodeNotReady 6m42s kubelet Node minikube status is now: NodeNotReady
161 | Normal NodeAllocatableEnforced 6m42s kubelet Updated Node Allocatable limit across pods
162 | Normal NodeHasSufficientMemory 6m42s kubelet Node minikube status is now: NodeHasSufficientMemory
163 | Normal NodeReady 6m31s kubelet Node minikube status is now: NodeReady
164 |
165 | At the top, we can see the name, role, labels, and annotations. The name is self-explanatory. The role here is showing two - control-plane, and master. These are the same thing, and are in parallel since Kubernetes v1.20. `master` is being deprecated in favor of `control-plane` and will be fully removed in a future release. The purpose and limitations of this, along with taints, will be discussed later. Labels are key/value pairs that can be arbitrarily applied, but usually carry semantic meaning for either the user or an application. For example, `kubernetes.io/arch=amd64` tells us that this node has `amd64` architecture. Clusters can be of mixed architecture, so it's good to be able to easily tell apart `x86` and `arm` nodes for scheduling purposes.
166 |
167 | Let's label the node, for fun:
168 |
169 | ❯ kubectl label node --all "my.name.is=$(whoami)"
170 | node/minikube labeled
171 | Using the `--all` flag applies it to all nodes; without it, you'd need to add the node's name (`k3d-sgarland-cluster-server-0` for me).
172 |
173 | We can then see the new label with the `--show-labels `flag:
174 |
175 | ❯ kubectl get nodes --show-labels
176 | NAME STATUS ROLES AGE VERSION LABELS
177 | minikube Ready control-plane,master 9m32s v1.23.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=minikube,kubernetes.io/os=linux,minikube.k8s.io/commit=362d5fdc0a3dbee389b3d3f1034e8023e72bd3a7,minikube.k8s.io/name=minikube,minikube.k8s.io/primary=true,minikube.k8s.io/updated_at=2022_05_06T14_14_05_0700,minikube.k8s.io/version=v1.25.2,my.name.is=sgarland,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
178 |
179 | Unfortunately it's a bit messy in the default comma-separated form, but should be able to spot your change in there. To delete the label, the syntax is somewhat confusing; you use the key and a `-` sign to indicate that it should be removed:
180 |
181 | ❯ kubectl label node --all "my.name.is-"
182 | node/minikube labeled
183 |
184 | Next is Capacity. We can see things like the amount of ephemeral storage and memory available (16 GB and 8 GB, respectively) to the cluster, as well as allocatable pods. 110 pods is not actually resource-based, but networking - with a `/24` block being assigned to each node, there are 256 addresses available. With slightly over double the amount of addresses than the maximum number of pods, this reduces IP address reuse as pods come and go from the node.
185 |
186 | Finally, Events. In this section, the kubelet reports the status of the node, here showing that it has sufficient memory, disk, PID, and is ready.
187 |
188 | # Exploration
189 |
190 | ## Imperative vs Declarative
191 |
192 | Ideally, everything is maintained in code, and changes are made with some form of state management, be it ArgoCD, Flux, or others. Less optimally, you can issue commands with `kubectl apply`, which reads your input file and compares it to existing, then makes changes. Even less optimally, you can directly issue `kubectl` commands.
193 |
194 | ## Kubectl verbs
195 |
196 | So far we've used a few - `get`, `describe`, and `label`. Kubernetes [loosely follows HTTP verbs](https://kubernetes.io/docs/reference/access-authn-authz/authorization/#determine-the-request-verb), with some extras thrown in. One important note is that since you are directly communicating with the API, there are no warnings for destructive actions. If you tell it to delete a Persistent Volume, it will do so (with some exceptions for finalizers).
197 |
198 | ## Create a deployment
199 |
200 | Let's deploy a simple application. If you have a small Dockerized app you'd like to run you're welcome to use it here, but otherwise, we'll focus on this simple echo app that echoes the user's input.
201 |
202 | ### Building the application
203 | Use your own, or copy/paste this into a shell to write them to `echo.py` and `Dockerfile`, respectively.
204 |
205 | cat << EOF > echo.py
206 | #!/usr/bin/env python
207 |
208 | def main():
209 | while True:
210 | user_input = input("Hi, say something, or type 'quit' to quit: ")
211 | if user_input == "quit":
212 | break
213 | else:
214 | print(user_input)
215 |
216 | if __name__ == "__main__":
217 | main()
218 |
219 | EOF
220 |
221 |
222 | ---
223 | cat << EOF > Dockerfile
224 | FROM python:3.10-alpine
225 |
226 | WORKDIR /app
227 |
228 | COPY ./echo.py /app/echo.py
229 |
230 | CMD ["python", "/app/echo.py"]
231 |
232 | EOF
233 |
234 | Then, build it:
235 |
236 | docker build -t echo .
237 |
238 | To test that it works, you can use:
239 |
240 | docker run --rm -i --name echo echo
241 |
242 | Bonus question: what does the `-i` flag do, and what happens if you neglect to include it here?
243 |
244 | ### Writing a Deployment
245 | A Deployment is a basic Kubernetes structure, which allows you to define a workload in the form of a Pod, which should run with n replicas. Via the kubelet, a Deployment can ensure that an app is restarted if it fails, is reachable (assuming you've set up liveness and readiness probes), and more.
246 |
247 | #### YAML
248 | cat << EOF > deployment.yaml
249 | apiVersion: apps/v1
250 | kind: Deployment
251 | metadata:
252 | name: echo
253 | spec:
254 | selector:
255 | matchLabels:
256 | app: echo
257 | replicas: 1
258 | template:
259 | metadata:
260 | labels:
261 | app: echo
262 | spec:
263 | containers:
264 | - name: echo
265 | image: localhost:5000/echo:latest
266 | imagePullPolicy: Always
267 | stdin: true
268 | tty: true
269 |
270 | EOF
271 |
272 | Let's break down what's going on here, line by excruciating line:
273 |
274 | # This refers to a specific API version for the code
275 | # that follows - these are regularly updated and
276 | # deprecated, but you're warned well in advance
277 | apiVersion: apps/v1
278 |
279 | # This specifies what it is you're defining - could
280 | # also be a StatefulSet, an Ingress, a Service, etc.
281 | kind: Deployment
282 |
283 | # You can put multiple things here; the two most
284 | # common are the name of the application, and
285 | # a namespace in which to install it
286 | metadata:
287 | name: echo
288 |
289 | # This tells the Deployment what application
290 | # it should manage - in this case, it's looking
291 | # for those with the label `app: echo`
292 | spec:
293 | selector:
294 | matchlabels:
295 | app: echo
296 | # The number of replicas to deploy - note that
297 | # this is even with `spec.selector`, and like Python,
298 | # whitespace is extremely important
299 | replicas: 1
300 |
301 | # This gives the Pods a template to apply
302 | # In this case, the label `app: echo`
303 | template:
304 | metadata:
305 | labels:
306 | app: echo
307 | # Now we define the Pod's containers - note that this
308 | # is even with `template.metadata`, as it is part
309 | # of the template
310 | spec:
311 | containers:
312 | # The name of your application
313 | - name: echo
314 | # The image, optionally as a FQDN
315 | # If not specified as a FQDN, it will first
316 | # be searched for locally, and then on Dockerhub
317 | image: localhost:5000/echo:latest
318 | # When to pull - can also use Never or IfNotPresent
319 | imagePullPolicy: IfNotPresent
320 | # Technically only stdin is needed, but
321 | # if you don't also give it a psuedo-TTY
322 | # it will complain (but still run) when
323 | # you attach to the container
324 | stdin: true
325 | tty: true
326 |
327 | ### Applying the Deployment
328 |
329 | #### Pushing the build
330 | But first, we have to tag and push to our registry. Note that if you're using an M1 Mac, you will not do this, but can substitute in `docker pull` commands to verify that the images are available for you, e.g. `docker pull stephangarland/echo:web`
331 |
332 | docker tag echo:latest localhost:5000/echo:latest
333 | ---
334 | docker push localhost:5000/echo:latest
335 | ---
336 | The push refers to repository [localhost:5000/echo]
337 | 43358167f05b: Layer already exists
338 | 96568c21d3ac: Layer already exists
339 | b02dd59d34c0: Layer already exists
340 | 0b800261971d: Layer already exists
341 | 16e3ab2d4dee: Layer already exists
342 | fbd7d5451c69: Layer already exists
343 | 4fc242d58285: Layer already exists
344 | latest: digest: sha256:36450f0ec0febf8daf800f24ab81363211dc52dd6bfc3e50d5d54c508f8d89ed size: 1782
345 |
346 | #### Deploy!
347 | As stated, there are far better ways to deploy applications, but this is the most basic, and gives the most insight into what Kubernetes is doing to get your app running.
348 |
349 | If you run all of these in quick succession, you should see the following:
350 |
351 | ❯ kubectl apply -f deployment.yaml
352 | deployment.apps/echo created
353 |
354 | ❯ kubectl get deployments
355 | NAME READY UP-TO-DATE AVAILABLE AGE
356 | echo 0/1 1 0 1s
357 |
358 | ❯ kubectl get pods
359 | NAME READY STATUS RESTARTS AGE
360 | echo-746cdbd89c-hrzds 0/1 ContainerCreating 0 2s
361 |
362 | Once the pod creates and deploys (which for this, takes a very short amount of time), the latter two commands should show this:
363 |
364 | ❯ kubectl get deployments
365 | NAME READY UP-TO-DATE AVAILABLE AGE
366 | echo 1/1 1 1 2m6s
367 |
368 | ❯ kubectl get pods
369 | NAME READY STATUS RESTARTS AGE
370 | echo-746cdbd89c-hrzds 1/1 Running 0 2m35s
371 |
372 | ### Exploring the Deployment
373 | Let's apply some of the verbs available to us.
374 |
375 | #### Attach
376 | ❯ kubectl attach -i echo-746cdbd89c-hrzds
377 | If you don't see a command prompt, try pressing enter.
378 |
379 |
380 | Hi, say something, or type 'quit' to quit: Hello!
381 | Hello!
382 | Hi, say something, or type 'quit' to quit: quit
383 | Session ended, resume using 'kubectl attach echo-746cdbd89c-hrzds -c echo -i -t' command when the pod is running
384 |
385 | `attach` lets us attach to a container's default process, which in this case, is our app.
386 |
387 | #### Exec
388 |
389 | *Note: This early termination may not occur.*
390 |
391 | You could also use `exec` to get a shell into the pod, like this:
392 |
393 | ❯ kubectl exec -it echo-74bf7cdf5c-9rhxd -- sh
394 | Error from server (NotFound): pods "echo-74bf7cdf5c-9rhxd" not found
395 |
396 | #### Describe
397 | What's this? Our pod went away already? Let's `describe` the new pod to see why.
398 |
399 | ❯ kubectl describe pod echo-746cdbd89c-hrzds
400 | Name: echo-746cdbd89c-hrzds
401 | Namespace: default
402 | ... (not shown for conciseness)
403 | Containers:
404 | echo:
405 | ...
406 | Last State: Terminated
407 | Reason: Completed
408 | Exit Code: 0
409 | Started: Fri, 25 Mar 2022 15:03:57 -0500
410 | Finished: Fri, 25 Mar 2022 15:05:24 -0500
411 |
412 | Ah, there we are - since our program runs in a loop until it receives `quit` as input, once that was passed, the program exited. The kubelet noticed that the deployment no longer had a running pod, and spawned a new one.
413 |
414 | #### Get
415 |
416 | We can see this if we `get` pods:
417 |
418 | ❯ kubectl get pods
419 | NAME READY STATUS RESTARTS AGE
420 | echo-746cdbd89c-hrzds 1/1 Running 1 (62s ago) 8m
421 |
422 | #### Exec (again)
423 | Now let's exec into the pod.
424 |
425 | ❯ kubectl exec -it echo-746cdbd89c-hrzds -- sh
426 | /app # ls
427 | echo.py
428 | /app # python echo.py
429 | Hi, say something, or type 'quit' to quit: Hello
430 | Hello
431 | Hi, say something, or type 'quit' to quit: quit
432 | /app #
433 | Note that here, quitting the app didn't kill the pod - that's because we spawned a new shell to exec into, and created a new instance of the app. Look at what's running:
434 |
435 | /app # ps
436 | PID USER TIME COMMAND
437 | 1 root 0:00 python /app/echo.py
438 | 27 root 0:00 sh
439 | 41 root 0:00 ps
440 |
441 | Our app is running as the `init` process, PID 1. Kill it and watch what happens. Just kidding - `init` traps most `kill` signals in reasonable *nix distributions for good reason; but you can send it `INT` aka `2` if you'd like to see what happens (you could also kill the shell, if you'd like).
442 |
443 | #### Delete
444 |
445 | This is how you canonically restart a pod, in case you weren't aware.
446 |
447 | ❯ kubectl delete pod -l app=echo
448 | pod "echo-746cdbd89c-hrzds" deleted
449 |
450 | What's this `-l` flag? Why didn't we have to specify the entire name? Welcome to selectors - also available with their longhand flag, `--selector`. Remember the `template.metadata.labels.app` we assigned to the Deployment? That's how this is finding it.
451 | And we can see that we now have a new pod, thanks to the Deployment:
452 |
453 | ❯ kubectl get pods
454 | NAME READY STATUS RESTARTS AGE
455 | echo-746cdbd89c-x9k2m 1/1 Running 0 36s
456 |
457 | ### Scaling workloads
458 |
459 | If you have a given workload, be it a Deployment or StatefulSet, you can horizontally scale it using the command `kubectl scale`, and the flag `--replicas`. Go ahead and scale ours up to, say, 3 replicas:
460 |
461 | ❯ kubectl scale deployment echo --replicas=3
462 | deployment.apps/echo scaled
463 |
464 | Now let's look at our deployment (if you aren't quick, you might just see 3/3 ready, but that's OK):
465 |
466 | ❯ kubectl get deployments
467 | NAME READY UP-TO-DATE AVAILABLE AGE
468 | echo 1/3 3 1 68m
469 |
470 | Once the pods are all up, this will change to 3/3 ready.
471 |
472 | ❯ kubectl get pods
473 | NAME READY STATUS RESTARTS AGE
474 | echo-746cdbd89c-8v5qb 1/1 Running 0 3s
475 | echo-746cdbd89c-ns9kk 1/1 Running 0 3s
476 | echo-746cdbd89c-x9k2m 1/1 Running 0 26m
477 |
478 | Of note, all this time we haven't been specifying a deployment (or pod) for `get`, which is fine since we're only running the one. If this were a real cluster, though, there would likely be many deployments and pods, and we'd want to be more specific:
479 |
480 | ❯ kubectl get deployment echo
481 | NAME READY UP-TO-DATE AVAILABLE AGE
482 | echo 3/3 3 3 71m
483 |
484 | ### Scaling (down) workloads
485 |
486 | To horizontally scale to zero, AKA delete the pods and prevent them from coming back, use `--replicas` again, but specify 0 pods: `--replicas=0`. Alternately, if you want to completely get rid of the deployment, use either `kubectl delete deployment/echo` (imperative) or `kubectl delete -f deployment.yaml` (declarative). With the latter, kubectl is reading the deployment manifest we wrote, and removing it.
487 |
488 | Either way, once done, we can verify that it's gone:
489 |
490 | ❯ kubectl get deployment
491 | No resources found in default namespace.
492 |
493 | ## Namespaces
494 |
495 | We've briefly mentioned namespaces so far, but all the work has been done in the default namespace. This is generally a bad idea - namespaces are a way of organizing and restricting resources. We can limit a given namespace to X CPUs and Y memory, we can restrict the rights of workloads inside that namespace, and it makes it easier to keep track of things when running various `kubectl` commands if it's scoped to a namespace.
496 |
497 | Let's create one imperatively and use it, and then create another declaratively.
498 |
499 | ❯ kubectl create namespace echo
500 | namespace/echo created
501 |
502 | Now let's deploy our echo app in the new namespace:
503 |
504 | ❯ kubectl apply -f deployment.yaml -n echo
505 | deployment.apps/echo created
506 |
507 | Note that the pod isn't in the default namespace anymore:
508 |
509 | ❯ kubectl get pods
510 | No resources found in default namespace.
511 |
512 | ❯ kubectl get pods -n echo
513 | NAME READY STATUS RESTARTS AGE
514 | echo-d97d96459-s2bvk 1/1 Running 0 79s
515 |
516 | Now, let's delete it and then do it again declaratively (delete the deployment however you'd like, as described earlier).
517 |
518 | sed -i '/^spec:/i \ \ namespace: echo' deployment.yaml
519 |
520 | This adds a properly spaced `.metadata.namespace` line to the deployment manifest, looking for the target `^spec:` line and then going immediately before that.
521 |
522 | ❯ kubectl get pods -n echo
523 | NAME READY STATUS RESTARTS AGE
524 | echo-d97d96459-9l2jw 1/1 Running 0 4s
525 |
526 | There's our pod! What if we wanted to declaratively create the namespace, as well? Let's delete the namespace, which will also delete the deployment (not recommended in prod due to finalizers, but for this example it's fine):
527 |
528 | ❯ kubectl delete namespace echo
529 | namespace/echo deleted
530 |
531 | ---
532 | ex deployment.yaml < helm/Chart.yaml
575 | apiVersion: v2
576 | name: echo
577 | description: A Helm chart for a simple echo app
578 | version: 0.1.0
579 | appVersion: 0.1.0
580 | EOF
581 |
582 | `version` is the version of the Helm chart, whereas `appVersion` is the version of the application. They should both use semantic versioning. `apiVersion` would be `v1` if you needed Helm v2 compatibility, but no one should be using Helm v2 these days, so stick with `apiVersion: v2`.
583 |
584 | cat << EOF > helm/templates/deployment.yaml
585 | apiVersion: apps/v1
586 | kind: Deployment
587 | metadata:
588 | name: echo
589 | namespace: echo
590 | spec:
591 | selector:
592 | matchLabels:
593 | app: echo
594 | replicas: 1
595 | template:
596 | metadata:
597 | labels:
598 | app: echo
599 | spec:
600 | containers:
601 | - name: echo
602 | image: localhost:5000/echo:latest
603 | imagePullPolicy: Always
604 | ports:
605 | - containerPort: 8080
606 | stdin: true
607 | tty: true
608 | EOF
609 |
610 | The eagle-eyed among you will note that this is largely the same, except that we've added a `containerPort` that we'll be talking to.
611 |
612 | cat << EOF > helm/templates/service.yaml
613 | apiVersion: v1
614 | kind: Service
615 | metadata:
616 | name: echo
617 | namespace: echo
618 | spec:
619 | ports:
620 | - protocol: TCP
621 | port: 8080
622 | targetPort: 8080
623 | selector:
624 | app: echo
625 | type: NodePort
626 | EOF
627 |
628 | The Service will connect our app out to the cluster - in this case, via a NodePort, which means that a random port will be opened on every node. `targetPort` is actually redundant here, as it defaults to the same port as `port`, but it's shown for education. In production, you would typically use a `LoadBalancer` service.
629 |
630 | cat << EOF > helm/templates/ingress.yaml
631 | apiVersion: networking.k8s.io/v1
632 | kind: Ingress
633 | metadata:
634 | name: echo
635 | namespace: echo
636 | annotations:
637 | nginx.ingress.kubernetes.io/rewrite-target: /$1
638 | spec:
639 | rules:
640 | - host: echo.internal
641 | http:
642 | paths:
643 | - path: /
644 | pathType: Prefix
645 | backend:
646 | service:
647 | name: echo
648 | port:
649 | number: 8080
650 | EOF
651 |
652 | We're using an Ingress here to route traffic to the service, and ultimately, to the pod.
653 |
654 | cat << EOF > helm/templates/namespace.yaml
655 | apiVersion: v1
656 | kind: Namespace
657 | metadata:
658 | name: echo
659 | EOF
660 |
661 | The Namespace definition hasn't changed. We could also rely on Helm to do this for us, with its `--create-namespace` flag.
662 |
663 | cat << EOF > helm/templates/NOTES.txt
664 | To access, please run the following command:
665 |
666 | sudo echo "$(minikube ip) echo.internal" >> /etc/hosts
667 |
668 | Then go to http://echo.internal inyour browser.
669 |
670 | To clean up, run the following command:
671 |
672 | sudo sed -i '$d' /etc/hosts
673 | EOF
674 |
675 | `NOTES.txt` is a special file for Helm, which it will render when you run `helm install` as helpful tips to the user. In this case, we're explaining how to edit the `/etc/hosts` file so that the URI resolves.
676 |
677 | ### App
678 |
679 | But wait, I hear you saying, the app didn't have any web server! You're correct, so let's remedy that quickly:
680 |
681 | mkdir -p echo/templates && \
682 | cat << EOF > echo/echo.py
683 | #!/usr/bin/env python
684 |
685 | from flask import Flask, request, render_template
686 |
687 | app = Flask(__name__)
688 |
689 | @app.route("/")
690 | def form():
691 | return render_template("index.html")
692 |
693 | @app.route("/", methods=["POST"])
694 | def form_post():
695 | return request.form["echo_input"]
696 |
697 | if __name__ == "__main__":
698 | app.run(host="0.0.0.0", port=8080, debug=True)
699 | EOF
700 |
701 | ---
702 |
703 | cat << EOF > echo/templates/index.html
704 |
705 |
706 | Echo (echo...)
707 |
708 |
709 | Echo (echo...)
710 |
714 |
715 |
716 | EOF
717 |
718 | (No one will ever accuse me of being a frontend dev. I regret nothing.)
719 |
720 | We need to make sure Docker can install Flask (ideally this would be pinned to a specific version): `echo "flask" > echo/requirements.txt`
721 |
722 | Next, we need to update the `Dockerfile`.
723 |
724 | cat << EOF > echo/Dockerfile
725 | FROM python:3.10-alpine
726 |
727 | WORKDIR /app
728 |
729 | COPY . /app
730 |
731 | RUN pip install -r requirements.txt
732 |
733 | EXPOSE 8080
734 |
735 | CMD ["python", "/app/echo.py"]
736 | EOF
737 |
738 | If you're using a local registry, you'll also need build, tag, and push this new image:
739 |
740 | docker build -t echo echo && \
741 | docker tag echo:latest localhost:5000/echo:latest && \
742 | docker push localhost:5000/echo:latest
743 |
744 | ### Installation
745 |
746 | To install the Chart, let's first see what it would do:
747 |
748 | ❯ helm install --dry-run --debug echo helm/
749 | install.go:178: [debug] Original chart version: ""
750 | install.go:195: [debug] CHART PATH: /Users/sgarland/git/zapier/intro-to-x/k8s/helm
751 |
752 | NAME: echo
753 | LAST DEPLOYED: Fri May 6 17:02:50 2022
754 | NAMESPACE: default
755 | STATUS: pending-install
756 | REVISION: 1
757 | TEST SUITE: None
758 | USER-SUPPLIED VALUES:
759 | {}
760 |
761 | COMPUTED VALUES:
762 | {}
763 |
764 | HOOKS:
765 | MANIFEST:
766 | ---
767 | # Source: echo/templates/namespace.yaml
768 | apiVersion: v1
769 | kind: Namespace
770 | metadata:
771 | name: echo
772 | ---
773 | # Source: echo/templates/service.yaml
774 | apiVersion: v1
775 | kind: Service
776 | metadata:
777 | name: echo
778 | namespace: echo
779 | spec:
780 | ports:
781 | - protocol: TCP
782 | port: 8080
783 | targetPort: 8080
784 | selector:
785 | app: echo
786 | type: NodePort
787 | ---
788 | # Source: echo/templates/deployment.yaml
789 | apiVersion: apps/v1
790 | kind: Deployment
791 | metadata:
792 | name: echo
793 | namespace: echo
794 | spec:
795 | selector:
796 | matchLabels:
797 | app: echo
798 | replicas: 1
799 | template:
800 | metadata:
801 | labels:
802 | app: echo
803 | spec:
804 | containers:
805 | - name: echo
806 | image: localhost:5000/echo:latest
807 | imagePullPolicy: Always
808 | ports:
809 | - containerPort: 8080
810 | stdin: true
811 | tty: true
812 | ---
813 | # Source: echo/templates/ingress.yaml
814 | apiVersion: networking.k8s.io/v1
815 | kind: Ingress
816 | metadata:
817 | name: echo
818 | namespace: echo
819 | annotations:
820 | nginx.ingress.kubernetes.io/rewrite-target: /
821 | spec:
822 | rules:
823 | - host: echo.internal
824 | http:
825 | paths:
826 | - path: /
827 | pathType: Prefix
828 | backend:
829 | service:
830 | name: echo
831 | port:
832 | number: 8080
833 |
834 | NOTES:
835 | To access, please run the following command:
836 |
837 | sudo echo "$(minikube ip) echo.internal" >> /etc/hosts
838 |
839 | Then go to http://echo.internal inyour browser.
840 |
841 | To clean up, run the following command:
842 |
843 | sudo sed -i '$d' /etc/hosts
844 |
845 | Looks good! To install it, we can use the `ugprade` command with the `--install` flag - this way, if we need to make any changes, we don't have to type out a new command.
846 |
847 | ❯ helm upgrade --install echo helm
848 | Release "echo" does not exist. Installing it now.
849 | NAME: echo
850 | LAST DEPLOYED: Fri May 6 17:04:43 2022
851 | NAMESPACE: default
852 | STATUS: deployed
853 | REVISION: 1
854 | TEST SUITE: None
855 | NOTES:
856 | To access, please run the following command:
857 |
858 | sudo echo "$(minikube ip) echo.internal" >> /etc/hosts
859 |
860 | Then go to http://echo.internal inyour browser.
861 |
862 | To clean up, run the following command:
863 |
864 | sudo sed -i '$d' /etc/hosts
865 |
866 | Let's add the `/etc/hosts` entry, then we can test it out! Note that if you're using the `docker` driver, you'll need to first run `minikube service echo -n echo --url` in a separate terminal, and keep it open for the next step. Also, replace the address you cURL to with the one minikube gives you (the ingress is largely useless here, although you could add it to `/etc/hosts` if you'd like).
867 |
868 | sudo echo "$(minikube ip) echo.internal" >> /etc/hosts
869 | ---
870 | ❯ curl -d 'echo_input=Hello, world!' -X POST http://echo.internal
871 | Hello, world!
872 |
873 | # RBAC
874 |
875 | RBAC is Role-based Access Control. It's a way to control access to resources based on a user (or group's) role, rather than their identity. The assumption is that you have something else (like Okta) to authenticate the user, and then, RBAC will control that user's ability to access or modify resources.
876 |
877 | ## Example
878 |
879 | ### Generating a certificate
880 |
881 | We're going to create a certificate and user to demonstrate how RBAC works.
882 |
883 | mkdir cert && openssl genrsa -out cert/echo-user.key 4096 && openssl req -new \
884 | -key cert/echo-user.key -out cert/echo-user.csr -subj "/CN=echo-user/O=echo-group" \
885 | && openssl x509 -req -in cert/echo-user.csr -CA ~/.minikube/ca.crt \
886 | -CAkey ~/.minikube/ca.key -CAcreateserial -out cert/echo-user.crt -days 365 \
887 | || echo "Failed to create cert! Please check that ~/.minikube/ca.{crt,key} exist."
888 |
889 | Should result in the following:
890 |
891 | Generating RSA private key, 4096 bit long modulus
892 | ...........................++
893 | ...........................++
894 | e is 65537 (0x10001)
895 | Signature ok
896 | subject=/CN=echo-user/O=echo-group
897 | Getting CA Private Key
898 |
899 | This one-liner uses the `openssl` tool to first create a 4096-bit RSA private key, then requests a Certificate Request using that key, and finally creates a certificate signed by the Minikube Certificate Authority, with an expiry of 365 days. The ending part, if you're not familiar with shell, is an `OR` that only executes if the previous command fails - since that command is relying on two files existing in `~/.minikube`, there's a pretty good chance that they're the reason for the failure, hence the message.
900 |
901 | ### Creating a user
902 |
903 | Now, we're going to create a user entry in our kubeconfig, then create a context using it.
904 |
905 | ❯ kubectl config set-credentials echo-user --client-certificate=cert/echo-user.crt \
906 | --client-key=cert/echo-user.key
907 | User "echo-user" set.
908 |
909 | ❯ kubectl config set-context echo-user-context --cluster=minikube --user=echo-user
910 | Context "echo-user-context" created.
911 |
912 | ❯ kubectl config use-context echo-user-context
913 | Switched to context "echo-user-context".
914 |
915 | ### Testing out the user
916 |
917 | Let's create a namespace again:
918 |
919 | ❯ kubectl create ns foobar
920 | Error from server (Forbidden): namespaces is forbidden: User "echo-user" cannot create resource "namespaces" in API group "" at the cluster scope
921 |
922 | Since Minikube is installed with its default context of `minikube`, this additional user we've added has no permissions to do, well, anything. Try `kubectl get pods` or some other read-only action, and check the result.
923 |
924 | ### Adding RBAC
925 |
926 | RBAC definitions consist of two parts - Role, and RoleBinding - and are are either scoped to a cluster, or to a namespace. Helpfully, cluster-scoped RBAC objects are named ClusterRoles and ClusterRoleBindings.
927 |
928 | cat << EOF > helm/templates/rbac.yaml
929 | apiVersion: rbac.authorization.k8s.io/v1
930 | kind: Role
931 | metadata:
932 | name: echo-ro
933 | namespace: echo
934 | rules:
935 | - apiGroups: [""]
936 | resources: ["pods"]
937 | verbs: ["get", "list", "watch"]
938 | ---
939 | apiVersion: rbac.authorization.k8s.io/v1
940 | kind: RoleBinding
941 | metadata:
942 | name: echo-ro
943 | namespace: echo
944 | subjects:
945 | - kind: User
946 | name: echo-user
947 | apiGroup: rbac.authorization.k8s.io
948 | roleRef:
949 | kind: Role
950 | name: echo-ro
951 | apiGroup: rbac.authorization.k8s.io
952 | EOF
953 |
954 | This is two RBAC objects in one file - a Role, and a RoleBinding. They're both scoped to the `echo` namespace, and as the name implies, they create a read-only role for the `echo-user` user we previously created. Note that you'll need to switch back to the `minikube` context to apply this (do you remember how?).
955 |
956 | ❯ helm upgrade --install echo helm
957 | Release "echo" has been upgraded. Happy Helming!
958 | NAME: echo
959 | LAST DEPLOYED: Wed May 18 10:51:41 2022
960 | NAMESPACE: default
961 | STATUS: deployed
962 | REVISION: 2
963 | TEST SUITE: None
964 | NOTES:
965 | To access, please run the following command:
966 |
967 | sudo echo "$(minikube ip) echo.internal" >> /etc/hosts
968 |
969 | Then go to http://echo.internal inyour browser.
970 |
971 | To clean up, run the following command:
972 |
973 | sudo sed -i '$d' /etc/hosts
974 |
975 | ### Verifying RBAC
976 |
977 | We can of course use `kubectl get role -n echo` and `kubectl get rolebinding -n echo` to view our newly-available RBAC, but `kubectl` includes a very useful feature called `kubectl auth can-i` which allows you to check if you have the ability to do a given action, as a given user and/or group. Cluster administrators can impersonate another user (this is very useful for SREs) with the `--as user.name` flag, but anyone can use it to check their current ability.
978 |
979 | ❯ kubectl config use-context echo-user-context
980 | Switched to context "echo-user-context".
981 |
982 | ❯ kubectl auth can-i get pods -n echo
983 | yes
984 |
985 | ❯ kubectl auth can-i get pods -n echo --as foobar
986 | Error from server (Forbidden): users "foobar" is forbidden: User "echo-user" cannot impersonate resource "users" in API group "" at the cluster scope
987 |
988 | ❯ kubectl auth can-i create pods -n echo
989 | no
990 |
991 | ❯ kubectl auth can-i get pods -n kube-system
992 | no
993 |
994 | ❯ kubectl auth can-i create pods --subresource exec -n echo
995 | no
996 |
997 | This last one can be problematic, and indeed, is/was the source of much pain in SRE land as developers were unable to exec into bastion pods. `exec` is a subset of `create`, and specific permission must be granted to do so. If you try without having the requisite permission, you'll see this:
998 |
999 | ❯ kubectl exec -it -n echo echo-75897c68fd-nhn64 -- sh
1000 | Error from server (Forbidden): pods "echo-75897c68fd-nhn64" is forbidden: User "echo-user" cannot create resource "pods/exec" in API group "" in the namespace "echo"
1001 |
1002 | ### Modifying RBAC
1003 |
1004 | ex helm/templates/rbac.yaml <> helm/templates/deployment.yaml
1048 | resources:
1049 | limits:
1050 | cpu: 100m
1051 | memory: 128Mi
1052 | requests:
1053 | cpu: 50m
1054 | memory: 50Mi
1055 | EOF
1056 |
1057 | Now run a `helm upgrade` cycle (make sure you're back to the `minikube` context), then examine the deployment, and finally, the pod's YAML manifest to view changes.
1058 |
1059 | ## Exploring limits and requests
1060 |
1061 | Play around (you can use `kubectl edit deployment` to speed things up) with limits and requests, and see how the scheduler and kubelet respond to combinations.
1062 |
1063 | # More to explore
1064 |
1065 | * This application could be put behind a load balancer (you could set up [MetalLB](https://metallb.universe.tf/) locally if you'd like), with additional replicas.
1066 | * This application runs as root, which is not recommended. How could you fix that?
1067 | * User entries could be captured and sent to a database stored in a dynamically generated Persistent Volume, with additional routes enabling historical views.
1068 | * HPA (Horizontal Pod Autoscaler) could be set up, along with some load testing mechanism, to demonstrate how Kubernetes will scale the application in response to demand.
1069 | * KEDA (Kubernetes Event-driven Autoscaling) could be set up to automatically scale on metrics other than CPU or Memory.
1070 |
--------------------------------------------------------------------------------
/k8s/k8s-102.md:
--------------------------------------------------------------------------------
1 | # WIP DRAFT
2 |
3 | ### Viewing resources in a container
4 |
5 | # Memory limits are in /sys/fs/cgroup/memory/memory.limit_in_bytes
6 | # We can use the bash-ism `(())` to do math, converting it to MiB
7 | # Alternately if you have `bc`, you can use that, as well as `awk`
8 | ❯ echo $(($(< /sys/fs/cgroup/memory/memory.limit_in_bytes) / 1048576))
9 | 2048
10 |
11 | # Memory requests would be in /sys/fs/cgroup/memory/memory.soft_limit_in_bytes if
12 | # Kubernetes followed normal Linux memory accounting practices, but it doesn't
13 |
14 | ❯ cat /sys/fs/cgroup/memory/memory.soft_limit_in_bytes
15 | 9223372036854771712
16 |
17 |
18 | Wondering what on earth 9223372036854771712 bytes is? Is this a hint?
19 | ❯ printf "%x\n" $(< /sys/fs/cgroup/memory/memory.soft_limit_in_bytes)
20 | 7ffffffffffff000
21 |
22 | # CPU requests are in /sys/fs/cgroup/cpu/cpu.shares, with a single core/vCPU being equal to 1024
23 | # This is thus 256 / 1024 == 0.25
24 | ❯ cat /sys/fs/cgroup/cpu/cpu.shares
25 | 256
26 |
27 | # CPU limits have to be calculated, as it's a combination of quota and period
28 | ❯ cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
29 | 150000
30 |
31 | ❯ cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
32 | 100000
33 |
34 | # So, CPU limits are:
35 | ❯ echo $(($(< /sys/fs/cgroup/cpu/cpu.cfs_quota_us) / $(< /sys/fs/cgroup/cpu/cpu.cfs_period_us)))
36 | 1 # ???
37 |
38 | # Bash doesn't handle floats, as it turns out - the answer is 1.5 vCPUs
39 | ❯ awk -v quota="$(< /sys/fs/cgroup/cpu/cpu.cfs_quota_us)" \
40 | -v period="$(< /sys/fs/cgroup/cpu/cpu.cfs_period_us)" \
41 | '{print quota/period}' <(echo)
42 | 1.5
43 |
44 | ### Viewing resources on the host
45 |
46 | So what if you want to view a given container's resources from the host? More Linux internals, I'm afraid.
47 |
48 | This specific example comes from my homelab (so does the above), but once we have requests and limits set for our application, we can circle back and view them on the `minikube` node.
49 |
50 | # I'm going to look for an app called `radarr` that I know is running on this node
51 |
52 | dell01-k3s-worker-01 [~]$ ps -ax | grep radarr
53 | 3028 ? S 0:00 s6-supervise radarr
54 | 3030 ? Ssl 1159:40 /app/radarr/bin/Radarr -nobrowser -data=/config
55 | 10484 pts/0 R+ 0:00 grep radarr
56 |
57 | # Then, I'll look at its `/proc` filesystem entry
58 | dell01-k3s-worker-01 [~]$ cat /proc/3030/cgroup
59 | 15:name=openrc:/k3s-service
60 | 14:name=systemd:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
61 | 13:rdma:/
62 | 12:pids:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
63 | 11:hugetlb:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
64 | 10:net_prio:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
65 | 9:perf_event:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
66 | 8:net_cls:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
67 | 7:freezer:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
68 | 6:devices:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
69 | 5:memory:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
70 | 4:blkio:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
71 | 3:cpuacct:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
72 | 2:cpu:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
73 | 1:cpuset:/kubepods/burstable/pod78e3f455-3991-4e0c-a076-07ad534e7a95/2d3023473e0cc6e72b8c5b52007d7e315c6e0b283ad95b86978a315cc3028543
74 | 0::/k3s-service
75 |
76 | # cgroups inherit from their parents, incidentally, so everything here is inheriting
77 | # from both the `burstable` and `kubepods` cgroups
78 |
79 | # We'll use `awk` to grab what we want from that list, then command substitution
80 | dell01-k3s-worker-01 [~]$ ls -l /sys/fs/cgroup/memory/$(awk -F: '/memory/ {print $NF}' /proc/3030/cgroup)
81 | total 0
82 | -rw-r--r-- 1 root root 0 May 18 15:39 cgroup.clone_children
83 | --w--w--w- 1 root root 0 May 5 17:50 cgroup.event_control
84 | -rw-r--r-- 1 root root 0 May 18 15:51 cgroup.procs
85 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.failcnt
86 | --w------- 1 root root 0 May 18 15:51 memory.force_empty
87 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.failcnt
88 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.limit_in_bytes
89 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.max_usage_in_bytes
90 | -r--r--r-- 1 root root 0 May 18 15:51 memory.kmem.slabinfo
91 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.tcp.failcnt
92 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.tcp.limit_in_bytes
93 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.kmem.tcp.max_usage_in_bytes
94 | -r--r--r-- 1 root root 0 May 18 15:39 memory.kmem.tcp.usage_in_bytes
95 | -r--r--r-- 1 root root 0 May 18 15:39 memory.kmem.usage_in_bytes
96 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.limit_in_bytes
97 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.max_usage_in_bytes
98 | -rw-r--r-- 1 root root 0 May 18 15:51 memory.move_charge_at_immigrate
99 | -r--r--r-- 1 root root 0 May 18 15:39 memory.numa_stat
100 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.oom_control
101 | ---------- 1 root root 0 May 18 15:51 memory.pressure_level
102 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.soft_limit_in_bytes
103 | -r--r--r-- 1 root root 0 May 18 15:39 memory.stat
104 | -rw-r--r-- 1 root root 0 May 18 15:51 memory.swappiness
105 | -r--r--r-- 1 root root 0 May 18 15:39 memory.usage_in_bytes
106 | -rw-r--r-- 1 root root 0 May 18 15:39 memory.use_hierarchy
107 | -rw-r--r-- 1 root root 0 May 18 15:51 notify_on_release
108 | -rw-r--r-- 1 root root 0 May 18 15:51 tasks
109 |
110 | # Looks familiar, right?
111 |
112 | dell01-k3s-worker-01 [~]$ echo $(($(< /sys/fs/cgroup/memory/$(awk -F: '/memory/ {print $NF}' /proc/3030/cgroup)/memory.limit_in_bytes) / 1048576))
113 | 2048
114 |
115 | # Finding the CPU information from the host is left as an exercise for the reader.
116 |
117 | ## Setting resource limits and requests
118 |
119 | cat << EOF >> helm/templates/deployment.yaml
120 | resources:
121 | limits:
122 | cpu: 100m
123 | memory: 128Mi
124 | requests:
125 | cpu: 50m
126 | memory: 50Mi
127 | EOF
128 |
129 | Now run a `helm upgrade` cycle (make sure you're back to the `minikube` context), then exec back into the pod to examine it.
130 |
131 | /app # echo $(($(< /sys/fs/cgroup/memory/memory.usage_in_bytes) / 1048576))
132 | sh: arithmetic syntax error
133 |
134 | # As it turns out, the $(< ) command is a bash-ism for `cat`, and this is `sh`, not `bash`
135 |
136 | /app # echo $(($(cat /sys/fs/cgroup/memory/memory.usage_in_bytes) / 1048576))
137 | 37
138 |
139 | # So, our app is using about 37 MiB of memory.
140 |
141 | /app # echo $(($(cat /sys/fs/cgroup/memory/memory.limit_in_bytes) / 1048576))
142 | 128
143 |
144 | And we can see that our 128 MiB limit has been set.
--------------------------------------------------------------------------------
/mysql/mysql-101-0.md:
--------------------------------------------------------------------------------
1 | # MySQL 101 Part I
2 |
3 | - [MySQL 101 Part I](#mysql-101-part-i)
4 | - [Prerequisites](#prerequisites)
5 | - [MySQL Client](#mysql-client)
6 | - [GUI](#gui)
7 | - [TUI](#tui)
8 | - [Introduction](#introduction)
9 | - [What is SQL?](#what-is-sql)
10 | - [What is a relational database?](#what-is-a-relational-database)
11 | - [What is ACID?](#what-is-acid)
12 | - [What is MySQL?](#what-is-mysql)
13 | - [How is it pronounced?](#how-is-it-pronounced)
14 | - [Basic definitions](#basic-definitions)
15 | - [SQL sub-languages](#sql-sub-languages)
16 | - [Other definitions](#other-definitions)
17 | - [MySQL Components](#mysql-components)
18 | - [MySQL Operations](#mysql-operations)
19 | - [Assumptions](#assumptions)
20 | - [Notes](#notes)
21 | - [Schemata](#schemata)
22 | - [Schema spelunking](#schema-spelunking)
23 | - [String literals](#string-literals)
24 | - [SQL\_MODE](#sql_mode)
25 | - [Create a schema](#create-a-schema)
26 | - [Table operations](#table-operations)
27 | - [Create tables](#create-tables)
28 | - [Data types](#data-types)
29 | - [Foreign keys](#foreign-keys)
30 | - [Why you might want foreign keys](#why-you-might-want-foreign-keys)
31 | - [Creating a foreign key](#creating-a-foreign-key)
32 | - [Demonstrating a foreign key](#demonstrating-a-foreign-key)
33 | - [Determining table size](#determining-table-size)
34 | - [Column operations](#column-operations)
35 | - [Adding columns](#adding-columns)
36 | - [Modfying columns](#modfying-columns)
37 | - [Dropping tables with foreign keys](#dropping-tables-with-foreign-keys)
38 | - [Copied table definitions](#copied-table-definitions)
39 | - [Copied table data and truncating](#copied-table-data-and-truncating)
40 | - [Transactions](#transactions)
41 | - [Generated columns](#generated-columns)
42 | - [Invisible columns](#invisible-columns)
43 |
44 | ## Prerequisites
45 |
46 | ### MySQL Client
47 |
48 | You'll need to have a MySQL client. In order of preference, some options for GUI (graphical) and TUI (terminal) are:
49 |
50 | #### GUI
51 |
52 | - [Sequel Ace](https://sequel-ace.com/)
53 | - Install from App Store, or with [Homebrew](https://brew.sh/): `HOMEBREW_NO_AUTO_UPDATE=1 brew install --cask sequel-ace`
54 | - [MySQL Workbench](https://www.mysql.com/products/workbench/)
55 | - [DBeaver](https://dbeaver.io/)
56 |
57 | #### TUI
58 |
59 | - [mysql-client](https://dev.mysql.com/doc/refman/8.0/en/mysql.html)
60 | - Install with [Homebrew](https://brew.sh/): `HOMEBREW_NO_AUTO_UPDATE=1 brew install mysql-client`
61 |
62 |
63 | Note that the server is currently using a self-signed TLS certificate, which some clients may complain about. Sequel Ace, MySQL Workbench, and msyql-client are proven to work without issue. Also note that mysql-client is available via [Homebrew](https://formulae.brew.sh/formula/mysql-client), but it won't symlink by default, so you'll need to do something like `brew link --force mysql-client`.
64 |
65 | WARNING: MySQL Workbench may not work with M1/M2 (ARM) Macs.
66 |
67 | ## Introduction
68 |
69 | ## What is SQL?
70 |
71 | Structured Query Language. It's a domain-specific language designed to manage data in a Relational Database Management System (RDBMS). It's been extended and updated many times, both in its official ANSI definition, and in implementations of it like MySQL and PostgreSQL.
72 |
73 | ## What is a relational database?
74 |
75 | It's what most people probably think of when they think of a database. Broadly speaking, data is related to other data in some manner. For example, observe these two tables (tl;dr a logical grouping of data):
76 |
77 | ```sql
78 | SHOW COLUMNS FROM users;
79 | ```
80 |
81 | ```sql
82 | +------------+----------+------+-----+---------+----------------+
83 | | Field | Type | Null | Key | Default | Extra |
84 | +------------+----------+------+-----+---------+----------------+
85 | | id | bigint | NO | PRI | NULL | auto_increment |
86 | | first_name | char(64) | YES | | NULL | |
87 | | last_name | char(64) | YES | | NULL | |
88 | | user_id | bigint | NO | UNI | NULL | |
89 | +------------+----------+------+-----+---------+----------------+
90 | 4 rows in set (0.09 sec)
91 | ```
92 |
93 | ```sql
94 | SHOW COLUMNS FROM zaps;
95 | ```
96 |
97 | ```sql
98 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+
99 | | Field | Type | Null | Key | Default | Extra |
100 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+
101 | | id | bigint unsigned | NO | PRI | NULL | auto_increment |
102 | | zap_id | bigint unsigned | NO | UNI | NULL | |
103 | | created_at | timestamp | NO | | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
104 | | last_updated_at | timestamp | YES | | NULL | on update CURRENT_TIMESTAMP |
105 | | owned_by | bigint unsigned | NO | MUL | NULL | |
106 | | shared_with | json | YES | | json_array() | DEFAULT_GENERATED |
107 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+
108 | 6 rows in set (0.01 sec)
109 | ```
110 |
111 | Table `users` has four columns - `id`, `first_name`, `last_name`, and `user_id`. Table `zaps` has six columns - `id`, `zap_id`, `created_at`, `last_updated_at`, `owned_by`, and `shared_with`.
112 |
113 | Although it isn't explicitly defined or enforced, there is an implicit relationship between these two tables via `users.user_id` and `zaps.owned_by`. Thus, a query like `SELECT zap_id, owned_by FROM zaps JOIN users ON user_id = owned_by;` could use that relationship. Ideally, there would be additional constraints like foreign keys established to ensure referential integrity, but this example suffices for now.
114 |
115 | Also, generally speaking, RDBMS are ACID-compliant (but not always).
116 |
117 | ## What is ACID?
118 |
119 | ACID is a set of four properties that, if implemented correctly, guarantee data validity:
120 |
121 | - Atomicity
122 | - In a given transaction, each statement must either completely succeed, or fail. If any statement in a transaction fails, the entire transaction must fail.
123 | - Consistency
124 | - A given transaction can only move a database from one valid and consistent state to another.
125 | - Isolation
126 | - Even with concurrent transactions executing, the database must end up in the same state as if each transaction were executed sequentially.
127 | - Durability
128 | - Once a transaction is committed, it must remain committed in the event of a system failure.
129 |
130 | Note that the lack of one or more of these properties does not necessarily mean that data committed is invalid, only that the guarantees granted by that particular property must be accounted for elsewhere. A common counter-example of this is Eventual Consistency with distributed systems.
131 |
132 | ### What is MySQL?
133 |
134 | It's an extremely popular row-based relational database implementing and extending ANSI SQL. It's unfortunately owned by Oracle, but if you'd prefer, the MariaDB fork is essentially the same thing.
135 |
136 | #### How is it pronounced?
137 |
138 | Officially, "My Ess Que Ell," but since the SQL language was originally called SEQUEL ("Structured English Query Language"), and only changed due to trademark issues, I feel at ease saying "My Sequel." However, this tends to bring out pedants who love to haughtily correct your pronunciation, so do what you will. For what it's worth, I also pronounce kubectl (the Kubernetes CLI tool) as "kube cuddle," so I may not be the greatest influence.
139 |
140 | ## Basic definitions
141 |
142 | ### SQL sub-languages
143 |
144 | All of these can be grouped as SQL, and some of them can also be combined - `DQL` is often merged with `DML`, for example. Knowing that `DML` is generally operating on a single record at a time (but may be batched), and that `DDL` is generally operating on an entire table or schema at a time suffices for now.
145 |
146 | - DCL
147 | - Data Control Language. `GRANT`, `REVOKE`.
148 | - DDL
149 | - Data Definition Language. `ALTER`, `CREATE`, `DROP`, `TRUNCATE`.
150 | - DML
151 | - Data Manipulation Language. `CALL`, `DELETE`, `INSERT`, `LOCK`, `SELECT (with FROM or WHERE)`, `UPDATE`.
152 | - DQL
153 | - Data Query Language. `SELECT`.
154 | - TCL
155 | - Transaction Control Language. `COMMIT`, `ROLLBACK`, `SAVEPOINT`.
156 |
157 | ### Other definitions
158 |
159 | - B+ tree
160 | - An _m_`-ary tree` data structure that is self-balancing, with a variable number of children per node. It differs from the `B-tree` in that an individual data node can have either keys or children, but not both. It has `O(log(n))` time complexity for insertion, search, and deletion. It is frequently used both for filesystems and for RDBMS.
161 | - Block
162 | - The lowest reasonable level of data storage (above individual bits). Historically sized at 512 bytes due to hard drive sector sizes, but generally sized at 4 KiB in modern drives, and SSDs. Enterprise drives sometimes have 520 byte block sizes (or 4160 bytes for the 4 KiB-adjacent size), with the extra 8 bytes being used for data integrity calculations.
163 | - Filesystem
164 | - A method for the operating system to store data. May include features like copy-on-write, encryption, journaling, pre-allocation, SSD management, volume management, and more. Modern examples include APFS (default for Apple products), ext4 (default for most Linux distributions), NTFS (default for Windows), XFS (default for Red Hat and its downstream), and ZFS (default for FreeBSD).
165 | - Schema
166 | - A logical grouping of database objects, e.g. tables, indices, etc. Often called a database, but technically, the database may contain any number of schemas, each with its own unique (or shared!) set of data, access policies, etc.
167 | - Table
168 | - A logical grouping of data, of varying or similar types. May contain constraints, indices, etc.
169 | - Tablespace
170 | - The link between the logical storage layer (tables, indices) and the physical storage layer (the disk's filesystem). This is an actual file that exists on the disk, contained in `$MYSQL_DATA_DIR`, nominally `/var/lib/mysql`.
171 | - As an aside, this fact, combined with [RDS MySQL file size limits](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.KnownIssuesAndLimitations.html#MySQL.Concepts.Limits.FileSize) yields some interesting information about RDS. Since they used to (anything created before April 2014) limit a table to 2 TiB*, that means that they were using ext3, as that is its maximum file size. Instances created after April 2014 are limited to 16 TiB* files, indicating that they are probably now using ext4, as that is generally its maximum file size. 16 TB is also the limit for InnoDB with 4 KB InnoDB page sizes, so it's possible the underlying disk's filesystem is XFS or something else, but since that value defaults to 16 KB, it seems unlikely.
172 |
173 |
174 | What's a TiB?
175 |
176 | A TiB (or MiB, or GiB..) is how data is actually sized, in base-2. Written out, instead of Terabytes, it's Tebibytes, and is _2^40 bytes_ instead of _10^12 bytes_ (Terabytes are base-10). Base-10 caught on for storage marketing since the number is larger and thus sounds better, but in reality you're getting less. This is why a 1 TB hard drive shows up on your computer as having 931 GB - because it's actually 931 GiB, but it gets displayed as GB since GiB as a term never caught on.
177 |
178 | In specific relation to this point, AWS' docs state that the limits are in TB (terabytes) instead of TiB (tebibytes). It's possible that their VM subsystem limits the size to n TB, but the actual filesystem is capable of n TiB.
179 |
180 |
181 |
182 | # MySQL Components
183 |
184 | As of MySQL 8.0, this is the official architecture drawing:
185 |
186 | 
187 |
188 | * Connector
189 | * Also known as the Client, this is how you interact with the database, be it manually via a CLI client tool, or via a program using the DB.
190 | * Server
191 | * Parser
192 | * This component receives a human-readable query, and translates it into machine-readable commands, via a lexical scanner and a grammar rule module.
193 | * Optimizer
194 | * This component attempts to optimize a given query using its knowledge of the stored data, such that the relative compute time of the query is minimized.
195 | * Caches/Buffers
196 | * This component has various caches to store frequently-accessed data, temporary tables created for use by other queries, etc.
197 | * SQL Interface
198 | * This component is the link between the Connector and the rest of the Server.
199 | * Storage Engine
200 | * This component stores and manages the actual databases. Historically MySQL used the MyISAM engine, but switched to InnoDB with version 5.6. Both (and others) remain available if desired, but unless you have an extremely specific use case, you should use InnoDB.
201 |
202 | # MySQL Operations
203 |
204 | ## Assumptions
205 |
206 | - All examples here are using MySQL 8.0.23, with the InnoDB engine.
207 | - All examples here are using the mysql-client TUI program, but others may work as well.
208 |
209 | ## Notes
210 |
211 | - MySQL is case-insenitive for most, but not all operations. I'll use `UPPERCASE` to designate commands, and `lowercase` to desginate arguments and schema, table, and column names, but you're welcome to use all lowercase.
212 | - The `;` suffix to commands serves as both the command terminator, and specifies that the output should be in an ASCII table.
213 | - The `\G` suffix to commands is an alternative terminator, and specifies that the output should be in a vertical, non-tabular format.
214 | - Not all clients support this. If you're using a GUI client like Sequel Ace, you can simply scroll the output window horizontally, or expand it to make it bigger.
215 | - I'm formatting my queries with statements and clauses on the left, their arguments indented by two spaces, and any qualifiers on the same line, where possible.
216 | - This was developed on a Debian VM with 16 cores of a Xeon E5-2650 v2, 64 GiB of DDR3 RAM, and a working directory which is an NFS export over a 1GBe network, consisting of a ZFS RAIDZ2 array of spinning disks; ashift=12, blocksize=128K. Your times will vary, based mostly on the disk and RAM speed.
217 |
218 | ## Schemata
219 |
220 | A brand-new installation of MySQL will typically have four schemata - `information_schema`, `mysql`, `performance_schema`, and `sys`.
221 |
222 | - `information_schema` contains information about the schema in the database. This includes columns, column types, indices, foreign keys, and tables.
223 | - `mysql` generally contains configuration and logs.
224 | - `sys` generally contains information about the SQL engine (InnoDB here), including currently executing processes, and query metrics.
225 | - `performance_schema` contains some specific performance information about the schema in the database, such as deadlocks, locks, memory consumption, mutexes, and threads.
226 |
227 | ## Schema spelunking
228 |
229 | As mentioned, `databases` is often used to mean `schema`, and in fact in MySQL they're synonyms for this statement - `SHOW schemas` results in the exact same output. You won't have the `test` database yet, but you should see the other four shown below. NOTE: I'll demonstrate both output formats here, and will switch as needed to easily display the information.
230 |
231 | ```sql
232 | SHOW schemas;
233 | ```
234 |
235 | ```sql
236 | +--------------------+
237 | | Database |
238 | +--------------------+
239 | | information_schema |
240 | | mysql |
241 | | northwind |
242 | | performance_schema |
243 | | sys |
244 | | test |
245 | +--------------------+
246 | 6 rows in set (0.01 sec)
247 | ```
248 |
249 | ```sql
250 | SHOW schemas\G
251 | ```
252 |
253 | ```sql
254 | *************************** 1. row ***************************
255 | Database: information_schema
256 | *************************** 2. row ***************************
257 | Database: mysql
258 | *************************** 3. row ***************************
259 | Database: northwind
260 | *************************** 4. row ***************************
261 | Database: performance_schema
262 | *************************** 5. row ***************************
263 | Database: sys
264 | *************************** 6. row ***************************
265 | Database: test
266 | 6 rows in set (0.01 sec)
267 | ```
268 |
269 | The `SHOW` statement behind the scenes is gathering and formatting data in a way that's easy for humans to see and understand. Often, it comes from the `information_schema` or `performance_schema` schema, as seen below. This query also demonstrates the use of the `AS` statement, which allows you to alias a column or sub-query.
270 |
271 | ```sql
272 | SELECT
273 | schema_name AS 'Database'
274 | FROM
275 | information_schema.schemata;
276 | ```
277 |
278 | ```sql
279 | +--------------------+
280 | | Database |
281 | +--------------------+
282 | | mysql |
283 | | information_schema |
284 | | performance_schema |
285 | | sys |
286 | | test |
287 | | northwind |
288 | +--------------------+
289 | 6 rows in set (0.01 sec)
290 | ```
291 |
292 | ### String literals
293 |
294 | You may have noticed that in the above examples, sometimes a column or table name was enclosed with a single quote (`'`), sometimes a backtick ( \` ), and other times nothing at all. This is deliberate.
295 |
296 | In ANSI SQL, string literals are represented with single quotation marks, e.g. 'test.' This mode is disabled by default in MySQL, so you're free to use double quotation marks if you'd prefer; however if you were trying to pass in a command to the client from a shell (e.g. `mysql -e 'SELECT foo FROM bar'`), you might run into shell expansion issues depending on your query. Also, since you'll probably be working with other SQL implementations like Postgres, it's best to try to stay as neutral as possible.
297 |
298 | Backticks may be used at any time, and are called quoted identifiers. They tell the SQL parser to consider anything enclosed in them as a string literal. This may be useful if, for example, you created a table named `table` (please don't), had a column named `count`, etc. The full list of keywords / reserved words [is here](https://dev.mysql.com/doc/refman/8.0/en/keywords.html) if you want to see what to avoid.
299 |
300 | ```sql
301 | CREATE TABLE table (id INT);
302 | ```
303 |
304 | ```sql
305 | ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table (id INT)' at line 1
306 | ```
307 |
308 | vs.
309 |
310 | ```sql
311 | CREATE TABLE `table` (id INT);
312 | ```
313 |
314 | ```sql
315 | Query OK, 0 rows affected (0.15 sec)
316 | ```
317 |
318 | #### SQL_MODE
319 |
320 | As it turns out, you can alter this behavior. First, let's check the current `SQL_MODE`. System variables can be viewed with either `SHOW VARIABLES` or `SELECT @@<[GLOBAL, SESSION]>`.
321 |
322 | ```sql
323 | SHOW VARIABLES LIKE 'sql_mode'\G
324 | ```
325 |
326 | ```sql
327 | *************************** 1. row ***************************
328 | Variable_name: sql_mode
329 | Value: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
330 | 1 row in set (0.01 sec)
331 | ```
332 |
333 | If neither `GLOBAL` or `SESSION` are specified when using the `@@` method, the session value is returned if it exists, otherwise the global value is returned.
334 |
335 | ```sql
336 | SELECT @@sql_mode\G
337 | ```
338 |
339 | ```sql
340 | *************************** 1. row ***************************
341 | @@sql_mode: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
342 | 1 row in set (0.00 sec)
343 | ```
344 |
345 | We'll use the `mysql.user` table for this example. First, no quotes of any kind. As expected, we get the rows from those two columns.
346 |
347 | ```sql
348 | SELECT host, user FROM mysql.user;
349 | ```
350 |
351 | ```sql
352 | +-------------+------------------+
353 | | host | user |
354 | +-------------+------------------+
355 | | % | zapier |
356 | | % | zapier_training |
357 | | 192.168.1.% | sgarland |
358 | | localhost | mysql.infoschema |
359 | | localhost | mysql.session |
360 | | localhost | mysql.sys |
361 | | localhost | root |
362 | +-------------+------------------+
363 | 7 rows in set (0.01 sec)
364 | ```
365 |
366 | Now, we'll mix single and double quotes.
367 |
368 | ```sql
369 | SELECT 'host', "user" FROM mysql.user;
370 | ```
371 |
372 | ```sql
373 | +------+------+
374 | | host | user |
375 | +------+------+
376 | | host | user |
377 | | host | user |
378 | | host | user |
379 | | host | user |
380 | | host | user |
381 | | host | user |
382 | | host | user |
383 | +------+------+
384 | 7 rows in set (0.00 sec)
385 | ```
386 |
387 | In MySQL's default mode, these two are treated the same, and you get the respective string literals printed as rows for the selected columns.
388 |
389 | If single (or double) quotes are combined with backticks, you get partial results.
390 |
391 | ```sql
392 | SELECT 'host', `user` FROM mysql.user;
393 | ```
394 |
395 | ```sql
396 | +------+------------------+
397 | | host | user |
398 | +------+------------------+
399 | | host | zapier |
400 | | host | zapier_training |
401 | | host | sgarland |
402 | | host | mysql.infoschema |
403 | | host | mysql.session |
404 | | host | mysql.sys |
405 | | host | root |
406 | +------+------------------+
407 | 7 rows in set (0.00 sec)
408 | ```
409 |
410 | Now, we'll modify the session's `sql_mode`. You don't have permission to set any global variables, but you can set most session variables. Unlike for the selection, if you don't specify `GLOBAL` or `SESSION`, the `SET` will always assume `SESSION`.
411 |
412 | ```sql
413 | SET @@sql_mode = ANSI_QUOTES;
414 | ```
415 |
416 | ```sql
417 | Query OK, 0 rows affected (0.00 sec)
418 |
419 | mysql> SELECT @@sql_mode\G
420 | *************************** 1. row ***************************
421 | @@sql_mode: ANSI_QUOTES
422 | 1 row in set (0.00 sec)
423 | ```
424 |
425 | Oh no, we've overridden all of the other settings! Luckily, the global variable hasn't been modified, so we can use it to build the correct setting. To do so, we'll use the `CONCAT_WS` function, which as the name implies, concatenates things with a separator. It takes the form `CONCAT_WS(sep, )`. We'll also run a `SELECT` of the global variable, nesting it as a sub-query.
426 |
427 | ```sql
428 | SET @@sql_mode = (SELECT CONCAT_WS(',', 'ANSI_QUOTES', (SELECT @@GLOBAL.sql_mode)));
429 | ```
430 |
431 | ```sql
432 | Query OK, 0 rows affected (0.01 sec)
433 | ```
434 |
435 | ```
436 | SELECT @@sql_mode\G
437 | ```
438 |
439 | ```sql
440 | *************************** 1. row ***************************
441 | @@sql_mode: ANSI_QUOTES,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
442 | 1 row in set (0.00 sec)
443 | ```
444 |
445 | Whew. Now we can try out the quoting differences again.
446 |
447 | ```sql
448 | SELECT 'host', "user" FROM mysql.user;
449 | ```
450 |
451 | ```sql
452 | +------+------------------+
453 | | host | USER |
454 | +------+------------------+
455 | | host | zapier |
456 | | host | zapier_training |
457 | | host | sgarland |
458 | | host | mysql.infoschema |
459 | | host | mysql.session |
460 | | host | mysql.sys |
461 | | host | root |
462 | +------+------------------+
463 | 7 rows in set (0.00 sec)
464 | ```
465 |
466 | This time, only single quotes are treated as string literals, with double quotes being treated as identifiers.
467 |
468 | Now, set the `SESSION.sql_mode` back to its original value, using a sub-query like before.
469 |
470 | ```sql
471 | SET @@sql_mode = (SELECT @@GLOBAL.sql_mode);
472 | ```
473 |
474 | ```sql
475 | Query OK, 0 rows affected (0.00 sec)
476 | ```
477 |
478 | ### Create a schema
479 |
480 | Let's create some tables! First, we need a schema. There aren't a lot of options here to be covered, so we can just create one. I'll be using `foo`, but you should substitute any name you'd like that's not already in use. Ideally, we would also enable encryption at rest. This can be globally set, or specified at schema creation - any tables in the schema inherit its setting. If you're curious, InnoDB uses AES, with ECB mode for tablespaces, and CBC mode for data. Also notably, [undo logs](https://dev.mysql.com/doc/refman/8.0/en/innodb-undo-logs.html) and [redo logs](https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html) have their encryption handled by separate variables. However, since this requires some additional work (all of the easy options are only available with MySQL Enterprise; MySQL Community requires you to generate and store the key yourself), we'll skip it.
481 |
482 | ```sql
483 | CREATE SCHEMA foo;
484 | ```
485 |
486 | ```sql
487 | Query OK, 1 row affected (0.02 sec)
488 | ```
489 |
490 | ## Table operations
491 |
492 | ### Create tables
493 |
494 | First, we'll select our new schema so we don't have to constantly specify it. I'll be using `foo` here, but you should substitute whatever you created in the last step.
495 |
496 | ```sql
497 | USE foo;
498 | ```
499 |
500 | Now, we'll create the `users` table.
501 |
502 | ```sql
503 | CREATE TABLE users (
504 | id BIGINT PRIMARY KEY,
505 | first_name CHAR(64),
506 | last_name CHAR(64),
507 | uid BIGINT
508 | );
509 | ```
510 |
511 | ```sql
512 | Query OK, 0 rows affected (0.17 sec)
513 | ```
514 |
515 | ```sql
516 | SHOW COLUMNS FROM users;
517 | ```
518 |
519 | ```sql
520 | +------------+----------+------+-----+---------+-------+
521 | | Field | Type | Null | Key | Default | Extra |
522 | +------------+----------+------+-----+---------+-------+
523 | | id | bigint | NO | PRI | NULL | |
524 | | first_name | char(64) | YES | | NULL | |
525 | | last_name | char(64) | YES | | NULL | |
526 | | uid | bigint | YES | | NULL | |
527 | +------------+----------+------+-----+---------+-------+
528 | 4 rows in set (0.02 sec)
529 | ```
530 |
531 | Hmm, something's not quite right as compared to the original example - we're missing `AUTO_INCREMENT`! Without it, you'd have to manually specify the `id` value (which is this table's `PRIMARY KEY`), which is annoying. Additionally, while `id` was automatically made to be `NOT NULL` since it's the primary key, `uid` was not, so we need to change those (if you don't specify `NOT NULL`, MySQL defaults to `NULL`). Finally, `uid` should actually be named `user_id`, and it should have a `UNIQUE` constraint.
532 |
533 | NOTE: when redefining a column, it's like a `POST`, not a `PUT` - if you only specify what you want to be changed, the pre-existing definitions will be deleted.
534 |
535 | ```sql
536 | ALTER TABLE users MODIFY uid BIGINT NOT NULL UNIQUE;
537 | ```
538 |
539 | ```sql
540 | Query OK, 0 rows affected (0.27 sec)
541 | Records: 0 Duplicates: 0 Warnings: 0
542 | ```
543 |
544 | ```sql
545 | ALTER TABLE users MODIFY id BIGINT AUTO_INCREMENT;
546 | ```
547 |
548 | ```sql
549 | Query OK, 0 rows affected (0.34 sec)
550 | Records: 0 Duplicates: 0 Warnings: 0
551 | ```
552 |
553 | ```sql
554 | SHOW COLUMNS FROM users;
555 | ```
556 |
557 | ```sql
558 | +------------+----------+------+-----+---------+----------------+
559 | | Field | Type | Null | Key | Default | Extra |
560 | +------------+----------+------+-----+---------+----------------+
561 | | id | bigint | NO | PRI | NULL | auto_increment |
562 | | first_name | char(64) | YES | | NULL | |
563 | | last_name | char(64) | YES | | NULL | |
564 | | uid | bigint | NO | UNI | NULL | |
565 | +------------+----------+------+-----+---------+----------------+
566 | 4 rows in set (0.02 sec)
567 | ```
568 |
569 | If you wanted to rename a column without specifying its definition, you can use `RENAME COLUMN`.
570 |
571 | ```sql
572 | ALTER TABLE users RENAME COLUMN uid TO user_id;
573 | ```
574 |
575 | ```sql
576 | Query OK, 0 rows affected (0.12 sec)
577 | Records: 0 Duplicates: 0 Warnings: 0
578 | ```
579 |
580 | Now, we'll make the `zaps` table. You have noticed by now that the primary key column `id` has been the first column in all of these definitions. While nothing stops you from placing it last, or in the middle, this is a bad idea for a variety of reasons, not least of which it's confusing for anyone used to normal ordering. There may be some small binpacking gains to be made by carefully matching column widths to page sizes (the default pagesize for InnoDB is 16 KB, and the default pagesize for most disks today is 4 KB), which can also impact performance on spinning disks. Also, prior to MySQL 8.0.13, temporary tables (usually, tables that InnoDB creates as part of a query) would silently cast `VARCHAR` and `VARBINARY` columns to their respective `CHAR` or `BINARY`. If you had some `VARCHAR` columns with a large maximum size, this could cause the required space to store them to rapidly balloon, filling up the disk.
581 |
582 | In general, column ordering in a table doesn't tremendously matter for MySQL (but it does for queries, as we'll see later), so stick to convention.
583 |
584 | ```sql
585 | CREATE TABLE zaps (
586 | `id` BIGINT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
587 | `zap_id` BIGINT UNSIGNED NOT NULL,
588 | `created_at` TIMESTAMP NOT NULL DEFAULT NOW(),
589 | `last_updated_at` TIMESTAMP NULL ON UPDATE NOW(),
590 | `owned_by` BIGINT UNSIGNED NOT NULL,
591 | UNIQUE(zap_id)
592 | );
593 | ```
594 |
595 | ```sql
596 | SHOW COLUMNS FROM zaps;
597 | ```
598 |
599 | ```sql
600 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+
601 | | Field | Type | Null | Key | Default | Extra |
602 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+
603 | | id | bigint unsigned | NO | PRI | NULL | auto_increment |
604 | | zap_id | bigint unsigned | NO | UNI | NULL | |
605 | | created_at | timestamp | NO | | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
606 | | last_updated_at | timestamp | YES | | NULL | on update CURRENT_TIMESTAMP |
607 | | owned_by | bigint unsigned | NO | | NULL | |
608 | +-----------------+-----------------+------+-----+-------------------+-----------------------------+
609 | 5 rows in set (0.00 sec)
610 | ```
611 |
612 | We're introducing some new defaults here:
613 | * DEFAULT NOW()
614 | * With this, much like an `AUTO INCREMENTING` column, the current timestamp will be added to the `created_at` column when a new row is created. NOTE: This doesn't make the column immutable, and nothing stops someone from altering this value manually later.
615 | * ON UPDATE NOW()
616 | * For `last_updated_at`, while the default is `NULL`, whenever the row is updated, the current timestamp is added.
617 |
618 | `NOW()` is an alias for `CURRENT_TIMESTAMP`, and no, I didn't forget the function call on the right. For historical reasons, `CURRENT_TIMESTAMP` may be called with or without parentheses, but `NOW()` requires them. Similarly, generally any default value being declared that isn't a literal (e.g. `0`, `NULL`, etc.) is required to be wrapped in parentheses - see `(JSON_ARRAY())`. Again, for historical reasons, `TIMESTAMP` and `DATETIME` columns don't require this. Also, `JSON` _requires_ its default value to be wrapped in parentheses, even if the default is a literal (as do `BLOB`, `GEOMETRY`, and `TEXT`). See [MySQL docs on defaults](https://dev.mysql.com/doc/refman/8.0/en/data-type-defaults.html) for more information on this behavior, and [MySQL docs on timestamp initialization](https://dev.mysql.com/doc/refman/8.0/en/timestamp-initialization.html) for more information on timestamp column defaults.
619 |
620 | #### Data types
621 |
622 | What is the difference between a `VARCHAR` and a `CHAR`, and what is the integer after it? `CHAR` allocates precisely the amount of space required. If you specify that a column is 64 bytes wide, then you can store 64 bytes in it, and no matter if you're storing 1 byte or 64 bytes, the actual column usage will take 64 bytes - this is because the value is right-padded with spaces, and the trailing spaces are them removed when retrieved (by default - the trimming behavior can be modified, if desired).
623 |
624 | Let's try adding a 65-byte string to a column with a strict 64-byte limit - this can be done with the `LPAD` function, which takes the form `LPAD(, , ).`
625 |
626 | ```sql
627 | INSERT INTO users
628 | (first_name, last_name, user_id)
629 | VALUES
630 | ("Stephan",
631 | (SELECT LPAD("Garland", 65, " ")),
632 | 1
633 | );
634 | ```
635 |
636 | ```sql
637 | ERROR 1406 (22001): Data too long for column 'last_name' at row 1
638 | ```
639 |
640 | Since people in different cultures may have longer names than I'm used to, making this column allowed to be wider than 64 bytes is probably a good idea, especially if there isn't a storage penalty for doing so. While a `VARCHAR` can technically be up to `2^16 - 1` bytes - the same as the row width limit - it's still a good idea to have some kind of reasonable limits in place, lest someone exploit a security hole and starting using your DB for Chia mining or something. 255 bytes was the historic maximum length allowed in older SQL implementations, and it's the maximum value that a `VARCHAR` can be stored with while having a 1-byte length prefix. Thus, we'll modify our columns to this standard.
641 |
642 | ```sql
643 | ALTER TABLE users
644 | MODIFY first_name VARCHAR(255),
645 | MODIFY last_name VARCHAR(255);
646 | ```
647 |
648 | ```sql
649 | Query OK, 0 rows affected (0.13 sec)
650 | Records: 0 Duplicates: 0 Warnings: 0
651 | ```
652 |
653 | ```sql
654 | SHOW COLUMNS FROM users;
655 | ```
656 |
657 | ```sql
658 | +------------+--------------+------+-----+---------+----------------+
659 | | Field | Type | Null | Key | Default | Extra |
660 | +------------+--------------+------+-----+---------+----------------+
661 | | id | bigint | NO | PRI | NULL | auto_increment |
662 | | first_name | varchar(255) | YES | | NULL | |
663 | | last_name | varchar(255) | YES | | NULL | |
664 | | user_id | bigint | NO | UNI | NULL | |
665 | +------------+--------------+------+-----+---------+----------------+
666 | 4 rows in set (0.01 sec)
667 | ```
668 |
669 | What about ints? You may sometimes see an integer following an integer-type column definition, like `int(4)`. Confusingly, this has nothing to do with the maximum amount of data that can be stored in that column, and is only used for display. Even more confusingly, the MySQL client itself will ignore it, and show the entire stored number. Applications can choose whether or not to use the display width. In general, there's little reason to use this feature, and if you want to constrain display width, do so in your application.
670 |
671 | For floating points, MySQL supports `FLOAT` and `DOUBLE`, with the former being 4 bytes, and the latter 8 bytes.
672 |
673 | For exact precision numbers, MySQL supports `DECIMAL` and `NUMERIC`, and they are identical.
674 |
675 | There are also sub-types of `INT`, such as `SMALLINT` (2 bytes, storing a maximum value of `2^16 - 1` if unsigned), and `BIGINT`, as seen previously - it's 8 bytes, and stores a maximum value of `2^63 - 1` if signed, and `2^64 - 1` if unsigned. Since there's not much reason to have negative IDs, let's alter those definitions as well:
676 |
677 | ```sql
678 | ALTER TABLE users
679 | MODIFY id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
680 | MODIFY user_id BIGINT UNSIGNED NOT NULL UNIQUE;
681 | ```
682 |
683 | ```sql
684 | Query OK, 0 rows affected, 1 warning (0.10 sec)
685 | Records: 0 Duplicates: 0 Warnings: 1
686 | ```
687 |
688 | A warning? Huh?
689 |
690 |
691 | I don't see any warnings!
692 |
693 | Your client may not display warnings, in which case you can just follow along in this document.
694 |
695 |
696 | ```sql
697 | SHOW WARNINGS\G
698 | ```
699 |
700 | ```sql
701 | *************************** 1. row ***************************
702 | Level: Warning
703 | Code: 1831
704 | Message: Duplicate index 'user_id' defined on the table 'test.users'. This is deprecated and will be disallowed in a future release.
705 | 1 row in set (0.00 sec)
706 | ```
707 |
708 | Let's look at the table definition.
709 |
710 | ```sql
711 | SHOW CREATE TABLE users\G
712 | ```
713 |
714 | ```sql
715 | *************************** 1. row ***************************
716 | Table: users
717 | Create Table: CREATE TABLE `users` (
718 | `id` bigint unsigned NOT NULL AUTO_INCREMENT,
719 | `first_name` varchar(255) DEFAULT NULL,
720 | `last_name` varchar(255) DEFAULT NULL,
721 | `user_id` bigint unsigned NOT NULL,
722 | PRIMARY KEY (`id`),
723 | UNIQUE KEY `uid` (`user_id`),
724 | UNIQUE KEY `user_id` (`user_id`)
725 | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
726 | 1 row in set (0.01 sec)
727 | ```
728 |
729 |
730 | What is SHOW CREATE TABLE?
731 |
732 | `SHOW CREATE TABLE` is a command that lets you view the query that would be used to create the table in its current state. It's safe to do, and is a good way to view columns, their types, indexes, foreign keys, etc. for a given table.
733 |
734 |
735 |
736 | Ah - constraints like `UNIQUE` don't have to be redefined along with the rest of the column definition, and in doing so, we've duplicated a constraint. While allowed for now, it's not a good practice, so we'll get rid of it.
737 |
738 | ```sql
739 | ALTER TABLE users DROP CONSTRAINT uid;
740 | ```
741 |
742 | ```sql
743 | Query OK, 0 rows affected (0.16 sec)
744 | Records: 0 Duplicates: 0 Warnings: 0
745 | ```
746 |
747 | ```sql
748 | SHOW COLUMNS FROM users;
749 | ```
750 |
751 | ```sql
752 | +------------+-----------------+------+-----+---------+----------------+
753 | | Field | Type | Null | Key | Default | Extra |
754 | +------------+-----------------+------+-----+---------+----------------+
755 | | id | bigint unsigned | NO | PRI | NULL | auto_increment |
756 | | first_name | varchar(255) | YES | | NULL | |
757 | | last_name | varchar(255) | YES | | NULL | |
758 | | user_id | bigint unsigned | NO | UNI | NULL | |
759 | +------------+-----------------+------+-----+---------+----------------+
760 | 4 rows in set (0.01 sec)
761 | ```
762 |
763 | ### Foreign keys
764 |
765 | These tables seem fine to start with, but the columns that we are implicitly designing to have relationships don't have any method of enforcement. While this is a valid design - placing all referential integrity requirements onto the application - SQL was designed to handle this for us, so let's make use of it. NOTE: foreign keys bring with them a huge array of problems that will likely not be seen until your scale is large, so keep that in mind, and have a plan to migrate off of them if necessary.
766 |
767 | #### Why you might want foreign keys
768 |
769 | Let's create a user, and give them a Zap.
770 |
771 | ```sql
772 | INSERT INTO users
773 | (first_name, last_name, user_id)
774 | VALUES
775 | ('Stephan', 'Garland', 1);
776 | ```
777 |
778 | ```sql
779 | Query OK, 1 row affected (0.02 sec)
780 | ```
781 |
782 | ```sql
783 | INSERT INTO zaps (zap_id, owned_by) VALUES (1, 1);
784 | ```
785 |
786 | ```sql
787 | Query OK, 1 row affected (0.03 sec)
788 | ```
789 |
790 | ```sql
791 | TABLE zaps;
792 | ```
793 |
794 |
795 | What is `TABLE`?
796 |
797 | Syntactic sugar (a shortcut) for `SELECT * FROM `.
798 |
799 | ```sql
800 | +----+--------+---------------------+-----------------+----------+
801 | | id | zap_id | created_at | last_updated_at | owned_by |
802 | +----+--------+---------------------+-----------------+----------+
803 | | 1 | 1 | 2023-02-27 10:25:01 | NULL | 1 |
804 | +----+--------+---------------------+-----------------+----------+
805 | 1 row in set (0.00 sec)
806 | ```
807 |
808 |
809 |
810 | We can `JOIN` on this if we want.
811 |
812 | ```sql
813 | SELECT *
814 | FROM
815 | users
816 | JOIN zaps ON
817 | users.user_id = zaps.owned_by\G
818 | ```
819 |
820 | ```sql
821 | *************************** 1. row ***************************
822 | id: 1
823 | first_name: Stephan
824 | last_name: Garland
825 | user_id: 1
826 | email: NULL
827 | id: 1
828 | zap_id: 1
829 | created_at: 2023-02-27 10:25:01
830 | last_updated_at: NULL
831 | owned_by: 1
832 | 1 row in set (0.01 sec)
833 | ```
834 |
835 | That's all well and good, but what if I want to delete my account? Wouldn't it be nice if devs didn't have to worry about deleting every trace of my existence? Or what if everyone's user ID has to change for a migration? Enter foreign keys.
836 |
837 | #### Creating a foreign key
838 |
839 | ```sql
840 | ALTER TABLE
841 | zaps
842 | ADD FOREIGN KEY
843 | (owned_by)
844 | REFERENCES users
845 | (user_id)
846 | ON UPDATE CASCADE
847 | ON DELETE CASCADE;
848 | ```
849 |
850 | ```sql
851 | Query OK, 1 row affected (0.50 sec)
852 | Records: 1 Duplicates: 0 Warnings: 0
853 | ```
854 |
855 | ```sql
856 | SHOW CREATE TABLE zaps\G
857 | ```
858 |
859 | ```sql
860 | *************************** 1. row ***************************
861 | Table: zaps
862 | Create Table: CREATE TABLE `zaps` (
863 | `id` bigint unsigned NOT NULL AUTO_INCREMENT,
864 | `zap_id` bigint unsigned NOT NULL,
865 | `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
866 | `last_updated_at` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
867 | `owned_by` bigint unsigned NOT NULL,
868 | PRIMARY KEY (`id`),
869 | UNIQUE KEY `zap_id` (`zap_id`),
870 | KEY `owned_by` (`owned_by`),
871 | CONSTRAINT `zaps_ibfk_1` FOREIGN KEY (`owned_by`) REFERENCES `users` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
872 | ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
873 | 1 row in set (0.00 sec)
874 | ```
875 |
876 | Note that not only do we now have a `FOREIGN KEY` linking `zaps.owned_by` to `users.user_id`, but InnoDB has added an index on `zaps.owned_by` - this is required, and despite the documentation informing you that you must do this before adding the foreign key, it actually does it for you if you don't.
877 |
878 | #### Demonstrating a foreign key
879 |
880 | ```sql
881 | UPDATE users SET user_id = 9 WHERE id = 1;
882 | ```
883 |
884 | Note the `WHERE` predicate - we'll go more into that later, but the most important thing to take away here is that there are very few instances where you should issue DML like `UPDATE`without a `WHERE`.
885 |
886 |
887 | Why not?
888 |
889 | If there was no predicate, the query would apply to everything in the table, e.g. every user would be modified.
890 |
891 |
892 |
893 | ```sql
894 | Query OK, 1 row affected (0.02 sec)
895 | Rows matched: 1 Changed: 1 Warnings: 0
896 | ```
897 |
898 | ```sql
899 | SELECT *
900 | FROM
901 | users
902 | JOIN zaps ON
903 | users.user_id = zaps.owned_by\G
904 | ```
905 |
906 | ```sql
907 | *************************** 1. row ***************************
908 | id: 1
909 | first_name: Stephan
910 | last_name: Garland
911 | user_id: 9
912 | email: NULL
913 | id: 1
914 | zap_id: 1
915 | created_at: 2023-02-27 10:25:01
916 | last_updated_at: NULL
917 | owned_by: 9
918 | 1 row in set (0.01 sec)
919 | ```
920 |
921 | And just like that, `zaps` has updated its `owned_by` value for that Zap to equal the new value in `users`. And if we delete the `users` entry, the same `CASCADE` action will follow.
922 |
923 | ```sql
924 | DELETE FROM users WHERE id = 1;
925 | ```
926 |
927 | ```sql
928 | Query OK, 1 row affected (0.02 sec)
929 | ```
930 |
931 | ```sql
932 | SELECT * FROM zaps;
933 | ```
934 |
935 | ```sql
936 | Empty set (0.00 sec)
937 | ```
938 |
939 | ### Determining table size
940 |
941 | In order to find out how many rows are in a table, there are a few ways of doing so. InnoDB maintains information about tables in the `INFORMATION_SCHEMA.TABLES` table, including an estimate of row count. However, it's just that - an estimate. It can be made to be accurate if you use `ANALYZE TABLE`, but in production, you shouldn't do this (to be clear, it should be done, but carefully), since it places a table-wide read lock during the process. You can also use the query `SELECT COUNT(*)`, but that will perform a table scan (where the entire table is read sequentially, without indices), so it may have a performance impact on the database, as it's consuming a lot of available IOPS. Finally, assuming you have an auto-incrementing `id` field in the table, you can use `SELECT id FROM ORDER BY id DESC LIMIT 1` to get the last incremented value. This is also an estimate, since it doesn't take any deletions into account (auto-increment is monotonic), but it's extremely fast.
942 |
943 | ```sql
944 | SELECT table_name, table_rows
945 | FROM
946 | information_schema.tables
947 | WHERE
948 | table_schema = 'test';
949 | ```
950 |
951 | ```sql
952 | +---------------+------------+
953 | | TABLE_NAME | TABLE_ROWS |
954 | +---------------+------------+
955 | | gensql | 1000 |
956 | | ref_users | 1000 |
957 | | ref_users_big | 992839 |
958 | | ref_zaps | 0 |
959 | | ref_zaps_big | 0 |
960 | | users | 1000 |
961 | | zaps | 0 |
962 | +---------------+------------+
963 | 7 rows in set (0.01 sec)
964 | ```
965 |
966 | ```sql
967 | ANALYZE TABLE ref_zaps; ANALYZE TABLE ref_zaps_big;
968 | ```
969 |
970 | ```sql
971 | +---------------+---------+----------+----------+
972 | | Table | Op | Msg_type | Msg_text |
973 | +---------------+---------+----------+----------+
974 | | test.ref_zaps | analyze | status | OK |
975 | +---------------+---------+----------+----------+
976 | 1 row in set (0.03 sec)
977 |
978 | +-------------------+---------+----------+----------+
979 | | Table | Op | Msg_type | Msg_text |
980 | +-------------------+---------+----------+----------+
981 | | test.ref_zaps_big | analyze | status | OK |
982 | +-------------------+---------+----------+----------+
983 | 1 row in set (0.05 sec)
984 | ```
985 |
986 | ```sql
987 | SELECT table_name, table_rows
988 | FROM
989 | information_schema.tables
990 | WHERE table_schema = 'test';
991 | ```
992 |
993 | ```sql
994 | +---------------+------------+
995 | | TABLE_NAME | TABLE_ROWS |
996 | +---------------+------------+
997 | | gensql | 1000 |
998 | | ref_users | 1000 |
999 | | ref_users_big | 992839 |
1000 | | ref_zaps | 1000 |
1001 | | ref_zaps_big | 997211 |
1002 | | users | 1000 |
1003 | | zaps | 0 |
1004 | +---------------+------------+
1005 | 7 rows in set (0.02 sec)
1006 | ```
1007 |
1008 | Actual row count:
1009 |
1010 | ```sql
1011 | SELECT
1012 | 'ref_users_big' AS 'table_name',
1013 | COUNT(*) AS 'row_count'
1014 | FROM
1015 | ref_users_big
1016 | UNION
1017 | SELECT
1018 | 'ref_zaps_big',
1019 | COUNT(*)
1020 | FROM
1021 | ref_zaps_big;
1022 | ```
1023 |
1024 | ```sql
1025 | +---------------+-----------+
1026 | | table_name | row_count |
1027 | +---------------+-----------+
1028 | | ref_users_big | 1000000 |
1029 | | ref_zaps_big | 1000000 |
1030 | +---------------+-----------+
1031 | 2 rows in set (2.42 sec)
1032 | ```
1033 |
1034 |
1035 | What's a UNION?
1036 |
1037 | A way to combine query results, regardless of any relation between tables or queries.
1038 |
1039 |
1040 | ## Column operations
1041 |
1042 | ### Adding columns
1043 |
1044 | Adding columns is done with `ALTER TABLE`:
1045 |
1046 | ```sql
1047 | ALTER TABLE
1048 | zaps
1049 | ADD COLUMN
1050 | shared_with
1051 | JSON;
1052 | ```
1053 |
1054 | ```sql
1055 | Query OK, 0 rows affected (0.18 sec)
1056 | Records: 0 Duplicates: 0 Warnings: 0
1057 | ```
1058 |
1059 | Just as with a table definition, the column's name (`shared_with`) and type (`JSON`) are required; additonal qualifiers like `DEFAULT`, `UNIQUE`, etc. may be appended. To add some types of default values, like a JSON array, you must call the function.
1060 |
1061 | * [MySQL supports JSON](https://dev.mysql.com/doc/refman/8.0/en/json.html) as a data type! While you can of course simply store JSON strings in a text column, there are some benefits to using the native JSON datatype; among them that you can index scalars from the JSON objects, and that you can extract specific keys/values from the objects instead of the entire string.
1062 | * Please don't use this as an excuse to treat MySQL as a Document DB, though. If you want NoSQL, you should use NoSQL. RDBMS are optimized for relations. Storing some information in JSON is fine, but it shouldn't be the default.
1063 |
1064 | ### Modfying columns
1065 |
1066 | This was covered earlier during [table operations](#table-operations), but as a refresher, we'll again use `ALTER TABLE` to add a `DEFAULT` value of an empty JSON array, which must be called as its function:
1067 |
1068 | ```sql
1069 | ALTER TABLE
1070 | zaps
1071 | MODIFY COLUMN
1072 | shared_with
1073 | JSON
1074 | DEFAULT (
1075 | JSON_ARRAY()
1076 | );
1077 | ```
1078 |
1079 | ```sql
1080 | Query OK, 0 rows affected (0.09 sec)
1081 | Records: 0 Duplicates: 0 Warnings: 0
1082 | ```
1083 |
1084 | ### Dropping tables with foreign keys
1085 |
1086 | If there are foreign keys relying on the column you're trying to drop, you will first need to either disable foreign key checks, or remove those checks before you can drop the column.
1087 |
1088 | ```sql
1089 | DROP TABLE users;
1090 | ```
1091 |
1092 | ```sql
1093 | ERROR 3730 (HY000): Cannot drop table 'users' referenced by a foreign key constraint 'zaps_ibfk_1' on table 'zaps'.
1094 | ```
1095 |
1096 | ```sql
1097 | SET foreign_key_checks = 0;
1098 | ```
1099 |
1100 | ```sql
1101 | Query OK, 0 rows affected (0.01 sec)
1102 | ```
1103 |
1104 | ```sql
1105 | DROP TABLE users;
1106 | Query OK, 0 rows affected (0.30 sec)
1107 | ```
1108 |
1109 | ```sql
1110 | SHOW CREATE TABLE zaps\G
1111 | ```
1112 |
1113 | ```sql
1114 | *************************** 1. row ***************************
1115 | Table: zaps
1116 | Create Table: CREATE TABLE `zaps` (
1117 | `id` bigint unsigned NOT NULL AUTO_INCREMENT,
1118 | `zap_id` bigint unsigned NOT NULL,
1119 | `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
1120 | `last_updated_at` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
1121 | `owned_by` bigint unsigned NOT NULL,
1122 | `shared_with` json DEFAULT (json_array()),
1123 | PRIMARY KEY (`id`),
1124 | UNIQUE KEY `zap_id` (`zap_id`),
1125 | KEY `owned_by` (`owned_by`),
1126 | CONSTRAINT `zaps_ibfk_1` FOREIGN KEY (`owned_by`) REFERENCES `users` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
1127 | ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1128 | 1 row in set (0.00 sec)
1129 | ```
1130 |
1131 | Just because MySQL let us drop the table, it doesn't mean it cleaned up after us.
1132 |
1133 | How can we remove the FK?
1134 |
1135 | ```sql
1136 | ALTER TABLE zaps DROP CONSTRAINT `zaps_ibfk_1`;
1137 | ```
1138 |
1139 | ```sql
1140 | Query OK, 0 rows affected (0.20 sec)
1141 | Records: 0 Duplicates: 0 Warnings: 0
1142 | ```
1143 |
1144 |
1145 | Also, don't forget to re-enable `foreign_key_checks` for your session.
1146 |
1147 | ```sql
1148 | SET foreign_key_checks = 1;
1149 | ```
1150 |
1151 | ```sql
1152 | Query OK, 0 rows affected (0.00 sec)
1153 | ```
1154 |
1155 | But wait, how are we going to get back the `users` table? We could scroll back up and find the definition, but wouldn't it be nice if we could copy the definition from somewhere else?
1156 |
1157 | ### Copied table definitions
1158 |
1159 | Luckily, this exists in the form of `CREATE TABLE LIKE`. [MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/create-table-like.html). You do need `SELECT` privileges from the schema/table you're copying from, which is enabled for `test.ref_%` with this user. You'll also need to specify the schema the table exists in, since it's outside of the currently selected schema.
1160 |
1161 | NOTE: This schema is somewhat different from what we created before; most of it is additional, but one big change is that there is no longer an explicit `id` column, instead, the `user_id` column takes its place.
1162 |
1163 | ```sql
1164 | CREATE TABLE users LIKE test.ref_users;
1165 | ```
1166 |
1167 | ```sql
1168 | Query OK, 0 rows affected (0.34 sec)
1169 | ```
1170 |
1171 | There are some restrictions. The documentation lists all of them, but the biggest one is that any foreign keys aren't copied. We deleted ours so it doesn't really matter, but this could catch you by surprise if you expected them to come over with the schema definition. Also, depending on the version of MySQL you're using, a bug may exist where tables copied in this manner will logically reside (that is, within a given tablespace file) in the original table's tablespace. A way around this is with this alternative query:
1172 |
1173 | ```sql
1174 | CREATE TABLE users SELECT * FROM test.ref_users LIMIT 0;
1175 | ```
1176 |
1177 | **Warning**
1178 |
1179 | The second form shown has a [large list of things](https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html) it does not do:
1180 |
1181 | - Copy any indexes, including primary keys
1182 | - Maintain the `AUTO_INCREMENT` attribute
1183 | - Maintain data types - `VARCHAR` may become `CHAR`
1184 | - Maintain default values for columns that are expressions
1185 |
1186 | Finally, note that both of these _only_ copy the schema definition, not the data. The table you're copying from actually has thousands of rows in it, but none of those will be in your table.
1187 |
1188 |
1189 | What if you wanted to copy data as well?
1190 |
1191 | The above alternative query hopefully hinted at it! Just take heed of the warning.
1192 |
1193 | ```sql
1194 | DROP TABLE users; CREATE TABLE users SELECT * FROM test.ref_users LIMIT 1000;
1195 | ```
1196 |
1197 | ```sql
1198 | Query OK, 0 rows affected (0.30 sec)
1199 |
1200 | Query OK, 1000 rows affected (1.14 sec)
1201 | Records: 1000 Duplicates: 0 Warnings: 0
1202 | ```
1203 |
1204 |
1205 | #### Copied table data and truncating
1206 |
1207 | Now that we have `users` back, let's actually fill it with more than just 1000 rows. `test.ref_users_big` has 1,000,000 rows. That would take a while to fill for everyone (my poor spinning disks), but 10,000 is reasonable.
1208 |
1209 | First, let's dump the existing values, but leave the table definition. While there are a few ways to do this, the fastest is `TRUNCATE` ([MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/truncate-table.html)). This is a `DDL` operation vs. `DML`, as instead of iterating through the table and deleting each row, it stores the table definition, drops the table, then re-creates it. This does have several limitations, especially with foreign keys, but it works fine here.
1210 |
1211 | ```sql
1212 | TRUNCATE TABLE users;
1213 | ```
1214 |
1215 | ```sql
1216 | Query OK, 0 rows affected (0.42 sec)
1217 | ```
1218 |
1219 | `0 rows affected` may be confusing, as we in fact just affected 1000 rows, but remember that this is the same as a `DROP TABLE`, which similarly doesn't report on the number of rows removed.
1220 |
1221 | Now, we can copy into the table; but first, we're going to `DROP` the table and create it properly with `CREATE LIKE` so we don't have any issues with missing primary keys.
1222 |
1223 | ```sql
1224 | DROP TABLE users;
1225 | CREATE TABLE users LIKE test.ref_users;
1226 | INSERT INTO users SELECT * FROM test.ref_users_big LIMIT 10000;
1227 | ```
1228 |
1229 | ```sql
1230 | Query OK, 10000 rows affected (5.33 sec)
1231 | Records: 10000 Duplicates: 0 Warnings: 0
1232 | ```
1233 |
1234 | ### Transactions
1235 |
1236 | Remember the discussion about doing `DML` without a predicate? There's a fix for that.
1237 |
1238 | ```sql
1239 | START TRANSACTION;
1240 | ```
1241 |
1242 | ```sql
1243 | Query OK, 0 rows affected (0.00 sec)
1244 | ```
1245 |
1246 | ```sql
1247 | UPDATE users SET city = "Asheville";
1248 | ```
1249 |
1250 | ```sql
1251 | Query OK, 9999 rows affected (6.96 sec)
1252 | Rows matched: 10000 Changed: 9999 Warnings: 0
1253 | ```
1254 |
1255 | Uh-oh. Looks like everyone has moved to Western North Carolina.
1256 |
1257 | ```sql
1258 | ROLLBACK;
1259 | ```
1260 |
1261 | ```sql
1262 | Query OK, 0 rows affected (5.45 sec)
1263 | ```
1264 |
1265 | Whew, not fired.
1266 |
1267 | NOTE: Canceling a query (`Ctrl-C`), _regardless of whether or not you're in a transaction_, has the same effect, assuming the InnoDB storage engine is being used. This is the `A` in `ACID` at work - either the entire query succeeds, or none of it does. However, the rollback may take some time depending on how many rows have been affected. Also, if you don't manage to cancel the query before it completes, you're out of luck.
1268 |
1269 | ### Generated columns
1270 |
1271 | What if you wanted a column that automatically created data for you based on other columns?
1272 |
1273 | ```sql
1274 | ALTER TABLE
1275 | users
1276 | ADD COLUMN
1277 | full_name VARCHAR(510) GENERATED ALWAYS AS (
1278 | CONCAT_WS(', ', last_name, first_name)
1279 | );
1280 | ```
1281 |
1282 | ```sql
1283 | Query OK, 0 rows affected (0.34 sec)
1284 | Records: 0 Duplicates: 0 Warnings: 0
1285 | ```
1286 |
1287 | ```sql
1288 | SELECT user_id, full_name, city, country
1289 | FROM users
1290 | LIMIT 10;
1291 | ```
1292 |
1293 | ```sql
1294 | +---------+-------------------+-------------+----------------+
1295 | | user_id | full_name | city | country |
1296 | +---------+-------------------+-------------+----------------+
1297 | | 1 | MacPherson, Addie | Latina | Italy |
1298 | | 2 | Airla, Valaree | Pribram | Czech Republic |
1299 | | 3 | Nett, Sheppard | Hamada | Japan |
1300 | | 4 | Kirschner, Robby | Bikaner | India |
1301 | | 5 | Bilski, Lewiss | Vörderås | Sweden |
1302 | | 6 | Yamauchi, Marleah | Rotterdam | Netherlands |
1303 | | 7 | Calore, Ania | Miyakojima | Japan |
1304 | | 8 | Breger, Gratiana | Valkeakoski | Finland |
1305 | | 9 | Serafina, Janith | Morant Bay | Jamaica |
1306 | | 10 | Beckman, Pavla | Wackersdorf | Germany |
1307 | +---------+-------------------+-------------+----------------+
1308 | 10 rows in set (0.01 sec)
1309 | ```
1310 |
1311 | Note that by default, this will create a `VIRTUAL` column (you can specify `STORED` after `AS` if you'd rather have a normal column), which is not actually stored, but instead calculated at query time. While this takes no storage space, it does add some amount of computational load, and more importantly comes with a [huge list](https://dev.mysql.com/doc/refman/8.0/en/create-table-generated-columns.html) of limitations. One large benefit, however, is that since the columns aren't actually created when the query is ran, the operation takes as long as a normal `ALTER TABLE` operation. If stored, the data must be written to the table, which will necessitate taking write locks. Also, since the column isn't actually being written anywhere, you can actually place the columns in any table position (by default, adding a column just appends to the end of the table) while still using the `INSTANT` algorithm, despite what the docs imply.
1312 |
1313 | The creation and deletion time in particular is markedly better when compared to `STORED` columns:
1314 |
1315 | ```sql
1316 | ALTER TABLE
1317 | users
1318 | ADD COLUMN
1319 | full_name VARCHAR(510) GENERATED ALWAYS AS (
1320 | CONCAT_WS(', ', last_name, first_name)
1321 | ) STORED;
1322 | ```
1323 |
1324 | ```sql
1325 | Query OK, 10000 rows affected (7.23 sec)
1326 | Records: 10000 Duplicates: 0 Warnings: 0
1327 | ```
1328 |
1329 | ```sql
1330 | ALTER TABLE
1331 | users
1332 | DROP COLUMN full_name;
1333 | ```
1334 |
1335 | ```sql
1336 | Query OK, 0 rows affected (2.24 sec)
1337 | Records: 0 Duplicates: 0 Warnings: 0
1338 | ```
1339 |
1340 | Demonstrating column positioning:
1341 |
1342 | ```sql
1343 | ALTER TABLE
1344 | users
1345 | ADD COLUMN
1346 | full_name VARCHAR(510) GENERATED ALWAYS AS (
1347 | CONCAT_WS(', ', last_name, first_name)
1348 | )
1349 | AFTER
1350 | last_name;
1351 | ```
1352 |
1353 | ```sql
1354 | Query OK, 0 rows affected (0.27 sec)
1355 | Records: 0 Duplicates: 0 Warnings: 0
1356 | ```
1357 |
1358 | ```sql
1359 | SELECT * FROM users LIMIT 1\G
1360 | ```
1361 |
1362 | ```sql
1363 | *************************** 1. row ***************************
1364 | user_id: 1
1365 | first_name: Addie
1366 | last_name: MacPherson
1367 | full_name: MacPherson, Addie
1368 | email: addie.macpherson@lizard.com
1369 | city: Latina
1370 | country: Italy
1371 | created_at: 2001-05-27 19:47:17
1372 | last_updated_at: NULL
1373 | 1 row in set (0.01 sec)
1374 | ```
1375 |
1376 | ### Invisible columns
1377 |
1378 | You can make columns `INVISIBLE` if you'd rather they not show up unless specifically queried for. This is done with the `INVISIBLE` keyword after the type (`VARCHAR(510)` here) if being created, or modified later with `ALTER COLUMN`:
1379 |
1380 | ```sql
1381 | ALTER TABLE users ALTER COLUMN full_name SET INVISIBLE;
1382 | ```
1383 |
1384 | ```sql
1385 | Query OK, 0 rows affected (0.19 sec)
1386 | Records: 0 Duplicates: 0 Warnings: 0
1387 | ```
1388 |
1389 | ```sql
1390 | SELECT * FROM users LIMIT 1\G
1391 | ```
1392 |
1393 | ```sql
1394 | *************************** 1. row ***************************
1395 | user_id: 1
1396 | first_name: Addie
1397 | last_name: MacPherson
1398 | email: addie.macpherson@lizard.com
1399 | city: Latina
1400 | country: Italy
1401 | created_at: 2001-05-27 19:47:17
1402 | last_updated_at: NULL
1403 | 1 row in set (0.00 sec)
1404 | Query OK, 0 rows affected (0.19 sec)
1405 | Records: 0 Duplicates: 0 Warnings: 0
1406 | 1 row in set (0.00 sec)
1407 | ```
1408 |
1409 | To set them back to visible, use `SET VISIBLE`:
1410 |
1411 | ```sql
1412 | ALTER TABLE users ALTER COLUMN full_name SET VISIBLE;
1413 | ```
1414 |
1415 | ```sql
1416 | Query OK, 0 rows affected (0.08 sec)
1417 | Records: 0 Duplicates: 0 Warnings: 0
1418 | ```
1419 |
1420 |
--------------------------------------------------------------------------------
/mysql/mysql-101-1.md:
--------------------------------------------------------------------------------
1 | # MySQL 101 Part II
2 |
3 | - [MySQL 101 Part II](#mysql-101-part-ii)
4 | - [Queries](#queries)
5 | - [Predicates](#predicates)
6 | - [WHERE](#where)
7 | - [SELECT](#select)
8 | - [Working with JSON](#working-with-json)
9 | - [Finding non-null arrays](#finding-non-null-arrays)
10 | - [Checking for a value inside an array](#checking-for-a-value-inside-an-array)
11 | - [Extracting scalars from an object](#extracting-scalars-from-an-object)
12 | - [INSERT](#insert)
13 | - [TABLE](#table)
14 | - [Joins](#joins)
15 | - [Relational alegbra](#relational-alegbra)
16 | - [Types of joins](#types-of-joins)
17 | - [Cross](#cross)
18 | - [Inner Join](#inner-join)
19 | - [Left Outer Join](#left-outer-join)
20 | - [Right Outer Join](#right-outer-join)
21 | - [Full Outer Join](#full-outer-join)
22 | - [Specifying a column's table](#specifying-a-columns-table)
23 | - [Indices](#indices)
24 | - [Single indices](#single-indices)
25 | - [Partial indices](#partial-indices)
26 | - [Functional indices](#functional-indices)
27 | - [JSON / Longtext](#json--longtext)
28 | - [Composite indices](#composite-indices)
29 | - [Testing indices](#testing-indices)
30 | - [Descending indices](#descending-indices)
31 | - [When indicies aren't helpful](#when-indicies-arent-helpful)
32 | - [HAVING](#having)
33 | - [Query optimization](#query-optimization)
34 | - [SELECT \*](#select-)
35 | - [OFFSET / LIMIT](#offset--limit)
36 | - [DISTINCT](#distinct)
37 | - [Cleanup](#cleanup)
38 |
39 | ## Queries
40 |
41 | ### Predicates
42 |
43 | A predicate is a function which asserts that something is true or false. You can think of it like a filter.
44 |
45 | #### WHERE
46 |
47 | `WHERE` is the easiest to understand and apply, and will cover most of your needs.
48 |
49 | ```sql
50 | SELECT
51 | user_id, first_name, last_name
52 | FROM
53 | users
54 | WHERE
55 | country = 'Zimbabwe';
56 | ```
57 |
58 | ```sql
59 | +---------+------------+-----------+
60 | | user_id | first_name | last_name |
61 | +---------+------------+-----------+
62 | | 106 | Ivonne | Barmen |
63 | | 1149 | Myca | Flieger |
64 | | 2143 | Dallas | Nimesh |
65 | | 4401 | Jeana | Naga |
66 | | 4623 | Godiva | Adal |
67 | | 5582 | Lexie | Fenwick |
68 | | 5586 | Carrie | Nich |
69 | | 5793 | Marten | Casady |
70 | | 6072 | Feliza | Culhert |
71 | | 6467 | Wood | O'Connor |
72 | | 7093 | Miriam | Galliett |
73 | | 7669 | Cele | Belden |
74 | | 7675 | Araldo | Hoes |
75 | | 8106 | Imojean | Beaudoin |
76 | | 9438 | Sibby | Luedtke |
77 | | 9566 | Eb | Cattima |
78 | | 9606 | Alard | Frodina |
79 | +---------+------------+-----------+
80 | 17 rows in set (0.22 sec)
81 | ```
82 |
83 | Note that we filtered the results with a predicate that wasn't even in the result set (`country`).
84 |
85 | You may also have seen or used the wildcard `%` with `LIKE` and `NOT LIKE`.
86 |
87 | ```sql
88 | SELECT
89 | user_id, first_name, last_name
90 | FROM
91 | users
92 | WHERE
93 | country
94 | LIKE 'Zim%';
95 | ```
96 |
97 | ```sql
98 | +---------+------------+-----------+
99 | | user_id | first_name | last_name |
100 | +---------+------------+-----------+
101 | | 106 | Ivonne | Barmen |
102 | | 1149 | Myca | Flieger |
103 | | 2143 | Dallas | Nimesh |
104 | | 4401 | Jeana | Naga |
105 | | 4623 | Godiva | Adal |
106 | | 5582 | Lexie | Fenwick |
107 | | 5586 | Carrie | Nich |
108 | | 5793 | Marten | Casady |
109 | | 6072 | Feliza | Culhert |
110 | | 6467 | Wood | O'Connor |
111 | | 7093 | Miriam | Galliett |
112 | | 7669 | Cele | Belden |
113 | | 7675 | Araldo | Hoes |
114 | | 8106 | Imojean | Beaudoin |
115 | | 9438 | Sibby | Luedtke |
116 | | 9566 | Eb | Cattima |
117 | | 9606 | Alard | Frodina |
118 | +---------+------------+-----------+
119 | 17 rows in set (0.22 sec)
120 | ```
121 |
122 | These two are functionally equivalent queries. However, if there is an index on the predicate column, and you use a leading wildcard (e.g. `LIKE '%babwe'`), MySQL cannot use the index, and will instead perform a table scan. If you can avoid using leading wildcards on large tables, do so. It's also worth noting that there are many times when the query optimizer determines that the table scan would be faster than using an index, and so will do so anyway. [Index usage can be hinted](https://dev.mysql.com/doc/refman/8.0/en/index-hints.html), forced, and ignored, although as of MySQL 8.0.20, the old syntax (which included hints) [is deprecated](https://dev.mysql.com/doc/refman/8.0/en/optimizer-hints.html#optimizer-hints-index-level). Examples of both are below with an `EXPLAIN SELECT`. They're from a different schema and table, as I've already set up the index.
123 |
124 | ```sql
125 | EXPLAIN SELECT
126 | user_id, first_name, last_name
127 | FROM
128 | test.ref_users
129 | USE INDEX (country)
130 | WHERE
131 | country
132 | LIKE 'Zim%'\G
133 | ```
134 |
135 | ```sql
136 | *************************** 1. row ***************************
137 | id: 1
138 | select_type: SIMPLE
139 | table: ref_users
140 | partitions: NULL
141 | type: range
142 | possible_keys: country
143 | key: country
144 | key_len: 1023
145 | ref: NULL
146 | rows: 3
147 | filtered: 100.00
148 | Extra: Using index condition
149 | 1 row in set, 1 warning (0.01 sec)
150 | ```
151 |
152 | ```sql
153 | EXPLAIN SELECT
154 | user_id, first_name, last_name
155 | FROM
156 | test.ref_users
157 | FORCE INDEX (country)
158 | WHERE
159 | country
160 | LIKE '%babwe'\G
161 |
162 | ```sql
163 | *************************** 1. row ***************************
164 | id: 1
165 | select_type: SIMPLE
166 | table: ref_users
167 | partitions: NULL
168 | type: ALL
169 | possible_keys: NULL
170 | key: NULL
171 | key_len: NULL
172 | ref: NULL
173 | rows: 1000
174 | filtered: 11.11
175 | Extra: Using where
176 | 1 row in set, 1 warning (0.00 sec)
177 | ```
178 |
179 | Even when using `FORCE INDEX`, it's not being used, because it can't.
180 |
181 | ```sql
182 | EXPLAIN SELECT
183 | user_id, first_name, last_name
184 | FROM
185 | test.ref_users
186 | /*+ INDEX(ref_users country) */
187 | WHERE
188 | country
189 | LIKE 'Zim%'\G
190 | ```
191 |
192 | The new syntax, which looks like a C-style comment, requires both the table and column to be listed.
193 |
194 | ```sql
195 | *************************** 1. row ***************************
196 | id: 1
197 | select_type: SIMPLE
198 | table: customers
199 | partitions: NULL
200 | type: range
201 | possible_keys: city
202 | key: city
203 | key_len: 153
204 | ref: NULL
205 | rows: 2
206 | filtered: 100.00
207 | Extra: Using index condition
208 | 1 row in set, 1 warning (0.00 sec)
209 | ```
210 |
211 | ### SELECT
212 |
213 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/select.html)
214 |
215 | You use it to select data from tables (or `/dev/stdin`). Any questions?
216 |
217 | ```sql
218 | SELECT * FROM ref_zaps LIMIT 10 OFFSET 15;
219 | ```
220 |
221 | ```sql
222 | +--------+----------+----------------------+---------------------+-----------------+
223 | | zap_id | owned_by | shared_with | created_at | last_updated_at |
224 | +--------+----------+----------------------+---------------------+-----------------+
225 | | 16 | 788 | [] | 2013-10-16 21:25:30 | NULL |
226 | | 17 | 689 | [] | 2016-07-21 03:05:33 | NULL |
227 | | 18 | 735 | [] | 2020-12-16 13:51:04 | NULL |
228 | | 19 | 802 | [] | 2009-11-22 03:33:19 | NULL |
229 | | 20 | 297 | [529, 805, 541, 498] | 1997-07-11 15:05:07 | NULL |
230 | | 21 | 649 | [] | 2015-05-18 20:08:31 | NULL |
231 | | 22 | 438 | [] | 2006-12-14 15:28:30 | NULL |
232 | | 23 | 607 | [] | 2013-04-15 17:57:19 | NULL |
233 | | 24 | 460 | [] | 2018-01-28 02:05:59 | NULL |
234 | | 25 | 677 | [] | 1995-06-07 21:46:30 | NULL |
235 | +--------+----------+----------------------+---------------------+-----------------+
236 | 10 rows in set (0.01 sec)
237 | ```
238 |
239 |
240 | Can you think of anything missing from this table? (HINT: SHOW CREATE TABLE)
241 |
242 | There's no foreign key linking `owned_by` to a given user! In fact, they're just randomly generated numbers, but there are pairings. Let's create a foreign key now:
243 | ```sql
244 | ALTER TABLE ref_zaps ADD CONSTRAINT zap_owner_id FOREIGN KEY (owned_by) REFERENCES ref_users (user_id);
245 | ```
246 |
247 | ```sql
248 | Query OK, 1000 rows affected (0.90 sec)
249 | Records: 1000 Duplicates: 0 Warnings: 0
250 | ```
251 |
252 |
253 | #### Working with JSON
254 |
255 | Both JSON arrays and objects can be stored in JSON columns. Using them in queries isn't as straight-forward as other column types.
256 |
257 | ##### Finding non-null arrays
258 |
259 | ```sql
260 | SELECT *
261 | FROM
262 | ref_zaps
263 | WHERE JSON_LENGTH(shared_with) > 0
264 | LIMIT 10;
265 | ```
266 |
267 | ```sql
268 | +--------+----------+----------------------+---------------------+-----------------+
269 | | zap_id | owned_by | shared_with | created_at | last_updated_at |
270 | +--------+----------+----------------------+---------------------+-----------------+
271 | | 20 | 297 | [529, 805, 541, 498] | 1997-07-11 15:05:07 | NULL |
272 | | 40 | 312 | [395, 721, 397, 930] | 2016-11-15 03:42:41 | NULL |
273 | | 60 | 469 | [261, 565, 326, 637] | 2011-09-21 11:40:22 | NULL |
274 | | 80 | 505 | [753, 766, 812, 521] | 2001-07-04 15:28:08 | NULL |
275 | | 100 | 459 | [884, 23, 163, 654] | 2008-08-30 12:53:32 | NULL |
276 | | 120 | 411 | [730, 484, 530, 449] | 2012-09-02 00:42:20 | NULL |
277 | | 140 | 191 | [611, 798, 984, 583] | 2004-12-14 04:08:09 | NULL |
278 | | 160 | 310 | [941, 353, 499, 668] | 2003-01-22 01:05:04 | NULL |
279 | | 180 | 463 | [679, 639, 760, 784] | 2022-01-22 04:31:00 | NULL |
280 | | 200 | 36 | [308, 955, 485, 298] | 2015-10-17 21:42:16 | NULL |
281 | +--------+----------+----------------------+---------------------+-----------------+
282 | 10 rows in set (0.02 sec)
283 | ```
284 |
285 | ##### Checking for a value inside an array
286 |
287 | ```sql
288 | SELECT
289 | zap_id,
290 | owned_by,
291 | shared_with,
292 | user_id,
293 | full_name
294 | FROM ref_zaps
295 | JOIN
296 | ref_users ON
297 | JSON_CONTAINS(shared_with, JSON_ARRAY(ref_users.user_id))
298 | LIMIT 10;
299 | ```
300 |
301 | ```sql
302 | +--------+----------+---------------------+---------+--------------------+
303 | | zap_id | owned_by | shared_with | user_id | full_name |
304 | +--------+----------+---------------------+---------+--------------------+
305 | | 240 | 697 | [3, 854, 486, 907] | 3 | Gorlin, Alene |
306 | | 100 | 459 | [884, 23, 163, 654] | 23 | Schnurr, Sissie |
307 | | 700 | 947 | [28, 173, 33, 899] | 28 | Russi, Bab |
308 | | 560 | 869 | [258, 197, 724, 31] | 31 | Quince, Caryl |
309 | | 700 | 947 | [28, 173, 33, 899] | 33 | Langille, Tonya |
310 | | 740 | 888 | [41, 221, 402, 301] | 41 | Kruter, Bonni |
311 | | 460 | 566 | [45, 793, 553, 162] | 45 | Schuh, Gasparo |
312 | | 940 | 211 | [497, 973, 323, 48] | 48 | Aylsworth, Steffen |
313 | | 260 | 861 | [313, 52, 334, 457] | 52 | Delwyn, Karoline |
314 | | 420 | 667 | [524, 527, 948, 60] | 60 | Magen, Sherill |
315 | +--------+----------+---------------------+---------+--------------------+
316 | 10 rows in set (0.88 sec)
317 | ```
318 |
319 | ##### Extracting scalars from an object
320 |
321 | You can select a JSON column mixed in with non-JSON as you'd expect, and the entire contents will be displayed.
322 |
323 | ```sql
324 | SELECT
325 | user_id,
326 | email,
327 | user_json
328 | FROM
329 | gensql
330 | LIMIT 10;
331 | ```
332 |
333 | ```sql
334 | +---------+-------------------------------+-----------------------------------------------------------------------------------------------+
335 | | user_id | email | user_json |
336 | +---------+-------------------------------+-----------------------------------------------------------------------------------------------+
337 | | 1 | abba.wilder@bodacious.com | {"a_key": "playable", "b_key": {"c_key": ["unscathed", "humongous", "surplus", "mousiness"]}} |
338 | | 2 | antonetta.bosson@chaplain.com | {"a_key": "obedience", "b_key": {"c_key": ["depletion", "carve", "driveway", "primate"]}} |
339 | | 3 | cobb.fondea@contusion.com | {"a_key": "activity", "b_key": {"c_key": ["famine", "huskiness", "unleash", "unknotted"]}} |
340 | | 4 | hanan.keelin@aspect.com | {"a_key": "iron", "b_key": {"c_key": ["exact", "postcard", "sauciness", "dispatch"]}} |
341 | | 5 | kinna.lytle@epidermis.com | {"a_key": "flannels", "b_key": {"c_key": ["sherry", "graded", "crusader", "rumble"]}} |
342 | | 6 | carolynn.sewoll@starch.com | {"a_key": "extrude", "b_key": {"c_key": ["harmony", "ferris", "confirm", "elevate"]}} |
343 | | 7 | ola.pride@defile.com | {"a_key": "blurt", "b_key": {"c_key": ["expectant", "half", "coming", "remover"]}} |
344 | | 8 | orella.acima@subwoofer.com | {"a_key": "grape", "b_key": {"c_key": ["wrist", "galley", "fragment", "scurvy"]}} |
345 | | 9 | odilia.thorr@daredevil.com | {"a_key": "numbing", "b_key": {"c_key": ["glutinous", "repacking", "reliant", "polygon"]}} |
346 | | 10 | berrie.marybella@undertow.com | {"a_key": "unadvised", "b_key": {"c_key": ["grove", "cornhusk", "darkening", "grazing"]}} |
347 | +---------+-------------------------------+-----------------------------------------------------------------------------------------------+
348 | 10 rows in set (0.01 sec)
349 | ```
350 |
351 | You can also extract specific keys:
352 |
353 | ```sql
354 | -- the ->> operator is shorthand for JSON_UNQUOTE(JSON_EXTRACT())
355 | SELECT
356 | email,
357 | user_json->>'$.b_key'
358 | FROM
359 | gensql
360 | LIMIT 10;
361 | ```
362 |
363 | ```sql
364 | +---------+-------------------------------+---------------------------------------------------------------+
365 | | user_id | email | user_json->>'$.b_key' |
366 | +---------+-------------------------------+---------------------------------------------------------------+
367 | | 1 | abba.wilder@bodacious.com | {"c_key": ["unscathed", "humongous", "surplus", "mousiness"]} |
368 | | 2 | antonetta.bosson@chaplain.com | {"c_key": ["depletion", "carve", "driveway", "primate"]} |
369 | | 3 | cobb.fondea@contusion.com | {"c_key": ["famine", "huskiness", "unleash", "unknotted"]} |
370 | | 4 | hanan.keelin@aspect.com | {"c_key": ["exact", "postcard", "sauciness", "dispatch"]} |
371 | | 5 | kinna.lytle@epidermis.com | {"c_key": ["sherry", "graded", "crusader", "rumble"]} |
372 | | 6 | carolynn.sewoll@starch.com | {"c_key": ["harmony", "ferris", "confirm", "elevate"]} |
373 | | 7 | ola.pride@defile.com | {"c_key": ["expectant", "half", "coming", "remover"]} |
374 | | 8 | orella.acima@subwoofer.com | {"c_key": ["wrist", "galley", "fragment", "scurvy"]} |
375 | | 9 | odilia.thorr@daredevil.com | {"c_key": ["glutinous", "repacking", "reliant", "polygon"]} |
376 | | 10 | berrie.marybella@undertow.com | {"c_key": ["grove", "cornhusk", "darkening", "grazing"]} |
377 | +---------+-------------------------------+---------------------------------------------------------------+
378 | 10 rows in set (0.00 sec)
379 | ```
380 |
381 |
382 | Or nest extractions:
383 |
384 | ```sql
385 | SELECT
386 | user_id,
387 | email,
388 | user_json->>'$.b_key.c_key'
389 | FROM
390 | gensql
391 | LIMIT 10;
392 | ```
393 |
394 | ```sql
395 | +---------+-------------------------------+----------------------------------------------------+
396 | | user_id | email | user_json->>'$.b_key.c_key' |
397 | +---------+-------------------------------+----------------------------------------------------+
398 | | 1 | abba.wilder@bodacious.com | ["unscathed", "humongous", "surplus", "mousiness"] |
399 | | 2 | antonetta.bosson@chaplain.com | ["depletion", "carve", "driveway", "primate"] |
400 | | 3 | cobb.fondea@contusion.com | ["famine", "huskiness", "unleash", "unknotted"] |
401 | | 4 | hanan.keelin@aspect.com | ["exact", "postcard", "sauciness", "dispatch"] |
402 | | 5 | kinna.lytle@epidermis.com | ["sherry", "graded", "crusader", "rumble"] |
403 | | 6 | carolynn.sewoll@starch.com | ["harmony", "ferris", "confirm", "elevate"] |
404 | | 7 | ola.pride@defile.com | ["expectant", "half", "coming", "remover"] |
405 | | 8 | orella.acima@subwoofer.com | ["wrist", "galley", "fragment", "scurvy"] |
406 | | 9 | odilia.thorr@daredevil.com | ["glutinous", "repacking", "reliant", "polygon"] |
407 | | 10 | berrie.marybella@undertow.com | ["grove", "cornhusk", "darkening", "grazing"] |
408 | +---------+-------------------------------+----------------------------------------------------+
409 | 10 rows in set (0.01 sec)
410 | ```
411 |
412 | ```sql
413 | -- the -> operator is shorthand for JSON_EXTRACT()
414 | -- arrays are 0-indexed, so this is a slice, like lst[1:3]
415 | SELECT
416 | email,
417 | user_json->'$.b_key.c_key[1 to 2]'
418 | FROM
419 | gensql
420 | LIMIT 10;
421 | ```
422 |
423 | ```sql
424 | +--------------------------------+------------------------------------+
425 | | email | user_json->'$.e_key.d_key[1 to 2]' |
426 | +--------------------------------+------------------------------------+
427 | | donelle.labors@amused.com | ["idealness", "unplug"] |
428 | | mackenzie.youngran@abridge.com | ["waffle", "scion"] |
429 | | elset.kramer@tiny.com | ["dimple", "manpower"] |
430 | | theresita.faxen@plentiful.com | ["appetizer", "huskiness"] |
431 | | salomi.pasco@each.com | ["tiptop", "unsnap"] |
432 | | ashia.garate@varied.com | ["bauble", "mayflower"] |
433 | | jonathan.aulea@chastise.com | ["senior", "silicon"] |
434 | | gillan.mcnalley@slain.com | ["provider", "gradient"] |
435 | | madelon.harleigh@defiling.com | ["evoke", "tidy"] |
436 | | dagny.iverson@entryway.com | ["baton", "skillful"] |
437 | +--------------------------------+------------------------------------+
438 | 10 rows in set (0.02 sec)
439 | ```
440 |
441 | See [MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html) for much more about JSON operations.
442 |
443 | ### INSERT
444 |
445 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/insert.html)
446 |
447 | `INSERT` is used to insert rows into a table. There is also an `UPSERT` equivalent, with the `ON DUPLICATE KEY UPDATE` clause. With this, if an `INSERT` would cause a key collision with a `UNIQUE` index (explicit or implicit, e.g. `PRIMARY KEY`), then an `UPDATE` of that row occurs instead.
448 |
449 | ```sql
450 | INSERT INTO users
451 | (first_name, last_name, user_id)
452 | VALUES
453 | ('Leeroy', 'Jenkins', 42);
454 | ```
455 |
456 | ```sql
457 | ERROR 1062 (23000): Duplicate entry '42' for key 'users.PRIMARY'
458 | ```
459 |
460 | Expectedly, that failed since `user_id`, which is our primary key, already has an entry at `42`.
461 |
462 | ```sql
463 | SELECT * FROM
464 | users
465 | WHERE
466 | user_id = 42\G
467 | ```
468 |
469 | ```sql
470 | *************************** 1. row ***************************
471 | user_id: 42
472 | first_name: Ramona
473 | last_name: Odelet
474 | full_name: Odelet, Ramona
475 | email: ramona.odelet@lucid.com
476 | city: Foligno
477 | country: Italy
478 | created_at: 2003-07-29 07:34:15
479 | last_updated_at: NULL
480 | 1 row in set (0.01 sec)
481 | ```
482 |
483 | Now we can try again, this time with an instruction to perform an UPSERT.
484 |
485 | ```sql
486 | INSERT INTO users
487 | (first_name, last_name, user_id)
488 | VALUES
489 | ("Leeroy", "Jenkins", 42) AS vals
490 | ON DUPLICATE KEY UPDATE
491 | first_name = vals.first_name,
492 | last_name = vals.last_name;
493 | ```
494 |
495 | ```sql
496 | Query OK, 2 rows affected (0.21 sec)
497 | ```
498 |
499 | ```sql
500 | SELECT * FROM users WHERE user_id = 42\G
501 | ```
502 |
503 | ```sql
504 | *************************** 1. row ***************************
505 | user_id: 42
506 | first_name: Leeroy
507 | last_name: Jenkins
508 | full_name: Jenkins, Leeroy
509 | email: ramona.odelet@lucid.com
510 | city: Foligno
511 | country: Italy
512 | created_at: 2003-07-29 07:34:15
513 | last_updated_at: 2023-02-27 13:24:26
514 | 1 row in set (0.01 sec)
515 | ```
516 |
517 | While `full_name` updated, since it's a `GENERATED` column, `email` is now incorrect. Also, note that `last_updated_at` has changed from `NULL`, since we've modified the row.
518 |
519 | Let's put the row back to how it was before.
520 |
521 |
522 | How can this be accomplished?
523 |
524 | ```sql
525 | -- first, let's be safe with a transaction
526 | START TRANSACTION;
527 | ```
528 |
529 | ```sql
530 | Query OK, 0 rows affected (0.01 sec)
531 | ```
532 |
533 | ```sql
534 | -- then, use UPDATE
535 | UPDATE users SET first_name = 'Ramona', last_name = 'Odelet' WHERE user_id = 42;
536 | ```
537 |
538 | ```sql
539 | Query OK, 1 row affected (0.01 sec)
540 | Rows matched: 1 Changed: 1 Warnings: 0
541 | ```
542 |
543 | ```sql
544 | -- next, verify the work
545 | SELECT * FROM users WHERE user_id = 42\G
546 | ```
547 |
548 | ```sql
549 | *************************** 1. row ***************************
550 | user_id: 42
551 | first_name: Ramona
552 | last_name: Odelet
553 | full_name: Odelet, Ramona
554 | email: ramona.odelet@lucid.com
555 | city: Foligno
556 | country: Italy
557 | created_at: 2003-07-29 07:34:15
558 | last_updated_at: 2023-02-27 13:30:10
559 | 1 row in set (0.00 sec)
560 | ```
561 |
562 | ```sql
563 | -- finally, commit the result
564 | COMMIT;
565 | ```
566 |
567 | ```sql
568 | Query OK, 0 rows affected (0.08 sec)
569 | ```
570 |
571 |
572 | ### TABLE
573 |
574 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/table.html)
575 |
576 | `TABLE` is syntactic sugar for `SELECT * FROM `. Works great if you know the table is small, but be careful on large tables!
577 |
578 | ```sql
579 | TABLE users\G
580 | ```
581 |
582 | ```sql
583 | -- 9999 rows are above this...
584 | *************************** 10000. row ***************************
585 | user_id: 10000
586 | first_name: Gabrila
587 | last_name: Lemmueu
588 | full_name: Lemmueu, Gabrila
589 | email: gabrila.lemmueu@urgent.com
590 | city: Itanagar
591 | country: India
592 | created_at: 2020-12-10 01:58:35
593 | last_updated_at: NULL
594 | 10000 rows in set (0.48 sec)
595 | ```
596 |
597 | ## Joins
598 |
599 | ### Relational alegbra
600 |
601 | Not a lot of it, I promise; just what we need to discuss joins.
602 |
603 | * Union: `R ∪ S --- R OR S`
604 | * Implemented in MySQL via the `UNION` keyword
605 | * Intersection: `R ∩ S --- R AND S`
606 | * Implemented in MySQL via `INNER JOIN`, or in MySQL 8.0.31, the `INTERSECT` keyword
607 | * Difference: `R ≏ S --- R - S`
608 | * Implemented in MySQL 8.0.31 via the `EXCEPT` keyword, and can be emulated using `UNION` and `NOT IN`
609 |
610 | If you're intersted in exploring relational alegbra, [this application](https://dbis-uibk.github.io/relax/calc/local/uibk/local/3) is quite useful to convert SQL to relational alegbra, and display the results.
611 |
612 | ### Types of joins
613 |
614 | #### Cross
615 |
616 | Before we demonstrate a cross join, you should have two small (very small, like < 10 rows) tables. You can either use what we learned earlier to create a new table from an existing one, or you can use any two of the following two tables: `northwind.orders_status`, `northwind.tax_status_name`, `test.ref_users_tiny`, `test.ref_users_zaps`. You can cross join across schemas if you'd like, although I can't promise the information will make any sense.
617 |
618 | Also called a Cartesian Join. This produces `n x m` rows for the two groups being joined. That said, every other join can be thought of as a cross join with a predicate. In fact, `CROSS JOIN`, `JOIN`, and `INNER JOIN` are actually syntactically equivalent in MySQL (not ANSI SQL!), but for readability, it's preferred to only use `CROSS JOIN` if you actually intend to use it.
619 |
620 | ```sql
621 | SELECT
622 | z.zap_id,
623 | u.user_id,
624 | u.full_name
625 | FROM
626 | ref_users_tiny u
627 | CROSS JOIN
628 | ref_zaps_tiny z;
629 | ```
630 |
631 | ```sql
632 | +--------+---------+-------------------+
633 | | zap_id | user_id | full_name |
634 | +--------+---------+-------------------+
635 | | 1 | 4 | McGrody, Cointon |
636 | | 1 | 3 | Gorlin, Alene |
637 | | 1 | 2 | Marienthal, Shirl |
638 | | 1 | 1 | Jemena, Wyatt |
639 | | 2 | 4 | McGrody, Cointon |
640 | | 2 | 3 | Gorlin, Alene |
641 | | 2 | 2 | Marienthal, Shirl |
642 | | 2 | 1 | Jemena, Wyatt |
643 | | 3 | 4 | McGrody, Cointon |
644 | | 3 | 3 | Gorlin, Alene |
645 | | 3 | 2 | Marienthal, Shirl |
646 | | 3 | 1 | Jemena, Wyatt |
647 | | 4 | 4 | McGrody, Cointon |
648 | | 4 | 3 | Gorlin, Alene |
649 | | 4 | 2 | Marienthal, Shirl |
650 | | 4 | 1 | Jemena, Wyatt |
651 | +--------+---------+-------------------+
652 | 16 rows in set (0.01 sec)
653 | ```
654 |
655 | #### Inner Join
656 |
657 | The default (i.e. `JOIN` == `INNER JOIN`). This is `users AND zaps` with a predicate.
658 |
659 | ```sql
660 | SELECT
661 | z.zap_id,
662 | u.full_name,
663 | u.city,
664 | u.country
665 | FROM
666 | ref_users u
667 | JOIN
668 | ref_zaps z
669 | ON
670 | u.user_id = z.owned_by
671 | LIMIT 10;
672 | ```
673 |
674 | ```sql
675 | +--------+-------------------+-------------+----------------+
676 | | zap_id | full_name | city | country |
677 | +--------+-------------------+-------------+----------------+
678 | | 411 | MacPherson, Addie | Latina | Italy |
679 | | 794 | Airla, Valaree | Pribram | Czech Republic |
680 | | 830 | Kirschner, Robby | Bikaner | India |
681 | | 697 | Bilski, Lewiss | Vörderås | Sweden |
682 | | 110 | Yamauchi, Marleah | Rotterdam | Netherlands |
683 | | 942 | Yamauchi, Marleah | Rotterdam | Netherlands |
684 | | 772 | Calore, Ania | Miyakojima | Japan |
685 | | 676 | Breger, Gratiana | Valkeakoski | Finland |
686 | | 715 | Serafina, Janith | Morant Bay | Jamaica |
687 | | 405 | Beckman, Pavla | Wackersdorf | Germany |
688 | +--------+-------------------+-------------+----------------+
689 | 10 rows in set (0.02 sec)
690 | ```
691 |
692 | #### Left Outer Join
693 |
694 | Left and Right Joins are both a type of Outer Join, and often just called Left or Right Join. This is `users OR zaps` with a predicate and default value (`NULL`) for `zaps`.
695 |
696 | ```sql
697 | SELECT
698 | u.user_id,
699 | u.full_name,
700 | z.zap_id,
701 | z.owned_by
702 | FROM
703 | ref_users u
704 | LEFT JOIN
705 | ref_zaps_joins z
706 | ON
707 | u.user_id = z.owned_by
708 | LIMIT 10;
709 | ```
710 |
711 | ```sql
712 | +---------+-------------------+--------+----------+
713 | | user_id | full_name | zap_id | owned_by |
714 | +---------+-------------------+--------+----------+
715 | | 1 | MacPherson, Addie | 411 | 1 |
716 | | 2 | Airla, Valaree | 794 | 2 |
717 | | 3 | Nett, Sheppard | NULL | NULL |
718 | | 4 | Kirschner, Robby | 830 | 4 |
719 | | 5 | Bilski, Lewiss | 697 | 5 |
720 | | 6 | Yamauchi, Marleah | 942 | 6 |
721 | | 6 | Yamauchi, Marleah | 110 | 6 |
722 | | 7 | Calore, Ania | 772 | 7 |
723 | | 8 | Breger, Gratiana | 676 | 8 |
724 | | 9 | Serafina, Janith | 715 | 9 |
725 | +---------+-------------------+--------+----------+
726 | 10 rows in set (0.09 sec)
727 | ```
728 |
729 | Of course, we previously put a foreign key on `zaps.owned_by`, precisely to prevent this kind of thing from happening. Still, you can see how this kind of query could be useful.
730 |
731 | #### Right Outer Join
732 |
733 | This is the same thing, but with the tables reversed:
734 |
735 | ```sql
736 | SELECT
737 | u.user_id,
738 | u.full_name,
739 | z.zap_id,
740 | z.owned_by
741 | FROM
742 | ref_users u
743 | RIGHT JOIN
744 | ref_zaps_joins z
745 | ON
746 | u.user_id = z.owned_by
747 | LIMIT 10;
748 | ```
749 |
750 | ```sql
751 | +---------+------------------+--------+----------+
752 | | user_id | full_name | zap_id | owned_by |
753 | +---------+------------------+--------+----------+
754 | | 602 | Hirz, Datha | 1 | 602 |
755 | | 593 | Meldoh, Vergil | 2 | 593 |
756 | | NULL | NULL | 3 | 0 |
757 | | 548 | Philps, Ardelia | 4 | 548 |
758 | | 957 | Joash, Electra | 5 | 957 |
759 | | 777 | Levinson, Lenore | 6 | 777 |
760 | | 648 | Vas, Tiphanie | 7 | 648 |
761 | | 959 | Brink, Kaia | 8 | 959 |
762 | | 569 | Lasser, Garrard | 9 | 569 |
763 | | 429 | Adamsen, Justen | 10 | 429 |
764 | +---------+------------------+--------+----------+
765 | 10 rows in set (0.09 sec)
766 | ```
767 |
768 | You can translate any `LEFT JOIN` to a `RIGHT JOIN` simply by swapping the order of the tables being joined:
769 |
770 | ```sql
771 | SELECT
772 | u.user_id,
773 | u.full_name,
774 | z.zap_id,
775 | z.owned_by
776 | FROM
777 | ref_zaps_joins z
778 | RIGHT JOIN
779 | ref_users u
780 | ON
781 | u.user_id = z.owned_by
782 | LIMIT 10;
783 | ```
784 |
785 | ```sql
786 | +---------+-------------------+--------+----------+
787 | | user_id | full_name | zap_id | owned_by |
788 | +---------+-------------------+--------+----------+
789 | | 1 | MacPherson, Addie | 411 | 1 |
790 | | 2 | Airla, Valaree | 794 | 2 |
791 | | 3 | Nett, Sheppard | NULL | NULL |
792 | | 4 | Kirschner, Robby | 830 | 4 |
793 | | 5 | Bilski, Lewiss | 697 | 5 |
794 | | 6 | Yamauchi, Marleah | 942 | 6 |
795 | | 6 | Yamauchi, Marleah | 110 | 6 |
796 | | 7 | Calore, Ania | 772 | 7 |
797 | | 8 | Breger, Gratiana | 676 | 8 |
798 | | 9 | Serafina, Janith | 715 | 9 |
799 | +---------+-------------------+--------+----------+
800 | 10 rows in set (0.15 sec)
801 | ```
802 |
803 | #### Full Outer Join
804 |
805 | This is `users OR zaps` with a predicate and default value (`NULL`) for both tables. MySQL doesn't support `FULL JOIN` as a keyword, but it can be performed using `UNION` (or `UNION ALL` if duplicates are desired).
806 |
807 | NOTE: This query will produce 1150 rows as written.
808 |
809 | ```sql
810 | SELECT
811 | u.user_id,
812 | u.full_name,
813 | z.zap_id,
814 | z.owned_by
815 | FROM
816 | ref_users u
817 | LEFT JOIN ref_zaps_joins z ON u.user_id = z.owned_by
818 | UNION ALL
819 | SELECT
820 | u.user_id,
821 | u.full_name,
822 | z.zap_id,
823 | z.owned_by
824 | FROM
825 | ref_users u
826 | RIGHT JOIN ref_zaps_joins z ON u.user_id = z.owned_by
827 | WHERE
828 | u.user_id IS NULL;
829 | ```
830 |
831 | To efficiently see what it's doing, you can run two queries, appending `ORDER BY -user_id DESC` and `ORDER BY user_id`, which represents the top and bottom of the result. Don't forget to add a `LIMIT` as well!
832 |
833 |
834 | What is -user_id?
835 |
836 | It's shorthand for the math expression `(0 - user_id)`, which effectively is the same thing as `ORDER BY ... ASC`, but it places `NULL` values last. Postgres avoids this weird trick and just has the `NULLS {FIRST, LAST}` option for ordering.
837 |
838 |
839 | ### Specifying a column's table
840 |
841 | You may have noticed that we've used aliases for many tables, e.g. `ref_users u`, and then notating columns with that alias as a prefix, e.g. `u.user_id`. This is not required for single tables, of course, nor is it requires with joins if every column name is unique. However, it's considered a good practice when using multiple tables.
842 |
843 | ### Indices
844 |
845 | Indices, or indexes, _may_ speed up queries. Each table **should** have a primary key (it's not required*, but, please don't do this), which is one index. Additional indices, on single or multiple columns, may be created. Most of them are stored in [B+ trees](https://en.wikipedia.org/wiki/B%2B_tree), which are similar to [B-trees](https://en.wikipedia.org/wiki/B-tree).
846 |
847 | Indices aren't free, however - when you create an index on a column, that column's values are copied to the aforementioned B+ tree. While disk space is relatively cheap, creating dozens of indices for columns that are infrequently queried should be avoided. Also, since `INSERTs` must also write to the index, they'll be slowed down somewhat. Finally, InnoDB limits a given table to a maximum of 64 secondary indices (that is, other than primary keys).
848 |
849 |
850 | Obscure facts about tables without primary keys
851 |
852 | \* Prior to MySQL 8.0.30, if you don't create a primary key, the first `UNIQUE NOT NULL` index created is automatically promoted to become the primary key. If you don't have one of those either, the table will have no primary key†. Starting with MySQL 8.0.30, if no primary key is declared, an invisible column will be created called `my_row_id` and set to be the primary key.
853 |
854 | † Not entirely true. A hidden index named `GEN_CLUST_INDEX` is created on an invisible (but a special kind of invisible, that you can never view) column named `ROW_ID` containing row IDs, but it's a monotonically increasing index that's shared globally across the entire database, not just that schema. Don't make InnoDB do this.
855 |
856 |
857 | #### Single indices
858 |
859 | Here, we'll switch over to `%_big` tables, which have 1,000,000 rows each.
860 |
861 | ```sql
862 | SELECT
863 | user_id,
864 | full_name,
865 | city,
866 | country
867 | FROM
868 | ref_users_big
869 | WHERE
870 | last_name = 'Safko';
871 | ```
872 |
873 | ```sql
874 | +---------+------------------+------------------------+----------------+
875 | | user_id | full_name | city | country |
876 | +---------+------------------+------------------------+----------------+
877 | | 66826 | Safko, Elwyn | Arad | Romania |
878 | | 68759 | Safko, Vance | Saint-Jérôme | Canada |
879 | | 81384 | Safko, Robinett | Hornchurch | United Kingdom |
880 | | 92580 | Safko, Daisi | Sherwood Park | Canada |
881 | | 121219 | Safko, Karalee | Miami Gardens | United States |
882 | | 124408 | Safko, Kyrstin | Hawick | United Kingdom |
883 | | 150615 | Safko, Kleon | Leigh | United Kingdom |
884 | | 151266 | Safko, Elita | Abag Qi | China |
885 | | 155926 | Safko, Berthe | Tullebølle | Denmark |
886 | | 168897 | Safko, Hazlett | Valletta | Malta |
887 | | ... | ... | ... | ... |
888 | | 900935 | Safko, Tommy | Paris | France |
889 | | 925514 | Safko, Rancell | Nampa | United States |
890 | | 928486 | Safko, Garry | Bardhaman | India |
891 | | 932457 | Safko, Desiree | Kherson | Ukraine |
892 | | 945316 | Safko, Courtnay | Saint Marys | Canada |
893 | | 947072 | Safko, Leonie | Durango | Mexico |
894 | | 948263 | Safko, Jarred | Las Vegas | United States |
895 | | 959464 | Safko, Gordie | Madison | United States |
896 | | 972002 | Safko, Adriena | Ubud | Indonesia |
897 | | 982089 | Safko, Gan | Milpitas | United States |
898 | +---------+------------------+------------------------+----------------+
899 | 76 rows in set (12.24 sec)
900 | ```
901 |
902 | Let's create an index on the last name.
903 |
904 | ```sql
905 | CREATE INDEX last_name ON ref_users_big (last_name);
906 | ```
907 |
908 | ```sql
909 | Query OK, 0 rows affected (45.08 sec)
910 | Records: 0 Duplicates: 0 Warnings: 0
911 | ```
912 |
913 | ```sql
914 | SELECT * FROM ref_users_big WHERE last_name = 'Safko';
915 | ```
916 |
917 | ```sql
918 | -- the same results as above
919 | 76 rows in set (0.04 sec)
920 | ```
921 |
922 | The lookup is now essentially instantaneous. If this is a frequently performed query, this may be a wise decision. There are also times when you may not need an index - for example, remember that a `UNIQUE` constraint is also an index. Since all of our users in this table have an email address which is `first.last@domain.com`, you might be tempted to add a predicate of `WHERE email LIKE '%safko%'` instead of adding an index, but alas - leading wildcards disallow the use of indexes, so it requires a full table scan.
923 |
924 | #### Partial indices
925 |
926 | Starting with MySQL 8.0.13, you can also create an index on a prefix of a column for string types (`CHAR`, `VARCHAR`, etc.), and for `TEXT` and `BLOB` columns you must do this.
927 |
928 | This will create an index on the first 3 characters of last_name:
929 |
930 | ```sql
931 | ALTER TABLE ref_users_big DROP INDEX user_name;
932 | CREATE INDEX last_name_partial ON ref_users_big (last_name(3));
933 | ```
934 |
935 | ```sql
936 | Query OK, 0 rows affected (0.31 sec)
937 | Records: 0 Duplicates: 0 Warnings: 0
938 |
939 | Query OK, 0 rows affected (37.85 sec)
940 | Records: 0 Duplicates: 0 Warnings: 0
941 | ```
942 |
943 | Speed for the new query is slower than before (0.16 seconds vs. 0.04 seconds), as expected, but 160 milliseconds for hashing three characters honestly isn't that bad. If you have tremendously large tables, limited disk space, or are worried about the write performance impact, this may be a good option for you.
944 |
945 | #### Functional indices
946 |
947 | You can also create an index that is itself an expression:
948 |
949 | ```sql
950 | CREATE INDEX
951 | created_month
952 | ON ref_users_big ((MONTH(created_at)));
953 | ```
954 |
955 | Note the double parentheses around the expression.
956 |
957 | ```sql
958 | Query OK, 0 rows affected (41.15 sec)
959 | Records: 0 Duplicates: 0 Warnings: 0
960 | ```
961 |
962 | What this specifically allows you to do is treat the `created_at` month value as an integer:
963 |
964 | ```sql
965 | EXPLAIN ANALYZE SELECT
966 | user_id, email, created_at
967 | FROM
968 | ref_users_big
969 | WHERE
970 | MONTH(created_at) = 6\G
971 | ```
972 |
973 | ```sql
974 | *************************** 1. row ***************************
975 | EXPLAIN: -> Index lookup on ref_users_big using created_month (month(created_at)=6) (cost=19952.91 rows=153858) (actual time=2.303..12051.690 rows=82815 loops=1)
976 |
977 | 1 row in set (15.49 sec)
978 | ```
979 |
980 | Note that in this case, it's actually _slower_ with the index, likely due to the cardinality of the month.
981 |
982 | ```sql
983 | EXPLAIN ANALYZE SELECT
984 | user_id, email, created_at
985 | FROM
986 | ref_users_big
987 | USE INDEX()
988 | WHERE
989 | MONTH(created_at) = 6\G
990 | ```
991 |
992 | ```sql
993 | *************************** 1. row ***************************
994 | EXPLAIN: -> Filter: (month(ref_users_big.created_at) = 6) (cost=100955.37 rows=994330) (actual time=1.114..11135.192 rows=82815 loops=1)
995 | -> Table scan on ref_users_big (cost=100955.37 rows=994330) (actual time=1.010..9733.530 rows=1000000 loops=1)
996 |
997 | 1 row in set (11.43 sec)
998 | ```
999 |
1000 | #### JSON / Longtext
1001 |
1002 | JSON has its own special requirements to be indexed, mostly if you're storing strings. First, you must select a specific part of the column's rows to be the indexed key, known as a functional key part. Additionally, the key has to have a prefix length assigned to it. Depending on the version of MySQL you're using, there may also be collation differences between the return value from various JSON functions and native storage of strings. Finally, this requires the stored data to be `k:v` objects, rather than arrays.
1003 |
1004 | Here, we're using a multi-valued index, which behind the scenes is creating a virtual, invisible column to store the extracted JSON array as a character array.
1005 |
1006 | ```sql
1007 | CREATE INDEX user_json_array_key ON gensql (
1008 | (
1009 | CAST(
1010 | user_json -> '$.b_key.c_key' AS CHAR(64) ARRAY
1011 | )
1012 | )
1013 | );
1014 | ```
1015 |
1016 | See [MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-multi-valued) for more information on indexing JSON values, and properly using them.
1017 |
1018 | #### Composite indices
1019 |
1020 | An index can also be created across multiple columns - for InnoDB, up to 16.
1021 |
1022 | ```sql
1023 | CREATE INDEX full_name ON ref_users (first_name, last_name);
1024 | ```
1025 |
1026 | ```sql
1027 | Query OK, 0 rows affected (40.09 sec)
1028 | Records: 0 Duplicates: 0 Warnings: 0
1029 | ```
1030 |
1031 | First, we'll use `IGNORE INDEX` to direct SQL to ignore the index we just created. This query counts the duplicate name tuples. Since the `id` is being included, and `GROUPing` it would result in an empty set (as it's the primary key, and thus guaranteed to be unique), `ANY VALUE` must be specified to let MySQL know that the result can be non-deterministic. Finally, `EXPLAIN ANALYZE` is being used to run the query, and explain what it's doing. This differs from `EXPLAIN`, which guesses at what would be done, but doesn't actually perform the query. Be careful using `EXPLAIN ANALYZE`, especially with destructive actions, since those queries will actually be performed!
1032 |
1033 | ```sql
1034 | EXPLAIN ANALYZE
1035 | SELECT
1036 | ANY_VALUE(id),
1037 | first_name,
1038 | last_name,
1039 | COUNT(*) c
1040 | FROM
1041 | ref_users_big
1042 | IGNORE INDEX(full_name)
1043 | GROUP BY
1044 | first_name,
1045 | last_name
1046 | HAVING
1047 | c > 1\G
1048 | ```
1049 |
1050 | ```sql
1051 | *************************** 1. row ***************************
1052 | EXPLAIN: -> Filter: (c > 1) (actual time=23295.903..24686.641 rows=4318 loops=1)
1053 | -> Table scan on (actual time=0.005..903.621 rows=995670 loops=1)
1054 | -> Aggregate using temporary table (actual time=23295.727..24415.358 rows=995670 loops=1)
1055 | -> Table scan on ref_users_big (cost=104920.32 rows=995522) (actual time=2.329..10156.102 rows=1000000 loops=1)
1056 |
1057 | 1 row in set (25.26 sec)
1058 | ```
1059 |
1060 | The query took 25.26 seconds, and resulted in 4318 rows. The output is read from the bottom up - a table scan was performed on the entire table, then a temporary table with the `GROUP BY` aggregation was created, and finally a second table scan on that temporary table was performed to find the duplicated tuples.
1061 |
1062 | If you're curious, `actual time` is in milliseconds, and consists of two timings - the first is the time to initiate the step and return the first row; the second is the time to initiate the step and return all rows. `cost` is an arbitrary number indicating what the query cost optimizer thinks the query costs to perform, and is meaningless.
1063 |
1064 | ```sql
1065 | EXPLAIN ANALYZE
1066 | SELECT
1067 | ANY_VALUE(id),
1068 | first_name,
1069 | last_name,
1070 | COUNT(*) c
1071 | FROM
1072 | ref_users_big
1073 | GROUP BY
1074 | first_name,
1075 | last_name
1076 | HAVING
1077 | c > 1\G
1078 | ```
1079 |
1080 | ```sql
1081 | *************************** 1. row ***************************
1082 | EXPLAIN: -> Filter: (c > 1) (actual time=6.318..12202.646 rows=4318 loops=1)
1083 | -> Group aggregate: count(0) (actual time=0.864..11447.233 rows=995670 loops=1)
1084 | -> Index scan on ref_users_big using full_name (cost=104920.32 rows=995522) (actual time=0.815..7315.098 rows=1000000 loops=1)
1085 |
1086 | 1 row in set (12.32 sec)
1087 | ```
1088 |
1089 | With the index in place, an index scan is performed instead of two table scans, resulting in a ~2x speedup.
1090 |
1091 | Another example, retreiving a specific doubled tuple that I know exists:
1092 |
1093 | ```sql
1094 | SELECT
1095 | user_id,
1096 | full_name,
1097 | email,
1098 | city,
1099 | country
1100 | FROM
1101 | ref_users_big
1102 | WHERE
1103 | first_name = 'Ashlie'
1104 | AND
1105 | last_name = 'Godred';
1106 | ```
1107 |
1108 | ```sql
1109 | +---------+----------------+-------------------------+----------+--------------+
1110 | | user_id | full_name | email | city | country |
1111 | +---------+----------------+-------------------------+----------+--------------+
1112 | | 974206 | Godred, Ashlie | ashlie.godred@mushy.com | Mikkeli | Finland |
1113 | | 987301 | Godred, Ashlie | ashlie.godred@suave.com | Pretoria | South Africa |
1114 | +---------+----------------+-------------------------+----------+--------------+
1115 | 2 rows in set (0.01 sec)
1116 | ```
1117 |
1118 | vs. if `USE INDEX()` is added to the query:
1119 |
1120 | ```sql
1121 | +---------+----------------+-------------------------+----------+--------------+
1122 | | user_id | full_name | email | city | country |
1123 | +---------+----------------+-------------------------+----------+--------------+
1124 | | 974206 | Godred, Ashlie | ashlie.godred@mushy.com | Mikkeli | Finland |
1125 | | 987301 | Godred, Ashlie | ashlie.godred@suave.com | Pretoria | South Africa |
1126 | +---------+----------------+-------------------------+----------+--------------+
1127 | 2 rows in set (14.60 sec)
1128 | ```
1129 |
1130 | Note that `USE INDEX()` is valid syntax to tell MySQL to ignore all indexes.
1131 |
1132 | If instead, either the `full_name` or `last_name_partial` index we made perviously is ignored on its own, its complement will be used, and they're effectively equally fast due to the filtered result set - here, using the partial index on `last_name` dropped the candidate tuples from 1,000,000 to 1,066.
1133 |
1134 | ```sql
1135 | EXPLAIN ANALYZE
1136 | SELECT
1137 | user_id,
1138 | full_name,
1139 | email,
1140 | city,
1141 | country
1142 | FROM
1143 | ref_users_big IGNORE INDEX(full_name)
1144 | WHERE
1145 | first_name = 'Ashlie'
1146 | AND
1147 | last_name = 'Godred'\G
1148 | ```
1149 |
1150 | ```sql
1151 | *************************** 1. row ***************************
1152 | EXPLAIN: -> Filter: ((ref_users_big.last_name = 'Godred') and (ref_users_big.first_name = 'Ashlie')) (cost=641.79 rows=0) (actual time=315.346..322.278 rows=2 loops=1)
1153 | -> Index lookup on ref_users_big using last_name_partial (last_name='Godred') (cost=641.79 rows=1066) (actual time=6.602..317.360 rows=1066 loops=1)
1154 |
1155 | 1 row in set (0.34 sec)
1156 | ```
1157 | #### Testing indices
1158 |
1159 | MySQL 8 added the ability to toggle an index on and off, without actually dropping it. This way, if you want to test whether or not an index is helpful, you can toggle it off, observe query performance, and then decide whether or not to leave it.
1160 |
1161 | ```sql
1162 | ALTER TABLE ref_users ALTER INDEX first_name INVISIBLE;
1163 | ```
1164 |
1165 | ```sql
1166 | Query OK, 0 rows affected (0.28 sec)
1167 | Records: 0 Duplicates: 0 Warnings: 0
1168 | ```
1169 |
1170 | ```sql
1171 | EXPLAIN ANALYZE
1172 | SELECT
1173 | user_id,
1174 | full_name,
1175 | email,
1176 | city,
1177 | country
1178 | FROM
1179 | ref_users_big
1180 | WHERE
1181 | first_name = 'Ashlie'
1182 | AND
1183 | last_name = 'Godred'\G
1184 | ```
1185 |
1186 | ```sql
1187 | *************************** 1. row ***************************
1188 | EXPLAIN: -> Filter: ((ref_users_big.last_name = 'Godred') and (ref_users_big.first_name = 'Ashlie')) (cost=641.79 rows=0) (actual time=315.346..322.278 rows=2 loops=1)
1189 | -> Index lookup on ref_users_big using last_name_partial (last_name='Godred') (cost=641.79 rows=1066) (actual time=6.602..317.360 rows=1066 loops=1)
1190 |
1191 | 1 row in set (0.34 sec)
1192 | ```
1193 |
1194 | #### Descending indices
1195 |
1196 | By default, indices are sorted in ascending order. While they can still be used when reversed, it's not as fast (although the performance difference may be minimal - test your theory before committing to it). If you are frequently querying something with `ORDER BY DESC`, it may be helpful to instead create an index in descending order.
1197 |
1198 | ```sql
1199 | CREATE INDEX first_desc ON ref_users_big (first_name DESC);
1200 | ```
1201 |
1202 | ```sql
1203 | Query OK, 0 rows affected (41.18 sec)
1204 | Records: 0 Duplicates: 0 Warnings: 0
1205 | ```
1206 |
1207 | #### When indicies aren't helpful
1208 |
1209 | You may have noticed in a few of the previous `EXPLAIN ANALYZE` statements two different kinds of inner joins - `nested loop inner join`, and `inner hash join`. A nested loop join is exactly what it sounds like:
1210 |
1211 | ```python
1212 | for tuple_i in table_1:
1213 | for tuple_j in table_2
1214 | if join_is_satisfied(tuple_i, tuple_j):
1215 | yield (tuple_i, tuple_j)
1216 | ```
1217 |
1218 | This has `O(MN)` time complexity, where `M` and `N` are the number of tuples in each table. If there's an index, the 2nd loop is using it for the lookup rather than another table scan, which makes the time complexity `O(Mlog(N))`, but with large sizes this is still quite bad. Here is an example on two tables with one million rows each:
1219 |
1220 | ```sql
1221 | EXPLAIN ANALYZE
1222 | SELECT
1223 | full_name
1224 | FROM
1225 | ref_users_big
1226 | JOIN
1227 | ref_zaps_big
1228 | ON
1229 | ref_users_big.user_id = ref_zaps_big.owned_by\G
1230 | ```
1231 |
1232 | ```sql
1233 | *************************** 1. row ***************************
1234 | EXPLAIN: -> Nested loop inner join (cost=498015.60 rows=993197) (actual time=6.998..360927.896 rows=1000000 loops=1)
1235 | -> Table scan on zaps (cost=100160.95 rows=993197) (actual time=6.685..8804.370 rows=1000000 loops=1)
1236 | -> Single-row index lookup on ref_users using user_id (user_id=zaps.owned_by) (cost=0.30 rows=1) (actual time=0.350..0.350 rows=1 loops=1000000)
1237 |
1238 | 1 row in set (6 min 2.41 sec)
1239 | ```
1240 |
1241 | A better solution is a hash join, specifically a grace hash join, named after the GRACE database created in the 1980s at the University of Tokyo, which pioneered this method.
1242 |
1243 | ```python
1244 | dict_table_1 = {id: row for id, row in table_1}
1245 | dict_table_2 = {id: row for id, row in table_2}
1246 | for tuple_i in dict_table_1.items():
1247 | for tuple_j in dict_table_2.items():
1248 | if join_is_satisfied(tuple_i, tuple_j):
1249 | yield (tuple_i, tuple_j)
1250 | ```
1251 |
1252 | While this looks very similar, there are details I've glossed over about the partioning method (it's recursive), and of course hash lookups are (optimally) `O(1)`, which speeds things up tremendously. The total time complexity for this method is `3(M+N)`.
1253 |
1254 | MySQL [added a hash join in 8.0.18](https://dev.mysql.com/blog-archive/hash-join-in-mysql-8/), but it comes with some limitations; chiefly that a table must fit into memory, and annoyingly, that the optimizer will often decide to use a nested loop if indexes exist. If it can be used, though, compare the difference:
1255 |
1256 | ```sql
1257 | EXPLAIN ANALYZE
1258 | SELECT
1259 | full_name
1260 | FROM
1261 | ref_users
1262 | IGNORE INDEX (user_id)
1263 | JOIN
1264 | zaps
1265 | ON
1266 | ref_users.user_id = zaps.owned_by\G
1267 | ```
1268 |
1269 | ```sql
1270 | *************************** 1. row ***************************
1271 | EXPLAIN: -> Inner hash join (ref_users.user_id = zaps.owned_by) (cost=98991977261.77 rows=993197) (actual time=7814.295..21403.160 rows=1000000 loops=1)
1272 | -> Table scan on ref_users (cost=0.03 rows=996699) (actual time=0.402..9319.650 rows=1000000 loops=1)
1273 | -> Hash
1274 | -> Table scan on zaps (cost=100160.95 rows=993197) (actual time=4.566..6810.026 rows=1000000 loops=1)
1275 |
1276 | 1 row in set (21.93 sec)
1277 | ```
1278 |
1279 | #### HAVING
1280 |
1281 | Earlier, we used `HAVING` in a `GROUP BY` aggregation. The difference between the two is that `WHERE` filters the results before they're sent to be aggregated, whereas `HAVING` filters the aggregation, and thus predicates relying on the aggregation result can be used. It's not limited to only aggregation results, though - a common use case is to allow the use of aliases or subquery results in filtering. Be aware that it's generally more performant to use `WHERE` if possible (consider re-writing your query if it isn't), but sometimes, you need it.
1282 |
1283 | ```sql
1284 | SELECT
1285 | ref_users_big.city,
1286 | COUNT(ref_zaps_big.zap_id) as zap_count
1287 | FROM
1288 | ref_users_big
1289 | LEFT JOIN
1290 | ref_zaps_big
1291 | ON
1292 | ref_users_big.user_id = ref_zaps_big.owned_by
1293 | GROUP BY
1294 | ref_users_big.city
1295 | HAVING
1296 | zap_count > 250;
1297 | ```
1298 |
1299 | ```sql
1300 | +----------+-----------+
1301 | | city | zap_count |
1302 | +----------+-----------+
1303 | | Hsin-chu | 260 |
1304 | | Vitória | 293 |
1305 | | Cordoba | 290 |
1306 | | Gdañsk | 292 |
1307 | +----------+-----------+
1308 | 4 rows in set (32.86 sec)
1309 | ```
1310 |
1311 | ## Query optimization
1312 |
1313 | Finally into the fun stuff!
1314 |
1315 | First, I'll spoil a lot of this - it's likely that you won't have to do much of this. MySQL's optimizer is actually pretty decent. That said, there are times when you will, and knowing what _should_ be happening, and how to compare it to what is actually happening is a useful skill.
1316 |
1317 | ### SELECT *
1318 |
1319 | If you're just exploring a schema, there's nothing wrong with `SELECT * FROM LIMIT 10` or some other small number (< ~1000). It will be nearly instantaneous. However, the problem arises when you're also using `ORDER BY`. Recall that we had a composite index on `(first_name, last_name)` called `full_name`. Compare these two:
1320 |
1321 | ```sql
1322 | EXPLAIN ANALYZE
1323 | SELECT
1324 | *
1325 | FROM
1326 | ref_users_big
1327 | ORDER BY
1328 | first_name,
1329 | last_name\G
1330 | ```
1331 |
1332 | ```sql
1333 | *************************** 1. row ***************************
1334 | EXPLAIN: -> Sort: ref_users.first_name, ref_users.last_name (cost=100495.40 rows=996699) (actual time=12199.513..12603.379 rows=1000000 loops=1)
1335 | -> Table scan on ref_users (cost=100495.40 rows=996699) (actual time=1.755..7039.004 rows=1000000 loops=1)
1336 |
1337 | 1 row in set (13.68 sec)
1338 | ```
1339 |
1340 | ```sql
1341 | EXPLAIN ANALYZE
1342 | SELECT
1343 | user_id,
1344 | first_name,
1345 | last_name
1346 | FROM
1347 | ref_users_big
1348 | ORDER BY
1349 | first_name,
1350 | last_name\G
1351 | ```
1352 |
1353 | ```sql
1354 | *************************** 1. row ***************************
1355 | EXPLAIN: -> Index scan on ref_users using full_name (cost=100495.40 rows=996699) (actual time=0.433..5413.188 rows=1000000 loops=1)
1356 |
1357 | 1 row in set (6.39 sec)
1358 | ```
1359 |
1360 | Since the the table includes columns not covered by the index (`user_id`), it would take longer to use the index and then find columns not in the index than to just do a table scan. Observe:
1361 |
1362 | ```sql
1363 | EXPLAIN ANALYZE
1364 | SELECT
1365 | *
1366 | FROM
1367 | ref_users
1368 | FORCE INDEX(full_name)
1369 | ORDER BY
1370 | first_name,
1371 | last_name\G
1372 | ```
1373 |
1374 | ```sql
1375 | *************************** 1. row ***************************
1376 | EXPLAIN: -> Index scan on ref_users using full_name (cost=348844.90 rows=996699) (actual time=11.273..65858.816 rows=1000000 loops=1)
1377 |
1378 | 1 row in set (1 min 7.13 sec)
1379 | ```
1380 |
1381 | In comparison, if your `ORDER BY` is covered by the index (the primary key - `user_id` here - is implicitly part of indices, and thus doesn't cause a slowdown), queries can use it, and are much faster! If you're writing software that will be accessing a database, and you don't actually need all of the columns, don't request them. Take the time to be deliberate in what you request.
1382 |
1383 | ### OFFSET / LIMIT
1384 |
1385 | If you need to get `n` rows from the middle of a table, unless you have a really good reason to do so, please don't do this:
1386 |
1387 | ```sql
1388 | -- The alternate form (and, IMO, the clearer one) is LIMIT 10 OFFSET 500000
1389 | SELECT
1390 | user_id,
1391 | full_name
1392 | FROM
1393 | ref_users_big
1394 | LIMIT 500000,10;
1395 | ```
1396 |
1397 | ```sql
1398 | +---------+-------------------+
1399 | | user_id | full_name |
1400 | +---------+-------------------+
1401 | | 500001 | Ader, Wilona |
1402 | | 500002 | Lindsley, Angy |
1403 | | 500003 | Scarito, Vladimir |
1404 | | 500004 | Hoenack, Rossy |
1405 | | 500005 | Cooley, Theobald |
1406 | | 500006 | Pineda, Gaven |
1407 | | 500007 | Harberd, Odie |
1408 | | 500008 | Engleman, Mendy |
1409 | | 500009 | Michon, Dionysus |
1410 | | 500010 | Seaden, Leigha |
1411 | +---------+-------------------+
1412 | 10 rows in set (6.29 sec)
1413 | ```
1414 |
1415 | Doing this causes a table scan up to the specified offset. Far better, if you have a known monotonic number (like `id`), is to use a `WHERE` predicate:
1416 |
1417 | ```sql
1418 | SELECT
1419 | user_id,
1420 | full_name
1421 | FROM
1422 | ref_users_big
1423 | WHERE user_id > 500000
1424 | LIMIT 10;
1425 | ```
1426 |
1427 | ```sql
1428 | +---------+-------------------+
1429 | | user_id | full_name |
1430 | +---------+-------------------+
1431 | | 500001 | Ader, Wilona |
1432 | | 500002 | Lindsley, Angy |
1433 | | 500003 | Scarito, Vladimir |
1434 | | 500004 | Hoenack, Rossy |
1435 | | 500005 | Cooley, Theobald |
1436 | | 500006 | Pineda, Gaven |
1437 | | 500007 | Harberd, Odie |
1438 | | 500008 | Engleman, Mendy |
1439 | | 500009 | Michon, Dionysus |
1440 | | 500010 | Seaden, Leigha |
1441 | +---------+-------------------+
1442 | 10 rows in set (0.02 sec)
1443 | ```
1444 |
1445 | Using `user_id` as the filter allows it to be used for an index range scan, which is nearly instant. If you were doing this programmatically to support pagination, the last value of `id` could be used for the next iteration's predicate.
1446 |
1447 | ### DISTINCT
1448 |
1449 | `DISTINCT` is a very useful keyword for many operations when you want to not show duplicates. Unfortunately, it also adds a fairly hefty load to the database. That's not to say you _can't_ use it, but when writing code that will end up using this, ask yourself if you could intead handle de-duplication in the application. This also comes with tradeoffs, of course - you're now pulling more data over the network, and increasing load on the application. Generally speaking, databases are bound first by disk and memory, rather than CPU or network, so using compression (increased CPU load) and/or sending more data (not using `DISTINCT`) tends to increase overall performance, but you should experiment and profile your code.
1450 |
1451 | This also tends to be something that works well early on with little load, but as either the database or application grows, it becomes unwieldy.
1452 |
1453 | ```sql
1454 | EXPLAIN ANALYZE
1455 | SELECT
1456 | first_name,
1457 | last_name
1458 | FROM
1459 | ref_users_big\G
1460 | ```
1461 |
1462 | ```sql
1463 | *************************** 1. row ***************************
1464 | EXPLAIN: -> Table scan on ref_users_big (cost=101365.53 rows=995522) (actual time=1.815..7213.716 rows=1000000 loops=1)
1465 |
1466 | 1 row in set (8.13 sec)
1467 | ```
1468 |
1469 | ```sql
1470 | EXPLAIN ANALYZE
1471 | SELECT DISTINCT
1472 | first_name,
1473 | last_name
1474 | FROM
1475 | ref_users_big\G
1476 | ```
1477 |
1478 | ```sql
1479 | EXPLAIN: -> Table scan on (actual time=0.005..765.220 rows=995670 loops=1)
1480 | -> Temporary table with deduplication (cost=101050.45 rows=995522) (actual time=15306.678..16296.289 rows=995670 loops=1)
1481 | -> Table scan on ref_users_big (cost=101050.45 rows=995522) (actual time=0.825..8718.651 rows=1000000 loops=1)
1482 |
1483 | 1 row in set (17.73 sec)
1484 | ```
1485 | ## Cleanup
1486 |
1487 | This isn't something you'll do often, if at all, so may as well do so now, eh?
1488 |
1489 | ```sql
1490 | DROP SCHEMA foo;
1491 | ```
1492 |
1493 | ```sql
1494 | Query OK, 0 rows affected (0.05 sec)
1495 | ```
1496 |
--------------------------------------------------------------------------------
/mysql/mysql-102.md:
--------------------------------------------------------------------------------
1 | # MySQL 102 - WIP
2 |
3 | ### WITH (Common Table Expressions)
4 |
5 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/with.html)
6 |
7 | `WITH` can be used to create a temporary named result set, scoped to the statement in which it exists. They can also be recursive. A demonstration that's probably not useful in reality follows, but it does demonstrate how MySQL can be made to use indexes, even when it normally couldn't. Here, we're trying to select a random row from a large table. The row ID is selected with a sub-query that multiplies the output of `RAND()` (a float between 0-1) by the last `id` row in the table.
8 |
9 | ```sql
10 | mysql>
11 | EXPLAIN ANALYZE SELECT
12 | *
13 | FROM
14 | ref_users
15 | WHERE
16 | id = (
17 | SELECT
18 | FLOOR(
19 | (
20 | SELECT
21 | RAND() * (
22 | SELECT
23 | id
24 | FROM
25 | ref_users
26 | ORDER BY
27 | id DESC
28 | LIMIT
29 | 1
30 | )
31 | )
32 | )
33 | );
34 | *************************** 1. row ***************************
35 | EXPLAIN: -> Filter: (ref_users.id = floor((rand() * (select #4)))) (cost=10799.04 rows=99735) (actual time=1545.462..8220.073 rows=3 loops=1)
36 | -> Table scan on ref_users (cost=10799.04 rows=997354) (actual time=0.441..6723.994 rows=1000000 loops=1)
37 | -> Select #4 (subquery in condition; run only once)
38 | -> Limit: 1 row(s) (cost=0.00 rows=1) (actual time=0.079..0.079 rows=1 loops=1)
39 | -> Index scan on ref_users using PRIMARY (reverse) (cost=0.00 rows=1) (actual time=0.077..0.077 rows=1 loops=1)
40 |
41 | 1 row in set, 2 warnings (8.22 sec)
42 | ```
43 |
44 | Since `RAND()` is evaluated for every row [when used with WHERE](https://dev.mysql.com/doc/refman/8.0/en/mathematical-functions.html#function_rand), it's not constant, and thus can't be used with indices. Also, you may wind up with more than one result!
45 |
46 | If instead the `RAND()` call is placed into a CTE, it can be optimized:
47 |
48 | ```sql
49 | mysql>
50 | EXPLAIN ANALYZE
51 | WITH rand AS (
52 | SELECT
53 | FLOOR(
54 | (
55 | SELECT
56 | RAND() * (
57 | SELECT
58 | id
59 | FROM
60 | ref_users
61 | ORDER BY
62 | id DESC
63 | LIMIT
64 | 1
65 | )
66 | )
67 | )
68 | )
69 | SELECT
70 | *
71 | FROM
72 | ref_users
73 | WHERE
74 | id IN (TABLE rand);
75 | *************************** 1. row ***************************
76 | EXPLAIN: -> Nested loop inner join (cost=0.55 rows=1) (actual time=0.569..0.583 rows=1 loops=1)
77 | -> Filter: (``.`FLOOR((SELECT RAND() * (SELECT id FROM ref_users ORDER BY id DESC LIMIT 1)))` is not null) (cost=0.20 rows=1) (actual time=0.085..0.095 rows=1 loops=1)
78 | -> Table scan on (cost=0.20 rows=1) (actual time=0.005..0.012 rows=1 loops=1)
79 | -> Materialize with deduplication (cost=0.00 rows=1) (actual time=0.082..0.090 rows=1 loops=1)
80 | -> Filter: (rand.`FLOOR((SELECT RAND() * (SELECT id FROM ref_users ORDER BY id DESC LIMIT 1)))` is not null) (cost=0.00 rows=1) (actual time=0.017..0.023 rows=1 loops=1)
81 | -> Table scan on rand (cost=2.61 rows=1) (actual time=0.010..0.014 rows=1 loops=1)
82 | -> Materialize CTE rand (cost=0.00 rows=1) (actual time=0.013..0.018 rows=1 loops=1)
83 | -> Rows fetched before execution (cost=0.00 rows=1) (never executed)
84 | -> Select #5 (subquery in projection; run only once)
85 | -> Limit: 1 row(s) (cost=0.00 rows=1) (actual time=0.313..0.314 rows=1 loops=1)
86 | -> Index scan on ref_users using PRIMARY (reverse) (cost=0.00 rows=1) (actual time=0.310..0.310 rows=1 loops=1)
87 | -> Filter: (ref_users.id = ``.`FLOOR((SELECT RAND() * (SELECT id FROM ref_users ORDER BY id DESC LIMIT 1)))`) (cost=0.35 rows=1) (actual time=0.477..0.479 rows=1 loops=1)
88 | -> Single-row index lookup on ref_users using PRIMARY (id=``.`FLOOR((SELECT RAND() * (SELECT id FROM ref_users ORDER BY id DESC LIMIT 1)))`) (cost=0.35 rows=1) (actual time=0.468..0.469 rows=1 loops=1)
89 |
90 | 1 row in set, 1 warning (0.00 sec)
91 | ```
92 |
93 | ## Stored Procedures
94 |
95 | [MySQL docs.](https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html)
96 |
97 | Stored Procedures (and Stored Functions) are a way to write SQL as functions, to be called as needed. Most normal SQL queries are accepted, as well as conditionals, loops, and the ability to accept arguments and return values. The main difference between the two is that Stored Procedures may accept arguments and write out data to variables, whereas Stored Functions may accept arguments, and return a value.
98 |
99 | Their main advantage is that known, tested queries can be stored and later called from an application. Their main disadvantage is that they require people with reasonably good SQL skills to write them, else it's unlikely they'll exceed the performance of an ORM like Django.
100 |
101 | As an example, I used this to fill `zaps` with data (NOTE: this is not an example of a well-designed stored procedure, merely one that demonstrates a variety of concepts):
102 |
103 | ```sql
104 | DELIMITER // -- This is needed so that the individual commands don't end the stored procedure
105 | CREATE PROCEDURE insert_zaps(IN num_rows int, IN pct_shared float) -- Two input args are needed
106 | BEGIN
107 | DECLARE loop_count bigint; -- Variables are initialized with a type
108 | DECLARE len_table bigint;
109 | DECLARE rand_base float;
110 | DECLARE rand_offset float;
111 | DECLARE rand_ts timestamp;
112 | DECLARE rand_user bigint;
113 | DECLARE shared_with_user bigint;
114 | SELECT id INTO len_table FROM test.ref_users ORDER BY id DESC LIMIT 1; -- SELECT INTO can be used
115 | SET loop_count = 1; -- Or, if the value is simple, simply assigned
116 | WHILE loop_count <= num_rows DO
117 | SET rand_base = RAND();
118 | SET rand_offset = RAND();
119 | SET rand_ts = TIMESTAMP(
120 | FROM_UNIXTIME(
121 | UNIX_TIMESTAMP(NOW()) - FLOOR(
122 | 0 + (
123 | RAND() * 86400 * 365 * 10
124 | )
125 | )
126 | )
127 | ); -- This creates a random timestamp between now and 10 years ago
128 | WITH rand AS (
129 | SELECT
130 | FLOOR(
131 | (
132 | SELECT
133 | rand_base * len_table
134 | )
135 | )
136 | )
137 | SELECT
138 | id
139 | INTO rand_user
140 | FROM
141 | test.ref_users
142 | WHERE
143 | id IN (TABLE rand); -- This is the CTE demonstrated earlier to determine the table length
144 | INSERT INTO zaps (zap_id, created_at, owned_by) VALUES (loop_count, rand_ts, rand_user);
145 | IF ROUND(rand_base, 1) > (1 - pct_shared) THEN -- Roughly determine the amount of shared Zaps
146 | SELECT CAST(FLOOR(rand_base * rand_offset * len_table) AS unsigned) INTO shared_with_user;
147 | UPDATE
148 | zaps
149 | SET
150 | shared_with = JSON_ARRAY_APPEND(
151 | shared_with,
152 | '$',
153 | shared_with_user
154 | ) -- JSON_ARRAY_APPEND(array, key, value)
155 | WHERE
156 | id = loop_count;
157 | END IF;
158 | SET loop_count = loop_count + 1;
159 | END WHILE;
160 | END //
161 | DELIMITER ;
162 | ```
163 |
--------------------------------------------------------------------------------
/terraform/tf-101.md:
--------------------------------------------------------------------------------
1 | # Introduction
2 |
3 | ## What is Terraform?
4 |
5 | It's an Infrastructure-As-Code tool. It allows for declaratively creating infrastructure on cloud providers, in colos, and even your homelab. There are extensions for practically everything you can think of; if you're missing one (and you know Golang), you can write it.
6 |
7 | ## What is declarative?
8 |
9 | Computer languages generally fall into one of two types - imperative, and declarative. Most are imperative, which means that you explicitly tell the language what to do. With a declarative language, you describe what you want, and it figures out how to get there. That makes it sound fancier and easier than it is; in reality, you have to describe in very specific terms what it is you want.
10 |
11 | Terraform is mostly declarative, with some recent nods to imperative programming such as for loops - prior to version 0.12, you had to define the `count` of a resource you wanted instantiated, and it would make `n` copies of it.
12 |
13 | # Terraform Basics
14 |
15 | ## Resources vs. Modules
16 |
17 | Broadly speaking, resources specifically instantiate a named resource, like an EC2 instance, or a DNS record, whereas a module generically defines those things - usually with default values assigned - and you can later call them, saving typing. You _can_ define your entire infrastructure solely with resources, but you'll be missing out on a huge advantage of Terraform.
18 |
19 | The (redacted) example module creates a Redis instance, a Postgres instance, security groups for both, and a Cloudwatch metric for Redis. Going further, looking at its `variables.tf` file, we see that there are quite a few options - the type and size of storage for the DB, the version of both Redis and Postgres, encryption and snapshot options, and more. For more information on what is required to be passed to the resource, you can consult Terraform's documentation - here is the [Elasticache (Redis) page.](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/elasticache_replication_group)
20 |
21 | ## Variables
22 |
23 | Anything with a `default` value will pass this into the Terraform module when it's called if it isn't overridden. For example, `aws_db_instance.rds.storage_type` has as its value `var.db_storage_type`, which instructs Terraform to look at the variable `db_storage_type` - it's set to `gp2`, so we don't have to change it. `db_allocated_storage` has as a default `100` (in GiB, which is helpfully displayed as the variable's description), but it's overridden in the calling module to `20`.
24 |
25 | You may have noticed that some of the variables defined have `type` set. Terraform is dynamically typed, but much like Python with mypy, allows for static typing if desired.
26 |
27 | Locals, seen at the top of `main.tf`, are just that - local variables to that file. They can be anywhere, but historically are placed at the top of the file. They're generally used as seen here, to write out what would otherwise be bulky code with ternaries into something cleaner for later use. They're referenced with `local.varname` instead of `var.varname`.
28 |
29 | Terraform underwent a large syntax change between v0.11 and v0.12. In 0.11, all variables were encased in the `"${var.foo}"` syntax you may see scattered around. That has been simplified to `var.foo` or `local.foo`, whichever is correct for the variable. The exception is for string interpolation - using variables along with plaintext (or concatenating strings without the use of the `join` function) requires all variables to be wrapped in `"${}"`. People comfortable with Bash programming will feel at home here.
30 |
31 | Modules may also include a `terraform.tfvars` file, which has a `key=value` mapping for variable assignment. These are often used to have production and staging versions of infrastructure.
32 |
33 | Variable definition precedence takes the following order, from first to last, with the latest definition standing: env vars --> `terraform.tfvars` --> `-var $foo` on the command line. In general, you'll want to mimic what you see in use in the repository.
34 |
35 | ## Functions
36 |
37 | Terraform includes many built-in functions. One of them seen here is `flatten`. [Here is Terraform's](https://www.terraform.io/language/functions/flatten) documentation on the function, but you may be able to guess that it's flattening lists or lists of lists into a single list. Read through the documentation to get an idea of the rest of them.
38 |
39 | ## Plans and Applies
40 |
41 | This is the main draw of Terraform. When you run `terraform plan`, it looks at the existing infrastructure, compares it to its statefile, and generates a human-readable diff. It also includes things that have changed outside of its scope (for example, if someone manually creates a database using the AWS console), and at the bottom, a summary saying how many entities will be created, changed, and destroyed. You can then save this plan and apply it later - this is what Atlantis does. Additionally, during this time the statefile is locked, so no other changes can be made. This ensures that your expected output is applied with no surprises due to someone else making a change at the same time.
42 |
43 | To destroy infrastructure, in general you'll delete the resource/module from the code, and then run a plan. Terraform will detect that it exists in infrastructure but not in code, and generate a plan to destroy it which you can apply. In practice, some resources have protection enabled that prevents destroys. To destroy them, you either have to do two plan/apply cycles (one to remove the deletion protection, and another to destroy the resource), or manually delete it from the AWS console or command line, and then run the plan/apply. You can see this in `main.tf` on L144, with an explanation comment block above it.
44 |
45 | Targeted applies (where you specifically instruct Terraform to only affect a specific resource) also exist, but these are rarely needed and shouldn't be relied upon. Similarly, you can import pre-existing resources into the statefile, although the syntax can be a bit confusing, and there are also occasional bizarre gotchas such as needing a region to be hard-coded in the infrastructure code.
--------------------------------------------------------------------------------