├── .github └── workflows │ └── lint.yml ├── .gitignore ├── .lint.yml ├── LICENSE ├── README.md ├── argo ├── README.md └── assets │ ├── argo-workflow-mertic.png │ ├── argo-workflow-trigger.png │ ├── overview.jpeg │ └── swagger-snapshot.png ├── assets └── images │ └── iTerm2-background.jpg ├── devops ├── 01.hello-vueproject.md └── cicd-pipeline.md ├── docker ├── README.md ├── assets │ ├── docker storage volume │ ├── dockerfile.cmd │ ├── dockerfile.extrypoint │ ├── dockerfile.run │ └── images │ │ ├── build-high-level-arch.png │ │ ├── docker-architecture.png │ │ ├── dockerfile01.png │ │ └── dockerfile02.png ├── codes │ └── execsignal │ │ ├── Dockerfile.v1 │ │ ├── Dockerfile.v2 │ │ ├── README.md │ │ ├── deploy │ │ ├── dockerv1-deployment.yml │ │ └── dockerv2-deployment.yml │ │ ├── go.mod │ │ ├── go.sum │ │ └── main.go ├── dockerfile-guide.md └── multi-arch-dockerfile.md ├── gitlabci └── README.md ├── golang-tips.md ├── jenkins ├── README.md ├── jenkins-pipeline-enhance.md ├── topic001 │ └── README.md ├── topic002 │ ├── README.md │ ├── assets │ │ ├── k8s-auth.png │ │ └── k8s-cloud-setup.png │ └── deploy │ │ ├── README.md │ │ ├── incluster │ │ ├── deployment.yml │ │ ├── rabc.yml │ │ ├── svc-clusterip │ │ │ ├── ingress.yml │ │ │ └── service.yml │ │ └── svc-nodeport │ │ │ └── nodeport.yml │ │ └── outcluster │ │ └── rbac.yml └── topic003 │ ├── README.md │ ├── assets │ ├── pipeline.png │ ├── realworld-pipeline-flow.png │ └── use-pipeline-through-blueoccean.png │ ├── demo │ ├── 01-hello.Jenkinsfile │ ├── 02-ParamsAndEnv.Jenkinsfile │ └── 03-Parallel.Jenkinsfile │ └── start.sh ├── kubernetes ├── README.md ├── ascend │ ├── README.md │ └── images │ │ ├── avi-001.png │ │ ├── avi-002.png │ │ ├── avi-003.png │ │ ├── cann-01.png │ │ ├── cann-02.png │ │ ├── cann-03.png │ │ ├── mindxdl-01.png │ │ └── mindxdl-02.png ├── concepts │ ├── admission-webhook.md │ └── service.md ├── images │ ├── Kubernetes-Admission-controllers-00-featured.png │ └── Kubernetes-Admission-controllers-01-flow-diagram.jpeg ├── mysql │ └── README.md ├── operator │ ├── README.md │ └── operator-sample.md └── static-pod.md ├── logger └── README.md ├── middleware └── kafka │ └── README.md └── os └── README.md /.github/workflows/lint.yml: -------------------------------------------------------------------------------- 1 | name: MarkdownLint 2 | 3 | on: 4 | push: 5 | branches: 6 | - master 7 | pull_request: 8 | branches: 9 | - master 10 | 11 | jobs: 12 | linting: 13 | name: "Markdown linting" 14 | runs-on: ubuntu-latest 15 | 16 | steps: 17 | - uses: actions/checkout@v2 18 | name: Check out the code 19 | - name: Lint Code Base 20 | uses: docker://avtodev/markdown-lint:v1 21 | with: 22 | args: "**/*.md" 23 | config: '.lint.yml' 24 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | **/.vscode 2 | assets/images/*.png 3 | 4 | *.psd 5 | *.mov 6 | -------------------------------------------------------------------------------- /.lint.yml: -------------------------------------------------------------------------------- 1 | default: false # includes/excludes all rules by default 2 | 3 | # Heading levels should only increment by one level at a time 4 | MD001: true 5 | 6 | # Heading style 7 | MD003: true 8 | 9 | # Unordered list style 10 | MD004: true 11 | 12 | # Inconsistent indentation for list items at the same level 13 | MD005: true 14 | 15 | # Consider starting bulleted lists at the beginning of the line 16 | MD006: true 17 | 18 | # Unordered list indentation 19 | MD007: true 20 | 21 | # Trailing spaces 22 | MD009: true 23 | 24 | # Hard tabs 25 | MD010: true 26 | 27 | # Reversed link syntax 28 | MD011: true 29 | 30 | # Multiple consecutive blank lines 31 | MD012: true 32 | 33 | # Line length 34 | MD013: false 35 | 36 | # Dollar signs used before commands without showing output 37 | MD014: false 38 | 39 | # No space after hash on atx style heading 40 | MD018: true 41 | 42 | # Multiple spaces after hash on atx style heading 43 | MD019: true 44 | 45 | # No space inside hashes on closed atx style heading 46 | MD020: true 47 | 48 | # Multiple spaces inside hashes on closed atx style heading 49 | MD021: true 50 | 51 | # Headings should be surrounded by blank lines 52 | MD022: true 53 | 54 | # Headings must start at the beginning of the line 55 | MD023: true 56 | 57 | # Multiple headings with the same content 58 | MD024: 59 | allow_different_nesting: true 60 | 61 | # Multiple top level headings in the same document 62 | MD025: true 63 | 64 | # Trailing punctuation in heading 65 | MD026: true 66 | 67 | # Multiple spaces after blockquote symbol 68 | MD027: true 69 | 70 | # Blank line inside blockquote 71 | MD028: false 72 | 73 | # Ordered list item prefix 74 | MD029: false 75 | 76 | # Spaces after list markers 77 | MD030: true 78 | 79 | # Fenced code blocks should be surrounded by blank lines 80 | MD031: true 81 | 82 | # Lists should be surrounded by blank lines 83 | MD032: true 84 | 85 | # Inline HTML 86 | MD033: true 87 | 88 | # Bare URL used 89 | MD034: true 90 | 91 | # Horizontal rule style 92 | MD035: 93 | style: '---' 94 | 95 | # Emphasis used instead of a heading 96 | MD036: true 97 | 98 | # Spaces inside emphasis markers 99 | MD037: true 100 | 101 | # Spaces inside code span elements 102 | MD038: true 103 | 104 | # Spaces inside link text 105 | MD039: true 106 | 107 | # Fenced code blocks should have a language specified 108 | MD040: true 109 | 110 | # First line in file should be a top level heading 111 | MD041: true 112 | 113 | # No empty links 114 | MD042: true 115 | 116 | # Required heading structure 117 | MD043: false 118 | 119 | # Proper names should have the correct capitalization 120 | MD044: false 121 | 122 | # Images should have alternate text (alt text) 123 | MD045: false 124 | 125 | # Code block style 126 | MD046: 127 | style: 'fenced' 128 | 129 | # Files should end with a single newline character 130 | MD047: true 131 | 132 | # Code fence style 133 | MD048: 134 | style: 'backtick' 135 | 136 | # Custom rules: 137 | CHANGELOG-RULE-001: true 138 | CHANGELOG-RULE-002: true 139 | CHANGELOG-RULE-003: true 140 | CHANGELOG-RULE-004: true 141 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Kubernetes-Best-Pratice 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 目录 2 | 3 | ![Wechat](https://img.shields.io/badge/-colynnliu-%2307C160?style=flat&logo=Wechat&logoColor=white) 4 | [![Twitter](https://img.shields.io/badge/-Twitter-%231DA1F2?style=flat&logo=Twitter&logoColor=white)](https://twitter.com/colynnliu) 5 | 6 | ## Jenkins 持续集成实战 7 | 8 | * topic001: 如何定制镜像 [bilibili > 视频链接](https://www.bilibili.com/video/BV1zt4y1a7F1/) 9 | * topic002: Jenkins + Kubernetes CI/CD 解决方案实战 10 | * jenkins kubernetes ci/cd避免踩坑 - 实践注意项 - 第2篇 [bilibili > 视频链接](https://www.bilibili.com/video/BV1A5411V7zm/) 11 | * jenkins kubernetes ci/cd避免踩坑 - 实践注意项 - 第3篇 [bilibili > 视频链接](https://www.bilibili.com/video/BV1G5411V7mU/) 12 | * topic003: 如何写好 Jenkinsfile/Pipeline [bilibili > 视频链接](https://www.bilibili.com/video/BV1ph411W7Ek/) 13 | * topic004: 基于[workflow](https://github.com/go-atomci/workflow) 实现 Jenkins自定义Pipeline [bilibili > 视频链接](https://www.bilibili.com/video/BV1zb4y127EQ) 14 | * topic005: 分享基于云原生理念的 cicd平台-atomci 如何安装部署 [bilibili > 视频链接](https://www.bilibili.com/video/BV1qq4y1N7mZ/) 15 | * topic006: AtomCI 云原生的devops平台真得这么香吗 [bilibili > 视频链接](https://www.bilibili.com/video/BV1K3411m78Q/) 16 | * topic007: AtomCI 云原生的devops平台 - 5分钟全流程体验 [bilibili > 视频链接](https://www.bilibili.com/video/BV18F411a7Rk/) 17 | 18 | ## Golang Tips 19 | 20 | ## Dockerfile 21 | 22 | * 专题1: 如何写好dockerfile- 什么是docker [bilibili > 视频链接](https://www.bilibili.com/video/BV1sq4y117E8/) 23 | * 专题2: 如何写好dockerfile- 什么是dockerfile [bilibili > 视频链接](https://www.bilibili.com/video/BV1ri4y1X7WU/) 24 | * 专题3: 如何写好dockerfile- dockerfile结构、语法 [bilibili > 视频链接](https://www.bilibili.com/video/BV1UY411a7tK/) 25 | * 专题4: 如何写好dockerfile- dockerfile常用命令FROM/RUN [bilibili > 视频链接](https://www.bilibili.com/video/BV1wL411c7gn/) 26 | * 专题5:如何写好dockerfile- dockerfile常用命令WORKDIR/COPY/ADD [bilibili > 视频链接](https://www.bilibili.com/video/BV1PY411b7xC/) 27 | -------------------------------------------------------------------------------- /argo/README.md: -------------------------------------------------------------------------------- 1 | # argo workflow 2 | 3 | ## Argo Workflow Overview 4 | 5 | ![image](./assets/overview.jpeg) 6 | 7 | ## Argo Core Concepts 8 | 9 | * **Workflow**: a Kubernetes resource defining the execution of one or more **template**. Workflows are named. 10 | 11 | * **Template**: a **step**, **steps** or **dag**. 12 | 13 | * **Step**: a single step of a **workflow**, typically run a container based on **inputs** and capture the **outputs**. 14 | 15 | * **Inputs**: **parameters** and **artifacts** passed to the **step**, 16 | * **Outputs**: **parameters** and **artifacts** outputted by a **step** 17 | 18 | * **Parameters**: objects, strings, booleans, arrays 19 | 20 | * **Artifacts**: files saved by a container 21 | * **Artifact Repository**: a place where **artifacts** are stored 22 | 23 | * **Executor**: the method to execute a container, e.g. Docker, PNS ([learn more](https://argoproj.github.io/argo-workflows/workflow-executors/)) 24 | 25 | ## Multiple Events Sources/ Trigger 26 | 27 | ![IMage](./assets/argo-workflow-trigger.png) 28 | 29 | ## Workflow Notifications 30 | 31 | There are a number of use cases where you may wish to notify an external system when a workflow completes: 32 | 33 | 1. Send an email. 34 | 2. Send a Slack (or other instant message). 35 | 3. Send a message to Kafka (or other message bus). 36 | 37 | You have options: 38 | 39 | 1. For individual workflows, can add an exit handler to your workflow, [for example](https://raw.githubusercontent.com/argoproj/argo-workflows/master/examples/exit-handlers.yaml). 40 | 1. If you want the same for every workflow, you can add an exit handler to [the default workflow spec](default-workflow-specs.md). 41 | 1. Use a service (e.g. [Heptio Labs EventRouter](https://github.com/heptiolabs/eventrouter)) to the [Workflow events](workflow-events.md) we emit. 42 | 43 | ## Argo OpenAPI 44 | 45 | ![image](./assets/swagger-snapshot.png) 46 | 47 | 详情[argo openapi-spec](https://github.com/argoproj/argo-workflows/blob/master/api/openapi-spec/swagger.json) 48 | 49 | ## Argo Mertics 50 | 51 | ![image](assets/argo-workflow-mertic.png) 52 | -------------------------------------------------------------------------------- /argo/assets/argo-workflow-mertic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/argo/assets/argo-workflow-mertic.png -------------------------------------------------------------------------------- /argo/assets/argo-workflow-trigger.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/argo/assets/argo-workflow-trigger.png -------------------------------------------------------------------------------- /argo/assets/overview.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/argo/assets/overview.jpeg -------------------------------------------------------------------------------- /argo/assets/swagger-snapshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/argo/assets/swagger-snapshot.png -------------------------------------------------------------------------------- /assets/images/iTerm2-background.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/assets/images/iTerm2-background.jpg -------------------------------------------------------------------------------- /devops/01.hello-vueproject.md: -------------------------------------------------------------------------------- 1 | # How to create Vue Project 2 | 3 | ## 前置条件 4 | 5 | * `npm`or`yarn` installed 6 | 7 | ## Env 8 | 9 | * Ubuntu 16.04 10 | 11 | ## Install vue cli 12 | 13 | ```sh 14 | yarn global add @vue/cli 15 | ``` 16 | 17 | __注__: 安装的`vue`在 `/usr/local/bin/`下, 确认下普通用户是否有相关链接的权限,没有的话,添加下即可正常使用`vue` 18 | 19 | ## Create new project 20 | 21 | ```sh 22 | vue create nodejs-app-demo 23 | ``` 24 | -------------------------------------------------------------------------------- /devops/cicd-pipeline.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/devops/cicd-pipeline.md -------------------------------------------------------------------------------- /docker/README.md: -------------------------------------------------------------------------------- 1 | 2 | # Docker Docs 3 | 4 | ## outline 5 | 6 | * [x] [dockerfile 使用介绍](https://github.com/warm-native/docs/blob/master/docker/dockerfile-guide.md) 7 | * [ ] [dockerfile multi-arch] 8 | -------------------------------------------------------------------------------- /docker/assets/docker storage volume: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | -------------------------------------------------------------------------------- /docker/assets/dockerfile.cmd: -------------------------------------------------------------------------------- 1 | FROM colynn/ops-debug 2 | 3 | ARG HostFile=/etc/hosts 4 | 5 | # method1 6 | CMD tail -f $HostFile 7 | 8 | # method2 9 | CMD ["tail", "-f","$HostFile"] 10 | 11 | # method3 12 | CMD ["sh", "-c", "tail -f $HostFile"] 13 | -------------------------------------------------------------------------------- /docker/assets/dockerfile.extrypoint: -------------------------------------------------------------------------------- 1 | FROM colynn/maven:3.6.0-jdk-8-alpine-sh-settings as build-stage 2 | WORKDIR /app 3 | COPY . . 4 | RUN mvn clean install -U >/dev/null 5 | 6 | FROM colynn/alpine-oraclejdk8:slim as production-stage 7 | COPY --from=build-stage /app/target/api.jar /app/api.jar 8 | WORKDIR /app 9 | ENV JAVA_OPTS="-XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=256m -Xms1536m -Xmx1536m -Xmn768m -Xss256k" 10 | ENTRYPOINT java ${JAVA_OPTS} -Djava.security.egd=file:/dev/./urandom -jar api.jar 11 | -------------------------------------------------------------------------------- /docker/assets/dockerfile.run: -------------------------------------------------------------------------------- 1 | FROM colynn/ops-debug 2 | 3 | ARG WORKDIR=/app 4 | 5 | # method1 6 | RUN mkdir $WORKDIR 7 | 8 | # method2 9 | RUN ["mkdir", "$WORKDIR"] 10 | 11 | # method3 12 | RUN ["sh", "-c", "mkdir $WORKDIR"] -------------------------------------------------------------------------------- /docker/assets/images/build-high-level-arch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/docker/assets/images/build-high-level-arch.png -------------------------------------------------------------------------------- /docker/assets/images/docker-architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/docker/assets/images/docker-architecture.png -------------------------------------------------------------------------------- /docker/assets/images/dockerfile01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/docker/assets/images/dockerfile01.png -------------------------------------------------------------------------------- /docker/assets/images/dockerfile02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/docker/assets/images/dockerfile02.png -------------------------------------------------------------------------------- /docker/codes/execsignal/Dockerfile.v1: -------------------------------------------------------------------------------- 1 | FROM golang:1.15-buster AS build-env 2 | 3 | ADD . /go/src/execsignal 4 | 5 | WORKDIR /go/src/execsignal 6 | 7 | RUN export GOPROXY=https://goproxy.io CGO_ENABLED=0 && go build -o execsignal main.go && ls 8 | 9 | # FROM alpine:3.9 10 | FROM ubuntu:18.04 11 | 12 | LABEL maintainer="Colynn Liu " 13 | 14 | WORKDIR /app 15 | 16 | COPY --from=build-env /go/src/execsignal/execsignal /app 17 | 18 | ENV PATH $PATH:/app 19 | 20 | EXPOSE 8080 21 | # ENTRYPOINT ./execsignal 22 | CMD ./execsignal 23 | -------------------------------------------------------------------------------- /docker/codes/execsignal/Dockerfile.v2: -------------------------------------------------------------------------------- 1 | FROM golang:1.15-buster AS build-env 2 | 3 | ADD . /go/src/execsignal 4 | 5 | WORKDIR /go/src/execsignal 6 | 7 | RUN export GOPROXY=https://goproxy.io CGO_ENABLED=0 && go build -o execsignal main.go 8 | 9 | # FROM alpine:3.9 10 | FROM ubuntu:18.04 11 | 12 | LABEL maintainer="Colynn Liu " 13 | 14 | WORKDIR /app 15 | 16 | COPY --from=build-env /go/src/execsignal/execsignal /app 17 | 18 | ENV PATH $PATH:/app 19 | 20 | EXPOSE 8080 21 | 22 | # ENTRYPOINT exec ./execsignal 23 | CMD exec ./execsignal 24 | -------------------------------------------------------------------------------- /docker/codes/execsignal/README.md: -------------------------------------------------------------------------------- 1 | # execsignal 2 | 3 | ```sh 4 | # it can not catch exit signal 5 | docker build -f Dockerfile.v1 . -t colynn/signal:dockerv1 6 | ``` 7 | 8 | ```sh 9 | # it can catch exit signal 10 | docker build -f Dockerfile.v2 . -t colynn/signal:dockerv2 11 | ``` 12 | -------------------------------------------------------------------------------- /docker/codes/execsignal/deploy/dockerv1-deployment.yml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: dockerv1-deployment 5 | labels: 6 | app: dockerv1 7 | spec: 8 | replicas: 1 9 | selector: 10 | matchLabels: 11 | app: dockerv1 12 | template: 13 | metadata: 14 | labels: 15 | app: dockerv1 16 | spec: 17 | containers: 18 | - name: dockerv1 19 | image: colynn/signal:dockerv1 20 | ports: 21 | - containerPort: 8080 22 | command: 23 | - /bin/sh 24 | - -c 25 | - /app/execsignal 26 | -------------------------------------------------------------------------------- /docker/codes/execsignal/deploy/dockerv2-deployment.yml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: dockerv2-deployment 5 | labels: 6 | app: dockerv2 7 | spec: 8 | replicas: 1 9 | selector: 10 | matchLabels: 11 | app: dockerv2 12 | template: 13 | metadata: 14 | labels: 15 | app: dockerv2 16 | spec: 17 | containers: 18 | - name: dockerv2 19 | image: colynn/signal:dockerv2 20 | ports: 21 | - containerPort: 8080 22 | -------------------------------------------------------------------------------- /docker/codes/execsignal/go.mod: -------------------------------------------------------------------------------- 1 | module github.com/warm-native/docs/docker/codes/execsignal 2 | 3 | go 1.15 4 | 5 | require github.com/gin-gonic/gin v1.8.1 6 | -------------------------------------------------------------------------------- /docker/codes/execsignal/go.sum: -------------------------------------------------------------------------------- 1 | github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E= 2 | github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 3 | github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 4 | github.com/gin-contrib/sse v0.1.0 h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE= 5 | github.com/gin-contrib/sse v0.1.0/go.mod h1:RHrZQHXnP2xjPF+u1gW/2HnVO7nvIa9PG3Gm+fLHvGI= 6 | github.com/gin-gonic/gin v1.8.1 h1:4+fr/el88TOO3ewCmQr8cx/CtZ/umlIRIs5M4NTNjf8= 7 | github.com/gin-gonic/gin v1.8.1/go.mod h1:ji8BvRH1azfM+SYow9zQ6SZMvR8qOMZHmsCuWR9tTTk= 8 | github.com/go-playground/assert/v2 v2.0.1/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4= 9 | github.com/go-playground/locales v0.14.0 h1:u50s323jtVGugKlcYeyzC0etD1HifMjqmJqb8WugfUU= 10 | github.com/go-playground/locales v0.14.0/go.mod h1:sawfccIbzZTqEDETgFXqTho0QybSa7l++s0DH+LDiLs= 11 | github.com/go-playground/universal-translator v0.18.0 h1:82dyy6p4OuJq4/CByFNOn/jYrnRPArHwAcmLoJZxyho= 12 | github.com/go-playground/universal-translator v0.18.0/go.mod h1:UvRDBj+xPUEGrFYl+lu/H90nyDXpg0fqeB/AQUGNTVA= 13 | github.com/go-playground/validator/v10 v10.10.0 h1:I7mrTYv78z8k8VXa/qJlOlEXn/nBh+BF8dHX5nt/dr0= 14 | github.com/go-playground/validator/v10 v10.10.0/go.mod h1:74x4gJWsvQexRdW8Pn3dXSGrTK4nAUsbPlLADvpJkos= 15 | github.com/goccy/go-json v0.9.7 h1:IcB+Aqpx/iMHu5Yooh7jEzJk1JZ7Pjtmys2ukPr7EeM= 16 | github.com/goccy/go-json v0.9.7/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I= 17 | github.com/golang/protobuf v1.5.0/go.mod h1:FsONVRAS9T7sI+LIUmWTfcYkHO4aIWwzhcaSAoJOfIk= 18 | github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= 19 | github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg= 20 | github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM= 21 | github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo= 22 | github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo= 23 | github.com/kr/pretty v0.2.1/go.mod h1:ipq/a2n7PKx3OHsz4KJII5eveXtPO4qwEXGdVfWzfnI= 24 | github.com/kr/pretty v0.3.0/go.mod h1:640gp4NfQd8pI5XOwp5fnNeVWj67G7CFk/SaSQn7NBk= 25 | github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ= 26 | github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI= 27 | github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE= 28 | github.com/leodido/go-urn v1.2.1 h1:BqpAaACuzVSgi/VLzGZIobT2z4v53pjosyNd9Yv6n/w= 29 | github.com/leodido/go-urn v1.2.1/go.mod h1:zt4jvISO2HfUBqxjfIshjdMTYS56ZS/qv49ictyFfxY= 30 | github.com/mattn/go-isatty v0.0.14 h1:yVuAays6BHfxijgZPzw+3Zlu5yQgKGP2/hcQbHb7S9Y= 31 | github.com/mattn/go-isatty v0.0.14/go.mod h1:7GGIvUiUoEMVVmxf/4nioHXj79iQHKdU27kJ6hsGG94= 32 | github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421 h1:ZqeYNhU3OHLH3mGKHDcjJRFFRrJa6eAM5H+CtDdOsPc= 33 | github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q= 34 | github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9Gz0M= 35 | github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk= 36 | github.com/pelletier/go-toml/v2 v2.0.1 h1:8e3L2cCQzLFi2CR4g7vGFuFxX7Jl1kKX8gW+iV0GUKU= 37 | github.com/pelletier/go-toml/v2 v2.0.1/go.mod h1:r9LEWfGN8R5k0VXJ+0BkIe7MYkRdwZOjgMj2KwnJFUo= 38 | github.com/pkg/diff v0.0.0-20210226163009-20ebb0f2a09e/go.mod h1:pJLUxLENpZxwdsKMEsNbx1VGcRFpLqf3715MtcvvzbA= 39 | github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= 40 | github.com/rogpeppe/go-internal v1.6.1/go.mod h1:xXDCJY+GAPziupqXw64V24skbSoqbTEfhy4qGm1nDQc= 41 | github.com/rogpeppe/go-internal v1.8.0/go.mod h1:WmiCO8CzOY8rg0OYDC4/i/2WRWAB6poM+XZ2dLUbcbE= 42 | github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= 43 | github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= 44 | github.com/stretchr/testify v1.6.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= 45 | github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= 46 | github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= 47 | github.com/ugorji/go v1.2.7 h1:qYhyWUUd6WbiM+C6JZAUkIJt/1WrjzNHY9+KCIjVqTo= 48 | github.com/ugorji/go v1.2.7/go.mod h1:nF9osbDWLy6bDVv/Rtoh6QgnvNDpmCalQV5urGCCS6M= 49 | github.com/ugorji/go/codec v1.2.7 h1:YPXUKf7fYbp/y8xloBqZOw2qaVggbfwMlI8WM3wZUJ0= 50 | github.com/ugorji/go/codec v1.2.7/go.mod h1:WGN1fab3R1fzQlVQTkfxVtIBhWDRqOviHU95kRgeqEY= 51 | golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97 h1:/UOmuWzQfxxo9UtlXMwuQU8CMgg1eZXqTRwkSQJWKOI= 52 | golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc= 53 | golang.org/x/net v0.0.0-20210226172049-e18ecbb05110 h1:qWPm9rbaAMKs8Bq/9LRpbMqxWRVUAQwMI9fVrssnTfw= 54 | golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg= 55 | golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= 56 | golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= 57 | golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= 58 | golang.org/x/sys v0.0.0-20210806184541-e5e7981a1069 h1:siQdpVirKtzPhKl3lZWozZraCFObP8S1v6PRp0bLrtU= 59 | golang.org/x/sys v0.0.0-20210806184541-e5e7981a1069/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= 60 | golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= 61 | golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= 62 | golang.org/x/text v0.3.6 h1:aRYxNxv6iGQlyVaZmk6ZgYEDa+Jg18DxebPSrd6bg1M= 63 | golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= 64 | golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= 65 | golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= 66 | google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw= 67 | google.golang.org/protobuf v1.28.0 h1:w43yiav+6bVFTBQFZX0r7ipe9JQ1QsbMgHwbBziscLw= 68 | google.golang.org/protobuf v1.28.0/go.mod h1:HV8QOd/L58Z+nl8r43ehVNZIU/HEI6OcFqwMG9pJV4I= 69 | gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= 70 | gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= 71 | gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q= 72 | gopkg.in/errgo.v2 v2.1.0/go.mod h1:hNsd1EY+bozCKY1Ytp96fpM3vjJbqLJn88ws8XvfDNI= 73 | gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY= 74 | gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ= 75 | gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= 76 | gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= 77 | -------------------------------------------------------------------------------- /docker/codes/execsignal/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "context" 5 | "log" 6 | "net/http" 7 | "os" 8 | "os/signal" 9 | "syscall" 10 | "time" 11 | 12 | "github.com/gin-gonic/gin" 13 | ) 14 | 15 | func main() { 16 | router := gin.Default() 17 | router.GET("/", func(c *gin.Context) { 18 | time.Sleep(5 * time.Second) 19 | c.String(http.StatusOK, "Welcome Gin Server") 20 | }) 21 | 22 | srv := &http.Server{ 23 | Addr: ":8080", 24 | Handler: router, 25 | } 26 | 27 | go func() { 28 | // service connections 29 | if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed { 30 | log.Fatalf("listen: %s\n", err) 31 | } 32 | }() 33 | // Wait for interrupt signal to gracefully shutdown the server with 34 | // a timeout of 5 seconds. 35 | quit := make(chan os.Signal) 36 | // kill (no param) default send syscanll.SIGTERM 37 | // kill -2 is syscall.SIGINT 38 | // kill -9 is syscall. SIGKILL but can"t be catch, so don't need add it 39 | signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM) 40 | <-quit 41 | log.Println("Shutdown Server ...") 42 | 43 | ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) 44 | defer cancel() 45 | if err := srv.Shutdown(ctx); err != nil { 46 | log.Fatal("Server Shutdown:", err) 47 | } 48 | // catching ctx.Done(). timeout of 5 seconds. 49 | select { 50 | case <-ctx.Done(): 51 | log.Println("timeout of 5 seconds.") 52 | } 53 | log.Println("Server exiting") 54 | 55 | } 56 | -------------------------------------------------------------------------------- /docker/dockerfile-guide.md: -------------------------------------------------------------------------------- 1 | # Dockerfile详解 2 | 3 | - [Dockerfile详解](#dockerfile详解) 4 | - [前置 - docker是什么](#前置---docker是什么) 5 | - [Dockerfile是什么](#dockerfile是什么) 6 | - [概述](#概述) 7 | - [如何使用](#如何使用) 8 | - [Dockerfile结构](#dockerfile结构) 9 | - [示例dockerfile](#示例dockerfile) 10 | - [Dockerfile语法格式](#dockerfile语法格式) 11 | - [.dockerignore file](#dockerignore-file) 12 | - [Dockerfile 常用命令](#dockerfile-常用命令) 13 | - [From/RUN](#fromrun) 14 | - [1. `FROM`](#1-from) 15 | - [2. `RUN`](#2-run) 16 | - [WORKDIR / ADD / COPY](#workdir--add--copy) 17 | - [1. `WORKDIR`](#1-workdir) 18 | - [2. `COPY`](#2-copy) 19 | - [3. `ADD`](#3-add) 20 | - [EXPOSE / CMD / ENTRYPOINT](#expose--cmd--entrypoint) 21 | - [1.EXPOSE](#1expose) 22 | - [2. CMD](#2-cmd) 23 | - [3. ENTRYPOINT](#3-entrypoint) 24 | - [ARG / ENV](#arg--env) 25 | - [1. ARG](#1-arg) 26 | - [2. ENV](#2-env) 27 | - [ONBUILD](#onbuild) 28 | - [Healthcheck](#healthcheck) 29 | - [Volume](#volume) 30 | - [Dockerfile 最佳实践](#dockerfile-最佳实践) 31 | - [Exclude with `.dockerignore`](#exclude-with--dockerignore) 32 | - [Use multi-stage builds](#use-multi-stage-builds) 33 | - [Don’t install unnecessary packages](#dont-install-unnecessary-packages) 34 | - [自定义构建镜像](#自定义构建镜像) 35 | - [docker镜像的生成方式](#docker镜像的生成方式) 36 | - [方式](#方式) 37 | - [彩蛋 - BuildKit](#彩蛋---buildkit) 38 | - [使用BuildKit的内置ARGs](#使用buildkit的内置args) 39 | - [Differences between legacy builder and BuildKit](#differences-between-legacy-builder-and-buildkit) 40 | 41 | ## 前置 - docker是什么 42 | 43 | - Docker是一套平台即服务(PaaS)产品,它使用操作系统级的虚拟化,以容器的形式来交付软件。 44 | 45 | - Docker的主要好处是它允许用户将一个应用程序及其所有的依赖关系打包成一个标准化的单元(容器),用于软件开发、交付。与虚拟机不同,主机上所有的容器都共享一个操作系统内核的服务,从而能够更有效地利用底层系统和资源。 46 | 47 | - 容器之间相互隔离,它们可以通过明确定义的通道相互通信。 48 | 49 | ![Image](./assets/images/docker-architecture.png) 50 | 51 | ## Dockerfile是什么 52 | 53 | ### 概述 54 | 55 | ![Image](./assets/images/dockerfile01.png) 56 | 57 | - Docker可以通过读取`Dockerfile`中的指令自动构建镜像。 58 | - `Dockerfile`是一个文本文件,定义从一个镜像开始,它包含了用户可以在命令行上调用的所有命令,最后可以通过`docker build` 组装成一个镜像。 59 | 60 | ### 如何使用 61 | 62 | - 一般情况下,`Dockerfile`会使用执行`docker build`命令的根目录下的`Dockerfile`, 当前你也可以使用`-f`指定`Dockerfile`的路径 63 | 64 | ```sh 65 | # Dockerfile use current root path 66 | $ docker build . 67 | 68 | # Dockerfile use -f flag to point. 69 | $ docker build -f /path/to/a/Dockerfile . 70 | ``` 71 | 72 | - 构建镜像时指定镜像的tag 73 | 74 | ```sh 75 | # image default use latest tag 76 | $ docker build -t shykes/myapp . 77 | 78 | # image use 1.0.2 tag 79 | $ docker build -t shykes/myapp:1.0.2 . 80 | 81 | # 当前你也可以同时指定多个镜像tag 82 | $ docker build -t shykes/myapp:1.0.2 -t shykes/myapp:latest . 83 | ``` 84 | 85 | _注_: docker守护程序在`Dockerfile`中运行指令之前,会先执行一个初步验证,如果语法不正确将会返回错误,结束运行。 86 | 87 | ## Dockerfile结构 88 | 89 | ### 示例dockerfile 90 | 91 | ```dockerfile 92 | # syntax=docker/dockerfile:1 93 | FROM ubuntu:18.04 94 | LABEL maintainer="Colynn Liu " 95 | 96 | WORKDIR /app 97 | COPY . /app 98 | RUN mkdir /app/logs 99 | 100 | EXPOSE 8080 101 | CMD python /app/app.py 102 | ``` 103 | 104 | ![Image](./assets/images/dockerfile02.png) 105 | 106 | 注解: 107 | 108 | 1. Dockerfile 头信息 109 | 2. Dockerfile 命令集合 110 | 3. Dockerfile 运行时声明 111 | 112 | ## Dockerfile语法格式 113 | 114 | ```Dockerfile 115 | # Comment 116 | INSTRUCTION arguments 117 | ``` 118 | 119 | - Dockerfile命令是不区别大小写的,而我们习惯性用大写,因为这样可以更好与参数区别。 120 | 121 | - Docker会按照顺序运行Dockerfile内的指令。__`Dockerfile`必须以`FROM`指令开始__,当然`FROM`也是可以在[解析器指令](https://docs.docker.com/engine/reference/builder/#parser-directives)、注释或是全局范围的[ARG](https://docs.docker.com/engine/reference/builder/#arg)之后。 122 | 123 | _注_: [解析器指令](https://docs.docker.com/engine/reference/builder/#parser-directives)是可选的(很少用到),目前仅有`syntax`/`escape` 这两个解析器指令的定义。 124 | 125 | ### .dockerignore file 126 | 127 | 当使用 `ADD` or `COPY` . 时,为了避免将不需要的大文件或是敏感文件拷贝进容器,可以使用 `.dockerignore` 来忽略它们。 128 | 129 | - 我们来看一个 `.dockerignore`的示例: 130 | 131 | ```sh 132 | # comment 133 | */temp* # eg: /somedir/temporary.txt 134 | */*/temp* # eg: /somedir/subdir/temporary.txt 135 | temp? # eg: /tempa and /tempb 136 | **/*.go # 137 | ``` 138 | 139 | - 更多 140 | 141 | ```sh 142 | *.md 143 | !README*.md 144 | README-secret.md 145 | ``` 146 | 147 | ## Dockerfile 常用命令 148 | 149 | ### From/RUN 150 | 151 | #### 1. `FROM` 152 | 153 | ```dockerfile 154 | FROM [--platform=] [AS ] 155 | ``` 156 | 157 | Or 158 | 159 | ```dockerfile 160 | FROM [--platform=] [:] [AS ] 161 | ``` 162 | 163 | Or 164 | 165 | ```dockerfile 166 | FROM [--platform=] [@] [AS ] 167 | ``` 168 | 169 | Tips: 170 | 171 | - `FROM` can appear multiple times within a single Dockerfile to create multiple images 172 | - Optionally a name can be given to a new build stage by adding `AS name` to the `FROM` instruction. The name can be used in subsequent `FROM` and `COPY --from=` instructions to refer to the image built in this stage. 173 | - The `tag` or `digest` values are optional. If you omit either of them, the builder assumes a `latest` tag by default. The builder returns an error if it cannot find the `tag` value. 174 | - `ARG` is the only instruction that may precede `FROM` in the Dockerfile 175 | - An `ARG` declared before a `FROM` is outside of a build stage(在构建阶段之外), so it can’t be used in any instruction after a `FROM`. To use the default value of an ARG declared before the first `FROM` use an `ARG` instruction without a value inside of a build stage: 176 | 177 | ```dockerfile 178 | ARG VERSION=latest 179 | FROM busybox:$VERSION 180 | ARG VERSION 181 | RUN echo $VERSION > image_version 182 | ``` 183 | 184 | #### 2. `RUN` 185 | 186 | `RUN` has 2 forms: 187 | 188 | - `RUN ` (shell form, the command is run in a shell, which by default is `/bin/sh -c` on Linux or `cmd /S /C` on Windows) 189 | - `RUN ["executable", "param1", "param2"]` (exec form) 190 | 191 | _Notes_: 192 | 193 | 1. The __exec form__ is parsed as a JSON array, which means that you must use double-quotes (`“`) around words not single-quotes (`‘`). 194 | 2. `RUN [ "echo", "$HOME" ]`与 `RUN [ "sh", "-c", "echo $HOME" ]` 有什么区别,一起来看下例子 [`./assets/dockerfile.run`](https://github.com/warm-native/docs/tree/master/docker/assets/dockerfile.run) 195 | 3. The cache for `RUN` instructions is validated automatically during the next build. `docker build`时可以添加 `--no-cache`来解除缓存, 当然也可以通过`COPY`、`ADD`指令来使缓存无效。 196 | 197 | ### WORKDIR / ADD / COPY 198 | 199 | #### 1. `WORKDIR` 200 | 201 | `WORKDIR`为`Dockerfile`中的`RUN`, `CMD`, `ENTRYPOINT`, `COPY`, `ADD`指令设置工作目录,如果不存在将会被创建(即使后面没有使用这个工作目录),允许多次定义,也允许使用相对路径,也可以解析之前通过`ENV`设置的路径。 202 | 203 | _注_: 204 | 205 | 1. 使用相对路径时会相对之前的`WORKDIR`指令来创建,如果是首次创建,也就是在根目录`/`. 206 | 2. 为了`Dockerfile`的易维护性,建议`WORKDIR`只声明一次. 207 | 208 | #### 2. `COPY` 209 | 210 | ```dockerfile 211 | COPY [--chown=:] ... 212 | COPY [--chown=:] ["",... ""] 213 | ``` 214 | 215 | _注_: 如果路径中包含空格只能采用第二种方式 216 | 217 | `COPY` obeys the following rules: 218 | 219 | - `` 目录必须在构建的上下文内; 你不能执行类似于这样的:`COPY ../something /something` 220 | - 如果 `` 是一个目录,这个目录下的全部内容均会拷贝进入(__注意__: 目录本身并不会拷贝,只会拷贝目录下的内容) 221 | 222 | - 如果 `` 是任何类型的文件. 223 | - 如果 `` 以 `/`结尾,则``将会作为目录,``的目录将会拷贝至 ``目录之下; 224 | - 如果`` 不以`/`结尾,则``会被覆盖成``的内容。 225 | 226 | - 如果多个 `` 资源被指定时,(不管是文件/目录或是使用通配符的资源), `` 必须是一个目录,也就是说必须依赖 `/`结尾; 另外如果拷贝多个资源时,`` 不以`/`结尾, builder也提示语法错误。 227 | 228 | - 如果 `` 不存在, 它将和其路径中所有缺失的目录一起被创建。 229 | 230 | #### 3. `ADD` 231 | 232 | ```dockerfile 233 | ADD [--chown=:] ... 234 | ADD [--chown=:] ["",... ""] 235 | ``` 236 | 237 | `ADD`相对于`COPY`更强大: 238 | 239 | - It can handle remote URLs 240 | - It can also auto-extract tar files. 241 | 242 | `ADD` obeys the following rules: 243 | 244 | - 如果 `` 是一个URL, 245 | - 如果 `` 不是以`/`结尾, 则``会被覆盖成``的内容. 246 | - 如果 `` 以 `/`结尾,则``将会作为目录,``的目录将会拷贝至 ``目录之下; 247 | - `[Waring]` The URL must __have a nontrivial path__ so that an appropriate filename can be discovered in this case (`http://example.com` will not work). 248 | 249 | - 如果 `` 是一个本地的`tar`包(identity, gzip, bzip2 or xz), 它会被解压为一个目录. When a directory is copied or unpacked, it has the same behavior as `tar -x`. 250 | - `[Note]` Resources from remote URLs are not decompressed. 251 | - 将一个``是否判断为压缩包是依赖文件的内容,而不是文件的名称,比如你直接`ADD hello.tar.gz /`, 只是会将这个文件拷贝进去,并不会解压也不会报错。 252 | 253 | ### EXPOSE / CMD / ENTRYPOINT 254 | 255 | #### 1.EXPOSE 256 | 257 | - 格式 258 | 259 | ```sh 260 | EXPOSE [/...] 261 | ``` 262 | 263 | ```sh 264 | # the default is TCP if the protocol is not specified. 265 | EXPOSE 80/tcp 266 | EXPOSE 80/udp 267 | ``` 268 | 269 | - `EXPOSE` 指令不是确切的发布的端口, It functions as a type of documentation __between__ the person who builds the image __and__ the person who runs the container, about which ports are intended to be published. 270 | 271 | - 运行容器真正发布的端口,其实是通过`docker run`添加`-p`的参数来发布和映射一个或多个端口,或是使用`-P` 发布`EXPOSE`声明的端口并将其映射到高阶端口上。 272 | 273 | #### 2. CMD 274 | 275 | The CMD instruction has three forms: 276 | 277 | - `CMD ["executable","param1","param2"]` (exec form, this is the preferred form) 278 | - `CMD ["param1","param2"]` (as default parameters to ENTRYPOINT) 279 | - `CMD command param1 param2` (shell form) 280 | 281 | 概述: 282 | 283 | - 在`Dockerfile`中只能有一个`cmd`指令, 如果你定义了很多,只有最后一下能够起作用. 284 | 285 | - __The main purpose of a `CMD` is to provide defaults for an executing container.__ 这个默认值可以被执行,或是指定一个`ENTRYPOINT'指令来忽略. 286 | - `CMD`是第二种方式运行时(为`ENTRYPOINT`提供默认的参数), `CMD` 和 `ENTRYPOINT` 均应该是 `JSON`格式. 287 | - If the user specifies arguments to `docker run` then they will override the default specified in `CMD`. eg: `docker run -it --rm colynn/ops-debug /bin/sh` 288 | 289 | Tips: 290 | 291 | - Do not confuse `RUN` with `CMD`. `RUN` actually runs a command and commits the result; `CMD`在构建时不执行任何东西,但指定镜像的预期命令。 292 | - 大家可以看到`CMD`也是存在`exec`、`shell`模型的,所以也存在类似于`RUN`的变量解析的问题。 293 | 294 | #### 3. ENTRYPOINT 295 | 296 | The ENTRYPOINT instruction has two forms: 297 | 298 | - `ENTRYPOINT ["executable", "param1", "param2"]` # exec form 299 | - `ENTRYPOINT command param1 param2` # shell form 300 | 301 | Tips: 302 | 303 | - Only the last `ENTRYPOINT` instruction in the `Dockerfile` will have an effect. 304 | - The __exec form__ is parsed as a JSON array, which means that you must use double-quotes (“) around words not single-quotes (‘) 305 | - 类似`RUN`/`CMD` , `ENTRYPOINT [ "echo", "$HOME" ]` will not do variable substitution on `$HOME` 306 | - __shell form__ You can specify a plain string for the `ENTRYPOINT` and it will execute in `/bin/sh -c`. This form will use shell processing to substitute shell environment variables, and will ignore any `CMD` or `docker run` command line arguments. 307 | - 所以在 __shell form__ 下,为了确保`docker stop`能够正确停止ENTRYPOINT的可执行程序,记得用exec启动它, eg: `ENTRYPOINT exec top -b` ?, 其实写一个测试代码发现不论是加不加`exec` [测试代码](https://github.com/warm-native/docs/tree/master/docker/codes/execsignal)均可以收到 `SIGTERM` 信号,这个点值得我们再去[讨论](https://github.com/docker/cli/issues/3198), 308 | 309 | ### ARG / ENV 310 | 311 | #### 1. ARG 312 | 313 | The ARG instruction defines a variable that users can pass at build-time to the builder with the docker build command using the `--build-arg =` flag. If a user specifies a build argument that was not defined in the Dockerfile, the build outputs a warning. 314 | 315 | 1. Default values/Scope 316 | 317 | ```Dockerfile 318 | FROM busybox 319 | ARG user1=someuser 320 | ARG buildno=1 321 | 322 | USER ${user:-some_user} 323 | ARG user 324 | USER $user 325 | # ... 326 | ``` 327 | 328 | 2. Predefined ARGs 329 | 330 | - `HTTP_PROXY` 331 | - `http_proxy` 332 | - `HTTPS_PROXY` 333 | - `https_proxy` 334 | - `FTP_PROXY` 335 | - `ftp_proxy` 336 | - `NO_PROXY` 337 | - `no_proxy` 338 | 339 | 3. How to use 340 | 341 | ```sh 342 | docker build --build-arg HTTPS_PROXY=https://my-proxy.example.com . 343 | ``` 344 | 345 | #### 2. ENV 346 | 347 | 语法: 348 | 349 | ```Dockerfile 350 | ENV = ... 351 | ENV MY_VAR my-value 352 | ``` 353 | 354 | 示例: 355 | 356 | ```Dockerfile 357 | ENV MY_NAME="John Doe" 358 | ENV MY_DOG=Rex\ The\ Dog 359 | ENV MY_CAT=fluffy 360 | ``` 361 | 362 | Tips: 363 | 364 | - `ENV` which is persisted in the final image 365 | - `ARG` which is not persisted in the final image 366 | 367 | ### ONBUILD 368 | 369 | ```dockerfile 370 | ONBUILD 371 | ``` 372 | 373 | 举个例子: 374 | 375 | - 场景: 376 | 377 | 如果你的镜像是一个可重复使用的`python`应用程序构建器,那就需要应用的代码添加至指定的目录,而且再添加完代码后,可能还需要一个构建脚本被触发; 378 | 因为你没有应用代码的权限,而且对于每个应用的构建也有可能不同。 379 | 380 | - 解决办法: 381 | 382 | 你可以提供一个`Dockerfile`的模板给开发者,但是这样是很不高效的,而且应用的代码混合在一起,有错误也难以更新。 383 | 384 | - 最优解 385 | 386 | 如下面的命令,使用`ONBUILD`来注册提前指令,以便在以后的下一个构建阶段运行。 387 | 388 | ```dockerfile 389 | ONBUILD ADD . /app/src 390 | ONBUILD RUN /usr/local/bin/python-build --dir /app/src 391 | ``` 392 | 393 | ### Healthcheck 394 | 395 | - `HEALTHCHECK [OPTIONS] CMD command` (check container health by running a command inside the container) 396 | - `HEALTHCHECK NONE` (disable any healthcheck inherited from the base image) 397 | 398 | 作用: 399 | 400 | `HEALTHCHECK` 指令告诉Docker如何测试一个容器,以检查它是否仍在工作。比如网络服务器陷入了无限循环,无法处理新的连接,尽管服务器进程仍在运行。 401 | 402 | The options that can appear before `CMD` are: 403 | 404 | - `--interval=DURATION` (default: `30s`) 405 | - `--timeout=DURATION` (default: `30s`) 406 | - `--start-period=DURATION` (default: `0s`) 407 | - `--retries=N` (default: `3`) 408 | 409 | The command's exit status indicates the health status of the container. The possible values are: 410 | 411 | - 0: success - the container is healthy and ready for use 412 | - 1: unhealthy - the container is not working correctly 413 | - 2: reserved - do not use this exit code 414 | 415 | 示例: 416 | 417 | ```dockerfile 418 | HEALTHCHECK --interval=5m --timeout=3s \ 419 | CMD curl -f http://localhost/ || exit 1 420 | ``` 421 | 422 | ### Volume 423 | 424 | ```dockerfile 425 | VOLUME ["/data"] 426 | ``` 427 | 428 | The `VOLUME` instruction creates a mount point with the specified name and marks it as holding externally mounted volumes from native host or other containers. 429 | 430 | Notes about specifying volumes: 431 | 432 | - __You cannot specify a volume source in the `Dockerfile`__: 433 | - __What does the VOLUME line do?__: Every time you create a container from this image, docker will force that directory to be a volume. If you do not provide a volume in your run command, or compose file, the only option for docker is to create __an anonymous volume__. 434 | - __Changing the volume from within the Dockerfile__: If any build steps change the data within the volume after it has been declared, those changes will be discarded. 435 | - __The host directory is declared at container run-time__: The host directory (the mountpoint) is, by its nature, host-dependent. This is to preserve image portability, since a given host directory can’t be guaranteed to be available on all hosts. For this reason, you can’t mount a host directory from within the Dockerfile. The VOLUME instruction does not support specifying a host-dir parameter. You must specify the mountpoint when you create or run the container. 436 | 437 | 如果这个镜像你的应用镜像,你不再把它作为基础镜像继续使用的话,`volume`在`dockerfile`的场景可以简单概述下: 438 | 439 | - 适合的场景(logs、temp folders) , 440 | - 不适合的场景(static files, configs, code) 441 | 442 | 其实,对于`logs`、`temp folders`可以基于存放于`write layer`层,或是对于需要保存的通过`docker run`或是`docker-compose`的方式声明`volume`更好些。 443 | 444 | ## Dockerfile 最佳实践 445 | 446 | - 447 | 448 | ### Exclude with `.dockerignore` 449 | 450 | ### Use multi-stage builds 451 | 452 | - 453 | 454 | ### Don’t install unnecessary packages 455 | 456 | Each instruction you create in your Dockerfile results in a new image layer being created. Each layer brings additional data that are not always part of the resulting image. For example, if you add a file in one layer, but remove it in another layer later, the final image’s size will include the added file size in a form of a special "whiteout" file although you removed it. In addition, every layer contains separate metadata that add up to the overall image size as well. 457 | 458 | ## 自定义构建镜像 459 | 460 | - 461 | 462 | - 463 | 464 | ## docker镜像的生成方式 465 | 466 | 我们所说的`Docker images` 实际上是由一个或是多个镜像层构建的。 镜像中的层是以父子关系连接在一起的,每个层代表最终容器图像的某些部分。 467 | 468 | ### 方式 469 | 470 | - 通过docker容器生成镜像 471 | - 通过dockerfile生成镜像; 472 | 473 | ## 彩蛋 - BuildKit 474 | 475 | ```sh 476 | export DOCKER_BUILDKIT=1 477 | ``` 478 | 479 | ### 使用BuildKit的内置ARGs 480 | 481 | __BuildKit__ supports a predefined set of ARG variables with information on the platform of the node performing the build (build platform) and on the platform of the resulting image (target platform). The target platform can be specified with the --platform flag on docker build. 482 | 483 | The following ARG variables are set automatically: 484 | 485 | - `TARGETPLATFORM` - platform of the build result. Eg linux/amd64, linux/arm/v7, windows/amd64. 486 | - `TARGETOS` - OS component of TARGETPLATFORM 487 | - `TARGETARCH` - architecture component of TARGETPLATFORM 488 | - `TARGETVARIANT` - variant component of TARGETPLATFORM 489 | - `BUILDPLATFORM` - platform of the node performing the build. 490 | - `BUILDOS` - OS component of BUILDPLATFORM 491 | - `BUILDARCH` - architecture component of BUILDPLATFORM 492 | - `BUILDVARIANT` - variant component of BUILDPLATFORM 493 | 494 | 但是上面的这些ARGs你要使用,需要先在`Dockerfile`通过ARG xx的方式声明后,再可获取至相应参数的定义, 更多信息参看[官网链接](https://docs.docker.com/reference/dockerfile/#automatic-platform-args-in-the-global-scope) 495 | 496 | ### Differences between legacy builder and BuildKit 497 | 498 | The legacy Docker Engine builder processes all stages of a Dockerfile leading up to the selected `--target`. It will build a stage even if the selected target doesn’t depend on that stage. 499 | 500 | [BuildKit(](https://docs.docker.com/build/buildkit/) only builds the stages that the target stage depends on. 501 | 502 | For example, given the following Dockerfile: 503 | 504 | ```dockerfile 505 | # syntax=docker/dockerfile:1 506 | FROM ubuntu AS base 507 | RUN echo "base" 508 | 509 | FROM base AS stage1 510 | RUN echo "stage1" 511 | 512 | FROM base AS stage2 513 | RUN echo "stage2" 514 | ``` 515 | 516 | With [BuildKit enabled](https://docs.docker.com/build/buildkit/#getting-started), building the __stage2__ target in this Dockerfile means only _base_ and _stage2_ are processed. There is no dependency on _stage1_, so it’s skipped. 517 | -------------------------------------------------------------------------------- /docker/multi-arch-dockerfile.md: -------------------------------------------------------------------------------- 1 | # multi-arch 2 | 3 | ## 写在前面 4 | 5 | ### Docker Build architecture 6 | 7 | Docker Build implements a client-server architecture, where: 8 | 9 | - Buildx is the client and the user interface for running and managing builds 10 | - BuildKit is the server, or builder, that handles the build execution. 11 | ![Image](./assets/images/build-high-level-arch.png) 12 | 13 | > As of Docker Engine 23.0 and Docker Desktop 4.19, Buildx is the default build client. 14 | 15 | There are currently __four different ways__ that one can build locally with Docker: 16 | 17 | - The legacy builder in Docker Engine: `DOCKER_BUILDKIT=0 docker build .` 18 | - BuildKit in Docker Engine: `DOCKER_BUILDKIT=1 docker build .` 19 | - Buildx CLI plugin with the Docker driver: `docker buildx build .` 20 | - Buildx CLI plugin with the Container driver: `docker buildx create && docker buildx build .` 21 | 22 | ## Without Using Docker BuildX 23 | 24 | ```sh 25 | export DOCKER_CLI_EXPERIMENTAL=enabled 26 | 27 | docker manifest create 28 | 29 | docker manifest push 30 | ``` 31 | 32 | ## buildx 33 | 34 | 35 | 36 | ### Buildx CLI with driver 37 | 38 | 39 | 40 | [Set buildx as default builder](https://github.com/docker/cli/pull/3314) 41 | 42 | [Docker container build driver](https://docs.docker.com/build/drivers/docker-container/) 43 | 44 | ## buildKit 45 | 46 | ## 应用 47 | 48 | 如上我们了解到了 buildx、buildKit, 那么如何在自己的环境内使用呢?以下载[buildx v0.13.1](https://github.com/docker/buildx/releases/tag/v0.13.1)为例: 49 | 50 | ```sh 51 | mkdir ~/.docker/cli-plugins/ 52 | 53 | wget https://github.com/docker/buildx/releases/download/v0.13.1/buildx-v0.13.1.linux-amd64 -O docker-buildx 54 | 55 | chmod +x docker-buildx 56 | 57 | docker info 58 | ``` 59 | 60 | ## golang project sample 61 | 62 | 63 | 64 | ## java project sample 65 | -------------------------------------------------------------------------------- /gitlabci/README.md: -------------------------------------------------------------------------------- 1 | # GitlabCI 2 | 3 | ## How it works (CI/CD process overview) 4 | 5 | - [Ensure you have runners available](https://docs.gitlab.com/ee/ci/quick_start/#ensure-you-have-runners-available) to run your jobs. GitLab SaaS provides runners, so if you’re using GitLab.com, you can skip this step. 6 | 7 | If you don’t have a runner, [install GitLab Runner](https://docs.gitlab.com/runner/install/) and [register a runner](https://docs.gitlab.com/runner/register/) for your instance, project, or group. 8 | 9 | - [Create a .gitlab-ci.yml file](https://docs.gitlab.com/ee/ci/quick_start/#create-a-gitlab-ciyml-file) at the root of your repository. This file is where you define your CI/CD jobs. 10 | 11 | ## Core Concepts 12 | 13 | ## Script synatx 14 | 15 | > Use special characters with `script` 16 | 17 | - 18 | 19 | ## Varibles 20 | 21 | ## Jobs Rules 22 | 23 | ## FAQ 24 | 25 | ### 1. Git Submodule - Permission Denied 26 | 27 | - 错误描述(类似如下): 28 | 29 | ```sh 30 | $ git submodule update --init --recursive 31 | Submodule 'lib/urlgrabber' (git@gitlab.colynn.com:linux/urlgrabber.git) registered for path 'lib/urlgrabber' 32 | Cloning into 'lib/urlgrabber'... 33 | Host key verification failed. 34 | fatal: Could not read from remote repository. 35 | ``` 36 | 37 | - 解决方案: 38 | __方案1)[使用相对路径](https://docs.gitlab.com/ee/ci/git_submodules.html#configure-the-gitmodules-file):I had to change the paths in .gitmodules to relative__ 39 | 40 | ```sh 41 | [submodule "lib/urlgrabber"] 42 | path = lib/urlgrabber 43 | - url = git@gitlab.deif.com:linux/urlgrabber.git 44 | + url = ../../linux/urlgrabber.git 45 | ``` 46 | 47 | __方案2)通过前置脚本修改`git config` 替换掉`git@gitlab.colynn.com:linux/urlgrabber.git`__ 48 | 49 | a. 通过为`gitlab-ci.yml`添加`before_script`的方式实现替换 50 | 51 | ```yaml 52 | before_script: 53 | - git config --global --add url."https://${GITLAB_USERNAME}:${GITLAB_TOKEN}@gitlab.colynn.com/linux/urlgrabber.git".insteadOf "git@gitlab.colynn.com:linux/urlgrabber.git" 54 | ``` 55 | 56 | b. 通过`runner`的配置文件,添加[`pre_clone_script`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-section)的方式实现替换 57 | 58 | ```toml 59 | pre_clone_script="git config --global --add url.\"https://${GITLAB_USERNAME}:${GITLAB_TOKEN}@gitlab.colynn.com/linux/urlgrabber.git\".insteadOf \"git@gitlab.colynn.com:linux/urlgrabber.git\"\n" 60 | ``` 61 | -------------------------------------------------------------------------------- /golang-tips.md: -------------------------------------------------------------------------------- 1 | # Golang Learning For Beginner 2 | 3 | - [Golang Learning For Beginner](#golang-learning-for-beginner) 4 | - [Overview](#overview) 5 | - [书籍推荐](#书籍推荐) 6 | - [How to try it](#how-to-try-it) 7 | - [How to setup develop env](#how-to-setup-develop-env) 8 | - [Install](#install) 9 | - [Env profile](#env-profile) 10 | - [Go module](#go-module) 11 | - [vscode plugins](#vscode-plugins) 12 | - [First program/module](#first-programmodule) 13 | - [Project Layout/Advices](#project-layoutadvices) 14 | - [Resources](#resources) 15 | 16 | ## Overview 17 | 18 | - Strong and statically typed 19 | 20 | - Excellect community 21 | - Key features 22 | - Simplicity 23 | - Fast compile times 24 | - Garbage collected 25 | - Built-in concurrency 26 | - Compile to standalone binaries 27 | 28 | ## 书籍推荐 29 | 30 | 1. 31 | 2. 32 | 33 | ## How to try it 34 | 35 | 36 | 37 | ## How to setup develop env 38 | 39 | ### Install 40 | 41 | 1. [下载golang](https://golang.org/dl/) 42 | 43 | 2. [vscode](https://code.visualstudio.com/) 44 | 45 | ### Env profile 46 | 47 | - `GOROOT` is a variable that defines where your Go SDK is located. You do not need to change this variable, unless you plan to use different Go versions. 48 | 49 | - `GOPATH` is a variable that defines the root of your workspace. 50 | 51 | - `src/`: location of Go source code (for example, .go, .c, .g, .s). 52 | 53 | - `pkg/`: location of compiled package code (for example, .a). 54 | 55 | - `bin/`: location of compiled executable programs built by Go. 56 | 57 | ### Go module 58 | 59 | [Go Mod 包管理 & 常见问题](https://colynn.github.io/2019-08-15-introducing_go_mod/) 60 | 61 | ### vscode plugins 62 | 63 | - Go 64 | 65 | - Bracket Pair Colorizer 66 | - GitLens - Git supercharged 67 | - indent-rainbow 68 | 69 | ## First program/module 70 | 71 | ## Project Layout/Advices 72 | 73 | 1. [project-laybout](https://github.com/golang-standards/project-layout) 74 | 2. [golang 代码规范等](https://colynn.github.io/2020-03-29-golang-101/) 75 | 3. [Tencent Go安全指南](https://github.com/Tencent/secguide/blob/main/Go%E5%AE%89%E5%85%A8%E6%8C%87%E5%8D%97.md) 76 | 4. [golang 101](https://go101.org/article/101.html) 77 | 78 | ## Resources 79 | 80 | 1. 81 | 2. 82 | -------------------------------------------------------------------------------- /jenkins/README.md: -------------------------------------------------------------------------------- 1 | 2 | # Jenkins 持续集成实战 3 | 4 | ## jenkins pipeline 5 | 6 | * topic001: 如何定制镜像 [bilibili > 视频链接](https://www.bilibili.com/video/BV1zt4y1a7F1/) 7 | * topic002: Jenkins + Kubernetes CI/CD 解决方案实战 8 | * jenkins kubernetes ci/cd避免踩坑 - 实践注意项 - 第2篇 [bilibili > 视频链接](https://www.bilibili.com/video/BV1A5411V7zm/) 9 | * jenkins kubernetes ci/cd避免踩坑 - 实践注意项 - 第3篇 [bilibili > 视频链接](https://www.bilibili.com/video/BV1G5411V7mU/) 10 | * topic003: 如何写好 Jenkinsfile/Pipeline [bilibili > 视频链接](https://www.bilibili.com/video/BV1ph411W7Ek/) 11 | 12 | ## jenkins in read world 13 | 14 | * topic004: 基于[workflow](https://github.com/go-atomci/workflow) 实现 Jenkins自定义Pipeline [bilibili > 视频链接](https://www.bilibili.com/video/BV1zb4y127EQ) 15 | * topic005: 分享基于云原生理念的 cicd平台-atomci 如何安装部署 [bilibili > 视频链接](https://www.bilibili.com/video/BV1qq4y1N7mZ/) 16 | * topic006: AtomCI 云原生的devops平台真得这么香吗 [bilibili > 视频链接](https://www.bilibili.com/video/BV1K3411m78Q/) 17 | * topic007: AtomCI 云原生的devops平台 - 5分钟全流程体验 [bilibili > 视频链接](https://www.bilibili.com/video/BV18F411a7Rk/) 18 | 19 | ## jenkins enhance 20 | 21 | * topic008: Jenkins Shared library 22 | 23 | ## FAQ 24 | 25 | ### 1. 如果获取jenkins job的配置 26 | 27 | ```sh 28 | Job URL: http://YOUR_JENKINS/job/YOUR_DEPLOY/ 29 | Conig URL: http://YOUR_JENKINS/job/YOUR_DEPLOY/config.xml 30 | ``` 31 | -------------------------------------------------------------------------------- /jenkins/jenkins-pipeline-enhance.md: -------------------------------------------------------------------------------- 1 | # 2 | -------------------------------------------------------------------------------- /jenkins/topic001/README.md: -------------------------------------------------------------------------------- 1 | 2 | # 如何定制镜像- jenkins + docker 持续集成实战前置篇 3 | 4 | ## 1.定制镜像jnlp、kaniko、kubectl 5 | 6 | ### 为什么需要定制镜像 7 | 8 | 1. 你是不是有想直接在容器内 telnet或tcpdump 9 | 2. 或是需要某语言的执行环境,如python 10 | 11 | ### 探究如何定制镜像 12 | 13 | 1. 一起来学习下镜像是如何一步步产生的 14 | 2. 基于最基础的镜像的来做定制,为什么 15 | 16 | ### 举例 17 | 18 | #### jnlp 19 | 20 | * 添加python 运行环境 21 | 22 | #### kaniko 23 | 24 | * [kaniko](https://github.com/GoogleContainerTools/kaniko) 25 | 26 | #### kubectl 27 | 28 | * [install kubectl](https://v1-16.docs.kubernetes.io/docs/tasks/tools/install-kubectl/) 29 | -------------------------------------------------------------------------------- /jenkins/topic002/README.md: -------------------------------------------------------------------------------- 1 | # Topic2. Jenkins + Kubernetes CI/CD 解决方案实战 2 | 3 | [详细的配置教程](https://colynn.github.io/2019-10-22-kubernetes-ci-cd/) 4 | 5 | ## 1. 环境准备 6 | 7 | ### 1.1 jenkins部署 8 | 9 | * 主机形式或docker或是集群内安装; 10 | * kubernetes/git 插件 11 | 12 | ```sh 13 | # start jenkins docker 14 | $ docker run -d -p 8091:8080 -p50000:50000 --name jenkins -v $(pwd)/data:/var/jenkins_home colynn/jenkins:2.277.1-lts-alpine 15 | ``` 16 | 17 | ### 1.2 基础镜像准备 18 | 19 | * `colynn/jenkins-jnlp-agent:latest`: Jenkins jnlp agent, 还有另外一种ssh agent, 但还是推荐使用`jnlp-agent` 20 | * `colynn/kaniko-executor:debug`: 用于镜像制作及镜像推送 21 | 22 | > jnlp-agnet的基础镜像是必须的,对于其他的镜像可以根据需要定义`Pod`的`template` 23 | 24 | ## 2. jenkins Auth 25 | 26 | > 创建jenkins连接至kubernetes的auh信息 27 | 28 | ### 2.1 创建 service account 29 | 30 | > 请根据`jenkins`部署在k8s的集群内或外选择[`incluster`](https://github.com/warm-native/docs/tree/master/topic002/deploy/incluster) or [`outcluster`](https://github.com/warm-native/docs/tree/master/topic002/deploy/outcluster) 31 | > 注意默认授权的是`devops` 命名空间,可以根据需要修改 32 | 33 | ### 2.2 配置 Jenkins Credentials 34 | 35 | 1. 获取 service account auth信息 36 | 37 | ```sh 38 | kubectl -n devops describe serviceaccount jenkins-admin 39 | kubectl -n devops describe secret [jenkins-admin-token-name] 40 | ``` 41 | 42 | 2. 创建 __Secret text__ 类型的Credentials 43 | 44 | ![Image](./assets/k8s-auth.png) 45 | 46 | 3. git auth 47 | 48 | > 根据需要进行配置,如果不需要检出代码,可以不配置。 49 | 50 | ## 3. Jenkins add kubernetes cloud 51 | 52 | ![Image](./assets/k8s-cloud-setup.png) 53 | 54 | __注意__: 这里使用的`Kubernetes Namespace` 注意要和创建的 service account的 namesapce一致。 55 | 56 | ## 4. jenkins agent (提供agent的3种形式) 57 | 58 | 1. yaml in declarative pipeline 59 | 60 | > 示例链接:[>yaml in declarative pipeline](https://github.com/jenkinsci/kubernetes-plugin#declarative-pipeline) 61 | 62 | ```yaml 63 | pipeline { 64 | agent { 65 | kubernetes { 66 | yaml """ 67 | 68 | """ 69 | } 70 | } 71 | ... 72 | ``` 73 | 74 | 2. configuration in scripted pipeline 75 | 76 | > 示例链接:[>container configuration](https://github.com/jenkinsci/kubernetes-plugin#container-configuration) 77 | 78 | ```yaml 79 | podTemplate(cloud: 'kubernetes', containers: [ 80 | containerTemplate( 81 | 82 | ]) { 83 | ... 84 | } 85 | ``` 86 | 87 | 3. configuration in Jenkins UI 88 | 89 | ![image](https://user-images.githubusercontent.com/5203608/101015878-008b4280-35a3-11eb-9e6b-02eaf3567ffd.png) 90 | 91 | __注意事项__: 92 | 93 | 1. 配置 Kubernetes Pod Template 时注意: 94 | * 如果pipeline 没有指定agent 的标签,而是使用的 agent any, 那么 Usage 选项注意选择 Use this node as much as possible 95 | * 如果pipeline 指定的具体的agent 标签,那么 Usage 选项注意选择 ONly build jobs with label expressions matching this role, 而且 Lables 选项添加对应的标签。 96 | 97 | ```sh 98 | # 定义 pod template 时指定的标签是 atom-ci, 那么Jenkinsfile里的 agent也要添加上指定的标签 99 | ... 100 | agent { 101 | label 'atom-ci' 102 | } 103 | ... 104 | ``` 105 | 106 | 2. 添加 jnlp-agent 类型的 __Container Template__ 时注意: 107 | * __Command to run__ 和 __Arguments to pass to command__ 保持为空 108 | * 确保你拥有正确的 jenkins-jnlp-agent 镜像, 没有必要建议不要修改该镜像,直接使用默认的即可。 109 | 110 | ```Dockerfile 111 | COPY jenkins-slave /usr/local/bin/jenkins-slave 112 | ENTRYPOINT ["jenkins-slave"] 113 | ``` 114 | 115 | ## 5. 拓展 116 | 117 | 1. 中间产物 构建目录 118 | 2. jenkins as code 119 | -------------------------------------------------------------------------------- /jenkins/topic002/assets/k8s-auth.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/jenkins/topic002/assets/k8s-auth.png -------------------------------------------------------------------------------- /jenkins/topic002/assets/k8s-cloud-setup.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/jenkins/topic002/assets/k8s-cloud-setup.png -------------------------------------------------------------------------------- /jenkins/topic002/deploy/README.md: -------------------------------------------------------------------------------- 1 | # Jenkins agent service-account Get 2 | 3 | ## 概述 4 | 5 | 获取service-account分为两种形式, 6 | 7 | * `cluster`  jenkins master部署在kubernetes内部; 8 | * `outcluster`  jenkins-master 部署在kubernetes外部,比如通过docker或是二进制形式启动. 9 | 10 | ## How to run 11 | 12 | 点击视频链接:  13 | -------------------------------------------------------------------------------- /jenkins/topic002/deploy/incluster/deployment.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: devops 5 | 6 | 7 | apiVersion: extensions/v1beta1 8 | kind: Deployment 9 | metadata: 10 | annotations: 11 | creator: admin 12 | creationTimestamp: "2019-11-21T07:55:50Z" 13 | name: jenkins 14 | namespace: devops 15 | spec: 16 | progressDeadlineSeconds: 600 17 | replicas: 1 18 | revisionHistoryLimit: 2 19 | template: 20 | metadata: 21 | labels: 22 | app: jenkins 23 | name: jenkins 24 | namespace: devops 25 | spec: 26 | containers: 27 | - image: jenkins/jenkins:lts-alpine 28 | #imagePullPolicy: Always 29 | imagePullPolicy: IfNotPresent 30 | name: jenkins 31 | volumeMounts: 32 | - name: jenkins-volume 33 | mountPath: /var/jenkins_home 34 | - name: jenkins-localtime 35 | mountPath: /etc/localtime 36 | env: 37 | - name: JAVA_OPTS 38 | value: '-Xms256m -Xmx1024m -Duser.timezone=Asia/Shanghai' 39 | - name: TRY_UPGRADE_IF_NO_MARKER 40 | value: 'true' 41 | ports: 42 | - name: http 43 | containerPort: 8080 44 | - name: agent 45 | containerPort: 50000 46 | resources: 47 | requests: 48 | cpu: 200m 49 | memory: 500M 50 | limits: 51 | cpu: 800m 52 | memory: 1.5Gi 53 | securityContext: 54 | privileged: false 55 | terminationMessagePath: /dev/termination-log 56 | terminationMessagePolicy: File 57 | dnsPolicy: ClusterFirst 58 | restartPolicy: Always 59 | serviceAccountName: jenkins-admin 60 | schedulerName: default-scheduler 61 | securityContext: {} 62 | terminationGracePeriodSeconds: 30 63 | volumes: 64 | - name: jenkins-localtime 65 | hostPath: 66 | path: /etc/localtime 67 | - name: jenkins-volume 68 | hostPath: 69 | path: /data/jenkins_home 70 | -------------------------------------------------------------------------------- /jenkins/topic002/deploy/incluster/rabc.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: devops 5 | --- 6 | 7 | apiVersion: v1 8 | kind: ServiceAccount 9 | metadata: 10 | labels: 11 | k8s-app: jenkins 12 | name: jenkins-admin 13 | namespace: devops 14 | 15 | --- 16 | 17 | apiVersion: rbac.authorization.k8s.io/v1 18 | kind: ClusterRole 19 | metadata: 20 | name: jenkins-rbac 21 | namespace: devops 22 | rules: 23 | - apiGroups: [""] 24 | resources: ["pods"] 25 | verbs: ["create","delete","get","list","patch","update","watch"] 26 | - apiGroups: [""] 27 | resources: ["pods/exec"] 28 | verbs: ["create","delete","get","list","patch","update","watch"] 29 | - apiGroups: [""] 30 | resources: ["pods/log"] 31 | verbs: ["get","list","watch"] 32 | - apiGroups: [""] 33 | resources: ["secrets"] 34 | verbs: ["get"] 35 | - apiGroups: [""] 36 | resources: ["nodes"] 37 | verbs: ["get", "list", "watch"] 38 | 39 | --- 40 | 41 | apiVersion: rbac.authorization.k8s.io/v1 42 | kind: ClusterRoleBinding 43 | metadata: 44 | name: jenkins-admin 45 | namespace: devops 46 | subjects: 47 | - kind: ServiceAccount 48 | name: jenkins-admin 49 | namespace: devops 50 | roleRef: 51 | kind: ClusterRole 52 | name: jenkins-rbac 53 | apiGroup: rbac.authorization.k8s.io 54 | -------------------------------------------------------------------------------- /jenkins/topic002/deploy/incluster/svc-clusterip/ingress.yml: -------------------------------------------------------------------------------- 1 | apiVersion: extensions/v1beta1 2 | kind: Ingress 3 | metadata: 4 | name: jenkins-ingress 5 | namespace: devops 6 | spec: 7 | rules: 8 | # TODO: xxx.com 需要修改为你使用的域名 9 | - host: xxx.com 10 | http: 11 | paths: 12 | - backend: 13 | serviceName: jenkins-service 14 | servicePort: 8080 15 | path: / -------------------------------------------------------------------------------- /jenkins/topic002/deploy/incluster/svc-clusterip/service.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: devops 5 | 6 | --- 7 | 8 | apiVersion: v1 9 | kind: Service 10 | metadata: 11 | name: jenkins-service 12 | namespace: devops 13 | spec: 14 | ports: 15 | - name: http 16 | protocol: TCP 17 | port: 8080 18 | targetPort: 8080 19 | - port: 50000 20 | targetPort: 50000 21 | name: agent 22 | selector: 23 | app: jenkins -------------------------------------------------------------------------------- /jenkins/topic002/deploy/incluster/svc-nodeport/nodeport.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: devops 5 | 6 | --- 7 | 8 | apiVersion: v1 9 | kind: Service 10 | metadata: 11 | name: jenkins-service 12 | namespace: devops 13 | spec: 14 | type: NodePort 15 | ports: 16 | - name: http 17 | protocol: TCP 18 | port: 8080 19 | targetPort: 8080 20 | - port: 50000 21 | targetPort: 50000 22 | name: agent 23 | selector: 24 | app: jenkins -------------------------------------------------------------------------------- /jenkins/topic002/deploy/outcluster/rbac.yml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Namespace 3 | metadata: 4 | name: devops 5 | 6 | --- 7 | apiVersion: v1 8 | kind: ServiceAccount 9 | metadata: 10 | labels: 11 | k8s-app: jenkins 12 | name: jenkins-admin 13 | namespace: devops 14 | 15 | --- 16 | 17 | apiVersion: rbac.authorization.k8s.io/v1 18 | kind: Role 19 | metadata: 20 | name: jenkins-rbac 21 | namespace: devops 22 | rules: 23 | - apiGroups: [""] 24 | resources: ["pods"] 25 | verbs: ["create","delete","get","list","patch","update","watch"] 26 | - apiGroups: [""] 27 | resources: ["pods/exec"] 28 | verbs: ["create","delete","get","list","patch","update","watch"] 29 | - apiGroups: [""] 30 | resources: ["pods/log"] 31 | verbs: ["get","list","watch"] 32 | - apiGroups: [""] 33 | resources: ["secrets"] 34 | verbs: ["get"] 35 | - apiGroups: [""] 36 | resources: ["nodes"] 37 | verbs: ["get", "list", "watch"] 38 | - apiGroups: [""] 39 | resources: ["events"] 40 | verbs: ["get", "watch"] 41 | 42 | --- 43 | 44 | apiVersion: rbac.authorization.k8s.io/v1 45 | kind: RoleBinding 46 | metadata: 47 | name: jenkins-admin 48 | namespace: devops 49 | subjects: 50 | - kind: ServiceAccount 51 | name: jenkins-admin 52 | namespace: devops 53 | roleRef: 54 | kind: Role 55 | name: jenkins-rbac 56 | apiGroup: rbac.authorization.k8s.io 57 | -------------------------------------------------------------------------------- /jenkins/topic003/README.md: -------------------------------------------------------------------------------- 1 | # 如何写好 Jenkinsfile | Jenkins Pipeline 2 | 3 | ![Wechat](https://img.shields.io/badge/-colynnliu-%2307C160?style=flat&logo=Wechat&logoColor=white) 4 | [![Twitter](https://img.shields.io/badge/-Twitter-%231DA1F2?style=flat&logo=Twitter&logoColor=white)](https://twitter.com/colynnliu) 5 | 6 | ## 前置条件 7 | 8 | * jenkins 9 | * jenkins plugins (pipeline/blue occean) 10 | 11 | 12 | 13 | ## What is Pipeline / Jenkinsfile 14 | 15 | ![image](./assets/pipeline.png) 16 | 17 | ## 为什么需要 pipeline 18 | 19 | 我们均知道 Jenkins是一个支持多种自动化模式的自动化引擎。 Pipeline在Jenkins上添加了一套强大的自动化工具,支持从简单的持续集成到全面的CD管道的用例。 通过对一系列相关任务进行建模。 20 | 21 | ![image](./assets/realworld-pipeline-flow.png) 22 | 23 | ## Pipeline concepts 24 | 25 | * Pipeline 26 | * Node 27 | * Stage 28 | * Step 29 | 30 | `Pipeline` 下支持 `Parallel`, `Node`不支持`Parallel` 31 | 32 | ## Pipeline Syntax 33 | 34 | ### 1. Declarative Pipeline 35 | 36 | ```Jenkinsfile 37 | pipeline { 38 | /* insert Declarative Pipeline here */ 39 | } 40 | ``` 41 | 42 | ### 2. Scripted Pipeline 43 | 44 | ```Jenkinsfile 45 | node { 46 | stage('Example') { 47 | if (env.BRANCH_NAME == 'master') { 48 | echo 'I only execute on the master branch' 49 | } else { 50 | echo 'I execute elsewhere' 51 | } 52 | } 53 | } 54 | ``` 55 | 56 | ### 3. Synatx Comparison 57 | 58 | __Scripted Pipeline__ offers a tremendous amount of flexibility and extensibility to Jenkins users. The Groovy learning-curve isn’t typically desirable for all members of a given team, so __Declarative Pipeline__ was created to offer a simpler and more opinionated syntax for authoring Jenkins Pipeline. 59 | 60 | ### 4. Use Pipeline through API 61 | 62 | * 63 | 64 | ### 5. Use Pipeline through Blue Occean 65 | 66 | ![image](./assets/use-pipeline-through-blueoccean.png) 67 | 68 | ## Jenkinsfile work with Kubernetes plugin 69 | 70 | [Kubernetes plugin for Jenkins GitHub](https://github.com/jenkinsci/kubernetes-plugin/blob/master/README.md) 71 | 72 | [Kubernetes plugin docs](https://www.jenkins.io/doc/pipeline/steps/kubernetes/#kubernetes-plugin) 73 | 74 | ## Refer to 75 | 76 | 1. 77 | -------------------------------------------------------------------------------- /jenkins/topic003/assets/pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/jenkins/topic003/assets/pipeline.png -------------------------------------------------------------------------------- /jenkins/topic003/assets/realworld-pipeline-flow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/jenkins/topic003/assets/realworld-pipeline-flow.png -------------------------------------------------------------------------------- /jenkins/topic003/assets/use-pipeline-through-blueoccean.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/jenkins/topic003/assets/use-pipeline-through-blueoccean.png -------------------------------------------------------------------------------- /jenkins/topic003/demo/01-hello.Jenkinsfile: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/jenkins/topic003/demo/01-hello.Jenkinsfile -------------------------------------------------------------------------------- /jenkins/topic003/demo/02-ParamsAndEnv.Jenkinsfile: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/jenkins/topic003/demo/02-ParamsAndEnv.Jenkinsfile -------------------------------------------------------------------------------- /jenkins/topic003/demo/03-Parallel.Jenkinsfile: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/jenkins/topic003/demo/03-Parallel.Jenkinsfile -------------------------------------------------------------------------------- /jenkins/topic003/start.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | docker run -d -p 8090:8080 -p 50000:50000 -v ~/Testenv/jenkins_home:/var/jenkins_home --name jenkins jenkins/jenkins:2.277.1-lts-alpine -------------------------------------------------------------------------------- /kubernetes/README.md: -------------------------------------------------------------------------------- 1 | # FAQ 2 | 3 | ## 1. 问题1 - about node taint 4 | 5 | * 问题描述 6 | 7 | ```sh 8 | Warning FailedScheduling 4m38s (x426 over 8h) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate. 9 | ``` 10 | 11 | * 解决方案 12 | 13 | ```yaml 14 | # /var/lib/kubelet/config.yaml 15 | evictionHard: 16 | imagefs.available: 0% 17 | nodefs.available: 0% 18 | ``` 19 | 20 | ```sh 21 | service kubelet restart 22 | ``` 23 | 24 | > Refer To: 25 | 26 | ## 2. 问题2 - unable to retrieve the complete list of server APIs 27 | 28 | * 问题描述 29 | 30 | ```sh 31 | $ kubectl api-resources 32 | 33 | Error: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request 34 | ``` 35 | 36 | * 解决方案 37 | 38 | ```sh 39 | # 找到出问题的api service 40 | $ kubectl get apiservice 41 | 42 | # delete api service 43 | $ kubectl delete apiservice [service-name] 44 | ``` 45 | 46 | ## 3. 问题3 - Error: container has runAsNonRoot and image has non-numeric user (memcache), cannot verify user is non-root 47 | 48 | * 问题描述 49 | 50 | ```sh 51 | $ kubectl describe pod [pod-name] 52 | 53 | Warning Failed 25m (x12 over 27m) kubelet, k8s-host1 Error: container has runAsNonRoot and image has non-numeric user (memcache), cannot verify user is non-root 54 | ``` 55 | 56 | * 问题原因 57 | 58 | Here is the [implementation](https://github.com/kubernetes/kubernetes/blob/v1.25.0/pkg/kubelet/kuberuntime/security_context_others.go#L48) of the verification: 59 | 60 | ```golang 61 | case uid == nil && len(username) > 0: 62 | return fmt.Errorf("container has runAsNonRoot and image has non-numeric user (%s), cannot verify user is non-root (pod: %q, container: %s)", username, format.Pod(pod), container.Name) 63 | ``` 64 | 65 | And here is the [validation](https://github.com/kubernetes/kubernetes/blob/v1.25.0/pkg/kubelet/kuberuntime/kuberuntime_container.go) call with the comment: 66 | 67 | ```golang 68 | // Verify RunAsNonRoot. Non-root verification only supports numeric user. 69 | if err := verifyRunAsNonRoot(pod, container, uid, username); err != nil { 70 | return nil, cleanupAction, err 71 | } 72 | ``` 73 | 74 | As you can see, the only reason of that messages in your case is uid == nil. Based on the comment in the source code, we need to set a numeric user value. 75 | 76 | * 解决方案 77 | 78 | So, for the user with UID=999 you can do it in your pod definition [like that](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod): 79 | 80 | ```yaml 81 | securityContext: 82 | runAsUser: 999 83 | ``` 84 | 85 | ## 4. 问题4 - 现象是:kuberntes节点上拉取镜像被hang住,等一段时间后,镜像又可以正常被拉取下来 86 | 87 | ### 问题描述 88 | 89 | 如标题,场景是在一次性通过helm的方式部署了很多应用,因为业务服务依赖于TiDB,观察发现TiDB的服务在拉取镜像时消耗了很多时间(10mins+, 这个具体的环境有关系),但等了一段时间后 90 | TiDB又正常拉取到镜像并运行起来了。 91 | 92 | ### 问题分析 93 | 94 | 我们知道镜像的拉取动作属于kubelet组件的职责(不知道的哥哥们可以看下,一个pod是如何在kubernetes上运行起来的详细介绍类的文章),我们从TiDB的events事件中只可以看到拉取镜像消耗了近10分钟的时间,而且均是`Normal`事件,观察对应节点Kubelet的日志也没有看到对应的明显错误,所以推断有可能是配置不匹配导致的异常, 所以下一步来到kubernetes官网查看[kubelet与image相关的配置](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) 95 | 96 | ### 问题原因 97 | 98 | 默认情况下`serializeImagePulls=true`. In other words, `kubelet` sends only one image pull request to the image service at a time. Other image pull requests have to wait until the one being processed is complete. 99 | 100 | 我们的场景下是通过helm触发很多charts应用的部署, 虽然镜像仓库均在本机,但有些服务的镜像太大(6G+),在加上默认情况下`serializeImagePulls=true`的影响,而且`kubelet` 镜像拉取的请求的串行发送只是控制的发送端(换句话说,kubelet并不能保证docker service是串行拉取镜像的?),从而也就符合我们在TiDB的事件日志看到的类似如下的现象了。 101 | 102 | ```sh 103 | Normal Pulling 14m kubelet Pulling image "registry.sensenebula.io:5000/pingcap/busybox:1.34.1" 104 | Normal Pulled 3m39s kubelet Successfully pulled image "registry.sensenebula.io:5000/pingcap/busybox:1.34.1" in 11m12.669101707s 105 | Normal Created 3m38s kubelet Created container slowlog 106 | ``` 107 | 108 | ### 解决方案 109 | 110 | 修改`serializeImagePulls=false`, 111 | 112 | When `serializeImagePulls` is set to `false`, the kubelet defaults to no limit on the maximum number of images being pulled at the same time. If you would like to limit the number of parallel image pulls, you can set the field `maxParallelImagePulls` in kubelet configuration. With `maxParallelImagePulls` set to `n`, only `n` images can be pulled at the same time, and any image pull beyond `n` will have to wait until at least one ongoing image pull is complete. 113 | 114 | Limiting the number parallel image pulls would prevent image pulling from consuming too much network bandwidth or disk I/O, when parallel image pulling is enabled. 115 | 116 | You can set `maxParallelImagePulls` to a positive number that is greater than or equal to 1. If you set `maxParallelImagePulls` to be greater than or equal to 2, you must set the `serializeImagePulls` to false. The kubelet will fail to start with invalid `maxParallelImagePulls` settings. 117 | 118 | ### 扩展 - 写给依然有疑惑的你 119 | 120 | 也许你通过上面显示的事件日志看到`Pulling`到`Pulled`消耗了很多的时间,你是否怀疑一个镜像会为什么会拉取这么久,其实在发送`Pulling`只是表示进入了`image pull requests`的队列, 我们就从下面的kubelet的代码给你答案, 121 | 122 | ```go 123 | // pkg/kubelet/images/image_manager.go 124 | 125 | // EnsureImageExists pulls the image for the specified pod and container, and returns 126 | // (imageRef, error message, error). 127 | func (m *imageManager) EnsureImageExists(ctx context.Context, pod *v1.Pod, container *v1.Container, pullSecrets []v1.Secret, podSandboxConfig *runtimeapi.PodSandboxConfig, podRuntimeHandler string) (string, string, error) { 128 | ... 129 | m.podPullingTimeRecorder.RecordImageStartedPulling(pod.UID) 130 | m.logIt(ref, v1.EventTypeNormal, events.PullingImage, logPrefix, fmt.Sprintf("Pulling image %q", container.Image), klog.Info) 131 | startTime := time.Now() 132 | pullChan := make(chan pullResult) 133 | m.puller.pullImage(ctx, spec, pullSecrets, pullChan, podSandboxConfig) 134 | imagePullResult := <-pullChan 135 | if imagePullResult.err != nil { 136 | m.logIt(ref, v1.EventTypeWarning, events.FailedToPullImage, logPrefix, fmt.Sprintf("Failed to pull image %q: %v", container.Image, imagePullResult.err), klog.Warning) 137 | m.backOff.Next(backOffKey, m.backOff.Clock.Now()) 138 | 139 | msg, err := evalCRIPullErr(container, imagePullResult.err) 140 | return "", msg, err 141 | } 142 | m.podPullingTimeRecorder.RecordImageFinishedPulling(pod.UID) 143 | imagePullDuration := time.Since(startTime).Truncate(time.Millisecond) 144 | m.logIt(ref, v1.EventTypeNormal, events.PulledImage, logPrefix, fmt.Sprintf("Successfully pulled image %q in %v (%v including waiting). Image size: %v bytes.", 145 | ... 146 | ``` 147 | 148 | ```go 149 | // pkg/kubelet/images/puller.go 150 | 151 | // Maximum number of image pull requests than can be queued. 152 | const maxImagePullRequests = 10 153 | 154 | type serialImagePuller struct { 155 | imageService kubecontainer.ImageService 156 | pullRequests chan *imagePullRequest 157 | } 158 | 159 | func newSerialImagePuller(imageService kubecontainer.ImageService) imagePuller { 160 | imagePuller := &serialImagePuller{imageService, make(chan *imagePullRequest, maxImagePullRequests)} 161 | go wait.Until(imagePuller.processImagePullRequests, time.Second, wait.NeverStop) 162 | return imagePuller 163 | } 164 | 165 | type imagePullRequest struct { 166 | ctx context.Context 167 | spec kubecontainer.ImageSpec 168 | pullSecrets []v1.Secret 169 | pullChan chan<- pullResult 170 | podSandboxConfig *runtimeapi.PodSandboxConfig 171 | } 172 | 173 | func (sip *serialImagePuller) pullImage(ctx context.Context, spec kubecontainer.ImageSpec, pullSecrets []v1.Secret, pullChan chan<- pullResult, podSandboxConfig *runtimeapi.PodSandboxConfig) { 174 | sip.pullRequests <- &imagePullRequest{ 175 | ctx: ctx, 176 | spec: spec, 177 | pullSecrets: pullSecrets, 178 | pullChan: pullChan, 179 | podSandboxConfig: podSandboxConfig, 180 | } 181 | } 182 | 183 | func (sip *serialImagePuller) processImagePullRequests() { 184 | for pullRequest := range sip.pullRequests { 185 | startTime := time.Now() 186 | imageRef, err := sip.imageService.PullImage(pullRequest.ctx, pullRequest.spec, pullRequest.pullSecrets, pullRequest.podSandboxConfig) 187 | var size uint64 188 | if err == nil && imageRef != "" { 189 | // Getting the image size with best effort, ignoring the error. 190 | size, _ = sip.imageService.GetImageSize(pullRequest.ctx, pullRequest.spec) 191 | } 192 | pullRequest.pullChan <- pullResult{ 193 | imageRef: imageRef, 194 | imageSize: size, 195 | err: err, 196 | // Note: pullDuration includes credential resolution and getting the image size. 197 | pullDuration: time.Since(startTime), 198 | } 199 | } 200 | } 201 | 202 | ``` 203 | 204 | ### 代码注解 205 | 206 | 通过上面的函数可以看到,大致逻辑是这样的: 207 | 208 | 1. `image_manager.go` : 209 | a. 在开始拉取镜像前先输出一条 `Pulling image %q`的日志, 210 | b. 再创建一个接收拉取镜像结果的channel, 211 | c. 调用`imagePuller`的`pullimage`方法,将镜像拉取的请求添加至队列内, 212 | d. `imagePullResult := <-pullChan` 等待镜像拉取的结果。 213 | 214 | 2. `puller.go`: 215 | a. `pullImage`方法其实只是将拉取镜像的请求添加至队列, 216 | b. 再依赖`processImagePullRequests` 循环调用imageService拉取镜像。 217 | 218 | 3. 所以你在日志中看到的`Pulling`事件并不能代表当前此镜像正在拉取,只能说明此镜像拉取的请求即将被添加至队列内(因为队列最大值是 10,更多的请求将会pending), 219 | 220 | ### 附录 221 | 222 | 1. 查看当前`kubelet`的所有配置参数 223 | 224 | ```sh 225 | kubectl get --raw "/api/v1/nodes//proxy/configz" | jq 226 | ``` 227 | 228 | Just make sure you replace`` with your node name. And if you don't have `jq` installed, leave out the `| jq` part as that's only for formatting. 229 | -------------------------------------------------------------------------------- /kubernetes/ascend/README.md: -------------------------------------------------------------------------------- 1 | # atlas300V Pro run on Kubernetes 2 | 3 | - [atlas300V Pro run on Kubernetes](#atlas300v-pro-run-on-kubernetes) 4 | - [前言](#前言) 5 | - [环境信息](#环境信息) 6 | - [硬件信息](#硬件信息) 7 | - [软件信息](#软件信息) 8 | - [名词解释](#名词解释) 9 | - [CANN 异构计算架构](#cann-异构计算架构) 10 | - [总体架构](#总体架构) 11 | - [MindX DL组件](#mindx-dl组件) 12 | - [组件安装](#组件安装) 13 | - [概述](#概述) 14 | - [安装驱动及固件](#安装驱动及固件) 15 | - [安装toolbox](#安装toolbox) 16 | - [下载链接](#下载链接) 17 | - [安装容器化支持](#安装容器化支持) 18 | - [集群调度(Kubernetes)](#集群调度kubernetes) 19 | - [安装部署](#安装部署) 20 | - [通用操作](#通用操作) 21 | - [Ascend Device Plugin 镜像](#ascend-device-plugin-镜像) 22 | - [Ascend for volcano 镜像](#ascend-for-volcano-镜像) 23 | - [下载chart包](#下载chart包) 24 | - [至此结束了](#至此结束了) 25 | - [虚拟化](#虚拟化) 26 | - [应用场景及方案](#应用场景及方案) 27 | - [虚拟化规则](#虚拟化规则) 28 | - [使用约束](#使用约束) 29 | - [静态虚拟化](#静态虚拟化) 30 | - [创建vNPU](#创建vnpu) 31 | - [销毁指定vNPU](#销毁指定vnpu) 32 | - [挂载vNPU](#挂载vnpu) 33 | - [动态虚拟化](#动态虚拟化) 34 | - [如何使用](#如何使用) 35 | - [资源监测](#资源监测) 36 | - [前置条件](#前置条件) 37 | - [npu-exporter镜像](#npu-exporter镜像) 38 | - [面板](#面板) 39 | - [命令行的使用](#命令行的使用) 40 | - [写在前面](#写在前面) 41 | - [命令行使用示例](#命令行使用示例) 42 | - [npu服务的Dockerfile声明](#npu服务的dockerfile声明) 43 | 44 | ## 前言 45 | 46 | 随着国产化信创的要求,对于arm形态(鲲鹏架构)的atlas NPU卡做了些适配改造的工作,由于前期对于国产化硬件了解不多,导致踩了很多坑,所以今天就一起梳理下,如果你也刚好有这个需求,不妨接着往下看,期望对你会有些收获。 47 | 48 | 对于Atlas系列的硬件,有加速卡、服务器、开发者套件等,我们此文只讨论加速卡,对于atlas的推理、训练服务器我们不涉及,当然不管你是采购的何种型号的服务器,只要你使用的AI加速卡也是Atlas 那你就可以继续阅读,(不同型号或是规格的服务器可能导致你可使用的加速卡的种类和个数不同而已。) 49 | 50 | [Ascend官网](https://www.hiascend.com)的文档其实比较全面,但是因为相对有些分散(涉及到很多的名词,默认值并没有按照前后逻辑一步步说明清楚),所以本篇文档的作用更多的是一个串联,可以让你更好的去阅读Ascend社区的相关文档。 51 | 52 | ## 环境信息 53 | 54 | ### 硬件信息 55 | 56 | - Atlas 800 推理服务器(型号:3000): CPU: 鲲鹏920*2 / AI加速卡: 2个Atlas 300V Pro 57 | - KunLun G2289 GPU服务器: CPU: 鲲鹏920*2 / AI加速卡: 2个Atlas 300I Duo 58 | 59 | > Atlas 800 推理服务器(型号:3000): 是基于鲲鹏920与昇腾310芯片的AI推理服务器,最大可支持8个Atlas 300I/V Pro,提供强大的实时推理能力和视频分析能力,广泛应用于中心侧AI推理场景。 60 | 61 | > KunLun G2280 GPU服务器: 一是款2U2路的GPU服务器,是基于鲲鹏920与昇腾310芯片的AI推理服务器,最大可支持8个Atlas 300I/V Pro推理卡,为深度学习和推理提供强大算力,广泛应用于中心侧AI推理如智慧城市、智慧交通、智慧金融等多场景。 62 | 63 | > AI加速卡 Atlas 300V Pro ([规格详情](https://support.huawei.com/enterprise/zh/doc/EDOC1100209004/e6767cc0)) 与Atlas 300I Duo卡均是配置的Ascend 310P AI处理器产品规格,对于ascend AI芯片想了解更多请点击[这里](https://zhuanlan.zhihu.com/p/662674649), 另外我们只有使用了昇腾的AI加速卡,其他的可以通过[昇腾AI产品形态说明](https://www.hiascend.com/document/detail/zh/canncommercial/700/productform/hardwaredesc_0001.html)或是[昇腾官网产品页了解](https://www.hiascend.com/zh/ecosystem/industry) 64 | 65 | ### 软件信息 66 | 67 | - OS: OpenEuler 23.04 68 | - Kubernetes: v1.20.11 69 | - Docker: 26.1.2 70 | 71 | 对于操作系统的磁盘分区略过,可参看[这里](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/clusterschedulingig/dlug_installation_006.html#ZH-CN_TOPIC_0000001882784853__section02091655142310) 72 | 73 | ## 名词解释 74 | 75 | ### CANN 异构计算架构 76 | 77 | 异构计算架构CANN(Compute Architecture for Neural Networks)是华为针对AI场景推出的异构计算架构,向上支持多种AI框架,包括MindSpore、PyTorch、TensorFlow等,向下服务AI处理器与编程,发挥承上启下的关键作用,是提升昇腾AI处理器计算效率的关键平台。同时针对多样化应用场景,提供多层次编程接口,支持用户快速构建基于昇腾平台的AI应用和业务。 78 | 79 | #### 总体架构 80 | 81 | CANN提供了功能强大、适配性好、可自定义开发的AI异构计算架构, 更多请访问[官网链接](https://www.hiascend.com/document/detail/zh/canncommercial/80RC1/quickstart/quickstart/quickstart_18_0003.html) 82 | 83 | ![Image](./images/cann-01.png) 84 | 85 | ### MindX DL组件 86 | 87 | MindX DL(昇腾深度学习组件)是支持 Atlas训练卡、推理卡的深度学习组件,提供昇腾 AI 处理器集群调度、昇腾 AI 处理器性能测试、模型保护等基础功能,快速使能合作伙伴进行深度学习平台开发, 更多请参加[官方文档](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/description/productdescription/dlug_description_001.html) 88 | 89 | ![Image](./images/mindxdl-01.png) 90 | 91 | ## 组件安装 92 | 93 | ### 概述 94 | 95 | 我们先整体看下开发运行环境的搭建流程,如下图 96 | ![Image](./images/cann-02.png) 97 | 98 | 注解:『环境检查』我们这篇文章就先略过,重点在『驱动和固件安装』这个步骤,因为我们实际的运行环境是在kubernetes平台上的容器内,所以还要再添加上容器化环境准备的步骤(也就是部署『MindX DL』相关的组件), 对于『CANN软件安装』的步骤也就内化到运行的容器内了。 99 | 100 | 对于不同形态(物理机/虚拟机/容器)的运行环境安装的组件也会有些差异,我再补充一张图,[原图在这里](https://www.hiascend.com/document/detail/zh/canncommercial/700/envdeployment/instg/instg_0060.html) 101 | 102 | ![Image](./images/cann-03.png) 103 | 104 | ### 安装驱动及固件 105 | 106 | - [原文链接](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha003/quickstart/quickstart/quickstart_18_0005.html) 107 | 108 | 下载对应版本的驱动及固件,说明『社区版』/『商用版』软件版本上没有区别??, 109 | 110 | Atlas 300V pro加速卡的社区版本[下载链接示例](https://www.hiascend.com/hardware/firmware-drivers/community?product=2&model=16&cann=8.0.RC2.alpha003&driver=1.0.22.alpha) 111 | 112 | - Ascend-hdk-310p-npu-driver 113 | - Ascend-hdk-310p-npu-firmware 114 | 115 | ```sh 116 | # 驱动及固件安装 117 | # 下载驱动及固件至服务器 118 | 119 | # 创建驱动运行用户HwHiAiUser。 120 | groupadd -g HwHiAiUser 121 | useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser -s /bin/bash 122 | 123 | # 增加对软件包的可执行权限。 124 | chmod +x Ascend-hdk--npu-driver_23.0.2_linux-x86-64.run 125 | chmod +x Ascend-hdk--npu-firmware_7.1.0.4.220.run 126 | 127 | # 执行如下命令,校验安装包的一致性和完整性。 128 | ./Ascend-hdk--npu-driver_23.0.2_linux-x86-64.run --check 129 | ./Ascend-hdk--npu-firmware_7.1.0.4.220.run --check 130 | 131 | # 执行如下命令安装驱动。 132 | ./Ascend-hdk--npu-driver_23.0.2_linux-x86-64.run --full --install-for-all 133 | 134 | # 出现类似如下回显信息,说明安装成功。 135 | # Driver package installed successfully! 136 | 137 | # 执行如下命令安装固件。 138 | ./Ascend-hdk--npu-firmware_7.1.0.4.220.run --full 139 | 140 | #出现类似如下回显信息,说明安装成功。 141 | Firmware package installed successfully! Reboot now or after driver installation for the installation/upgrade to take effect 142 | 143 | # 驱动和固件安装完成后,重启系统。 144 | systemctl reboot 145 | ``` 146 | 147 | > 对于『CANN软件安装』步骤如上面我们所提及的,调整至容器的`Dockerfile`来完成声明安装。 148 | 149 | - Ascend-hdk-310p-mcu 150 | 151 | - Ascend-mindx-toolbox 152 | - Ascend-cann-nnrt-6.0.2.1 [安装文档](https://www.hiascend.com/document/detail/zh/canncommercial/700/envdeployment/instg/instg_0060.html) 153 | 154 | - Ascend-docker-runtime 155 | 156 | ### 安装toolbox 157 | 158 | #### 下载链接 159 | 160 | 161 | 162 | 安装过程可直接参考官方文档[链接](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/toolbox/ascenddmi/toolboxug_0004.html) 163 | 164 | ### 安装容器化支持 165 | 166 | __Ascend Docker Runtime__(又称Ascend Docker,又称昇腾容器运行时)是MindX DL的基础组件,用于为所有的训练或推理作业提供昇腾AI处理器(Ascend NPU)容器化支持,使用户AI作业能够以Docker容器的方式平滑运行在昇腾设备之上,Ascend Docker Runtime逻辑接口如下图所示。 167 | 168 | ![Image](./images/mindxdl-02.png) 169 | 170 | 安装Ascend Docker Runtime 可直接参看[文档](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/dockerruntimeug/dlruntime_ug_006.html) 171 | 172 | 至此,你就已经具备了docker客户端的方式来启动自己的npu计算类服务了, 更多的可以参看官方文档[链接](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/dockerruntimeug/dlruntime_ug_011.html) 173 | 174 | ## 集群调度(Kubernetes) 175 | 176 | 集群调度组件基于业界流行的集群调度系统Kubernetes,增加了昇腾AI处理器(下文出现的NPU表示昇腾AI处理器)的支持,提供昇腾AI处理器资源管理和查看、优化调度和分布式训练集合通信配置等基础功能。深度学习平台开发厂商可以有效减少底层资源调度相关软件开发工作量,快速使能合作伙伴基于MindX DL开发深度学习平台。 177 | 178 | - 官方[文档链接](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/clusterschedulingsd/dl_present_001.html) 179 | 180 | ### 安装部署 181 | 182 | 我们基于手动安装的方式来部署,参考[获取软件包链接](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/clusterschedulingig/dlug_installation_010.html), 根据需要启用的功能部署相应的MindX DL组件,并非所有的软件包均需要下载安装。 183 | 184 | 下面我只能基于我们使用的功能来对『安装部署』做一个说明,本实践我们只是基于动态虚拟化实现集群调度,所以只需要部署Ascend Docker Runtime、Ascend Device Plugin、Volcano三个组件。 185 | 186 | 『安装容器化支持』的步骤时我们已经说明了`Ascend Docker Runtime`的安装部署过程,对于另外两个组件可以通过上面的[软件包链接](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/clusterschedulingig/dlug_installation_010.html)选择下载,也可以直接通过ascend社区的gitee group来选择相应的版本来下载安装. 187 | 188 | #### 通用操作 189 | 190 | 在安装两个组件前,在对应的安装节点上我们需要执行如下 [__通用操作__](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/clusterschedulingig/dlug_installation_015.html) 191 | > 说明:只列举目前操作,其他操作会包含在chart包实现,请配合下文的chart包链接一起测试实践。 192 | 193 | 1. 创建节点标签 194 | 195 | ```sh 196 | # 因为我们使用的均是Ascend 310P AI处理器,所以选择如下标签 197 | kubectl label nodes 主机名称 node-role.kubernetes.io/worker=worker 198 | kubectl label nodes 主机名称 workerselector=dls-worker-node 199 | kubectl label nodes 主机名称 host-arch=huawei-arm或host-arch=huawei-x86 200 | kubectl label nodes 主机名称 accelerator=huawei-Ascend310P 201 | ``` 202 | 203 | > 如果你是其他的产品类型,可根据[链接](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/clusterschedulingig/dlug_installation_014.html)选择. 204 | 205 | 2. 创建用户 206 | 207 | ```sh 208 | # HwHiAiUser是驱动或CANN软件包需要的软件运行用户. 209 | # ubuntu操作系统 210 | useradd -d /home/hwMindX -u 9000 -m -s /usr/sbin/nologin hwMindX 211 | usermod -a -G HwHiAiUser hwMindX 212 | 213 | # CentOS操作系统 214 | useradd -d /home/hwMindX -u 9000 -m -s /sbin/nologin hwMindX 215 | usermod -a -G HwHiAiUser hwMindX 216 | ``` 217 | 218 | > 更多关于组件用户的说明参看[链接](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/clusterschedulingig/dlug_installation_015.html) 219 | 220 | 3. 命名空间 221 | 222 | 不要随意改变命名空间,请参看[文档要求](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/clusterschedulingig/dlug_installation_017.html), 因为看代码有些命名空间已经固定为常量了,所以不要随意切换命名空间,不然跑不起来就要自己看日志撸代码排查了。 223 | 224 | - `Ascend Device Plugin` 对应[Gitee Repo Release](https://gitee.com/ascend/ascend-device-plugin/releases), 部署的命名空间: `kube-system` 225 | - `Volcano` 对应[Gitee Repo Release](https://gitee.com/ascend/ascend-for-volcano/releases),部署的命名空间:`volcano-system` 226 | 227 | > 此次实践安装的版本为`v5.0.1` 228 | 229 | #### Ascend Device Plugin 镜像 230 | 231 | 1. 准备镜像 232 | 233 | ```sh 234 | # 下载版本包 235 | wget https://gitee.com/ascend/ascend-device-plugin/releases/download/v5.0.1-Patch1/Ascend-mindxdl-device-plugin_5.0.1.1_linux-aarch64.zip 236 | 237 | # 解压制作镜像 238 | mkdir device-plugin 239 | unzip Ascend-mindxdl-device-plugin_5.0.1.1_linux-aarch64.zip -d device-plugin/ 240 | 241 | cd device-plugin 242 | # tag根据自己的要求来定义 243 | docker build -t colynn/ascend-device-plugin:5.0.1 -f Dockerfile . 244 | ``` 245 | 246 | #### Ascend for volcano 镜像 247 | 248 | 1. 准备镜像 249 | 250 | ```sh 251 | # 下载版本包 252 | wget https://gitee.com/ascend/ascend-for-volcano/releases/download/v5.0.1/Ascend-mindxdl-volcano_5.0.1_linux-aarch64.zip 253 | 254 | # 解压制作镜像 255 | mkdir volcano 256 | unzip Ascend-mindxdl-volcano_5.0.1_linux-aarch64.zip -d volcano/ 257 | # 解压后会有volcano-v1.4.0/volcano-v1.7.0两个版本,请根据你环境的kubernetes版本来选择. v1.4.0版本支持的K8s版本为1.16.x~1.21.x;v1.7.0版本支持的K8s版本为1.17.x~1.25.x。请根据K8s版本选择合适的Volcano版本。 258 | cd volcano/volcano-v1.7.0 259 | 260 | # tag根据自己的要求来定义 261 | docker build -t colynn/ascend-volcano-controller:5.0.1-v1.7.0 -f Dockerfile-controller . 262 | docker build -t colynn/ascend-volcano-scheduler:5.0.1-v1.7.0 -f Dockerfile-scheduler . 263 | ``` 264 | 265 | #### 下载chart包 266 | 267 | > 1.说明:当前chart包可以理解为只是多个yaml文件的集合(可以实现ascend-device-plugin/ascend for volcano/npu-exporter的部署),并非最佳实践仅供参考。 268 | 269 | > 2.默认此chart包部署的是基于动态虚拟化的版本,也就是依赖npu的资源也需要将schedulerName改为volcano,还有其他的labels设置,可以参看这个[链接](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/cpaug/cpaug_0019.html)的『使用方法』章节调整 270 | 271 | charts包地址: 272 | 273 | 根据上面步骤自己生成的镜像tag调整values.yaml文件,然后通过helm部署 274 | 275 | ```sh 276 | cd ascend-device-plugin 277 | helm install -n kube-system ascend-device-plugin . 278 | ``` 279 | 280 | ### 至此结束了 281 | 282 | 如果你已经看到了这里,基于atlas 300V Pro就已经完成了基于动态虚拟化的集群调度部署,接下来的章节就再讨论下 __虚拟化__ 、__资源监测__ 、命令行的使用、还有 __npu服务的Dockerfile声明__ . 283 | 284 | ## 虚拟化 285 | 286 | 昇腾虚拟化实例功能是指通过资源虚拟化的方式将物理机配置的NPU(昇腾AI处理器)切分成若干份vNPU(虚拟NPU)挂载到容器中使用,虚拟化管理方式能够实现统一不同规格资源的分配和回收处理,满足多用户反复申请/释放的资源操作请求。 287 | 288 | 其他的就不在这里摘抄了,可以直接阅读原[链接](https://www.hiascend.com/document/detail/zh/computepoweralloca/300/cpaug/cpaug/cpaug_00002.html) 289 | 290 | ### 应用场景及方案 291 | 292 | 昇腾虚拟化实例功能适用于多用户多任务并行,且每个任务算力需求较小的场景。 293 | 294 | 在昇腾解决方案中,昇腾虚拟化实例功能当前支持以下应用方案: 295 | 296 | - 原生Docker:结合原生Docker使用。通过npu-smi工具创建多个vNPU,通过Docker拉起运行容器时将vNPU挂载到容器。 297 | - Ascend Docker Runtime:结合Ascend Docker Runtime(容器引擎插件)使用。通过npu-smi工具创建多个vNPU,通过Ascend Docker拉起运行容器时将vNPU挂载到容器。 298 | - 集群调度组件:结合MindX DL中的集群调度组件Ascend Device Plugin、Volcano使用,支持静态虚拟化方案。 299 | 静态虚拟化方式下,通过npu-smi工具提前创建多个vNPU,当用户需要使用vNPU资源时,基于Ascend Device Plugin组件的设备发现、设备分配、设备健康状态上报功能,分配vNPU资源提供给上层用户使用,此方案下,集群调度组件的Volcano组件为可选。 300 | 301 | ### 虚拟化规则 302 | 303 | 对于不管是静态虚拟化还是动态虚拟化,其实均依赖于[虚拟化模板](),只是静态虚拟化需要你自己使用`npu-smi`基于虚拟化模板来手动创建vNPU, 而动态虚拟化则是将这个操作交给了`ascend-device-plugins`, 你可以简单理解为npu只能按照虚拟化模板的组合的方式来分配,而不能自己随意的申请使用资源, 比如我们使用的是一个310P处理器最大支持切分7个虚拟化实例,如下图Ascend 310P支持虚拟化实例组合 304 | 305 | ![Image](./images/avi-002.png) 306 | 307 | ### 使用约束 308 | 309 | > 参看[链接](https://www.hiascend.com/document/detail/zh/computepoweralloca/300/cpaug/cpaug/cpaug_00008.html) 310 | 311 | - 物理NPU虚拟化出vNPU后,不支持再将该物理NPU挂载到容器使用;如果物理机上创建了虚拟机,不支持再将该物理NPU直通到虚拟机使用。 312 | - Atlas 300I Duo 推理卡上两路NPU的工作模式必须一致。即均使用虚拟化实例功能,或均整卡使用。请根据业务自行规划。 313 | - 虚拟化实例模板是用于对整台服务器上所有标卡进行资源切分,不支持不同规格的标卡混插。如Atlas 300V Pro 视频解析卡支持24G和48G内存规格,不支持这两种内存规格的卡混插进行虚拟化。 314 | - 使用动态虚拟化时,在芯片复位或系统重启后,已创建的vNPU会自动销毁,需要重新创建. 参看[接口文档说明](https://support.huawei.com/enterprise/zh/doc/EDOC1100388862/4bad8e23?idPath=23710424|251366513|22892968|252309113|254184887), 对于我们私有化服务器重启的场景必然是会存在的,当前的解决方案是通过kubernetes的event事件侦听Pod的状态,匹配对应的异常时,删除Pod触发volcano的重新调度即可修复。 315 | 316 | ### 静态虚拟化 317 | 318 | > 参看[链接](https://www.hiascend.com/document/detail/zh/computepoweralloca/300/cpaug/cpaug/cpaug_00010.html) 319 | > 注意: 静态虚拟化模式下通过`npu-smi`创建的vNPU分组需要`ascend-device-plugin`才能生效哦(具体加载的npu资源信息可以通过ascend-device-plugin可以查看到)。 320 | 321 | #### 创建vNPU 322 | 323 | - 设置虚拟化模式 324 | 命令格式:`npu-smi set -t vnpu-mode -d mode` 325 | 326 | 示例:`npu-smi set -t vnpu-mode -d 0 # 虚拟化实例功能容器模式` 327 | 328 | - 创建vNPU 329 | 命令格式:`npu-smi set -t create-vnpu -i id -c chip_id -f vnpu_config [-v vnpu_id] [-g vgroup_id]` 330 | 331 | 对于创建vNPU我们这里只解释下`-f`这个参数, 通过[虚拟化实例组合](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc2/clusterscheduling/ref/cpaug/cpaug_0007.html)你可以了解到Ascend 310P支持虚拟化实例组合如下图所示,1个Ascend 310P处理器最大支持切分7个虚拟化实例,用户需要按照组合规格进行对NPU硬件资源进行虚拟化切分,为`-f`参数选择`vir04`/`vir04_3c`/`vir02`/`vir02_1c`/`vir01`等虚拟化实例模板。 332 | 333 | ![Image](./images/avi-001.png) 334 | 335 | - 查询vNPU信息。 336 | 命令格式:`npu-smi info -t info-vnpu -i id -c chip_id` 337 | 338 | #### 销毁指定vNPU 339 | 340 | 命令格式:`npu-smi set -t destroy-vnpu -i id -c chip_id -v vnpu_id` 341 | 342 | 示例:执行`npu-smi set -t destroy-vnpu -i 1 -c 0 -v 103`销毁设备1编号0的芯片中编号为103的vNPU设备。回显以下信息表示销毁成功。 343 | 344 | ```sh 345 | Status : OK 346 | Message : Destroy vnpu 103 success 347 | ``` 348 | 349 | > 说明: 在销毁指定vNPU之前,请确保此设备未被使用。 350 | 351 | #### 挂载vNPU 352 | 353 | 对于静态虚拟化的使用有原生docker、Ascend Docker、MindX DL集群组件(Ascend Device Plugin)三种方式,官方文档已经介绍的很清楚了,不在再赘述了,参看[链接](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc2/clusterscheduling/ref/cpaug/cpaug_0014.html) 354 | 355 | ### 动态虚拟化 356 | 357 | 动态vNPU调度(推理)特性只支持使用Volcano作为调度器,不支持使用其他调度器, 对于组件部署在上面的『集群调度』章节已经说明过了,另外对于[官方文档](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc2/clusterscheduling/clusterscheduling/clusterscheduling/mxdlug_scheduling_058.html)里说明的需要部署的`ClusterD`/`NodeD`组件 358 | 主要用于推理卡故障重调度、推理卡故障恢复,所以就没有在部署环境说明。 359 | 360 | 那么我们就再说明下动态虚拟化的[实现原理](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc2/clusterscheduling/clusterscheduling/clusterscheduling/mxdlug_scheduling_059.html),还有如何使用。 361 | 362 | ![Image](./images/avi-003.png) 363 | 364 | #### 如何使用 365 | 366 | > 参看[文档](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc2/clusterscheduling/clusterscheduling/clusterscheduling/mxdlug_scheduling_068.html) 367 | 368 | 当然你可以根据文档中的建议将创建、查询或删除操作任务的动作转换成K8s官方API中定义的对象,通过官方库里面提供的API发送给K8s的API Server或者将yaml内容转换成以JSON格式直接发送给K8s的API Server。 369 | 370 | 还有另外一种方案将[示例yaml](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc2/clusterscheduling/clusterscheduling/clusterscheduling/mxdlug_scheduling_063.html)的配置集成至自己服务的chart包内,通过helm包来完成推理服务的创建。 371 | 372 | ## 资源监测 373 | 374 | ### 前置条件 375 | 376 | > 在使用资源监测特性前,需要确保NPU-Exporter组件已经安装, 前面『下载chart包』章节的chart就已经包含了NPU-Exporter, 是通过ServiceMonitor的方式定义将数据推送给prometheus, 因为对于ascend社区制作好的镜像下载均存在授权申请的问题,所以下面我们就就补充下`npu-exporter`的镜像准备,可以自己根据[gitee的对应版本](https://gitee.com/ascend/ascend-npu-exporter/releases)进行下载, 其他的可参考[手动安装/NPU-Exporter文档](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc2/clusterscheduling/clusterschedulingig/clusterschedulingig/dlug_installation_031.html) 377 | 378 | #### npu-exporter镜像 379 | 380 | ```sh 381 | # 下载版本包 382 | wget https://gitee.com/ascend/ascend-npu-exporter/releases/download/v6.0.0-RC2/Ascend-mindxdl-npu-exporter_6.0.RC2_linux-aarch64.zip 383 | 384 | # 解压制作镜像 385 | mkdir npu-exporter 386 | unzip Ascend-mindxdl-npu-exporter_6.0.RC2_linux-aarch64.zip -d npu-exporter/ 387 | 388 | cd npu-exporter 389 | # tag根据自己的要求来定义 390 | docker build -t colynn/ascend-npu-exporter:6.0.2 -f Dockerfile . 391 | ``` 392 | 393 | 对于npu-exporter的部署,可以参考上面章节的[ascend-device-plugin chart](https://github.com/colynn/ascend-device-plugin), templates下已经包含npu-exporter及ServiceMonitor的定义。 394 | 395 | ##### 面板 396 | 397 | 首先你已经安装配置了grafana, 导入[ascend-npu-exporter 面板](https://grafana.com/grafana/dashboards/20592-ascend-npu-exporter/), 目前的面板数据对于vNPU的监控还不是很完善,现在看ascend社区组件的更新速度,应该很快可以解决。 398 | 399 | 如果你觉得配置面板有些复杂,那么你也可以使用下面的章节通过`npu-smi` / `ascend-dmi`来查看资源信息。 400 | 401 | ## 命令行的使用 402 | 403 | ### 写在前面 404 | 405 | 如果你follow上面的实践步骤,确保驱动及固件已经安装,那么你可以直接使用`npu-smi`这个cli命令,对于`ascend-dmi`可以参考[性能测试/安装ToolBox](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc2/toolbox/ascenddmi/toolboxug_0004.html)来进行安装。 406 | 407 | > 注意:`npu-smi` / `ascend-dmi`工具仅支持在NPU设备上使用,不支持在vNPU设备上使用。 408 | > 更多的ascend-dmi工具的使用请参看[官方文档](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc2/toolbox/ascenddmi/toolboxug_0011.html) 409 | 410 | ### 命令行使用示例 411 | 412 | ```sh 413 | # 使用率 414 | npu-smi info -t usages -i [card-id] 415 | 416 | # 温度 417 | npu-smi info -t power -i [card-id] 418 | 419 | ## 另外如果你也安装toolbox,可以直接使用ascend-dmi 420 | # 需要下载安装对应版本的 Ascend-mindx-toolbox 421 | ascend-dmi -i // 温度、功耗、NPU占用率、内存占用率 422 | ``` 423 | 424 | ## npu服务的Dockerfile声明 425 | 426 | WIP 427 | -------------------------------------------------------------------------------- /kubernetes/ascend/images/avi-001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/kubernetes/ascend/images/avi-001.png -------------------------------------------------------------------------------- /kubernetes/ascend/images/avi-002.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/kubernetes/ascend/images/avi-002.png -------------------------------------------------------------------------------- /kubernetes/ascend/images/avi-003.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/kubernetes/ascend/images/avi-003.png -------------------------------------------------------------------------------- /kubernetes/ascend/images/cann-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/kubernetes/ascend/images/cann-01.png -------------------------------------------------------------------------------- /kubernetes/ascend/images/cann-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/kubernetes/ascend/images/cann-02.png -------------------------------------------------------------------------------- /kubernetes/ascend/images/cann-03.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/kubernetes/ascend/images/cann-03.png -------------------------------------------------------------------------------- /kubernetes/ascend/images/mindxdl-01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/kubernetes/ascend/images/mindxdl-01.png -------------------------------------------------------------------------------- /kubernetes/ascend/images/mindxdl-02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/kubernetes/ascend/images/mindxdl-02.png -------------------------------------------------------------------------------- /kubernetes/concepts/admission-webhook.md: -------------------------------------------------------------------------------- 1 | # Admission Webhook 2 | 3 | > [Dynamic Admission Control](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) 4 | 5 | - [Admission Webhook](#admission-webhook) 6 | - [写在前面](#写在前面) 7 | - [Admission controllers](#admission-controllers) 8 | - [Why need admission controllers](#why-need-admission-controllers) 9 | - [Which plugins(admission controllers) are enabled by default](#which-pluginsadmission-controllers-are-enabled-by-default) 10 | - [Extending Kubernetes admission controllers with webhooks](#extending-kubernetes-admission-controllers-with-webhooks) 11 | - [What are admission webhooks](#what-are-admission-webhooks) 12 | - [Reference To](#reference-to) 13 | 14 | ## 写在前面 15 | 16 | 在我们聊 __Admission Webhook__ 之前先聊聊Admission controllers. 17 | 18 | ## Admission controllers 19 | 20 | Admission controllers are a powerful Kubernetes-native feature that helps you define and customize what is allowed to run on your cluster. 21 | 22 | ![Image](../images/Kubernetes-Admission-controllers-00-featured.png) 23 | 24 | A Kubernetes admission controller is code that evaluates requests to the Kubernetes API server, then determines whether or not to allow the request. 25 | 26 | The evaluation happens after the API server has already authenticated and authorized the request, but before the request is granted and implemented. 27 | 28 | In other words, even if the API server has determined a request to be valid (which it would do based on the RBAC Roles and ClusterRoles you have set up), admission controllers will evaluate the request and make a determination about whether or not to accept it based on their own set of rules. 29 | 30 | ## Why need admission controllers 31 | 32 | Many advanced features in Kubernetes require an admission controller to be enabled in order to properly support the feature. As a result, a Kubernetes API server that is not properly configured with the right set of admission controllers is an incomplete server and will not support all the features you expect. 33 | 34 | For example, they can mitigate denial of service (DoS) attacks on multitenant clusters. Consider the [`LimitRanger`](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#limitranger) plugin, which — as the name suggests — enforces limit ranges. `LimitRanger` define mandatory ranges of resource consumption on a per-namespace basis(在每个命名空间的基础上定义资源消耗的强制性范围). This prevents tenants from depleting one another’s resources(这可以防止租户彼此耗尽资源). [LimitRanger相关的代码](https://github.com/kubernetes/kubernetes/blob/v1.26.0-alpha.2/plugin/pkg/admission/limitranger/admission.go#L441) 35 | 36 | ## Which plugins(admission controllers) are enabled by default 37 | 38 | The recommended admission controllers are enabled by default, so you do not need to explicitly specify them. You can enable additional admission controllers beyond the default set using the `--enable-admission-plugins` flag (order doesn't matter). 39 | 40 | ```sh 41 | kube-apiserver -h | grep enable-admission-plugins 42 | 43 | # 如果你的kube-apiserver是pod类型启动的,可以执行如下命令查看: 44 | kubectl -n kube-system exec -it kube-apiserver-pod-name -- kube-apiserver -h |grep enable-admission-plugins 45 | ``` 46 | 47 | - In the v1.20.11, the default ones are: 48 | 49 | ```sh 50 | NamespaceLifecycle, LimitRanger, ServiceAccount, TaintNodesByCondition, Priority, DefaultTolerationSeconds, DefaultStorageClass, StorageObjectInUseProtection, PersistentVolumeClaimResize, RuntimeClass, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, MutatingAdmissionWebhook, ValidatingAdmissionWebhook, ResourceQuota 51 | ``` 52 | 53 | ## Extending Kubernetes admission controllers with webhooks 54 | 55 | All of these features we’ve been describing so far are critical to run reliable and secure services. 56 | 57 | However, as each organization has their own policies and default set of best practices, such highly specific controls may not be enough. 58 | 59 | Fortunately, Kubernetes has you covered. 60 | 61 | You can extend and customize the Kubernetes API functionality, without adding complexity to its base code, by using __webhooks__. 62 | 63 | The Kubernetes API server will call a registered webhook, which is a rather standard interface. This makes admission controllers easy to integrate with any third-party code. 64 | 65 | ## What are admission webhooks 66 | 67 | Admission webhooks are HTTP callbacks that receive admission requests and do something with them. You can define two types of admission webhooks, [validating admission webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook) and [mutating admission webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook). Mutating admission webhooks are invoked first, and can modify objects sent to the API server to enforce custom defaults. After all object modifications are complete, and after the incoming object is validated by the API server, validating admission webhooks are invoked and can reject requests to enforce custom policies. 68 | 69 | ![Image](../images/Kubernetes-Admission-controllers-01-flow-diagram.jpeg) 70 | 71 | __Note__: 如果 Admission Webhook 需要保证它们所看到的是对象的最终状态以实施某种策略。刚应该使用 validating admission webhook, 因为对象被 mutating Webhook 看到之后仍然可能被修改. 72 | 73 | ## Reference To 74 | 75 | 1. 76 | -------------------------------------------------------------------------------- /kubernetes/concepts/service.md: -------------------------------------------------------------------------------- 1 | # Service 2 | 3 | - [Service](#service) 4 | - [Headless Services](#headless-services) 5 | - [为什么需要Headless services](#为什么需要headless-services) 6 | - [示例](#示例) 7 | - [一些疑问](#一些疑问) 8 | - [Refeneces](#refeneces) 9 | 10 | - [Refeneces](#refeneces) 11 | 12 | ## Headless Services 13 | 14 | // 你了解有状态服务的Headless service吗? 15 | 16 | Headless service 是一个常规的Kubernetes服务,需要满足如下两点: 17 | 18 | 1. 其中`spec.clusterIP`被明确设置为 "None", 19 | 2. 还有就是`spec.type`被设置为 "ClusterIP"。 20 | 21 | 对于Headless service服务ClusterIP是没有分配的,kube-proxy是不会处理这些服务,而且它们没有负载和代理的概念,是kubernetes的DNS根据具体的标签直接转发至对应的后端的Pod. 22 | 23 | ### 为什么需要Headless services 24 | 25 | 对于无状态的服务kubernetes可以轻松管理,但是对于有状态服务的部署及复制提出了很多的挑战: 26 | 27 | - 为了保持相同的状态,每个Pod均有自己的存储,并且pod之间还会存在持续的数据同步。 28 | 29 | - Pod序次是不能互换的。`Pod`副本在任何重新调度中都有持久的标识符。 30 | 31 | - 最重要的是, 有状态的Pod往往要直接到达特定的pod(例如,在数据库写操作期间)或有pod-pod通信(比如数据同步、选举重节点等),而不通过负载均衡。 32 | 33 | 根据我们上段对于Headless Services的介绍,它无疑是解决上述问题的最佳方案, 但是你也是可以通过为一个Pod创建一个service来解决这个问题(很明显这个不是很优雅),来我们一直看一个示例。 34 | 35 | ### 示例 36 | 37 | > 示例中均只是显示helm charts中的部分template代码。 38 | 39 | - 示例1:kafka 部署 40 | 41 | 通过Headless service的方式来声明kafka的service 42 | 43 | ```yaml 44 | apiVersion: v1 45 | kind: Service 46 | metadata: 47 | name: {{ include "kafka.fullname" . }} 48 | labels: 49 | app.kubernetes.io/name: {{ include "kafka.name" . }} 50 | helm.sh/chart: {{ include "kafka.chart" . }} 51 | app.kubernetes.io/instance: {{ .Release.Name }} 52 | app.kubernetes.io/managed-by: {{ .Release.Service }} 53 | spec: 54 | type: {{ .Values.service.type }} 55 | clusterIP: None 56 | ports: 57 | - port: {{ .Values.service.server }} 58 | name: server 59 | - port: {{ .Values.service.metrics }} 60 | name: metrics 61 | selector: 62 | app.kubernetes.io/name: {{ include "kafka.name" . }} 63 | app.kubernetes.io/instance: {{ .Release.Name }} 64 | ``` 65 | 66 | - 示例2:zookeeper 部署 67 | 68 | 通过普通ClusterIP的方式来为每一个节点声明service 69 | 70 | ```yaml 71 | {{- range $i, $e := until (int .Values.replicas) }} 72 | --- 73 | apiVersion: v1 74 | kind: Service 75 | metadata: 76 | name: "{{ $fullName }}-{{$i}}" 77 | labels: 78 | app.kubernetes.io/name: {{ $name }} 79 | helm.sh/chart: {{ $chart }} 80 | app.kubernetes.io/instance: {{ $release.Name }} 81 | app.kubernetes.io/managed-by: {{ $release.Service }} 82 | spec: 83 | type: ClusterIP 84 | ports: 85 | - port: 2181 86 | name: client 87 | - port: 2888 88 | name: server 89 | - port: 3888 90 | name: leader-election 91 | selector: 92 | app.kubernetes.io/name: {{ $name }} 93 | app.kubernetes.io/instance: {{ $release.Name }} 94 | statefulset.kubernetes.io/pod-name: "{{ $fullName }}-{{$i}}" 95 | {{- end }} 96 | ``` 97 | 98 | ### 一些疑问 99 | 100 | 1. 为什么 headless service 无法直接编辑yaml改为nodePort类型的服务? 101 | 102 | > 因为headless service的clusterIP没有分配,而NodePort类型的服务却是要强依赖于clusterIP, 所以编辑后[api-server校验service信息](https://github.com/kubernetes/kubernetes/blob/v1.25.0/pkg/apis/core/validation/validation.go#L4621)时就会报错了。 103 | 104 | ## Refeneces 105 | 106 | 1. [services-networking#headless-services](https://kubernetes.io/docs/concepts/services-networking/service/#headless-services) 107 | -------------------------------------------------------------------------------- /kubernetes/images/Kubernetes-Admission-controllers-00-featured.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/kubernetes/images/Kubernetes-Admission-controllers-00-featured.png -------------------------------------------------------------------------------- /kubernetes/images/Kubernetes-Admission-controllers-01-flow-diagram.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/warm-native/docs/1675289732f305fdc9b0618af731e475219a765f/kubernetes/images/Kubernetes-Admission-controllers-01-flow-diagram.jpeg -------------------------------------------------------------------------------- /kubernetes/mysql/README.md: -------------------------------------------------------------------------------- 1 | # mysql operator 2 | 3 | Based on [`percona-xtradb-cluster-operator`](https://github.com/percona/percona-xtradb-cluster-operator) 4 | 5 | ## 概述 6 | 7 | ## 架构 8 | 9 | ## 部署 10 | 11 | ## 备份及恢复 12 | 13 | ## FAQ 14 | 15 | ## 附录 16 | 17 | 1. [galera-cluster-mysql-tutorial](https://severalnines.com/resources/whitepapers/galera-cluster-mysql-tutorial) 18 | 2. [galera-documentation](https://galeracluster.com/library/galera-documentation.pdf) 19 | 3. [galera](https://github.com/codership/galera) 20 | -------------------------------------------------------------------------------- /kubernetes/operator/README.md: -------------------------------------------------------------------------------- 1 | # Operator 2 | 3 | ## What is an Operator 4 | 5 | The goal of an Operator is to put operational knowledge into software. Previously this knowledge only resided in the minds of administrators, various combinations of shell scripts, or automation software like Ansible. It was outside of your Kubernetes cluster and hard to integrate. With Operators, CoreOS __changed that__. 6 | 7 | Operators implement and automate common Day-1(installation, configuration, etc.) and Day-2(re-configuration, update, failover, restore, etc.) activities in a piece of software running inside your Kubernetes cluster, by integrating natively with Kubernetes concepts and APIs. 8 | 9 | We call this a Kubernetes-native application. With Operators, you can stop treating an application as a collection of primitives like Pods, Deployments, Services or ConfigMaps, but instead, as a single object that only exposes the knobs that make sense for the application. 10 | 11 | * Software that runs within Kubernetes 12 | * Interacts with the Kubernetes API to create/manage objects 13 | * Works on the model of Eventual Consistency 14 | 15 | ## What is the Operator Framework 16 | 17 | The Operator Framework is an open-source toolkit to manage Kubernetes native applications, called Operators, in an effective, automated, and scalable way. 18 | 19 | ## What is Operator SDK 20 | 21 | It's a component of the Operator Framework, the Operator SDK makes it easier to build Kubernetes native applications, a process that can require deep, application-specific operational knowledge. 22 | 23 | ## What can I do with Operator SDK 24 | 25 | The Operator SDK provides the tools to build, test, and package Operators. Initially, the SDK facilitates the marriage of an application's business logic(for example, how to scale, upgrade, or backup) with the Kubernetes API to execute those operations. Over time, the SDK can allow engineers to make applications smarter and have the user experience of cloud services. 26 | 27 | Leading practices and code patterns that are shared across Operators are included in the SDK to help prevent reinventing the wheel. 28 | 29 | The Operator SDK is a framework that uses the controller-runtime library to make writing operators easier by providing: 30 | 31 | * High-level APIs and abstractions to write the operational logic more intuitively 32 | * Tools for scaffolding and code generation to bootstrap a new project fast 33 | * Extensions to cover common Operator use cases 34 | 35 | ## How can I write an operator with Operator SDK 36 | 37 | ### 1. install the SDK CLI 38 | 39 | [install guide](https://sdk.operatorframework.io/docs/installation) 40 | 41 | ### 2. read the user guides 42 | 43 | Operators can be created with the SDK using [Ansible](https://sdk.operatorframework.io/docs/building-operators/ansible/quickstart/), [Helm](https://sdk.operatorframework.io/docs/building-operators/helm/quickstart/), or [Go](https://sdk.operatorframework.io/docs/building-operators/golang/quickstart/). 44 | 45 | ## Operator Sample 46 | 47 | [memcached-operator sample](./operator-sample.md) 48 | -------------------------------------------------------------------------------- /kubernetes/operator/operator-sample.md: -------------------------------------------------------------------------------- 1 | # Operator Sample 2 | 3 | ## 写在前面 4 | 5 | 我们基于`Go`的Operator 来具体看下如何创建、运行Operator. 6 | 7 | ## 前置条件 8 | 9 | 1. 参照[官方链接](https://sdk.operatorframework.io/docs/building-operators/golang/installation),安装基础环境依赖([operator-sdk](https://sdk.operatorframework.io/docs/installation/), git/go1.18/docker 17.03+ / kubectl) 10 | 2. 确保你的开发环境对于kubectl操作的kubeconfig的context对应的k8s集群拥有`cluster-admin`的权限。 11 | * 这里你可以使用真实的kubernetes环境或是使用[kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation)来创建一个虚拟cluster 12 | * 你可以通过如下命令确认或是切换自己的context 13 | 14 | ```sh 15 | # 确认在kubeconfig文件中当前使用的context 16 | $ kubectl config current-context 17 | # 切换使用的context 18 | $ kubectl config use-context CONTEXT_NAME 19 | ``` 20 | 21 | 3. 一个有权限访问的镜像仓库(hub.docker.com或是自建的harbor等)用于存储构建的operator镜像. 22 | 23 | ## 下一步 24 | 25 | 你可以跟随着[memcached-operator的ChangeLog](https://github.com/colynn/memcached-operator/blob/master/CHANGELOG.md)看看如何一步步完善[`memcached-operator`](https://github.com/colynn/memcached-operator). 26 | -------------------------------------------------------------------------------- /kubernetes/static-pod.md: -------------------------------------------------------------------------------- 1 | # static pod 2 | -------------------------------------------------------------------------------- /logger/README.md: -------------------------------------------------------------------------------- 1 | # logger 2 | 3 | - [logger](#logger) 4 | - [Loki](#loki) 5 | - [what is Loki](#what-is-loki) 6 | - [设计思想](#设计思想) 7 | - [日志采集的方式](#日志采集的方式) 8 | - [日志存储方式](#日志存储方式) 9 | - [日志检索](#日志检索) 10 | - [a filter expression](#a-filter-expression) 11 | - [Best Prictices](#best-prictices) 12 | 13 | ## Loki 14 | 15 | ### what is Loki 16 | 17 | Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system. 18 | 19 | ### 设计思想 20 | 21 | - does not do full text indexing on logs. By storing compressed, unstructured logs and only ta metadata, Loki is simpler to operate and cheaper to run. 22 | - indexes and groups log streams using the same labels you’re already using with Prometheus, enabling you to seamlessly switch between metrics and logs using the same labels that you’re already using with Prometheus. 23 | - is an especially good fit for storing Kubernetes Pod logs. Metadata such as Pod labels is automatically scraped and indexed. 24 | - has native support in Grafana (needs Grafana v6.0). 25 | 26 | - 日志采集的方式 27 | - 日志存储的方式 28 | - 日志查询(索引)的方式 29 | 30 | The efficient indexing of log data distinguishes Loki from other logging systems. Unlike other logging systems, a Loki index is built from __labels__, leaving the original log message unindexed. 31 | 32 | ### 日志采集的方式 33 | 34 | 基于label的唯一性组合定义为stream, then batched up, compressed, and stored as chunks. 35 | 36 | ```sh 37 | {job="apache",status_code="200"} 11.11.11.11 - frank [25/Jan/2000:14:00:01 -0500] "GET /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6" 38 | {job="apache",status_code="404"} 11.11.11.11 - frank [25/Jan/2000:14:00:01 -0500] "GET /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6" 39 | ``` 40 | 41 | So, For Loki to be __efficient and cost-effective__, we have to use labels responsibly. The next section will explore this in more detail. 42 | 43 | promtail 的 scrape_configs的配置规则: 44 | 45 | ### 日志存储方式 46 | 47 | ### 日志检索 48 | 49 | As we see people using Loki who are accustomed to other index-heavy solutions(eg ELK), it seems like they feel obligated to define a lot of labels in order to query their logs effectively. After all, many other logging solutions are all about the index, and _this is the common way of thinking_. 50 | 51 | 但是使用Loki时,请你遗忘掉这个想法,Loki’s superpower is breaking up queries into small pieces and dispatching them in parallel so that you can query huge amounts of log data in small amounts of time. 52 | 53 | When we talk about cardinality we are referring to the combination of labels and values and the number of streams they create. 54 | 55 | 通过合并labels,减少stream的产生,另外通过并行提供快速的查询能力。 56 | 57 | This drives the fixed operating costs to a minimum while still allowing for incredibly fast query capability. 58 | 59 | #### a filter expression 60 | 61 | ### Best Prictices 62 | 63 | For any single log stream, logs must always be sent in increasing time order. If a log is received with a timestamp older than the most recent log received for that stream, that log will be dropped. 64 | -------------------------------------------------------------------------------- /middleware/kafka/README.md: -------------------------------------------------------------------------------- 1 | # kafka 2 | -------------------------------------------------------------------------------- /os/README.md: -------------------------------------------------------------------------------- 1 | # 探索linux系统的引导及启动过程 2 | 3 | ## 写在前面 4 | 5 | 近来由于安装openEuler 22的系统,但是从kernel 5.1开始sd设备在sys总线上的注册也[变成异步](https://github.com/torvalds/linux/blob/f883675bf6522b52cd75dc3de791680375961769/drivers/scsi/sd.c#L610)的了, 6 | 导致在服务器在上多块磁盘时且是静默的情况下会出现os安装在不期望的磁盘上,但是你可以尝试修改将[异步改为同步](https://gitee.com/openeuler/community/issues/I66HWX),但是今天我们提供另外一个解决方案。 7 | 8 | ## 另一个方案 9 | 10 | 通过openEuler20的低版本的kernel系统来引导安装OpenEulre22的OS 11 | 12 | ## 总结 13 | 14 | POST加电自检-->BIOS(Boot Sequence)-->加载对应引导上的MBR(bootloader)-->主引导设置加载其BootLoader-->Kernel初始化-->initrd—>/etc/init进程加载/etc/inittab 15 | 16 | 1. 加载 BIOS 的硬件信息与硬件自检,并依据设置取得第一个可启动的设备; 17 | 2. 读取并执行第一个启动设备内的MBR的 boot loader; 18 | 3. 依据 boot loader 的设置加载内核,内核会开始检测硬件与加载驱动程序; 19 | 4. 在内核 Kernel 加载完毕后,Kernel 会主动调用 init 进程,而 init 会取得 run-level 信息; 20 | 5. init 执行 rc.sysinit 初始化系统的操作环境(网络、时区等); 21 | 6. init 启动 run-level 的各个服务; 22 | 7. 用户登录 23 | 24 | > 要注意init 虽然只用了一个模块展现出来,但其实在启动过程中 __init__ 占了很大的比重。 25 | > 下面重点阐述下内核引导及init启动的阶段 26 | 27 | ### 内核引导阶段 28 | 29 | ```sh 30 | 3.1 /boot/kernel and Kernel parameter 31 | 内核初始化,加载基本的硬件驱动 32 | 3.2 /boot/initrd 33 | 引导 initrd 解压载入 34 | 3.2.1 阶段一:在内存中释放供 kernel 使用的 root filesystem 35 | 执行 initrd 文件系统中的 init,完成加载其他驱动模块 36 | 3.2.2 阶段二:执行真正的根文件系统中的 /sbin/init 进程 37 | ``` 38 | 39 | ### Sys V init 初始化阶段 40 | 41 | ```sh 42 | 4.1 /sbin/init 43 | 4.1.1 /etc/inittab 44 | init 进程读取 /etc/inittab 文件,确定系统启动的运行级别 45 | 4.1.2 /etc/rc.d/rc.sysinit 46 | 执行系统初始化脚本,对系统进行基本的配置 47 | 4.1.3 /etc/rc.d/rcN.d 48 | 根据先前确定的运行级别启动对应运行级别中的服务 49 | 4.1.4 /etc/rc.d/rc.local 50 | 执行用户自定义的开机启动程序 51 | 4.2 登录 52 | 4.2.1 /sbin/mingetty (命令行登录) 53 | 验证通过 执行 /etc/login 54 | 加载 /etc/profile ~/.bash_profile ~/.bash_login ~/profile 55 | 取得 non-login Shell 56 | 57 | 4.2.2 /etc/X11/prefdm (图形界面登录) 58 | gdm kdm xdm 59 | Xinit 60 | 加载 ~/.xinitrc ~/.xserverrc 61 | ``` 62 | 63 | ## Reference 64 | 65 | 1. 66 | 2. 67 | 3. [Linux 的启动流程](https://www.ruanyifeng.com/blog/2013/08/linux_boot_process.html) 68 | 4. [Linux基础:启动流程](https://wuchong.me/blog/2014/07/14/linux-boot-process/) 69 | 70 | ## 待办 71 | 72 | 1. 当前对于这块很是有很多的盲区,对于systemd / rootfs 是如果通过引导系统`systemd`调整根分区系统`initrd.img`实现os的安装的, 以及安装后与grub2.cfg配置的关系等? 73 | 2. Custom Linux ISO 74 | --------------------------------------------------------------------------------