├── README.md
├── eci-spark
├── pics
│ ├── spark-1.png
│ ├── spark-2.png
│ ├── spark-3.png
│ ├── spark-4.png
│ ├── spark-25.png
│ ├── 017c1cd4c74a5936acd4f9b93f089e81.png
│ ├── 79efc84b99359e069b9e3e9d42e2dc8d.png
│ ├── 7c5a3b2d598506b828c1d4707a08b4c8.png
│ ├── a4235604c1c2c5cc9a19f089d73f426d.png
│ ├── b645a2d7a0a5b7fe918cb24d9b22d592.png
│ ├── f22c2388786e00b104d677b545c69bc9.png
│ ├── 1574232926380-2e1dba72-2c79-4018-835c-4451f8e19feb.png
│ ├── 1574232962771-ef09e6c7-c5f1-4dcf-97fc-39494bc7f14f.png
│ ├── 1574233020714-fbe3f048-91c8-451f-87af-5b38f99a23c5.png
│ ├── 1574233108538-d986c0b4-6846-49ac-9bb4-82d2044cc855.png
│ ├── 1574233483142-9359d5e3-81c9-4154-8242-ed3a37a4e37b.png
│ ├── 1574233499811-efed418f-649b-45f0-b035-cdb09a15fa3d.png
│ ├── 1574233531691-89664643-2afe-40fe-8ac0-462a8dba1910.png
│ ├── 1574233651888-ade8ea24-4e36-4189-817d-26572230970a.png
│ ├── 1574233670142-d818a7c8-2edf-4d4f-ac68-816d18eb1b55.png
│ ├── 1574233691702-743d4526-f45a-4f92-8b06-397d7086d5fc.png
│ ├── 1574233705801-03b05378-4723-4584-ae8f-1d62beb971cd.png
│ ├── 1574411721752-3d77457d-4aa1-4008-938f-eb291bf16ce6.png
│ └── 1574432924198-263f5929-062d-4cfe-886a-7979dde56d21.png
├── wordcount-operator-example.yaml
├── wordcount-operator-example-ack.yaml
├── wordcount-spark-driver-svc.yaml
└── README.md
├── eci-gitlab-runner
├── java-demo
│ ├── Dockerfile
│ ├── src
│ │ └── main
│ │ │ └── webapp
│ │ │ ├── index.jsp
│ │ │ └── WEB-INF
│ │ │ └── web.xml
│ ├── .gitignore
│ ├── deployment.yaml
│ ├── pom.xml
│ └── .gitlab-ci.yml
├── mvn-pvc.yaml
├── nas-pvc.yaml
├── secret.yaml
├── imagecache.yaml
├── mvn-pv.yaml
├── nas-pv.yaml
├── gitlab-runner-deployment.yaml
├── imagecache-crd.yaml
├── README.md
└── config-map.yaml
├── eci-gpu-tensorflow
├── imagecache.yaml
└── gpu_pod.yaml
├── eci-wordpress
├── create.json
├── wordpress-all-in-one-pod.yaml
└── README.md
└── .gitignore
/README.md:
--------------------------------------------------------------------------------
1 | # BestPractice-Serverless-Kubernetes
--------------------------------------------------------------------------------
/eci-spark/pics/spark-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/spark-1.png
--------------------------------------------------------------------------------
/eci-spark/pics/spark-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/spark-2.png
--------------------------------------------------------------------------------
/eci-spark/pics/spark-3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/spark-3.png
--------------------------------------------------------------------------------
/eci-spark/pics/spark-4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/spark-4.png
--------------------------------------------------------------------------------
/eci-spark/pics/spark-25.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/spark-25.png
--------------------------------------------------------------------------------
/eci-gitlab-runner/java-demo/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM registry.cn-beijing.aliyuncs.com/acs-sample/tomcat
2 | ADD target/demo.war /usr/local/tomcat/webapps/demo.war
3 |
--------------------------------------------------------------------------------
/eci-spark/pics/017c1cd4c74a5936acd4f9b93f089e81.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/017c1cd4c74a5936acd4f9b93f089e81.png
--------------------------------------------------------------------------------
/eci-spark/pics/79efc84b99359e069b9e3e9d42e2dc8d.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/79efc84b99359e069b9e3e9d42e2dc8d.png
--------------------------------------------------------------------------------
/eci-spark/pics/7c5a3b2d598506b828c1d4707a08b4c8.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/7c5a3b2d598506b828c1d4707a08b4c8.png
--------------------------------------------------------------------------------
/eci-spark/pics/a4235604c1c2c5cc9a19f089d73f426d.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/a4235604c1c2c5cc9a19f089d73f426d.png
--------------------------------------------------------------------------------
/eci-spark/pics/b645a2d7a0a5b7fe918cb24d9b22d592.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/b645a2d7a0a5b7fe918cb24d9b22d592.png
--------------------------------------------------------------------------------
/eci-spark/pics/f22c2388786e00b104d677b545c69bc9.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/f22c2388786e00b104d677b545c69bc9.png
--------------------------------------------------------------------------------
/eci-spark/pics/1574232926380-2e1dba72-2c79-4018-835c-4451f8e19feb.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574232926380-2e1dba72-2c79-4018-835c-4451f8e19feb.png
--------------------------------------------------------------------------------
/eci-spark/pics/1574232962771-ef09e6c7-c5f1-4dcf-97fc-39494bc7f14f.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574232962771-ef09e6c7-c5f1-4dcf-97fc-39494bc7f14f.png
--------------------------------------------------------------------------------
/eci-spark/pics/1574233020714-fbe3f048-91c8-451f-87af-5b38f99a23c5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233020714-fbe3f048-91c8-451f-87af-5b38f99a23c5.png
--------------------------------------------------------------------------------
/eci-spark/pics/1574233108538-d986c0b4-6846-49ac-9bb4-82d2044cc855.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233108538-d986c0b4-6846-49ac-9bb4-82d2044cc855.png
--------------------------------------------------------------------------------
/eci-spark/pics/1574233483142-9359d5e3-81c9-4154-8242-ed3a37a4e37b.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233483142-9359d5e3-81c9-4154-8242-ed3a37a4e37b.png
--------------------------------------------------------------------------------
/eci-spark/pics/1574233499811-efed418f-649b-45f0-b035-cdb09a15fa3d.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233499811-efed418f-649b-45f0-b035-cdb09a15fa3d.png
--------------------------------------------------------------------------------
/eci-spark/pics/1574233531691-89664643-2afe-40fe-8ac0-462a8dba1910.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233531691-89664643-2afe-40fe-8ac0-462a8dba1910.png
--------------------------------------------------------------------------------
/eci-spark/pics/1574233651888-ade8ea24-4e36-4189-817d-26572230970a.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233651888-ade8ea24-4e36-4189-817d-26572230970a.png
--------------------------------------------------------------------------------
/eci-spark/pics/1574233670142-d818a7c8-2edf-4d4f-ac68-816d18eb1b55.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233670142-d818a7c8-2edf-4d4f-ac68-816d18eb1b55.png
--------------------------------------------------------------------------------
/eci-spark/pics/1574233691702-743d4526-f45a-4f92-8b06-397d7086d5fc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233691702-743d4526-f45a-4f92-8b06-397d7086d5fc.png
--------------------------------------------------------------------------------
/eci-spark/pics/1574233705801-03b05378-4723-4584-ae8f-1d62beb971cd.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233705801-03b05378-4723-4584-ae8f-1d62beb971cd.png
--------------------------------------------------------------------------------
/eci-spark/pics/1574411721752-3d77457d-4aa1-4008-938f-eb291bf16ce6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574411721752-3d77457d-4aa1-4008-938f-eb291bf16ce6.png
--------------------------------------------------------------------------------
/eci-spark/pics/1574432924198-263f5929-062d-4cfe-886a-7979dde56d21.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574432924198-263f5929-062d-4cfe-886a-7979dde56d21.png
--------------------------------------------------------------------------------
/eci-gpu-tensorflow/imagecache.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: eci.alibabacloud.com/v1
2 | kind: ImageCache
3 | metadata:
4 | name: tensorflow
5 | spec:
6 | images:
7 | - registry-vpc.cn-zhangjiakou.aliyuncs.com/eci/tensorflow:1.0 # 训练任务的镜像,建议放到阿里云vpc私网仓库
--------------------------------------------------------------------------------
/eci-gitlab-runner/mvn-pvc.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: PersistentVolumeClaim
3 | metadata:
4 | name: gitlab-runner-maven-pvc
5 | spec:
6 | accessModes:
7 | - ReadWriteOnce
8 | resources:
9 | requests:
10 | storage: 100Gi
11 | volumeName: gitlab-runner-maven-pv
12 |
--------------------------------------------------------------------------------
/eci-gitlab-runner/nas-pvc.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: PersistentVolumeClaim
3 | metadata:
4 | name: gitlab-runner-cache-pvc
5 | spec:
6 | accessModes:
7 | - ReadWriteOnce
8 | resources:
9 | requests:
10 | storage: 100Gi
11 | volumeName: gitlab-runner-cache-pv
12 |
--------------------------------------------------------------------------------
/eci-gitlab-runner/java-demo/src/main/webapp/index.jsp:
--------------------------------------------------------------------------------
1 | <%@ page import="java.net.InetAddress" %>
2 | <%@ page contentType="text/html;charset=UTF-8" language="java" %>
3 |
4 |
5 | Aliyun Container Service
6 |
7 |
8 | Hello Gitlab
9 |
10 |
11 |
--------------------------------------------------------------------------------
/eci-gitlab-runner/secret.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: Secret
3 | metadata:
4 | name: gitlab-runner-secret
5 | type: kubernetes.io/tls
6 | data:
7 | ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FUR***********
8 | tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS*********
9 | tls.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSB********
10 |
--------------------------------------------------------------------------------
/eci-gitlab-runner/java-demo/src/main/webapp/WEB-INF/web.xml:
--------------------------------------------------------------------------------
1 |
2 |
6 |
7 |
--------------------------------------------------------------------------------
/eci-wordpress/create.json:
--------------------------------------------------------------------------------
1 | {
2 | "cluster_type":"ManagedKubernetes",
3 | "profile":"Serverless",
4 | "name":"wordpress-demo",
5 | "region_id": "cn-hangzhou",
6 | "endpoint_public_access": true,
7 | "snat_entry":true,
8 | "addons":[
9 | {
10 | "name": "csi-provisioner",
11 | "config": ""
12 | }
13 | ],
14 | "zoneid": "cn-hangzhou-j"
15 | }
--------------------------------------------------------------------------------
/eci-gitlab-runner/imagecache.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: eci.alibabacloud.com/v1
2 | kind: ImageCache
3 | metadata:
4 | name: gitlab-runner
5 | spec:
6 | images:
7 | - gitlab/gitlab-runner-helper:x86_64-latest
8 | - gitlab/gitlab-runner:latest
9 | - registry.cn-hangzhou.aliyuncs.com/eci/kaniko:1.0
10 | - registry.cn-hangzhou.aliyuncs.com/eci/kubectl:1.0
11 | - registry.cn-hangzhou.aliyuncs.com/eci/java-demo:1.0
12 |
--------------------------------------------------------------------------------
/eci-gitlab-runner/mvn-pv.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: PersistentVolume
3 | metadata:
4 | name: gitlab-runner-maven-pv
5 | spec:
6 | accessModes:
7 | - ReadWriteOnce
8 | capacity:
9 | storage: 100Gi
10 | mountOptions:
11 | - nolock,noresvport,noacl,hard
12 | - vers=3
13 | - rsize=1048576
14 | - wsize=1048576
15 | - proto=tcp
16 | - timeo=600
17 | - retrans=2
18 | nfs:
19 | path: /share
20 | server: 0079f226-s3vx.cn-hangzhou.extreme.nas.aliyuncs.com
21 |
--------------------------------------------------------------------------------
/eci-gitlab-runner/nas-pv.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: PersistentVolume
3 | metadata:
4 | name: gitlab-runner-cache-pv
5 | spec:
6 | accessModes:
7 | - ReadWriteOnce
8 | capacity:
9 | storage: 100Gi
10 | mountOptions:
11 | - nolock,noresvport,noacl,hard
12 | - vers=3
13 | - rsize=1048576
14 | - wsize=1048576
15 | - proto=tcp
16 | - timeo=600
17 | - retrans=2
18 | nfs:
19 | path: /share
20 | server: 0024d203-lbsy.cn-hangzhou.extreme.nas.aliyuncs.com
21 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # maven ignore
2 | target/
3 | *.jar
4 | !.mvn/wrapper/*
5 | *.war
6 | *.zip
7 | *.tar
8 | *.tar.gz
9 | .flattened-pom.xml
10 |
11 | # eclipse ignore
12 | .settings/
13 | .project
14 | .classpath
15 |
16 | # idea ignore
17 | .idea/
18 | *.ipr
19 | *.iml
20 | *.iws
21 |
22 | # temp ignore
23 | *.log
24 | *.cache
25 | *.diff
26 | *.patch
27 | *.tmp
28 |
29 | # system ignore
30 | .DS_Store
31 | Thumbs.db
32 | *.orig
33 |
34 | # license check result
35 | license-list
36 |
37 | # grpc compiler
38 | compiler/gradle.properties
39 | compiler/build/*
40 | compiler/.gradle/*
41 |
42 |
--------------------------------------------------------------------------------
/eci-gitlab-runner/java-demo/.gitignore:
--------------------------------------------------------------------------------
1 | # maven ignore
2 | target/
3 | *.jar
4 | !.mvn/wrapper/*
5 | *.war
6 | *.zip
7 | *.tar
8 | *.tar.gz
9 | .flattened-pom.xml
10 |
11 | # eclipse ignore
12 | .settings/
13 | .project
14 | .classpath
15 |
16 | # idea ignore
17 | .idea/
18 | *.ipr
19 | *.iml
20 | *.iws
21 |
22 | # temp ignore
23 | *.log
24 | *.cache
25 | *.diff
26 | *.patch
27 | *.tmp
28 |
29 | # system ignore
30 | .DS_Store
31 | Thumbs.db
32 | *.orig
33 |
34 | # license check result
35 | license-list
36 |
37 | # grpc compiler
38 | compiler/gradle.properties
39 | compiler/build/*
40 | compiler/.gradle/*
41 |
--------------------------------------------------------------------------------
/eci-gitlab-runner/java-demo/deployment.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: Deployment
3 | metadata:
4 | name: java-demo
5 | spec:
6 | replicas: 2
7 | selector:
8 | matchLabels:
9 | app: java-demo
10 | template:
11 | metadata:
12 | labels:
13 | app: java-demo
14 | annotations:
15 | k8s.aliyun.com/eci-image-cache: "true"
16 | spec:
17 | containers:
18 | - name: java-demo
19 | image: registry.cn-hangzhou.aliyuncs.com/eci/java-demo:IMAGE_TAG
20 | imagePullPolicy: Always
21 | ports:
22 | - containerPort: 8080
23 |
24 | ---
25 | apiVersion: v1
26 | kind: Service
27 | metadata:
28 | name: java-demo
29 | spec:
30 | ports:
31 | - port: 80
32 | targetPort: 8080
33 | name: java-demo
34 | selector:
35 | app: java-demo
36 | type: LoadBalancer
37 |
--------------------------------------------------------------------------------
/eci-gpu-tensorflow/gpu_pod.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: Pod
3 | metadata:
4 | name: tensorflow
5 | annotations:
6 | k8s.aliyun.com/eci-gpu-type: "P4" # GPU规格,或者你可以指定指定和ECS一样的instanceType创建 如:k8s.aliyun.com/eci-instance-type: "ecs.gn5i-c2g1.large"
7 | k8s.aliyun.com/eci-image-cache: "true" # 开启镜像缓存自动匹配
8 | spec:
9 | containers:
10 | - image: registry-vpc.cn-zhangjiakou.aliyuncs.com/eci/tensorflow:1.0 # 训练任务的镜像
11 | name: tensorflow
12 | command:
13 | - "sh"
14 | - "-c"
15 | - "python models/tutorials/image/imagenet/classify_image.py" # 触发训练任务的脚本
16 | resources:
17 | limits:
18 | nvidia.com/gpu: "1" # 容器所需的GPU个数
19 | volumeMounts:
20 | - name: nfs-pv
21 | mountPath: /tmp/imagenet
22 | volumes:
23 | - name: nfs-pv # 训练结果持久化到NAS文件存储
24 | nfs:
25 | path: /share
26 | server: 0912430d-1nsl.cn-zhangjiakou.extreme.nas.aliyuncs.com
27 | restartPolicy: OnFailure
--------------------------------------------------------------------------------
/eci-gitlab-runner/java-demo/pom.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 4.0.0
4 | com.aliyun
5 | jenkins-demo-web
6 | 1.0.0-SNAPSHOT
7 | war
8 |
9 |
10 |
11 | demo
12 |
13 |
14 | org.apache.maven.plugins
15 | maven-compiler-plugin
16 | 3.1
17 |
18 | 1.8
19 | 1.8
20 | UTF8
21 |
22 |
23 |
24 |
25 |
26 |
--------------------------------------------------------------------------------
/eci-gitlab-runner/java-demo/.gitlab-ci.yml:
--------------------------------------------------------------------------------
1 | cache:
2 | paths:
3 | - /cache
4 | stages:
5 | - package
6 | - build
7 | - deploy
8 | mvn_package_job:
9 | image: registry.cn-hangzhou.aliyuncs.com/eci/kaniko:1.0
10 | stage: package
11 | tags:
12 | - test
13 | script:
14 | - mvn clean package -DskipTests
15 | - cp -f target/demo.war /cache
16 | build_and_publish_docker_image_job:
17 | image: registry.cn-hangzhou.aliyuncs.com/eci/kaniko:1.0
18 | stage: build
19 | tags:
20 | - test
21 | script:
22 | - mkdir target
23 | - cp /cache/demo.war target/demo.war
24 | - echo $CI_PIPELINE_ID
25 | - kaniko -f `pwd`/Dockerfile -c `pwd` --destination=registry.cn-hangzhou.aliyuncs.com/eci/java-demo:$CI_PIPELINE_ID
26 | deploy_k8s_job:
27 | image: registry.cn-hangzhou.aliyuncs.com/eci/kubectl:1.0
28 | stage: deploy
29 | tags:
30 | - test
31 | script:
32 | - mkdir -p ~/.kube
33 | - echo $kube_config |base64 -d > ~/.kube/config
34 | - sed -i "s/IMAGE_TAG/$CI_PIPELINE_ID/g" deployment.yaml
35 | - cat deployment.yaml
36 | - kubectl apply -f deployment.yaml
37 |
--------------------------------------------------------------------------------
/eci-gitlab-runner/gitlab-runner-deployment.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apps/v1
2 | kind: Deployment
3 | metadata:
4 | name: gitlab-runner
5 | spec:
6 | selector:
7 | matchLabels:
8 | app: gitlab-runner
9 | template:
10 | metadata:
11 | labels:
12 | app: gitlab-runner
13 | annotations:
14 | k8s.aliyun.com/eci-image-cache: "true"
15 | spec:
16 | containers:
17 | - image: gitlab/gitlab-runner:latest
18 | imagePullPolicy: IfNotPresent
19 | name: gitlab-runner
20 | volumeMounts:
21 | - mountPath: /etc/gitlab-runner
22 | name: config
23 | volumes:
24 | - name: config
25 | projected:
26 | defaultMode: 420
27 | sources:
28 | - secret:
29 | items:
30 | - key: ca.crt
31 | path: ca.crt
32 | - key: tls.crt
33 | path: tls.crt
34 | - key: tls.key
35 | path: tls.key
36 | name: gitlab-runner-secret
37 | - configMap:
38 | items:
39 | - key: config.toml
40 | path: config.toml
41 | name: gitlab-runner-config
42 |
--------------------------------------------------------------------------------
/eci-spark/wordcount-operator-example.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: "sparkoperator.k8s.io/v1beta2"
2 | kind: SparkApplication
3 | metadata:
4 | name: wordcount
5 | namespace: default
6 | spec:
7 | type: Java
8 | mode: cluster
9 | image: "registry.cn-beijing.aliyuncs.com/liumi/spark:2.4.4-example"
10 | imagePullPolicy: IfNotPresent
11 | mainClass: com.aliyun.liumi.spark.example.WordCount
12 | mainApplicationFile: "local:///opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar"
13 | sparkVersion: "2.4.4"
14 | restartPolicy:
15 | type: OnFailure
16 | onFailureRetries: 2
17 | onFailureRetryInterval: 5
18 | onSubmissionFailureRetries: 2
19 | onSubmissionFailureRetryInterval: 10
20 | timeToLiveSeconds: 36000
21 | sparkConf:
22 | "spark.kubernetes.allocation.batch.size": "10"
23 | driver:
24 | cores: 2
25 | memory: "4096m"
26 | labels:
27 | version: 2.4.4
28 | spark-app: spark-wordcount
29 | role: driver
30 | annotations:
31 | k8s.aliyun.com/eci-image-cache: "true"
32 | serviceAccount: spark
33 | executor:
34 | cores: 1
35 | instances: 100
36 | memory: "1024m"
37 | labels:
38 | version: 2.4.4
39 | role: executor
40 | annotations:
41 | k8s.aliyun.com/eci-image-cache: "true"
--------------------------------------------------------------------------------
/eci-gitlab-runner/imagecache-crd.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: apiextensions.k8s.io/v1beta1
2 | kind: CustomResourceDefinition
3 | metadata:
4 | name: imagecaches.eci.alibabacloud.com
5 | spec:
6 | group: eci.alibabacloud.com
7 | version: v1
8 | names:
9 | kind: ImageCache
10 | plural: imagecaches
11 | shortNames:
12 | - ic
13 | categories:
14 | - all
15 | scope: Cluster
16 | subresources:
17 | status: {}
18 | validation:
19 | openAPIV3Schema:
20 | required:
21 | - spec
22 | properties:
23 | spec:
24 | type: object
25 | required:
26 | - images
27 | properties:
28 | imagePullSecrets:
29 | type: array
30 | items:
31 | type: string
32 | images:
33 | minItems: 1
34 | type: array
35 | items:
36 | type: string
37 | imageCacheSize:
38 | type: integer
39 | additionalPrinterColumns:
40 | - name: Age
41 | type: date
42 | JSONPath: .metadata.creationTimestamp
43 | - name: CacheId
44 | type: string
45 | JSONPath: .status.imageCacheId
46 | - name: Phase
47 | type: string
48 | JSONPath: .status.phase
49 | - name: Progress
50 | type: string
51 | JSONPath: .status.progress
52 |
--------------------------------------------------------------------------------
/eci-spark/wordcount-operator-example-ack.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: "sparkoperator.k8s.io/v1beta2"
2 | kind: SparkApplication
3 | metadata:
4 | name: wordcount
5 | namespace: default
6 | spec:
7 | type: Java
8 | mode: cluster
9 | image: "registry.cn-beijing.aliyuncs.com/liumi/spark:2.4.4-example"
10 | imagePullPolicy: IfNotPresent
11 | mainClass: com.aliyun.liumi.spark.example.WordCount
12 | mainApplicationFile: "local:///opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar"
13 | sparkVersion: "2.4.4"
14 | restartPolicy:
15 | type: OnFailure
16 | onFailureRetries: 2
17 | onFailureRetryInterval: 5
18 | onSubmissionFailureRetries: 2
19 | onSubmissionFailureRetryInterval: 10
20 | timeToLiveSeconds: 36000
21 | sparkConf:
22 | "spark.kubernetes.allocation.batch.size": "10"
23 | driver:
24 | cores: 2
25 | memory: "4096m"
26 | labels:
27 | version: 2.4.4
28 | spark-app: spark-wordcount
29 | role: driver
30 | annotations:
31 | k8s.aliyun.com/eci-image-cache: "true"
32 | serviceAccount: spark
33 | executor:
34 | cores: 1
35 | instances: 100
36 | memory: "1024m"
37 | labels:
38 | version: 2.4.4
39 | role: executor
40 | annotations:
41 | k8s.aliyun.com/eci-image-cache: "true"
42 | #nodeName: virtual-kubelet
43 | nodeSelector:
44 | type: virtual-kubelet
45 | tolerations:
46 | - key: virtual-kubelet.io/provider
47 | operator: Exists
--------------------------------------------------------------------------------
/eci-gitlab-runner/README.md:
--------------------------------------------------------------------------------
1 | ### 1. 准备ASK集群
2 | https://cs.console.aliyun.com/?spm=5176.eciconsole.0.0.68254a9cNv12zh#/k8s/cluster/createV2/serverless
3 | 容器服务控制台创建标准serverless k8s集群
4 |
5 |
6 | ### 2. 准备PV/PVC
7 | 准备两个nas盘,一个做gitlab runner cache,一个做maven仓库,请自行替换nas server地址和path
8 |
9 | ``` shell
10 | kubectl apply -f mvn-pv.yaml
11 | kubectl apply -f mvn-pvc.yaml
12 | kubectl apply -f nas-pv.yaml
13 | kubectl apply -f nas-pvc.yaml
14 | ```
15 |
16 | ### 3. 准备Secret
17 | * kubeconfig里的证书公私钥拷贝到secret中,secret.yaml
18 | ``` shell
19 | kubectl apply -f secret.yaml
20 | ```
21 |
22 | * docker-registry的认证信息,ECI支持免密拉取,但是push docker image 还是要用到
23 | ``` shell
24 | kubectl create secret docker-registry registry-auth-secret --docker-server=registry.cn-hangzhou.aliyuncs.com --docker-username=${xxx} --docker-password=${xxx}
25 | ```
26 |
27 | 查看生成的secret可以用以下命令
28 | ``` shell
29 | kubectl get secret registry-auth-secret --output=yaml
30 | ```
31 |
32 | ### 4. 准备ConfigMap
33 | 把gitlab runner 的url、token,ask集群的api server地址拷贝到config.yaml
34 | ``` shell
35 | kubectl apply -f config-map.yaml
36 | ```
37 |
38 | ### 5. 准备imageCache(可选,节省镜像拉取时间)
39 | 目前ASK默认安装了 imagecache-crd,可以用以下命令查询,如果没有可以自己安装
40 | ``` shell
41 | # 查看image cache crd 是否安转
42 | kubectl get crd
43 | # 安装image cache crd
44 | kubectl apply -f imagecache-crd.yaml
45 | # 制作imagecache
46 | kubectl apply -f imagecache.yaml
47 | ```
48 |
49 | ### 6. 部署gitlab runner
50 | ``` shell
51 | kubectl apply -f gitlab-runner-deployment.yaml
52 | ```
53 |
54 | ### 7. 导入git repo,java demo见 java-demo 目录
55 |
--------------------------------------------------------------------------------
/eci-wordpress/wordpress-all-in-one-pod.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: storage.k8s.io/v1
2 | kind: StorageClass
3 | metadata:
4 | name: alicloud-disk
5 | provisioner: diskplugin.csi.alibabacloud.com
6 | parameters:
7 | type: cloud_auto,cloud_essd,cloud_ssd # 使用该配置,按优先级自适应选择云盘类型,最终创建的云盘类型受节点实例、所在可用区云盘支持情况等因素影响。
8 | fstype: ext4
9 | volumeExpandAutoSnapshot: "forced" # 该设置仅在创建的云盘类型为cloud_essd时生效。
10 | volumeBindingMode: WaitForFirstConsumer
11 | reclaimPolicy: Delete
12 | allowVolumeExpansion: true
13 | ---
14 | apiVersion: v1
15 | kind: PersistentVolumeClaim
16 | metadata:
17 | name: wordpress-pvc
18 | spec:
19 | accessModes:
20 | - ReadWriteOnce
21 | volumeMode: Filesystem
22 | resources:
23 | requests:
24 | storage: 20Gi
25 | storageClassName: alicloud-disk
26 | ---
27 | apiVersion: v1
28 | kind: Pod
29 | metadata:
30 | name: wordpress
31 | annotations:
32 | "k8s.aliyun.com/eci-with-eip": "true" #自动挂载EIP
33 | spec:
34 | containers:
35 | - image: mysql:5.6
36 | name: mysql
37 | env:
38 | - name: MYSQL_ROOT_PASSWORD
39 | value: "123456"
40 | livenessProbe:
41 | tcpSocket:
42 | port: 3306
43 | ports:
44 | - containerPort: 3306
45 | name: mysql
46 | - image: wordpress:4.8-apache
47 | name: wordpress
48 | env:
49 | - name: WORDPRESS_DB_HOST
50 | value: 127.0.0.1
51 | - name: WORDPRESS_DB_PASSWORD
52 | value: "123456"
53 | ports:
54 | - containerPort: 80
55 | name: wordpress
56 | volumeMounts:
57 | - name: wordpress-persistent-storage
58 | mountPath: /var/www/html
59 | volumes:
60 | - name: wordpress-persistent-storage
61 | persistentVolumeClaim:
62 | claimName: wordpress-pvc
--------------------------------------------------------------------------------
/eci-gitlab-runner/config-map.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: ConfigMap
3 | metadata:
4 | name: gitlab-runner-config
5 | data:
6 | config.toml: |
7 | concurrent = 2
8 | check_interval = 0
9 | [[runners]]
10 | name = "gitlab-runner"
11 | url = "https://gitlab.com/"
12 | token = $token
13 | executor = "kubernetes"
14 | output_limit = 51200
15 | [runners.kubernetes]
16 | host = "https://xxx.xxx.xxx.xxx:6443"
17 | cert_file = "/etc/gitlab-runner/tls.crt"
18 | key_file = "/etc/gitlab-runner/tls.key"
19 | ca_file = "/etc/gitlab-runner/ca.crt"
20 | namespace = "default"
21 | pull_policy = "if-not-present"
22 | cpu_limit = "0.5"
23 | cpu_request = "0.5"
24 | memory_limit = "1Gi"
25 | memory_request = "1Gi"
26 | helper_cpu_limit = "0.5"
27 | helper_cpu_request = "0.5"
28 | helper_memory_limit = "1Gi"
29 | helper_memory_request = "1Gi"
30 | helper_image = "gitlab/gitlab-runner-helper:x86_64-latest"
31 | [runners.kubernetes.pod_annotations]
32 | "k8s.aliyun.com/eci-image-cache" = "true"
33 | [runners.kubernetes.volumes]
34 | [[runners.kubernetes.volumes.pvc]]
35 | name = "gitlab-runner-cache-pvc"
36 | mount_path = "/cache"
37 | readonly = false
38 | [[runners.kubernetes.volumes.pvc]]
39 | name = "gitlab-runner-maven-pvc"
40 | mount_path = "/root/.m2"
41 | readonly = false
42 | [[runners.kubernetes.volumes.secret]]
43 | name = "registry-auth-secret"
44 | mount_path = "/root/.docker"
45 | read_only = false
46 | [runners.kubernetes.volumes.secret.items]
47 | ".dockerconfigjson" = "config.json"
48 |
--------------------------------------------------------------------------------
/eci-wordpress/README.md:
--------------------------------------------------------------------------------
1 | # 使用 ECI + ACK Serverless 一分钟体验 WordPress
2 |
3 | 进入WordPress目录
4 | ```bash
5 | cd eci-wordpress
6 | ```
7 |
8 | ## 创建 Serverless Kubernetes 集群
9 |
10 | 您可以使用 Aliyun CLI 命令方便的创建集群
11 |
12 | ```bash
13 | aliyun cs POST /clusters --header "Content-Type=application/json" --body "$(cat create.json)"
14 | ```
15 |
16 | 其中 `create.json` 文件保存有创建 Serverless Kubernetes 集群的参数,您可以自定义来配置自己的集群。
17 |
18 | - cluster_type:集群类型,Serverless Kubernetes 集群类型为 "ManagedKubernetes"
19 | - profile:集群标识,参数cluster_type取值为ManagedKubernetes,同时该参数配置为Serverless,表示创建ACK Serverless集群。
20 | - name:集群名称
21 | - region_id:集群所在地域ID
22 | - endpoint_public_access:是否开启公网API Server
23 | - snat_entry:是否在VPC中创建NAT网关并配置SNAT规则
24 | - addons:Kubernetes集群安装的组件列表
25 | - zoneid:集群所属地域的可用区ID,如果不指定vpcid和vswitch_ids的情况下,必须指定zoneid。
26 |
27 | 创建成功后,您可以在控制台中看到执行完的输出,如下所示:
28 |
29 | ```json
30 | {
31 | "cluster_id": "c486508d6416045a9a434b0******",
32 | "instanceId": "c486508d6416045a9a434b0******",
33 | "request_id": "075417EF-8F86-51E6-******",
34 | "task_id": "T-6524fe49265d5c06******"
35 | }
36 | ```
37 |
38 | 其中 `cluster_id` 为您创建的集群的唯一 id。
39 |
40 | 您现在可以登录[容器服务控制台](https://cs.console.aliyun.com)查看通过 Aliyun CLI 创建的 Serverless Kubernetes 集群。
41 |
42 | ## 安装 WordPress
43 |
44 | **注意:请确保上一步中创建的 Serverless Kubernetes 集群,已完成初始化(一般需要3~5分钟),再开始以下的操作**
45 |
46 | 使用 Cloud Shell 来管理上一步中创建中的 Serverless Kubernetes 集群
47 | ```bash
48 | source use-k8s-cluster ${集群ID}
49 | ```
50 |
51 | 执行 WordPress 安装 yaml 文件
52 | ```bash
53 | kubectl apply -f wordpress-all-in-one-pod.yaml
54 | ```
55 |
56 | 观察安装进度,直到 STATUS 为 Running
57 | ```bash
58 | kubectl get pods
59 | ```
60 |
61 | 查询 EIP 地址
62 | ```bash
63 | kubectl get -o json pod wordpress |grep "k8s.aliyun.com/allocated-eipAddress"
64 | ```
65 | 预期返回
66 | ```shell
67 | "k8s.aliyun.com/allocated-eipAddress": "39.105.XX.XX"
68 | ```
69 |
70 | 由于安全组默认没有开放 80 端口的访问,需要给安全组添加 80 端口的 ACL
71 |
72 | 首先获取安全组ID
73 |
74 | ```bash
75 | kubectl get -o json pod wordpress |grep "k8s.aliyun.com/eci-security-group"
76 | ```
77 |
78 | 预期返回
79 | ```shell
80 | "k8s.aliyun.com/eci-security-group": "sg-2zef08a606ey91******"
81 | ```
82 |
83 | 对安全组进行授权操作
84 | ```bash
85 | aliyun ecs AuthorizeSecurityGroup --RegionId ${Region ID} --SecurityGroupId ${安全组ID} --IpProtocol tcp --PortRange 80/80 --SourceCidrIp 0.0.0.0/0 --Priority 100
86 | ```
87 | 请根据实际替换上述命令的Region ID和安全组ID。命令示例如下:
88 | ```bash
89 | aliyun ecs AuthorizeSecurityGroup --RegionId cn-hangzhou --SecurityGroupId sg-2zef08a606ey91****** --IpProtocol tcp --PortRange 80/80 --SourceCidrIp 0.0.0.0/0 --Priority 100
90 | ```
91 |
92 | ## 使用 WordPress
93 |
94 | 在浏览器起输入上一步获取到的 EIP 地址,即可开始使用 WordPress
--------------------------------------------------------------------------------
/eci-spark/wordcount-spark-driver-svc.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: ServiceAccount
3 | metadata:
4 | name: spark-serverless
5 | namespace: default
6 |
7 | ---
8 | apiVersion: rbac.authorization.k8s.io/v1
9 | kind: ClusterRoleBinding
10 | metadata:
11 | name: spark-serverless-role
12 | roleRef:
13 | apiGroup: rbac.authorization.k8s.io
14 | kind: ClusterRole
15 | name: edit
16 | subjects:
17 | - kind: ServiceAccount
18 | name: spark-serverless
19 | namespace: default
20 |
21 | ---
22 | apiVersion: v1
23 | kind: Service
24 | metadata:
25 | name: wordcount-spark-driver-svc
26 | namespace: default
27 | annotations:
28 | service.beta.kubernetes.io/alibaba-cloud-private-zone-enable: "true"
29 | spec:
30 | clusterIP: None
31 | ports:
32 | - name: driver-rpc-port
33 | port: 7078
34 | protocol: TCP
35 | targetPort: 7078
36 | - name: blockmanager
37 | port: 7079
38 | protocol: TCP
39 | targetPort: 7079
40 | selector:
41 | spark-app-selector: spark-9b7952456a86413b94c70fe2b3f8496c
42 | spark-role: driver
43 | sessionAffinity: None
44 | type: ClusterIP
45 |
46 | ---
47 | apiVersion: v1
48 | kind: Pod
49 | metadata:
50 | annotations:
51 | spark-app-name: WordCount
52 | k8s.aliyun.com/eci-image-cache: "true"
53 | labels:
54 | spark-app-selector: spark-9b7952456a86413b94c70fe2b3f8496c
55 | spark-role: driver
56 | name: wordcount-spark-driver
57 | namespace: default
58 | spec:
59 | containers:
60 | - args:
61 | - driver
62 | env:
63 | - name: SPARK_DRIVER_MEMORY
64 | value: 1g
65 | - name: SPARK_DRIVER_CLASS
66 | value: com.aliyun.liumi.spark.example.WordCount
67 | - name: SPARK_DRIVER_ARGS
68 | - name: SPARK_DRIVER_BIND_ADDRESS
69 | valueFrom:
70 | fieldRef:
71 | apiVersion: v1
72 | fieldPath: status.podIP
73 | - name: SPARK_MOUNTED_CLASSPATH
74 | value: >-
75 | /opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar:/opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar
76 | - name: SPARK_JAVA_OPT_0
77 | value: '-Dspark.submit.deployMode=cluster'
78 | - name: SPARK_JAVA_OPT_1
79 | value: '-Dspark.driver.blockManager.port=7079'
80 | - name: SPARK_JAVA_OPT_2
81 | value: '-Dspark.master=k8s://https://47.99.132.xxx:6443'
82 | - name: SPARK_JAVA_OPT_3
83 | value: '-Dspark.app.id=spark-9b7952456a86413b94c70fe2b3f8496c'
84 | - name: SPARK_JAVA_OPT_4
85 | value: '-Dspark.kubernetes.authenticate.driver.serviceAccountName=spark'
86 | - name: SPARK_JAVA_OPT_5
87 | value: >-
88 | -Dspark.kubernetes.driver.pod.name=wordcount-spark-driver
89 | - name: SPARK_JAVA_OPT_6
90 | value: '-Dspark.app.name=WordCount'
91 | - name: SPARK_JAVA_OPT_7
92 | value: >-
93 | -Dspark.kubernetes.container.image=registry.cn-beijing.aliyuncs.com/liumi/spark:2.3.0-hdfs-1.0
94 | - name: SPARK_JAVA_OPT_8
95 | value: '-Dspark.executor.instances=10'
96 | - name: SPARK_JAVA_OPT_9
97 | value: >-
98 | -Dspark.jars=/opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar,/opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar
99 | - name: SPARK_JAVA_OPT_10
100 | value: >-
101 | -Dspark.driver.host=wordcount-spark-driver-svc.default.svc.cluster.local.c132a4a4826814d579c14bf2c5cf933af
102 | - name: SPARK_JAVA_OPT_11
103 | value: >-
104 | -Dspark.kubernetes.executor.podNamePrefix=wordcount-spark
105 | - name: SPARK_JAVA_OPT_12
106 | value: '-Dspark.driver.port=7078'
107 | - name: SPARK_JAVA_OPT_13
108 | value: >-
109 | -Dspark.kubernetes.executor.annotation.k8s.aliyun.com/eci-image-cache=true
110 | - name: SPARK_JAVA_OPT_14
111 | value: >-
112 | -Dspark.kubernetes.allocation.batch.size=10
113 |
114 | image: 'registry.cn-beijing.aliyuncs.com/liumi/spark:2.3.0-hdfs-1.0'
115 | imagePullPolicy: IfNotPresent
116 | name: spark-kubernetes-driver
117 | resources:
118 | limits:
119 | memory: 16384Mi
120 | requests:
121 | cpu: '8'
122 | memory: 16Gi
123 | terminationMessagePath: /dev/termination-log
124 | terminationMessagePolicy: File
125 |
126 | dnsPolicy: None
127 | dnsConfig:
128 | nameservers:
129 | - 100.100.2.136
130 | - 100.100.2.138
131 | searches:
132 | - default.svc.cluster.local.c132a4a4826814d579c14bf2c5cf933af
133 | - svc.cluster.local.c132a4a4826814d579c14bf2c5cf933af
134 | - cluster.local.c132a4a4826814d579c14bf2c5cf933af
135 | - c132a4a4826814d579c14bf2c5cf933af
136 | options:
137 | - name: ndots
138 | value: "5"
139 | hostAliases:
140 | - ip: "47.99.132.xxx"
141 | hostnames:
142 | - "kubernetes.default.svc"
143 | priority: 0
144 | restartPolicy: Never
145 | serviceAccount: spark-serverless
146 | serviceAccountName: spark-serverless
147 | terminationGracePeriodSeconds: 30
148 | tolerations:
149 | - effect: NoExecute
150 | key: node.kubernetes.io/not-ready
151 | operator: Exists
152 | tolerationSeconds: 300
153 | - effect: NoExecute
154 | key: node.kubernetes.io/unreachable
155 | operator: Exists
156 | tolerationSeconds: 300
--------------------------------------------------------------------------------
/eci-spark/README.md:
--------------------------------------------------------------------------------
1 | ## 背景
2 |
3 | 自2003年Google的三大核心技术GFS(03)、MapReduce(04)、和BigTable(06)的论文陆续发表至今,以Hadoop为代表的大数据处理框架,开始登上历史的舞台,迎来了一个黄金时代。Apache Hadoop是其中最为成功的开源项目,让企业级的大数据处理能力变得唾手可得。围绕Hadoop的学术研究和工业界的探索在过去的十多年里一直保持着火热。
4 |
5 | 而在另一条时间线上,容器技术在Docker问世后,终于等来了快速发展的6年。与此同时,Kubernetes作为容器编排的开源系统,在过去几年经过一番混战,并借助CNCF社区的推动以及云原生的兴起,也很快成为了业界容器编排的事实标准。如今,几乎所有的云厂商都有一套围绕Kubernetes的容器生态,例如我们阿里云就有ACK、ASK(Serverless Kubernetes)、EDAS、以及ECI(阿里云弹性容器实例)。
6 |
7 | 
8 |
9 | Data from Google Trends
10 |
11 | ASF (Apache Software Foundation) 和CNCF(Cloud Native Computing Foundation),两大相对独立的阵营悄然步入到了一个历史的拐点,我们都期待他们之间会碰撞出怎样的火花。显然,[Spark2.3.0]() 开始尝试原生支持on Kubernetes就是一个重要的时间节点。本文就是主要分享最近调研Spark on Kubernetes的一些总结。
12 |
13 |
14 |

15 |
16 |
17 |
18 |
19 | ## 从Hadoop说起
20 |
21 | Hadoop主要包含以下两个部分:Hadoop Distributed File System (HDFS) 和一个分布式计算引擎,该引擎就是Google的 MapReduce思想的一个实现 。Hadoop一度成为了大规模分布式数据存储和处理的标椎。
22 |
23 | ### Hadoop to Spark
24 |
25 | Hadoop在被业界广泛使用的同时,也一直存在很多的问题:
26 |
27 | 1、只支持Map和Reduce算子,复杂的算法、业务逻辑很难表达,最终只能将逻辑写入算子里面,除了代码不宜维护,还导致调度上没有任何优化空间,只能根据任务数单一纬度来调度。
28 |
29 | 2、计算的中间结果也要存入HDFS,不必要的IO开销。
30 |
31 | 3、 TaskTracker 将资源划分为map slot和reduce slot,不够灵活,当缺少某个stage的时候会严重降低资源利用率。
32 |
33 | 4、…
34 |
35 | 关于Hadoop的研究也基本是围绕资源调度、MapReduce计算模式、HDFS存储、以及通用性等方面的优化,Spark便是众多衍生系统中最成功的一个。甚至可以说是里程碑级别的,从此关于Hadoop的研究沉寂了很多。2009年由加州大学伯克利分校的AMPLab开发的Spark问世,便很快成为Apache的顶级开源项目。[Apache Spark](https://spark.apache.org/) 是一个基于内存计算、支持远比MapReduce复杂算子、涵盖批流等多种场景的大数据处理框架。
36 |
37 | 
38 |
39 | Spark 模块关系图
40 |
41 | 梳理下Spark中一些主要的概念:
42 |
43 | - **Application**:Spark Application的概念和Hadoop中的 MapReduce类似,指的是用户编写的 Spark 应用程序,相比于Hadoop支持更丰富的算子,而且利用内建的各种库可以很方便开发机器学习、图计算等领域的应用。
44 |
45 | - **Job**:由大量的Task组成的并行计算作业,一个作业通常包含一批RDD及作用于相应RDD上的各种算子。
46 |
47 | - **Stage**:每个作业都会被拆分成很多组Task,每组Task即为一个TaskSet,也被称为Stage,一个作业分为多个Stage。
48 |
49 | - **Task**: 被指定到某个Executor上的执行的任务,Task可以理解为一段逻辑,等待被调度到Excutor的某个线程中执行。
50 |
51 | - **Operations**:即算子,分为1)Action,比如:reduce、collect、count等;2)Transformation,比如:map、join、reduceByKey等。Action会将整个作业切割成多个Stage。
52 |
53 | - **Executor**:Application运行在Worker节点上的一个进程,该进程负责运行Task,每个Application都有各自的一批Executor。Executor的数量可以静态设定好,也可以采用动态资源分配。
54 |
55 | - **Driver**:Spark中的Driver根据提交的Application创建SparkContext,即准备程序的运行环境。SparkContext负责和ClusterManager通信,进行资源的申请、任务的分配等;当所有Executor全部执行完毕后,Driver负责将SparkContext关闭。
56 |
57 | - **Worker**:集群中任何可以运行Application任务的节点。
58 |
59 | - **Cluster Manager**:集群中调度资源的服务。Standalone模式下为Master;Yarn模式下为Yarn中的ResourceManager。
60 |
61 |
62 |
63 | ### Hadoop to YARN
64 |
65 | 早期的Hadoop大规模集群也可以达到几千个节点,当数据处理需求不断增长的时候,粗暴的增加节点已经让原生调度系统非常吃力。Application管理和Resource管理的逻辑全部放在Hadoop的 JobTracker中,而 JobTracker又不具备横向扩展的能力,这让JobTracker不负重堪。需要一套方案能将Application管理和Resource管理职责分开,能将计算模式和 JobTracker解耦,YARN就是在这样的背景下诞生的。如今我们常听到的Hadoop其实已经是指Yarn了。
66 |
67 |
68 |

69 |
70 |
71 | Yarn 在集群的角色
72 |
73 |
74 |
75 | 
76 |
77 | Yarn 模块关系图
78 |
79 | Spark调度在最初设计的时候,就是开放式的,而且调度模块之间的关系跟YARN的概念非常吻合。
80 |
81 | Spark Master和ResourceManager对应,Spark Worker和NodeManager对应,Spark Driver和Application Master对应,Spark Executor和Container对应。每个Executor能并行运行Task的数量就取决于分配给它的Container的CPU核数。
82 |
83 | Client提交一个应用给 Yarn ResourceManager后, Application Manager接受请求并找到一个Container创建该应用对应的Application Master,Application Master会向ResourceManager注册自己,以便client访问。Application Master上运行的就是Spark Driver。Application Master申请 Container并启动,Spark Driver然后在Container里启动 Spark Executor,并调度Spark Task到Spark Executor上的线程执行。等到所有的Task执行完毕后,Application Master取消注册并释放资源。
84 |
85 | #### 带来的好处
86 |
87 | 1、YARN作为集群统一的资源调度和应用管理层,降低了资源管理的复杂性的同时,对所有应用类型都是开放的,即支持混部MapReduce、Spark等,能提高整个集群的资源利用率。
88 |
89 | 2、两级调度方式,大大降低了ResourceManager的压力,增加了集群的扩展能力。
90 |
91 | 3、计算模式和资源调度解耦。在调度层,屏蔽了MapReduce、Spark、Flink等框架的计算模式的差异,让这些框架都只用专注于计算性能的优化。
92 |
93 | 4、可以使用YARN的高级功能,比如:1)原生FIFO之外的调度策略: [CapacityScheduler](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) & [FairScheduler](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html);2)基于队列的资源隔离与分配等。
94 |
95 |
96 |
97 | ### YARN to Kubernetes
98 |
99 | Hadoop和Spark能成为现在使用最广泛的大数据处理框架,离不开Yarn的强大。虽然也有人诟病它的悲观锁导致并发粒度小、二层调度资源可见性等问题,但是除此之外,Yarn就本身来说并没有什么大的缺陷,依然是大数据领域的调度底座的首选。历史往往就是如此,霸主都不是被对手干到,而是被那些一开始看似其他领域的新兴力量淘汰。这就是如今谷歌主导的kubernetes生态发展到一定的程度之后,Yarn必然要去面对的挑战:如果未来,一家公司80%的业务都已经统一在Kubernetes上跑,它还会原意为剩下的20%的大数据的业务单独维护一个Yarn集群么?
100 |
101 | #### Kubernetes的优势
102 |
103 | Spark on kubernetes相比于on YARN等传统部署方式的优势:
104 |
105 | 1、统一的资源管理。不论是什么类型的作业都可以在一个统一kubernetes的集群运行。不再需要单独为大数据作业维护一个独立的YARN集群。
106 |
107 | 2、弹性的集群基础设施。资源层和应用层提供了丰富的弹性策略,我们可以根据应用负载需求选择 ECS 虚拟机、神龙裸金属和 GPU 实例进行扩容,除了kubernetes集群本生具备的强大的扩缩容能力,还可以对接生态,比如virtual kubelet。
108 |
109 | 3、轻松实现复杂的分布式应用的资源隔离和限制,从YRAN复杂的队列管理和队列分配中解脱。
110 |
111 | 4、容器化的优势。每个应用都可以通过docker镜像打包自己的依赖,运行在独立的环境,甚至包括Spark的版本,所有的应用之间都是隔离的。
112 |
113 | 5、大数据上云。目前大数据应用上云常见的方式有两种:1)用ECS自建YARN(不限于YARN)集群;2)购买EMR服务。如今多了一个选择——Kubernetes。既能获得完全的集群级别的掌控,又能从复杂的集群管理、运维中解脱,还能享受云所带来的弹性和成本优势。
114 |
115 | Spark自2.3.0开始试验性支持Standalone、on YARN以及on Mesos之外的新的部署方式:[Running Spark on Kubernetes]() ,并在后续的发行版中不断地加强。
116 |
117 |
118 |
119 | 后文将是实际的操作,分别让Spark应用在普通的Kubernetes集群、Serverless Kubernetes集群、以及Kubernetes + virtual kubelet等三种场景中部署并运行。
120 |
121 |
122 |
123 | ## Spark on Kubernetes
124 |
125 | ### 准备数据以及Spark应用镜像
126 | #### 参考:
127 |
128 | [在ECI中访问HDFS的数据](https://help.aliyun.com/document_detail/146235.html)
129 |
130 | [在ECI中访问OSS的数据](https://help.aliyun.com/document_detail/146237.html)
131 |
132 |
133 |
134 | ### 创建kubernetes集群
135 |
136 | 如果已经有阿里云的ACK集群,该步可以忽略。
137 |
138 | 具体的创建流程参考:[创建Kubernetes 托管版集群]()。
139 |
140 |
141 |
142 | ### 提交作业
143 |
144 | #### 为Spark创建一个RBAC的role
145 |
146 | 创建账号(默认namespace)
147 |
148 | ```bash
149 | kubectl create serviceaccount spark
150 | ```
151 |
152 | 绑定角色
153 |
154 | ```bash
155 | kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
156 | ```
157 |
158 | #### 直接使用spark-submit提交(不推荐的提交方式)
159 |
160 | ```bash
161 | liumihustdeMacBook-Pro:spark-on-k8s liumihust$ ./spark-2.3.0-bin-hadoop2.6/bin/spark-submit
162 | --master k8s://121.199.47.XX:6443
163 | --deploy-mode cluster
164 | --name WordCount
165 | --class com.aliyun.liumi.spark.example.WordCount
166 | --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
167 | --conf spark.executor.instances=2
168 | --conf spark.kubernetes.container.image=registry.cn-beijing.aliyuncs.com/liumi/spark:2.4.4-example
169 | local:///opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar
170 | ```
171 |
172 | #### 参数解释
173 |
174 | --master :k8s集群的apiserver,这是决定spark是在k8s集群跑,还是在yarn上跑。
175 |
176 | --deploy-mode:driver可以部署在集群的master节点(client)也可以在非master(cluster)节点。
177 |
178 | spark.executor.instances: executor的数量
179 |
180 | spark.kubernetes.container.image spark打包镜像(包含driver、excutor、应用,也支持单独配置)
181 |
182 |
183 |
184 | #### 提交基本流程
185 | 
186 |
187 | Running Spark on Kubernetes
188 |
189 | 1. Spark先在k8s集群中创建Spark Driver(pod)。
190 |
191 | 2. Driver起来后,调用k8s API创建Executors(pods),Executors才是执行作业的载体。
192 |
193 | 3. 作业计算结束,Executor Pods会被自动回收,Driver Pod处于Completed状态(终态)。可以供用户查看日志等。
194 |
195 | 4. Driver Pod只能被用户手动清理,或者被k8s GC回收。
196 |
197 |
198 |
199 | #### 结果分析
200 |
201 | 执行过程中的截图如下:
202 | 
203 |
204 |
205 |
206 | 我们30G的数据用2个1C1G的Excutor处理了大约20分钟。
207 |
208 | 作业运行结束后查看结果:
209 |
210 | ```bash
211 | [root@liumi-hdfs ~]# $HADOOP_HOME/bin/hadoop fs -cat /pod/data/A-Game-of-Thrones-Result/*
212 | (142400000,the)
213 | (78400000,and)
214 | (77120000,)
215 | (62200000,to)
216 | (56690000,of)
217 | (56120000,a)
218 | (43540000,his)
219 | (35160000,was)
220 | (30480000,he)
221 | (29060000,in)
222 | (26640000,had)
223 | (26200000,her)
224 | (23050000,as)
225 | (22210000,with)
226 | (20450000,The)
227 | (19260000,you)
228 | (18300000,I)
229 | (17510000,she)
230 | (16960000,that)
231 | (16450000,He)
232 | (16090000,not)
233 | (15980000,it)
234 | (15080000,at)
235 | (14710000,for)
236 | (14410000,on)
237 | (12660000,but)
238 | (12470000,him)
239 | (12070000,is)
240 | (11240000,from)
241 | (10300000,my)
242 | (10280000,have)
243 | (10010000,were)
244 | ```
245 |
246 | 至此,已经能在kubernetes集群部署并运行spark作业。
247 |
248 |
249 |
250 | ## Spark on Serverless Kubernetes
251 |
252 | Serverless Kubernetes (ASK) 相比于普通的kubernetes集群,比较大的一个优势是,提交作业前无需提前预留任何资源,无需关心集群的扩缩容,所有资源都是随作业提交自动开始申请,作业执行结束后自动释放。作业执行完后就只剩一个SparkApplication和终态的Driver pod(只保留管控数据)。原理图如下图所示:
253 | 
254 |
255 | Running Spark on Serverless Kubernetes
256 |
257 | ASK通过virtual kubelet调度pod到阿里云弹性容器实例。虽然架构上跟ACK有明显的差异,但是两者都是全面兼容kubernetes标准的。所以on ASK跟前面的spark on kubernetes准备阶段的基本是一致的,即HDFS数据准备,spark base镜像的准备、spark应用镜像的准备等。主要就是作业提交方式稍有不同,以及一些额外的基本环境配置。
258 |
259 |
260 |
261 | ### 创建serverless kubernetes集群
262 |
263 | 选择标准serverless集群:
264 | 
265 |
266 | 基本参数:
267 |
268 | 1、自定义集群名。
269 |
270 | 2、选择地域、以及可用区。
271 |
272 | 3、专有网络可以用已有的也可以由容器服务自动创建的。
273 |
274 | 4、是否公网暴露API server,如有需求建议开启。
275 |
276 | 5、开启privatezone,必须开启。
277 |
278 | 6、日志收集,建议开启。
279 | 
280 |
281 |
282 | ##### 注:
283 |
284 | 1、提交之前一定要升级集群的集群的virtual kubelet的版本(新建的集群可以忽略),只有目前最新版的VK才能跑Spark作业。
285 |
286 | 2、ASK集群依赖privatezone做服务发现,所以集群不需要开启privatezone,创建的时候需要勾选。如果创建的时候没有勾选,需要联系我们帮开启。不然Spark excutor会找不到driver service。
287 |
288 |
289 |
290 | ### *制作镜像cache
291 |
292 | 由于后面可能要进行大规模启动,为了提高容器启动速度,提前将Spark应用的镜像缓存到ECI本地,采用k8s标准的CRD的方式,具体的流程参考:[使用CRD加速创建Pod]()
293 |
294 |
295 |
296 | ### 提交:
297 |
298 | 由于spark submit目前支持的参数非常有限,所以ASK场景中不建议使用spark submit直接提交,而是使用[Spark Operator]() 。在Spark Operator出现之前,也可以采用kubernetes原生的yaml方式提交。后面会分别介绍这两种不同的方式。
299 | #### 方式一:原生的方式,编写yaml
300 |
301 | 编写自定义的标准的kubernetes yaml创建资源。
302 |
303 | 我所测试的完整的yaml文件如下(基于Spark 2.3.0):
304 |
305 | wordcount-spark-driver-svc.yaml:
306 |
307 | ```yaml
308 | apiVersion: v1
309 | kind: ServiceAccount
310 | metadata:
311 | name: spark-serverless
312 | namespace: default
313 |
314 | ---
315 | apiVersion: rbac.authorization.k8s.io/v1
316 | kind: ClusterRoleBinding
317 | metadata:
318 | name: spark-serverless-role
319 | roleRef:
320 | apiGroup: rbac.authorization.k8s.io
321 | kind: ClusterRole
322 | name: edit
323 | subjects:
324 | - kind: ServiceAccount
325 | name: spark-serverless
326 | namespace: default
327 |
328 | ---
329 | apiVersion: v1
330 | kind: Service
331 | metadata:
332 | name: wordcount-spark-driver-svc
333 | namespace: default
334 | annotations:
335 | service.beta.kubernetes.io/alibaba-cloud-private-zone-enable: "true"
336 | spec:
337 | clusterIP: None
338 | ports:
339 | - name: driver-rpc-port
340 | port: 7078
341 | protocol: TCP
342 | targetPort: 7078
343 | - name: blockmanager
344 | port: 7079
345 | protocol: TCP
346 | targetPort: 7079
347 | selector:
348 | spark-app-selector: spark-9b7952456a86413b94c70fe2b3f8496c
349 | spark-role: driver
350 | sessionAffinity: None
351 | type: ClusterIP
352 |
353 | ---
354 | apiVersion: v1
355 | kind: Pod
356 | metadata:
357 | annotations:
358 | spark-app-name: WordCount
359 | k8s.aliyun.com/eci-image-cache: "true"
360 | labels:
361 | spark-app-selector: spark-9b7952456a86413b94c70fe2b3f8496c
362 | spark-role: driver
363 | name: wordcount-spark-driver
364 | namespace: default
365 | spec:
366 | containers:
367 | - args:
368 | - driver
369 | env:
370 | - name: SPARK_DRIVER_MEMORY
371 | value: 1g
372 | - name: SPARK_DRIVER_CLASS
373 | value: com.aliyun.liumi.spark.example.WordCount
374 | - name: SPARK_DRIVER_ARGS
375 | - name: SPARK_DRIVER_BIND_ADDRESS
376 | valueFrom:
377 | fieldRef:
378 | apiVersion: v1
379 | fieldPath: status.podIP
380 | - name: SPARK_MOUNTED_CLASSPATH
381 | value: >-
382 | /opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar:/opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar
383 | - name: SPARK_JAVA_OPT_0
384 | value: '-Dspark.submit.deployMode=cluster'
385 | - name: SPARK_JAVA_OPT_1
386 | value: '-Dspark.driver.blockManager.port=7079'
387 | - name: SPARK_JAVA_OPT_2
388 | value: '-Dspark.master=k8s://https://47.99.132.xxx:6443'
389 | - name: SPARK_JAVA_OPT_3
390 | value: '-Dspark.app.id=spark-9b7952456a86413b94c70fe2b3f8496c'
391 | - name: SPARK_JAVA_OPT_4
392 | value: '-Dspark.kubernetes.authenticate.driver.serviceAccountName=spark'
393 | - name: SPARK_JAVA_OPT_5
394 | value: >-
395 | -Dspark.kubernetes.driver.pod.name=wordcount-spark-driver
396 | - name: SPARK_JAVA_OPT_6
397 | value: '-Dspark.app.name=WordCount'
398 | - name: SPARK_JAVA_OPT_7
399 | value: >-
400 | -Dspark.kubernetes.container.image=registry.cn-beijing.aliyuncs.com/liumi/spark:2.3.0-hdfs-1.0
401 | - name: SPARK_JAVA_OPT_8
402 | value: '-Dspark.executor.instances=10'
403 | - name: SPARK_JAVA_OPT_9
404 | value: >-
405 | -Dspark.jars=/opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar,/opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar
406 | - name: SPARK_JAVA_OPT_10
407 | value: >-
408 | -Dspark.driver.host=wordcount-spark-driver-svc.default.svc.cluster.local.c132a4a4826814d579c14bf2c5cf933af
409 | - name: SPARK_JAVA_OPT_11
410 | value: >-
411 | -Dspark.kubernetes.executor.podNamePrefix=wordcount-spark
412 | - name: SPARK_JAVA_OPT_12
413 | value: '-Dspark.driver.port=7078'
414 | - name: SPARK_JAVA_OPT_13
415 | value: >-
416 | -Dspark.kubernetes.executor.annotation.k8s.aliyun.com/eci-image-cache=true
417 | - name: SPARK_JAVA_OPT_14
418 | value: >-
419 | -Dspark.kubernetes.allocation.batch.size=10
420 |
421 | image: 'registry.cn-beijing.aliyuncs.com/liumi/spark:2.3.0-hdfs-1.0'
422 | imagePullPolicy: IfNotPresent
423 | name: spark-kubernetes-driver
424 | resources:
425 | limits:
426 | memory: 16384Mi
427 | requests:
428 | cpu: '8'
429 | memory: 16Gi
430 | terminationMessagePath: /dev/termination-log
431 | terminationMessagePolicy: File
432 |
433 | dnsPolicy: None
434 | dnsConfig:
435 | nameservers:
436 | - 100.100.2.136
437 | - 100.100.2.138
438 | searches:
439 | - default.svc.cluster.local.c132a4a4826814d579c14bf2c5cf933af
440 | - svc.cluster.local.c132a4a4826814d579c14bf2c5cf933af
441 | - cluster.local.c132a4a4826814d579c14bf2c5cf933af
442 | - c132a4a4826814d579c14bf2c5cf933af
443 | options:
444 | - name: ndots
445 | value: "5"
446 | hostAliases:
447 | - ip: "47.99.132.xxx"
448 | hostnames:
449 | - "kubernetes.default.svc"
450 | priority: 0
451 | restartPolicy: Never
452 | serviceAccount: spark-serverless
453 | serviceAccountName: spark-serverless
454 | terminationGracePeriodSeconds: 30
455 | tolerations:
456 | - effect: NoExecute
457 | key: node.kubernetes.io/not-ready
458 | operator: Exists
459 | tolerationSeconds: 300
460 | - effect: NoExecute
461 | key: node.kubernetes.io/unreachable
462 | operator: Exists
463 | tolerationSeconds: 300
464 | ```
465 |
466 | yaml文件里定义了四个资源:
467 |
468 | **ServiceAccount**:spark-serverless,Driver需要在pod里面访问集群的api server,所以需要创建一个ServiceAccount。不用每次提交都创建。
469 |
470 | **ClusterRoleBinding**:spark-serverless-role,将RBAC的role绑定到这个ServiceAccount,赋予操作资源的权限。不用每次提交都创建。
471 |
472 | **Service**:Driver service,暴露Driver pod。Excutor 就是通过这个service访问Driver的。
473 |
474 | **Pod**:Driver pod,不用定义Excutor pod yaml,Excutor pod的参数通过Driver的环境变量来设置Dspark.kubernetes.*实现。
475 |
476 | kubectl 提交:
477 |
478 | ```bash
479 | liumihustdeMacBook-Pro:spark-on-k8s liumihust$ kubectl create -f wordcount-spark-driver-svc.yaml
480 | serviceaccount/spark-serverless created
481 | clusterrolebinding.rbac.authorization.k8s.io/spark-serverless-role created
482 | service/wordcount-spark-driver-svc created
483 | pod/wordcount-spark-driver created
484 | ```
485 |
486 | #### 方式二:Spark Operator
487 |
488 | 前面直接通过k8s yaml申明的方式,也能直接利用kubernetes的原生调度来跑Spark的作业,在任何集群只要稍加修改就可以用,但问题是:1)不好维护,涉及的自定义参数比较多,且不够直观(尤其对于只熟悉Spark的用户);2)没有了Spark Application的概念了,都是裸的pod和service,当应用多的时候,维护成本就上来了,缺少统一管理的机制。
489 |
490 | [Spark Operator]() 就是为了解决在Kubernetes集群部署并维护Spark应用而开发的,Spark Operator是经典的CRD + Controller,即Kubernetes Operator的实现。Kubernetes Operator诞生的故事也很具有传奇色彩,有兴趣的同学可以了解下 。Operator的出现可以说给有状态的、特定领域的复杂应用 on Kubernetes 打开了一扇窗,Spark Operator便是其中具有代表性的一个。
491 |
492 | 
493 |
494 | Spark Operator几个主要的概念:
495 |
496 | **SparkApplication**:标准的k8s CRD,有CRD就有一个Controller 与之对应。Controller负责监听CRD的创建、更新、以及删除等事件,并作出对应的Action。
497 |
498 | **ScheduledSparkApplication**:SparkApplication的升级,支持带有自定义时间调度策略的作业提交,比如cron。
499 |
500 | **Submission runner**:对Controller发起的创建请求提交spark-submit。
501 |
502 | **Spark pod monitor**:监听Spark pods的状态和事件更新并告知Controller。
503 |
504 |
505 |
506 | ##### 安装Spark Operator
507 |
508 | 推荐用 [helm 3.0]()
509 |
510 | ```bash
511 | helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
512 | helm install incubator/sparkoperator --namespace default --set operatorImageName=registry.cn-hangzhou.aliyuncs.com/eci_open/spark-operator --set operatorVersion=v1beta2-1.0.1-2.4.4 --generate-name --set enableWebhook=true
513 | ```
514 |
515 | 安装完成后可以看到集群多了个spark operator pod。
516 | 
517 |
518 | 选项说明:
519 |
520 | 1、--set operatorImageName:指定operator镜像,默认的google的镜像阿里云ECI内拉不下来,可以先拉取到本地然后推到ACR。
521 |
522 | 2、--set operatorVersion operator:镜像仓库名和版本不要写在一起。
523 |
524 | 3、--generate-name 可以不用显式设置安装名。
525 |
526 | 4、--set enableWebhook 默认不会打开,对于需要使用ACK+ECI的用户,会用到nodeSelector、tolerations这些高级特性,Webhook 必须要打开,后面会讲到。
527 |
528 | ##### 注:
529 |
530 | 创建spark operator的时候,一定要确保镜像能拉下来,推荐直接使用eci_open提供的镜像,因为spark operator卸载的时候也是用相同的镜像启动job进行清理,如果镜像拉不下来清理job也会卡主,导致所有的资源都要手动清理,比较麻烦。
531 |
532 |
533 |
534 | 申明wordcount SparkApplication:
535 |
536 | ```yaml
537 | apiVersion: "sparkoperator.k8s.io/v1beta2"
538 | kind: SparkApplication
539 | metadata:
540 | name: wordcount
541 | namespace: default
542 | spec:
543 | type: Java
544 | mode: cluster
545 | image: "registry.cn-beijing.aliyuncs.com/liumi/spark:2.4.4-example"
546 | imagePullPolicy: IfNotPresent
547 | mainClass: com.aliyun.liumi.spark.example.WordCount
548 | mainApplicationFile: "local:///opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar"
549 | sparkVersion: "2.4.4"
550 | restartPolicy:
551 | type: OnFailure
552 | onFailureRetries: 2
553 | onFailureRetryInterval: 5
554 | onSubmissionFailureRetries: 2
555 | onSubmissionFailureRetryInterval: 10
556 | timeToLiveSeconds: 36000
557 | sparkConf:
558 | "spark.kubernetes.allocation.batch.size": "10"
559 |
560 | driver:
561 | cores: 2
562 | memory: "4096m"
563 | labels:
564 | version: 2.4.4
565 | spark-app: spark-wordcount
566 | role: driver
567 | annotations:
568 | k8s.aliyun.com/eci-image-cache: "true"
569 | serviceAccount: spark
570 | executor:
571 | cores: 1
572 | instances: 100
573 | memory: "1024m"
574 | labels:
575 | version: 2.4.4
576 | role: executor
577 | annotations:
578 | k8s.aliyun.com/eci-image-cache: "true"
579 | ```
580 |
581 | 注:大部分的参数都可以直接通过SparkApplication CRD已经支持的参数设置,目前支持的所有参数参考:[SparkApplication CRD](),此外还支持直接以sparkConf形式的传入。
582 |
583 | ##### 提交:
584 |
585 | ```bash
586 | kubectl create -f wordcount-operator-example.yaml
587 | ```
588 |
589 |
590 | ### 结果分析
591 |
592 | 我们是100个1C1G的Excutor并发启动,应用的镜像大小约为 500 MB。
593 |
594 | 作业执行过程截图:
595 | 
596 | 
597 |
598 | 可以看到并发启动的100个pod基本在30s内可以完成全部的启动,其中93%可以在20秒内完成启动。
599 |
600 | 看下作业执行时间(包括了vk调度100个Excutor pod时间、每个Excutor pod资源准备的时间、以及作业实际执行的时间等):
601 |
602 | ```yaml
603 | exitCode: 0
604 | finishedAt: '2019-11-16T07:31:59Z'
605 | reason: Completed
606 | startedAt: '2019-11-16T07:29:01Z'
607 | ```
608 |
609 | 可以看到总共只花了178S,时间降了一个数量级。
610 |
611 |
612 |
613 | ### ACK + ECI
614 |
615 | 在Spark中,Driver和Excutor之间的启动顺序是串行的。尽管ECI展现了出色的并发创建Executor pod的能力,但是ASK这种特殊架构会让Driver和Excutor之间的这种串行体现的比较明显,通常情况下在ECI启动一个Driver pod需要大约20s的时间,然后才是大规模的Excutor pod的启动。对于一些响应要求高的应用,Driver的启动速度可能比Excutor执行作业的耗时更重要。这个时候,我们可以采用ACK+ECI,即传统的Kubernetes集群 + virtual kubelet的方式:
616 | 
617 |
618 | 对于用户来说,只需如下简单的几步就可以将excutor调度到ECI的virtual node。
619 |
620 | #### 1、在ACK集群中安装ECI的virtual kubelet。
621 |
622 | 进入容器服务控制台的应用目录栏,搜索"ack-virtual-node":
623 |
624 | 
625 |
626 | 点击进入,选择要安装的集群。
627 | 
628 |
629 | 必填参数参考:
630 |
631 | ```yaml
632 | virtualNode:
633 | image:
634 | repository: registry.cn-hangzhou.aliyuncs.com/acs/virtual-nodes-eci
635 | tag: v1.0.0.1-aliyun
636 |
637 | affinityAdminssion:
638 | enabled: true
639 | image:
640 | repository: registry.cn-hangzhou.aliyuncs.com/ask/virtual-node-affinity-admission-controller
641 | tag: latest
642 |
643 | env:
644 | ECI_REGION: "cn-hangzhou" #集群所在的地域
645 | ECI_VPC: vpc-bp187fy2e7l123456 # 集群所在的vpc,和创建集群的时候保持一致即可,可以在集群概览页查看
646 | ECI_VSWITCH: vsw-bp1bqf53ba123456 # 资源所在的交换机,同上
647 | ECI_SECURITY_GROUP: sg-bp12ujq5zp12346 # 资源所在的安全组,同上
648 | ECI_ACCESS_KEY: XXXXX #账号AK
649 | ECI_SECRET_KEY: XXXXX #账号SK
650 | ALIYUN_CLUSTERID: virtual-kubelet
651 | ```
652 |
653 |
654 |
655 | #### 2、修改应用的yaml
656 |
657 | 为excutor增加如下参数即可:
658 |
659 | ```yaml
660 | nodeSelector:
661 | type: virtual-kubelet
662 | tolerations:
663 | - key: virtual-kubelet.io/provider
664 | operator: Exists
665 | ```
666 |
667 | 完整的应用参数如下:
668 |
669 | ```yaml
670 | apiVersion: "sparkoperator.k8s.io/v1beta2"
671 | kind: SparkApplication
672 | metadata:
673 | name: wordcount
674 | namespace: default
675 | spec:
676 | type: Java
677 | mode: cluster
678 | image: "registry.cn-beijing.aliyuncs.com/liumi/spark:2.4.4-example"
679 | imagePullPolicy: IfNotPresent
680 | mainClass: com.aliyun.liumi.spark.example.WordCount
681 | mainApplicationFile: "local:///opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar"
682 | sparkVersion: "2.4.4"
683 | restartPolicy:
684 | type: OnFailure
685 | onFailureRetries: 2
686 | onFailureRetryInterval: 5
687 | onSubmissionFailureRetries: 2
688 | onSubmissionFailureRetryInterval: 10
689 | timeToLiveSeconds: 36000
690 | sparkConf:
691 | "spark.kubernetes.allocation.batch.size": "10"
692 |
693 | driver:
694 | cores: 2
695 | memory: "4096m"
696 | labels:
697 | version: 2.4.4
698 | spark-app: spark-wordcount
699 | role: driver
700 | annotations:
701 | k8s.aliyun.com/eci-image-cache: "true"
702 | serviceAccount: spark
703 | executor:
704 | cores: 1
705 | instances: 100
706 | memory: "1024m"
707 | labels:
708 | version: 2.4.4
709 | role: executor
710 | annotations:
711 | k8s.aliyun.com/eci-image-cache: "true"
712 | #nodeName: virtual-kubelet
713 | nodeSelector:
714 | type: virtual-kubelet
715 | tolerations:
716 | - key: virtual-kubelet.io/provider
717 | operator: Exists
718 | ```
719 |
720 | 这样就可以将Driver调度到ACK,Excutor调度到ECI上,完美互补。
721 |
722 |
723 |
724 | #### 3、提交
725 |
726 | 效果如下:
727 | 
728 |
729 | 看下作业执行时间:
730 |
731 | ```yaml
732 | exitCode: 0
733 | finishedAt: '2019-11-16T07:25:05Z'
734 | reason: Completed
735 | startedAt: '2019-11-16T07:22:40Z'
736 | ```
737 |
738 | 总共花了145秒,更重要的是Driver直接在本地起,只花了约2秒的时间就启动了。
739 |
740 |
741 |
742 | ## 总结:
743 |
744 | 作业执行时间不是Kubernetes + ECI的绝对优势,如果在ACK上准备好足够的节点资源,也是可以达到这个水平的。
745 |
746 | 我们的优势是:
747 |
748 | ##### 1)弹性和成本
749 |
750 | 对于不管是采用ACK + ECI还是ASK+ECI的方式,提交作业前无需提前预留任何资源,无需关心集群的扩缩容,所有资源都是随作业提交自动开始申请,作业执行结束后自动释放。作业执行完后就只剩一个SparkApplication和终态的Driver pod(只保留管控数据)。除此之外,ACK + ECI的方式还提供了更丰富的调度选择:1)可以将Driver和Excutor分开调度;2)考虑作业类型、成本等因素选择不同的调度资源,以满足更广泛的使用场景。
751 |
752 | ##### 2)计算与存储分离
753 |
754 | 在Kubernetes中跑大数据一直很困扰的问题就是数据存储的问题,到了Serverless kubernetes这个问题就更突出。我们连节点都没有,就更不可能去搭建HDFS/Yarn集群。而事实上,在HDFS集群上跑Spark,已经不是必需的了,见引用[1, 2]。阿里云的HDFS存储也正解了我们这个痛点问题,经测试读写性能也非常不错。我们可以将计算和存储分离,即kubernetes集群中的作业可以直接原生访问HDFS的数据。除了HDFS,阿里云的NAS和OSS也是可选的数据存储。
755 |
756 | ##### 3)调度
757 |
758 | 调度通常可以分为以YARN为代表的两级调度和集中式调度。两级调度有一个中央调度器负责宏观层面的资源调度,而应用的细粒度调度则由下层调度器来完成。集中式调度则对所有的资源请求进行统一调度,Kubernetes的调度就是典型的代表,Kubernetes通过将整个集群的资源信息缓存到本地,利用本地的数据进行乐观调度,进而提高调度器的性能。
759 |
760 | 当前kubernetes集群的达到一定规模的时候,性能会到达瓶颈,引用[3]。YARN可以说是历经了大数据领域多年锤炼的成果,采用kubernetes原生调度器来调度Spark作业能否hold住还是一个问号。
761 |
762 | 而对于Serverless Kubernetes,就变成了类两级调度:对于kubernetes来说调度其实进行了极大的简化,调度器只用将资源统一调度到virtual kubelet,而实际的细粒度调度就下沉到了阿里云强大的弹性计算的调度。
763 |
764 | 当处理的数据量越大,突发启动Excutor pod规模越大的时候,我们的优势会越明显。
765 |
766 |
767 |
768 | ## 参考
769 |
770 | [1] [HDFS vs. Cloud Storage: Pros, cons and migration tips]()
771 |
772 | [2] [New release of Cloud Storage Connector for Hadoop: Improving performance, throughput and more]()
773 |
774 | [3] [Understanding Scalability and Performance in the Kubernetes Master]() , Xingyu Chen, Fansong Zeng Alibaba Cloud
775 |
776 |
777 |
778 | ## 附录
779 | ### Spark Base 镜像:
780 | 本样例采用的是谷歌提供的 gcr.io/spark-operator/spark:v2.4.4
781 |
782 | ECI已经帮拉取到ACR仓库,各地域地址如下:
783 |
784 | 公网地址:registry.{对应regionId}.aliyuncs.com/eci_open/spark:2.4.4
785 |
786 | vpc网络地址:registry-vpc.{对应regionId}.aliyuncs.com/eci_open/spark:2.4.4
787 |
788 |
789 | ### Spark Operator 镜像
790 | 本样例采用的是谷歌提供的 gcr.io/spark-operator/spark-operator:v1beta2-1.0.1-2.4.4
791 |
792 | ECI已经帮拉取到ACR仓库,各地域地址如下:
793 |
794 | 公网地址:registry.{对应regionId}.aliyuncs.com/eci_open/spark-operator:v1beta2-1.0.1-2.4.4
795 |
796 | vpc网络地址:registry-vpc.{对应regionId}.aliyuncs.com/eci_open/spark-operator:v1beta2-1.0.1-2.4.4
797 |
--------------------------------------------------------------------------------