├── README.md ├── eci-spark ├── pics │ ├── spark-1.png │ ├── spark-2.png │ ├── spark-3.png │ ├── spark-4.png │ ├── spark-25.png │ ├── 017c1cd4c74a5936acd4f9b93f089e81.png │ ├── 79efc84b99359e069b9e3e9d42e2dc8d.png │ ├── 7c5a3b2d598506b828c1d4707a08b4c8.png │ ├── a4235604c1c2c5cc9a19f089d73f426d.png │ ├── b645a2d7a0a5b7fe918cb24d9b22d592.png │ ├── f22c2388786e00b104d677b545c69bc9.png │ ├── 1574232926380-2e1dba72-2c79-4018-835c-4451f8e19feb.png │ ├── 1574232962771-ef09e6c7-c5f1-4dcf-97fc-39494bc7f14f.png │ ├── 1574233020714-fbe3f048-91c8-451f-87af-5b38f99a23c5.png │ ├── 1574233108538-d986c0b4-6846-49ac-9bb4-82d2044cc855.png │ ├── 1574233483142-9359d5e3-81c9-4154-8242-ed3a37a4e37b.png │ ├── 1574233499811-efed418f-649b-45f0-b035-cdb09a15fa3d.png │ ├── 1574233531691-89664643-2afe-40fe-8ac0-462a8dba1910.png │ ├── 1574233651888-ade8ea24-4e36-4189-817d-26572230970a.png │ ├── 1574233670142-d818a7c8-2edf-4d4f-ac68-816d18eb1b55.png │ ├── 1574233691702-743d4526-f45a-4f92-8b06-397d7086d5fc.png │ ├── 1574233705801-03b05378-4723-4584-ae8f-1d62beb971cd.png │ ├── 1574411721752-3d77457d-4aa1-4008-938f-eb291bf16ce6.png │ └── 1574432924198-263f5929-062d-4cfe-886a-7979dde56d21.png ├── wordcount-operator-example.yaml ├── wordcount-operator-example-ack.yaml ├── wordcount-spark-driver-svc.yaml └── README.md ├── eci-gitlab-runner ├── java-demo │ ├── Dockerfile │ ├── src │ │ └── main │ │ │ └── webapp │ │ │ ├── index.jsp │ │ │ └── WEB-INF │ │ │ └── web.xml │ ├── .gitignore │ ├── deployment.yaml │ ├── pom.xml │ └── .gitlab-ci.yml ├── mvn-pvc.yaml ├── nas-pvc.yaml ├── secret.yaml ├── imagecache.yaml ├── mvn-pv.yaml ├── nas-pv.yaml ├── gitlab-runner-deployment.yaml ├── imagecache-crd.yaml ├── README.md └── config-map.yaml ├── eci-gpu-tensorflow ├── imagecache.yaml └── gpu_pod.yaml ├── eci-wordpress ├── create.json ├── wordpress-all-in-one-pod.yaml └── README.md └── .gitignore /README.md: -------------------------------------------------------------------------------- 1 | # BestPractice-Serverless-Kubernetes -------------------------------------------------------------------------------- /eci-spark/pics/spark-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/spark-1.png -------------------------------------------------------------------------------- /eci-spark/pics/spark-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/spark-2.png -------------------------------------------------------------------------------- /eci-spark/pics/spark-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/spark-3.png -------------------------------------------------------------------------------- /eci-spark/pics/spark-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/spark-4.png -------------------------------------------------------------------------------- /eci-spark/pics/spark-25.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/spark-25.png -------------------------------------------------------------------------------- /eci-gitlab-runner/java-demo/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM registry.cn-beijing.aliyuncs.com/acs-sample/tomcat 2 | ADD target/demo.war /usr/local/tomcat/webapps/demo.war 3 | -------------------------------------------------------------------------------- /eci-spark/pics/017c1cd4c74a5936acd4f9b93f089e81.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/017c1cd4c74a5936acd4f9b93f089e81.png -------------------------------------------------------------------------------- /eci-spark/pics/79efc84b99359e069b9e3e9d42e2dc8d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/79efc84b99359e069b9e3e9d42e2dc8d.png -------------------------------------------------------------------------------- /eci-spark/pics/7c5a3b2d598506b828c1d4707a08b4c8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/7c5a3b2d598506b828c1d4707a08b4c8.png -------------------------------------------------------------------------------- /eci-spark/pics/a4235604c1c2c5cc9a19f089d73f426d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/a4235604c1c2c5cc9a19f089d73f426d.png -------------------------------------------------------------------------------- /eci-spark/pics/b645a2d7a0a5b7fe918cb24d9b22d592.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/b645a2d7a0a5b7fe918cb24d9b22d592.png -------------------------------------------------------------------------------- /eci-spark/pics/f22c2388786e00b104d677b545c69bc9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/f22c2388786e00b104d677b545c69bc9.png -------------------------------------------------------------------------------- /eci-spark/pics/1574232926380-2e1dba72-2c79-4018-835c-4451f8e19feb.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574232926380-2e1dba72-2c79-4018-835c-4451f8e19feb.png -------------------------------------------------------------------------------- /eci-spark/pics/1574232962771-ef09e6c7-c5f1-4dcf-97fc-39494bc7f14f.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574232962771-ef09e6c7-c5f1-4dcf-97fc-39494bc7f14f.png -------------------------------------------------------------------------------- /eci-spark/pics/1574233020714-fbe3f048-91c8-451f-87af-5b38f99a23c5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233020714-fbe3f048-91c8-451f-87af-5b38f99a23c5.png -------------------------------------------------------------------------------- /eci-spark/pics/1574233108538-d986c0b4-6846-49ac-9bb4-82d2044cc855.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233108538-d986c0b4-6846-49ac-9bb4-82d2044cc855.png -------------------------------------------------------------------------------- /eci-spark/pics/1574233483142-9359d5e3-81c9-4154-8242-ed3a37a4e37b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233483142-9359d5e3-81c9-4154-8242-ed3a37a4e37b.png -------------------------------------------------------------------------------- /eci-spark/pics/1574233499811-efed418f-649b-45f0-b035-cdb09a15fa3d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233499811-efed418f-649b-45f0-b035-cdb09a15fa3d.png -------------------------------------------------------------------------------- /eci-spark/pics/1574233531691-89664643-2afe-40fe-8ac0-462a8dba1910.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233531691-89664643-2afe-40fe-8ac0-462a8dba1910.png -------------------------------------------------------------------------------- /eci-spark/pics/1574233651888-ade8ea24-4e36-4189-817d-26572230970a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233651888-ade8ea24-4e36-4189-817d-26572230970a.png -------------------------------------------------------------------------------- /eci-spark/pics/1574233670142-d818a7c8-2edf-4d4f-ac68-816d18eb1b55.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233670142-d818a7c8-2edf-4d4f-ac68-816d18eb1b55.png -------------------------------------------------------------------------------- /eci-spark/pics/1574233691702-743d4526-f45a-4f92-8b06-397d7086d5fc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233691702-743d4526-f45a-4f92-8b06-397d7086d5fc.png -------------------------------------------------------------------------------- /eci-spark/pics/1574233705801-03b05378-4723-4584-ae8f-1d62beb971cd.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574233705801-03b05378-4723-4584-ae8f-1d62beb971cd.png -------------------------------------------------------------------------------- /eci-spark/pics/1574411721752-3d77457d-4aa1-4008-938f-eb291bf16ce6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574411721752-3d77457d-4aa1-4008-938f-eb291bf16ce6.png -------------------------------------------------------------------------------- /eci-spark/pics/1574432924198-263f5929-062d-4cfe-886a-7979dde56d21.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aliyuneci/BestPractice-Serverless-Kubernetes/HEAD/eci-spark/pics/1574432924198-263f5929-062d-4cfe-886a-7979dde56d21.png -------------------------------------------------------------------------------- /eci-gpu-tensorflow/imagecache.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: eci.alibabacloud.com/v1 2 | kind: ImageCache 3 | metadata: 4 | name: tensorflow 5 | spec: 6 | images: 7 | - registry-vpc.cn-zhangjiakou.aliyuncs.com/eci/tensorflow:1.0 # 训练任务的镜像,建议放到阿里云vpc私网仓库 -------------------------------------------------------------------------------- /eci-gitlab-runner/mvn-pvc.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: PersistentVolumeClaim 3 | metadata: 4 | name: gitlab-runner-maven-pvc 5 | spec: 6 | accessModes: 7 | - ReadWriteOnce 8 | resources: 9 | requests: 10 | storage: 100Gi 11 | volumeName: gitlab-runner-maven-pv 12 | -------------------------------------------------------------------------------- /eci-gitlab-runner/nas-pvc.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: PersistentVolumeClaim 3 | metadata: 4 | name: gitlab-runner-cache-pvc 5 | spec: 6 | accessModes: 7 | - ReadWriteOnce 8 | resources: 9 | requests: 10 | storage: 100Gi 11 | volumeName: gitlab-runner-cache-pv 12 | -------------------------------------------------------------------------------- /eci-gitlab-runner/java-demo/src/main/webapp/index.jsp: -------------------------------------------------------------------------------- 1 | <%@ page import="java.net.InetAddress" %> 2 | <%@ page contentType="text/html;charset=UTF-8" language="java" %> 3 | 4 | 5 | Aliyun Container Service 6 | 7 | 8 |

Hello Gitlab

9 | 10 | 11 | -------------------------------------------------------------------------------- /eci-gitlab-runner/secret.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Secret 3 | metadata: 4 | name: gitlab-runner-secret 5 | type: kubernetes.io/tls 6 | data: 7 | ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FUR*********** 8 | tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS********* 9 | tls.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSB******** 10 | -------------------------------------------------------------------------------- /eci-gitlab-runner/java-demo/src/main/webapp/WEB-INF/web.xml: -------------------------------------------------------------------------------- 1 | 2 | 6 | 7 | -------------------------------------------------------------------------------- /eci-wordpress/create.json: -------------------------------------------------------------------------------- 1 | { 2 | "cluster_type":"ManagedKubernetes", 3 | "profile":"Serverless", 4 | "name":"wordpress-demo", 5 | "region_id": "cn-hangzhou", 6 | "endpoint_public_access": true, 7 | "snat_entry":true, 8 | "addons":[ 9 | { 10 | "name": "csi-provisioner", 11 | "config": "" 12 | } 13 | ], 14 | "zoneid": "cn-hangzhou-j" 15 | } -------------------------------------------------------------------------------- /eci-gitlab-runner/imagecache.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: eci.alibabacloud.com/v1 2 | kind: ImageCache 3 | metadata: 4 | name: gitlab-runner 5 | spec: 6 | images: 7 | - gitlab/gitlab-runner-helper:x86_64-latest 8 | - gitlab/gitlab-runner:latest 9 | - registry.cn-hangzhou.aliyuncs.com/eci/kaniko:1.0 10 | - registry.cn-hangzhou.aliyuncs.com/eci/kubectl:1.0 11 | - registry.cn-hangzhou.aliyuncs.com/eci/java-demo:1.0 12 | -------------------------------------------------------------------------------- /eci-gitlab-runner/mvn-pv.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: PersistentVolume 3 | metadata: 4 | name: gitlab-runner-maven-pv 5 | spec: 6 | accessModes: 7 | - ReadWriteOnce 8 | capacity: 9 | storage: 100Gi 10 | mountOptions: 11 | - nolock,noresvport,noacl,hard 12 | - vers=3 13 | - rsize=1048576 14 | - wsize=1048576 15 | - proto=tcp 16 | - timeo=600 17 | - retrans=2 18 | nfs: 19 | path: /share 20 | server: 0079f226-s3vx.cn-hangzhou.extreme.nas.aliyuncs.com 21 | -------------------------------------------------------------------------------- /eci-gitlab-runner/nas-pv.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: PersistentVolume 3 | metadata: 4 | name: gitlab-runner-cache-pv 5 | spec: 6 | accessModes: 7 | - ReadWriteOnce 8 | capacity: 9 | storage: 100Gi 10 | mountOptions: 11 | - nolock,noresvport,noacl,hard 12 | - vers=3 13 | - rsize=1048576 14 | - wsize=1048576 15 | - proto=tcp 16 | - timeo=600 17 | - retrans=2 18 | nfs: 19 | path: /share 20 | server: 0024d203-lbsy.cn-hangzhou.extreme.nas.aliyuncs.com 21 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # maven ignore 2 | target/ 3 | *.jar 4 | !.mvn/wrapper/* 5 | *.war 6 | *.zip 7 | *.tar 8 | *.tar.gz 9 | .flattened-pom.xml 10 | 11 | # eclipse ignore 12 | .settings/ 13 | .project 14 | .classpath 15 | 16 | # idea ignore 17 | .idea/ 18 | *.ipr 19 | *.iml 20 | *.iws 21 | 22 | # temp ignore 23 | *.log 24 | *.cache 25 | *.diff 26 | *.patch 27 | *.tmp 28 | 29 | # system ignore 30 | .DS_Store 31 | Thumbs.db 32 | *.orig 33 | 34 | # license check result 35 | license-list 36 | 37 | # grpc compiler 38 | compiler/gradle.properties 39 | compiler/build/* 40 | compiler/.gradle/* 41 | 42 | -------------------------------------------------------------------------------- /eci-gitlab-runner/java-demo/.gitignore: -------------------------------------------------------------------------------- 1 | # maven ignore 2 | target/ 3 | *.jar 4 | !.mvn/wrapper/* 5 | *.war 6 | *.zip 7 | *.tar 8 | *.tar.gz 9 | .flattened-pom.xml 10 | 11 | # eclipse ignore 12 | .settings/ 13 | .project 14 | .classpath 15 | 16 | # idea ignore 17 | .idea/ 18 | *.ipr 19 | *.iml 20 | *.iws 21 | 22 | # temp ignore 23 | *.log 24 | *.cache 25 | *.diff 26 | *.patch 27 | *.tmp 28 | 29 | # system ignore 30 | .DS_Store 31 | Thumbs.db 32 | *.orig 33 | 34 | # license check result 35 | license-list 36 | 37 | # grpc compiler 38 | compiler/gradle.properties 39 | compiler/build/* 40 | compiler/.gradle/* 41 | -------------------------------------------------------------------------------- /eci-gitlab-runner/java-demo/deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: java-demo 5 | spec: 6 | replicas: 2 7 | selector: 8 | matchLabels: 9 | app: java-demo 10 | template: 11 | metadata: 12 | labels: 13 | app: java-demo 14 | annotations: 15 | k8s.aliyun.com/eci-image-cache: "true" 16 | spec: 17 | containers: 18 | - name: java-demo 19 | image: registry.cn-hangzhou.aliyuncs.com/eci/java-demo:IMAGE_TAG 20 | imagePullPolicy: Always 21 | ports: 22 | - containerPort: 8080 23 | 24 | --- 25 | apiVersion: v1 26 | kind: Service 27 | metadata: 28 | name: java-demo 29 | spec: 30 | ports: 31 | - port: 80 32 | targetPort: 8080 33 | name: java-demo 34 | selector: 35 | app: java-demo 36 | type: LoadBalancer 37 | -------------------------------------------------------------------------------- /eci-gpu-tensorflow/gpu_pod.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Pod 3 | metadata: 4 | name: tensorflow 5 | annotations: 6 | k8s.aliyun.com/eci-gpu-type: "P4" # GPU规格,或者你可以指定指定和ECS一样的instanceType创建 如:k8s.aliyun.com/eci-instance-type: "ecs.gn5i-c2g1.large" 7 | k8s.aliyun.com/eci-image-cache: "true" # 开启镜像缓存自动匹配 8 | spec: 9 | containers: 10 | - image: registry-vpc.cn-zhangjiakou.aliyuncs.com/eci/tensorflow:1.0 # 训练任务的镜像 11 | name: tensorflow 12 | command: 13 | - "sh" 14 | - "-c" 15 | - "python models/tutorials/image/imagenet/classify_image.py" # 触发训练任务的脚本 16 | resources: 17 | limits: 18 | nvidia.com/gpu: "1" # 容器所需的GPU个数 19 | volumeMounts: 20 | - name: nfs-pv 21 | mountPath: /tmp/imagenet 22 | volumes: 23 | - name: nfs-pv # 训练结果持久化到NAS文件存储 24 | nfs: 25 | path: /share 26 | server: 0912430d-1nsl.cn-zhangjiakou.extreme.nas.aliyuncs.com 27 | restartPolicy: OnFailure -------------------------------------------------------------------------------- /eci-gitlab-runner/java-demo/pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4.0.0 4 | com.aliyun 5 | jenkins-demo-web 6 | 1.0.0-SNAPSHOT 7 | war 8 | 9 | 10 | 11 | demo 12 | 13 | 14 | org.apache.maven.plugins 15 | maven-compiler-plugin 16 | 3.1 17 | 18 | 1.8 19 | 1.8 20 | UTF8 21 | 22 | 23 | 24 | 25 | 26 | -------------------------------------------------------------------------------- /eci-gitlab-runner/java-demo/.gitlab-ci.yml: -------------------------------------------------------------------------------- 1 | cache: 2 | paths: 3 | - /cache 4 | stages: 5 | - package 6 | - build 7 | - deploy 8 | mvn_package_job: 9 | image: registry.cn-hangzhou.aliyuncs.com/eci/kaniko:1.0 10 | stage: package 11 | tags: 12 | - test 13 | script: 14 | - mvn clean package -DskipTests 15 | - cp -f target/demo.war /cache 16 | build_and_publish_docker_image_job: 17 | image: registry.cn-hangzhou.aliyuncs.com/eci/kaniko:1.0 18 | stage: build 19 | tags: 20 | - test 21 | script: 22 | - mkdir target 23 | - cp /cache/demo.war target/demo.war 24 | - echo $CI_PIPELINE_ID 25 | - kaniko -f `pwd`/Dockerfile -c `pwd` --destination=registry.cn-hangzhou.aliyuncs.com/eci/java-demo:$CI_PIPELINE_ID 26 | deploy_k8s_job: 27 | image: registry.cn-hangzhou.aliyuncs.com/eci/kubectl:1.0 28 | stage: deploy 29 | tags: 30 | - test 31 | script: 32 | - mkdir -p ~/.kube 33 | - echo $kube_config |base64 -d > ~/.kube/config 34 | - sed -i "s/IMAGE_TAG/$CI_PIPELINE_ID/g" deployment.yaml 35 | - cat deployment.yaml 36 | - kubectl apply -f deployment.yaml 37 | -------------------------------------------------------------------------------- /eci-gitlab-runner/gitlab-runner-deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: gitlab-runner 5 | spec: 6 | selector: 7 | matchLabels: 8 | app: gitlab-runner 9 | template: 10 | metadata: 11 | labels: 12 | app: gitlab-runner 13 | annotations: 14 | k8s.aliyun.com/eci-image-cache: "true" 15 | spec: 16 | containers: 17 | - image: gitlab/gitlab-runner:latest 18 | imagePullPolicy: IfNotPresent 19 | name: gitlab-runner 20 | volumeMounts: 21 | - mountPath: /etc/gitlab-runner 22 | name: config 23 | volumes: 24 | - name: config 25 | projected: 26 | defaultMode: 420 27 | sources: 28 | - secret: 29 | items: 30 | - key: ca.crt 31 | path: ca.crt 32 | - key: tls.crt 33 | path: tls.crt 34 | - key: tls.key 35 | path: tls.key 36 | name: gitlab-runner-secret 37 | - configMap: 38 | items: 39 | - key: config.toml 40 | path: config.toml 41 | name: gitlab-runner-config 42 | -------------------------------------------------------------------------------- /eci-spark/wordcount-operator-example.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: "sparkoperator.k8s.io/v1beta2" 2 | kind: SparkApplication 3 | metadata: 4 | name: wordcount 5 | namespace: default 6 | spec: 7 | type: Java 8 | mode: cluster 9 | image: "registry.cn-beijing.aliyuncs.com/liumi/spark:2.4.4-example" 10 | imagePullPolicy: IfNotPresent 11 | mainClass: com.aliyun.liumi.spark.example.WordCount 12 | mainApplicationFile: "local:///opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar" 13 | sparkVersion: "2.4.4" 14 | restartPolicy: 15 | type: OnFailure 16 | onFailureRetries: 2 17 | onFailureRetryInterval: 5 18 | onSubmissionFailureRetries: 2 19 | onSubmissionFailureRetryInterval: 10 20 | timeToLiveSeconds: 36000 21 | sparkConf: 22 | "spark.kubernetes.allocation.batch.size": "10" 23 | driver: 24 | cores: 2 25 | memory: "4096m" 26 | labels: 27 | version: 2.4.4 28 | spark-app: spark-wordcount 29 | role: driver 30 | annotations: 31 | k8s.aliyun.com/eci-image-cache: "true" 32 | serviceAccount: spark 33 | executor: 34 | cores: 1 35 | instances: 100 36 | memory: "1024m" 37 | labels: 38 | version: 2.4.4 39 | role: executor 40 | annotations: 41 | k8s.aliyun.com/eci-image-cache: "true" -------------------------------------------------------------------------------- /eci-gitlab-runner/imagecache-crd.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apiextensions.k8s.io/v1beta1 2 | kind: CustomResourceDefinition 3 | metadata: 4 | name: imagecaches.eci.alibabacloud.com 5 | spec: 6 | group: eci.alibabacloud.com 7 | version: v1 8 | names: 9 | kind: ImageCache 10 | plural: imagecaches 11 | shortNames: 12 | - ic 13 | categories: 14 | - all 15 | scope: Cluster 16 | subresources: 17 | status: {} 18 | validation: 19 | openAPIV3Schema: 20 | required: 21 | - spec 22 | properties: 23 | spec: 24 | type: object 25 | required: 26 | - images 27 | properties: 28 | imagePullSecrets: 29 | type: array 30 | items: 31 | type: string 32 | images: 33 | minItems: 1 34 | type: array 35 | items: 36 | type: string 37 | imageCacheSize: 38 | type: integer 39 | additionalPrinterColumns: 40 | - name: Age 41 | type: date 42 | JSONPath: .metadata.creationTimestamp 43 | - name: CacheId 44 | type: string 45 | JSONPath: .status.imageCacheId 46 | - name: Phase 47 | type: string 48 | JSONPath: .status.phase 49 | - name: Progress 50 | type: string 51 | JSONPath: .status.progress 52 | -------------------------------------------------------------------------------- /eci-spark/wordcount-operator-example-ack.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: "sparkoperator.k8s.io/v1beta2" 2 | kind: SparkApplication 3 | metadata: 4 | name: wordcount 5 | namespace: default 6 | spec: 7 | type: Java 8 | mode: cluster 9 | image: "registry.cn-beijing.aliyuncs.com/liumi/spark:2.4.4-example" 10 | imagePullPolicy: IfNotPresent 11 | mainClass: com.aliyun.liumi.spark.example.WordCount 12 | mainApplicationFile: "local:///opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar" 13 | sparkVersion: "2.4.4" 14 | restartPolicy: 15 | type: OnFailure 16 | onFailureRetries: 2 17 | onFailureRetryInterval: 5 18 | onSubmissionFailureRetries: 2 19 | onSubmissionFailureRetryInterval: 10 20 | timeToLiveSeconds: 36000 21 | sparkConf: 22 | "spark.kubernetes.allocation.batch.size": "10" 23 | driver: 24 | cores: 2 25 | memory: "4096m" 26 | labels: 27 | version: 2.4.4 28 | spark-app: spark-wordcount 29 | role: driver 30 | annotations: 31 | k8s.aliyun.com/eci-image-cache: "true" 32 | serviceAccount: spark 33 | executor: 34 | cores: 1 35 | instances: 100 36 | memory: "1024m" 37 | labels: 38 | version: 2.4.4 39 | role: executor 40 | annotations: 41 | k8s.aliyun.com/eci-image-cache: "true" 42 | #nodeName: virtual-kubelet 43 | nodeSelector: 44 | type: virtual-kubelet 45 | tolerations: 46 | - key: virtual-kubelet.io/provider 47 | operator: Exists -------------------------------------------------------------------------------- /eci-gitlab-runner/README.md: -------------------------------------------------------------------------------- 1 | ### 1. 准备ASK集群 2 | https://cs.console.aliyun.com/?spm=5176.eciconsole.0.0.68254a9cNv12zh#/k8s/cluster/createV2/serverless 3 | 容器服务控制台创建标准serverless k8s集群 4 | 5 | 6 | ### 2. 准备PV/PVC 7 | 准备两个nas盘,一个做gitlab runner cache,一个做maven仓库,请自行替换nas server地址和path 8 | 9 | ``` shell 10 | kubectl apply -f mvn-pv.yaml 11 | kubectl apply -f mvn-pvc.yaml 12 | kubectl apply -f nas-pv.yaml 13 | kubectl apply -f nas-pvc.yaml 14 | ``` 15 | 16 | ### 3. 准备Secret 17 | * kubeconfig里的证书公私钥拷贝到secret中,secret.yaml 18 | ``` shell 19 | kubectl apply -f secret.yaml 20 | ``` 21 | 22 | * docker-registry的认证信息,ECI支持免密拉取,但是push docker image 还是要用到 23 | ``` shell 24 | kubectl create secret docker-registry registry-auth-secret --docker-server=registry.cn-hangzhou.aliyuncs.com --docker-username=${xxx} --docker-password=${xxx} 25 | ``` 26 | 27 | 查看生成的secret可以用以下命令 28 | ``` shell 29 | kubectl get secret registry-auth-secret --output=yaml 30 | ``` 31 | 32 | ### 4. 准备ConfigMap 33 | 把gitlab runner 的url、token,ask集群的api server地址拷贝到config.yaml 34 | ``` shell 35 | kubectl apply -f config-map.yaml 36 | ``` 37 | 38 | ### 5. 准备imageCache(可选,节省镜像拉取时间) 39 | 目前ASK默认安装了 imagecache-crd,可以用以下命令查询,如果没有可以自己安装 40 | ``` shell 41 | # 查看image cache crd 是否安转 42 | kubectl get crd 43 | # 安装image cache crd 44 | kubectl apply -f imagecache-crd.yaml 45 | # 制作imagecache 46 | kubectl apply -f imagecache.yaml 47 | ``` 48 | 49 | ### 6. 部署gitlab runner 50 | ``` shell 51 | kubectl apply -f gitlab-runner-deployment.yaml 52 | ``` 53 | 54 | ### 7. 导入git repo,java demo见 java-demo 目录 55 | -------------------------------------------------------------------------------- /eci-wordpress/wordpress-all-in-one-pod.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: storage.k8s.io/v1 2 | kind: StorageClass 3 | metadata: 4 | name: alicloud-disk 5 | provisioner: diskplugin.csi.alibabacloud.com 6 | parameters: 7 | type: cloud_auto,cloud_essd,cloud_ssd # 使用该配置,按优先级自适应选择云盘类型,最终创建的云盘类型受节点实例、所在可用区云盘支持情况等因素影响。 8 | fstype: ext4 9 | volumeExpandAutoSnapshot: "forced" # 该设置仅在创建的云盘类型为cloud_essd时生效。 10 | volumeBindingMode: WaitForFirstConsumer 11 | reclaimPolicy: Delete 12 | allowVolumeExpansion: true 13 | --- 14 | apiVersion: v1 15 | kind: PersistentVolumeClaim 16 | metadata: 17 | name: wordpress-pvc 18 | spec: 19 | accessModes: 20 | - ReadWriteOnce 21 | volumeMode: Filesystem 22 | resources: 23 | requests: 24 | storage: 20Gi 25 | storageClassName: alicloud-disk 26 | --- 27 | apiVersion: v1 28 | kind: Pod 29 | metadata: 30 | name: wordpress 31 | annotations: 32 | "k8s.aliyun.com/eci-with-eip": "true" #自动挂载EIP 33 | spec: 34 | containers: 35 | - image: mysql:5.6 36 | name: mysql 37 | env: 38 | - name: MYSQL_ROOT_PASSWORD 39 | value: "123456" 40 | livenessProbe: 41 | tcpSocket: 42 | port: 3306 43 | ports: 44 | - containerPort: 3306 45 | name: mysql 46 | - image: wordpress:4.8-apache 47 | name: wordpress 48 | env: 49 | - name: WORDPRESS_DB_HOST 50 | value: 127.0.0.1 51 | - name: WORDPRESS_DB_PASSWORD 52 | value: "123456" 53 | ports: 54 | - containerPort: 80 55 | name: wordpress 56 | volumeMounts: 57 | - name: wordpress-persistent-storage 58 | mountPath: /var/www/html 59 | volumes: 60 | - name: wordpress-persistent-storage 61 | persistentVolumeClaim: 62 | claimName: wordpress-pvc -------------------------------------------------------------------------------- /eci-gitlab-runner/config-map.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ConfigMap 3 | metadata: 4 | name: gitlab-runner-config 5 | data: 6 | config.toml: | 7 | concurrent = 2 8 | check_interval = 0 9 | [[runners]] 10 | name = "gitlab-runner" 11 | url = "https://gitlab.com/" 12 | token = $token 13 | executor = "kubernetes" 14 | output_limit = 51200 15 | [runners.kubernetes] 16 | host = "https://xxx.xxx.xxx.xxx:6443" 17 | cert_file = "/etc/gitlab-runner/tls.crt" 18 | key_file = "/etc/gitlab-runner/tls.key" 19 | ca_file = "/etc/gitlab-runner/ca.crt" 20 | namespace = "default" 21 | pull_policy = "if-not-present" 22 | cpu_limit = "0.5" 23 | cpu_request = "0.5" 24 | memory_limit = "1Gi" 25 | memory_request = "1Gi" 26 | helper_cpu_limit = "0.5" 27 | helper_cpu_request = "0.5" 28 | helper_memory_limit = "1Gi" 29 | helper_memory_request = "1Gi" 30 | helper_image = "gitlab/gitlab-runner-helper:x86_64-latest" 31 | [runners.kubernetes.pod_annotations] 32 | "k8s.aliyun.com/eci-image-cache" = "true" 33 | [runners.kubernetes.volumes] 34 | [[runners.kubernetes.volumes.pvc]] 35 | name = "gitlab-runner-cache-pvc" 36 | mount_path = "/cache" 37 | readonly = false 38 | [[runners.kubernetes.volumes.pvc]] 39 | name = "gitlab-runner-maven-pvc" 40 | mount_path = "/root/.m2" 41 | readonly = false 42 | [[runners.kubernetes.volumes.secret]] 43 | name = "registry-auth-secret" 44 | mount_path = "/root/.docker" 45 | read_only = false 46 | [runners.kubernetes.volumes.secret.items] 47 | ".dockerconfigjson" = "config.json" 48 | -------------------------------------------------------------------------------- /eci-wordpress/README.md: -------------------------------------------------------------------------------- 1 | # 使用 ECI + ACK Serverless 一分钟体验 WordPress 2 | 3 | 进入WordPress目录 4 | ```bash 5 | cd eci-wordpress 6 | ``` 7 | 8 | ## 创建 Serverless Kubernetes 集群 9 | 10 | 您可以使用 Aliyun CLI 命令方便的创建集群 11 | 12 | ```bash 13 | aliyun cs POST /clusters --header "Content-Type=application/json" --body "$(cat create.json)" 14 | ``` 15 | 16 | 其中 `create.json` 文件保存有创建 Serverless Kubernetes 集群的参数,您可以自定义来配置自己的集群。 17 | 18 | - cluster_type:集群类型,Serverless Kubernetes 集群类型为 "ManagedKubernetes" 19 | - profile:集群标识,参数cluster_type取值为ManagedKubernetes,同时该参数配置为Serverless,表示创建ACK Serverless集群。 20 | - name:集群名称 21 | - region_id:集群所在地域ID 22 | - endpoint_public_access:是否开启公网API Server 23 | - snat_entry:是否在VPC中创建NAT网关并配置SNAT规则 24 | - addons:Kubernetes集群安装的组件列表 25 | - zoneid:集群所属地域的可用区ID,如果不指定vpcid和vswitch_ids的情况下,必须指定zoneid。 26 | 27 | 创建成功后,您可以在控制台中看到执行完的输出,如下所示: 28 | 29 | ```json 30 | { 31 | "cluster_id": "c486508d6416045a9a434b0******", 32 | "instanceId": "c486508d6416045a9a434b0******", 33 | "request_id": "075417EF-8F86-51E6-******", 34 | "task_id": "T-6524fe49265d5c06******" 35 | } 36 | ``` 37 | 38 | 其中 `cluster_id` 为您创建的集群的唯一 id。 39 | 40 | 您现在可以登录[容器服务控制台](https://cs.console.aliyun.com)查看通过 Aliyun CLI 创建的 Serverless Kubernetes 集群。 41 | 42 | ## 安装 WordPress 43 | 44 | **注意:请确保上一步中创建的 Serverless Kubernetes 集群,已完成初始化(一般需要3~5分钟),再开始以下的操作** 45 | 46 | 使用 Cloud Shell 来管理上一步中创建中的 Serverless Kubernetes 集群 47 | ```bash 48 | source use-k8s-cluster ${集群ID} 49 | ``` 50 | 51 | 执行 WordPress 安装 yaml 文件 52 | ```bash 53 | kubectl apply -f wordpress-all-in-one-pod.yaml 54 | ``` 55 | 56 | 观察安装进度,直到 STATUS 为 Running 57 | ```bash 58 | kubectl get pods 59 | ``` 60 | 61 | 查询 EIP 地址 62 | ```bash 63 | kubectl get -o json pod wordpress |grep "k8s.aliyun.com/allocated-eipAddress" 64 | ``` 65 | 预期返回 66 | ```shell 67 | "k8s.aliyun.com/allocated-eipAddress": "39.105.XX.XX" 68 | ``` 69 | 70 | 由于安全组默认没有开放 80 端口的访问,需要给安全组添加 80 端口的 ACL 71 | 72 | 首先获取安全组ID 73 | 74 | ```bash 75 | kubectl get -o json pod wordpress |grep "k8s.aliyun.com/eci-security-group" 76 | ``` 77 | 78 | 预期返回 79 | ```shell 80 | "k8s.aliyun.com/eci-security-group": "sg-2zef08a606ey91******" 81 | ``` 82 | 83 | 对安全组进行授权操作 84 | ```bash 85 | aliyun ecs AuthorizeSecurityGroup --RegionId ${Region ID} --SecurityGroupId ${安全组ID} --IpProtocol tcp --PortRange 80/80 --SourceCidrIp 0.0.0.0/0 --Priority 100 86 | ``` 87 | 请根据实际替换上述命令的Region ID和安全组ID。命令示例如下: 88 | ```bash 89 | aliyun ecs AuthorizeSecurityGroup --RegionId cn-hangzhou --SecurityGroupId sg-2zef08a606ey91****** --IpProtocol tcp --PortRange 80/80 --SourceCidrIp 0.0.0.0/0 --Priority 100 90 | ``` 91 | 92 | ## 使用 WordPress 93 | 94 | 在浏览器起输入上一步获取到的 EIP 地址,即可开始使用 WordPress -------------------------------------------------------------------------------- /eci-spark/wordcount-spark-driver-svc.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: ServiceAccount 3 | metadata: 4 | name: spark-serverless 5 | namespace: default 6 | 7 | --- 8 | apiVersion: rbac.authorization.k8s.io/v1 9 | kind: ClusterRoleBinding 10 | metadata: 11 | name: spark-serverless-role 12 | roleRef: 13 | apiGroup: rbac.authorization.k8s.io 14 | kind: ClusterRole 15 | name: edit 16 | subjects: 17 | - kind: ServiceAccount 18 | name: spark-serverless 19 | namespace: default 20 | 21 | --- 22 | apiVersion: v1 23 | kind: Service 24 | metadata: 25 | name: wordcount-spark-driver-svc 26 | namespace: default 27 | annotations: 28 | service.beta.kubernetes.io/alibaba-cloud-private-zone-enable: "true" 29 | spec: 30 | clusterIP: None 31 | ports: 32 | - name: driver-rpc-port 33 | port: 7078 34 | protocol: TCP 35 | targetPort: 7078 36 | - name: blockmanager 37 | port: 7079 38 | protocol: TCP 39 | targetPort: 7079 40 | selector: 41 | spark-app-selector: spark-9b7952456a86413b94c70fe2b3f8496c 42 | spark-role: driver 43 | sessionAffinity: None 44 | type: ClusterIP 45 | 46 | --- 47 | apiVersion: v1 48 | kind: Pod 49 | metadata: 50 | annotations: 51 | spark-app-name: WordCount 52 | k8s.aliyun.com/eci-image-cache: "true" 53 | labels: 54 | spark-app-selector: spark-9b7952456a86413b94c70fe2b3f8496c 55 | spark-role: driver 56 | name: wordcount-spark-driver 57 | namespace: default 58 | spec: 59 | containers: 60 | - args: 61 | - driver 62 | env: 63 | - name: SPARK_DRIVER_MEMORY 64 | value: 1g 65 | - name: SPARK_DRIVER_CLASS 66 | value: com.aliyun.liumi.spark.example.WordCount 67 | - name: SPARK_DRIVER_ARGS 68 | - name: SPARK_DRIVER_BIND_ADDRESS 69 | valueFrom: 70 | fieldRef: 71 | apiVersion: v1 72 | fieldPath: status.podIP 73 | - name: SPARK_MOUNTED_CLASSPATH 74 | value: >- 75 | /opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar:/opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar 76 | - name: SPARK_JAVA_OPT_0 77 | value: '-Dspark.submit.deployMode=cluster' 78 | - name: SPARK_JAVA_OPT_1 79 | value: '-Dspark.driver.blockManager.port=7079' 80 | - name: SPARK_JAVA_OPT_2 81 | value: '-Dspark.master=k8s://https://47.99.132.xxx:6443' 82 | - name: SPARK_JAVA_OPT_3 83 | value: '-Dspark.app.id=spark-9b7952456a86413b94c70fe2b3f8496c' 84 | - name: SPARK_JAVA_OPT_4 85 | value: '-Dspark.kubernetes.authenticate.driver.serviceAccountName=spark' 86 | - name: SPARK_JAVA_OPT_5 87 | value: >- 88 | -Dspark.kubernetes.driver.pod.name=wordcount-spark-driver 89 | - name: SPARK_JAVA_OPT_6 90 | value: '-Dspark.app.name=WordCount' 91 | - name: SPARK_JAVA_OPT_7 92 | value: >- 93 | -Dspark.kubernetes.container.image=registry.cn-beijing.aliyuncs.com/liumi/spark:2.3.0-hdfs-1.0 94 | - name: SPARK_JAVA_OPT_8 95 | value: '-Dspark.executor.instances=10' 96 | - name: SPARK_JAVA_OPT_9 97 | value: >- 98 | -Dspark.jars=/opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar,/opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar 99 | - name: SPARK_JAVA_OPT_10 100 | value: >- 101 | -Dspark.driver.host=wordcount-spark-driver-svc.default.svc.cluster.local.c132a4a4826814d579c14bf2c5cf933af 102 | - name: SPARK_JAVA_OPT_11 103 | value: >- 104 | -Dspark.kubernetes.executor.podNamePrefix=wordcount-spark 105 | - name: SPARK_JAVA_OPT_12 106 | value: '-Dspark.driver.port=7078' 107 | - name: SPARK_JAVA_OPT_13 108 | value: >- 109 | -Dspark.kubernetes.executor.annotation.k8s.aliyun.com/eci-image-cache=true 110 | - name: SPARK_JAVA_OPT_14 111 | value: >- 112 | -Dspark.kubernetes.allocation.batch.size=10 113 | 114 | image: 'registry.cn-beijing.aliyuncs.com/liumi/spark:2.3.0-hdfs-1.0' 115 | imagePullPolicy: IfNotPresent 116 | name: spark-kubernetes-driver 117 | resources: 118 | limits: 119 | memory: 16384Mi 120 | requests: 121 | cpu: '8' 122 | memory: 16Gi 123 | terminationMessagePath: /dev/termination-log 124 | terminationMessagePolicy: File 125 | 126 | dnsPolicy: None 127 | dnsConfig: 128 | nameservers: 129 | - 100.100.2.136 130 | - 100.100.2.138 131 | searches: 132 | - default.svc.cluster.local.c132a4a4826814d579c14bf2c5cf933af 133 | - svc.cluster.local.c132a4a4826814d579c14bf2c5cf933af 134 | - cluster.local.c132a4a4826814d579c14bf2c5cf933af 135 | - c132a4a4826814d579c14bf2c5cf933af 136 | options: 137 | - name: ndots 138 | value: "5" 139 | hostAliases: 140 | - ip: "47.99.132.xxx" 141 | hostnames: 142 | - "kubernetes.default.svc" 143 | priority: 0 144 | restartPolicy: Never 145 | serviceAccount: spark-serverless 146 | serviceAccountName: spark-serverless 147 | terminationGracePeriodSeconds: 30 148 | tolerations: 149 | - effect: NoExecute 150 | key: node.kubernetes.io/not-ready 151 | operator: Exists 152 | tolerationSeconds: 300 153 | - effect: NoExecute 154 | key: node.kubernetes.io/unreachable 155 | operator: Exists 156 | tolerationSeconds: 300 -------------------------------------------------------------------------------- /eci-spark/README.md: -------------------------------------------------------------------------------- 1 | ## 背景 2 | 3 | 自2003年Google的三大核心技术GFS(03)、MapReduce(04)、和BigTable(06)的论文陆续发表至今,以Hadoop为代表的大数据处理框架,开始登上历史的舞台,迎来了一个黄金时代。Apache Hadoop是其中最为成功的开源项目,让企业级的大数据处理能力变得唾手可得。围绕Hadoop的学术研究和工业界的探索在过去的十多年里一直保持着火热。 4 | 5 | 而在另一条时间线上,容器技术在Docker问世后,终于等来了快速发展的6年。与此同时,Kubernetes作为容器编排的开源系统,在过去几年经过一番混战,并借助CNCF社区的推动以及云原生的兴起,也很快成为了业界容器编排的事实标准。如今,几乎所有的云厂商都有一套围绕Kubernetes的容器生态,例如我们阿里云就有ACK、ASK(Serverless Kubernetes)、EDAS、以及ECI(阿里云弹性容器实例)。 6 | 7 | ![spark-1.png](https://github.com/aliyuneci/BestPractice-Serverless-Kubernetes/blob/master/eci-spark/pics/spark-1.png) 8 | 9 |
Data from Google Trends
10 | 11 | ASF (Apache Software Foundation) 和CNCF(Cloud Native Computing Foundation),两大相对独立的阵营悄然步入到了一个历史的拐点,我们都期待他们之间会碰撞出怎样的火花。显然,[Spark2.3.0]() 开始尝试原生支持on Kubernetes就是一个重要的时间节点。本文就是主要分享最近调研Spark on Kubernetes的一些总结。 12 | 13 |
14 | 15 |
16 | 17 | 18 | 19 | ## 从Hadoop说起 20 | 21 | Hadoop主要包含以下两个部分:Hadoop Distributed File System (HDFS) 和一个分布式计算引擎,该引擎就是Google的 MapReduce思想的一个实现 。Hadoop一度成为了大规模分布式数据存储和处理的标椎。 22 | 23 | ### Hadoop to Spark 24 | 25 | Hadoop在被业界广泛使用的同时,也一直存在很多的问题: 26 | 27 | 1、只支持Map和Reduce算子,复杂的算法、业务逻辑很难表达,最终只能将逻辑写入算子里面,除了代码不宜维护,还导致调度上没有任何优化空间,只能根据任务数单一纬度来调度。 28 | 29 | 2、计算的中间结果也要存入HDFS,不必要的IO开销。 30 | 31 | 3、 TaskTracker 将资源划分为map slot和reduce slot,不够灵活,当缺少某个stage的时候会严重降低资源利用率。 32 | 33 | 4、… 34 | 35 | 关于Hadoop的研究也基本是围绕资源调度、MapReduce计算模式、HDFS存储、以及通用性等方面的优化,Spark便是众多衍生系统中最成功的一个。甚至可以说是里程碑级别的,从此关于Hadoop的研究沉寂了很多。2009年由加州大学伯克利分校的AMPLab开发的Spark问世,便很快成为Apache的顶级开源项目。[Apache Spark](https://spark.apache.org/) 是一个基于内存计算、支持远比MapReduce复杂算子、涵盖批流等多种场景的大数据处理框架。 36 | 37 | ![spark-2.png](https://github.com/aliyuneci/BestPractice-Serverless-Kubernetes/blob/master/eci-spark/pics/spark-2.png) 38 | 39 |
Spark 模块关系图
40 | 41 | 梳理下Spark中一些主要的概念: 42 | 43 | - **Application**:Spark Application的概念和Hadoop中的 MapReduce类似,指的是用户编写的 Spark 应用程序,相比于Hadoop支持更丰富的算子,而且利用内建的各种库可以很方便开发机器学习、图计算等领域的应用。 44 | 45 | - **Job**:由大量的Task组成的并行计算作业,一个作业通常包含一批RDD及作用于相应RDD上的各种算子。 46 | 47 | - **Stage**:每个作业都会被拆分成很多组Task,每组Task即为一个TaskSet,也被称为Stage,一个作业分为多个Stage。 48 | 49 | - **Task**: 被指定到某个Executor上的执行的任务,Task可以理解为一段逻辑,等待被调度到Excutor的某个线程中执行。 50 | 51 | - **Operations**:即算子,分为1)Action,比如:reduce、collect、count等;2)Transformation,比如:map、join、reduceByKey等。Action会将整个作业切割成多个Stage。 52 | 53 | - **Executor**:Application运行在Worker节点上的一个进程,该进程负责运行Task,每个Application都有各自的一批Executor。Executor的数量可以静态设定好,也可以采用动态资源分配。 54 | 55 | - **Driver**:Spark中的Driver根据提交的Application创建SparkContext,即准备程序的运行环境。SparkContext负责和ClusterManager通信,进行资源的申请、任务的分配等;当所有Executor全部执行完毕后,Driver负责将SparkContext关闭。 56 | 57 | - **Worker**:集群中任何可以运行Application任务的节点。 58 | 59 | - **Cluster Manager**:集群中调度资源的服务。Standalone模式下为Master;Yarn模式下为Yarn中的ResourceManager。 60 | 61 | 62 | 63 | ### Hadoop to YARN 64 | 65 | 早期的Hadoop大规模集群也可以达到几千个节点,当数据处理需求不断增长的时候,粗暴的增加节点已经让原生调度系统非常吃力。Application管理和Resource管理的逻辑全部放在Hadoop的 JobTracker中,而 JobTracker又不具备横向扩展的能力,这让JobTracker不负重堪。需要一套方案能将Application管理和Resource管理职责分开,能将计算模式和 JobTracker解耦,YARN就是在这样的背景下诞生的。如今我们常听到的Hadoop其实已经是指Yarn了。 66 | 67 |
68 | 69 |
70 | 71 |
Yarn 在集群的角色
72 | 73 | 74 | 75 | ![spark-3.png](https://github.com/aliyuneci/BestPractice-Serverless-Kubernetes/blob/master/eci-spark/pics/spark-3.png) 76 | 77 |
Yarn 模块关系图
78 | 79 | Spark调度在最初设计的时候,就是开放式的,而且调度模块之间的关系跟YARN的概念非常吻合。 80 | 81 | Spark Master和ResourceManager对应,Spark Worker和NodeManager对应,Spark Driver和Application Master对应,Spark Executor和Container对应。每个Executor能并行运行Task的数量就取决于分配给它的Container的CPU核数。 82 | 83 | Client提交一个应用给 Yarn ResourceManager后, Application Manager接受请求并找到一个Container创建该应用对应的Application Master,Application Master会向ResourceManager注册自己,以便client访问。Application Master上运行的就是Spark Driver。Application Master申请 Container并启动,Spark Driver然后在Container里启动 Spark Executor,并调度Spark Task到Spark Executor上的线程执行。等到所有的Task执行完毕后,Application Master取消注册并释放资源。 84 | 85 | #### 带来的好处 86 | 87 | 1、YARN作为集群统一的资源调度和应用管理层,降低了资源管理的复杂性的同时,对所有应用类型都是开放的,即支持混部MapReduce、Spark等,能提高整个集群的资源利用率。 88 | 89 | 2、两级调度方式,大大降低了ResourceManager的压力,增加了集群的扩展能力。 90 | 91 | 3、计算模式和资源调度解耦。在调度层,屏蔽了MapReduce、Spark、Flink等框架的计算模式的差异,让这些框架都只用专注于计算性能的优化。 92 | 93 | 4、可以使用YARN的高级功能,比如:1)原生FIFO之外的调度策略: [CapacityScheduler](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) & [FairScheduler](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html);2)基于队列的资源隔离与分配等。 94 | 95 | 96 | 97 | ### YARN to Kubernetes 98 | 99 | Hadoop和Spark能成为现在使用最广泛的大数据处理框架,离不开Yarn的强大。虽然也有人诟病它的悲观锁导致并发粒度小、二层调度资源可见性等问题,但是除此之外,Yarn就本身来说并没有什么大的缺陷,依然是大数据领域的调度底座的首选。历史往往就是如此,霸主都不是被对手干到,而是被那些一开始看似其他领域的新兴力量淘汰。这就是如今谷歌主导的kubernetes生态发展到一定的程度之后,Yarn必然要去面对的挑战:如果未来,一家公司80%的业务都已经统一在Kubernetes上跑,它还会原意为剩下的20%的大数据的业务单独维护一个Yarn集群么? 100 | 101 | #### Kubernetes的优势 102 | 103 | Spark on kubernetes相比于on YARN等传统部署方式的优势: 104 | 105 | 1、统一的资源管理。不论是什么类型的作业都可以在一个统一kubernetes的集群运行。不再需要单独为大数据作业维护一个独立的YARN集群。 106 | 107 | 2、弹性的集群基础设施。资源层和应用层提供了丰富的弹性策略,我们可以根据应用负载需求选择 ECS 虚拟机、神龙裸金属和 GPU 实例进行扩容,除了kubernetes集群本生具备的强大的扩缩容能力,还可以对接生态,比如virtual kubelet。 108 | 109 | 3、轻松实现复杂的分布式应用的资源隔离和限制,从YRAN复杂的队列管理和队列分配中解脱。 110 | 111 | 4、容器化的优势。每个应用都可以通过docker镜像打包自己的依赖,运行在独立的环境,甚至包括Spark的版本,所有的应用之间都是隔离的。 112 | 113 | 5、大数据上云。目前大数据应用上云常见的方式有两种:1)用ECS自建YARN(不限于YARN)集群;2)购买EMR服务。如今多了一个选择——Kubernetes。既能获得完全的集群级别的掌控,又能从复杂的集群管理、运维中解脱,还能享受云所带来的弹性和成本优势。 114 | 115 | Spark自2.3.0开始试验性支持Standalone、on YARN以及on Mesos之外的新的部署方式:[Running Spark on Kubernetes]() ,并在后续的发行版中不断地加强。 116 | 117 | 118 | 119 | 后文将是实际的操作,分别让Spark应用在普通的Kubernetes集群、Serverless Kubernetes集群、以及Kubernetes + virtual kubelet等三种场景中部署并运行。 120 | 121 | 122 | 123 | ## Spark on Kubernetes 124 | 125 | ### 准备数据以及Spark应用镜像 126 | #### 参考: 127 | 128 | [在ECI中访问HDFS的数据](https://help.aliyun.com/document_detail/146235.html) 129 | 130 | [在ECI中访问OSS的数据](https://help.aliyun.com/document_detail/146237.html) 131 | 132 | 133 | 134 | ### 创建kubernetes集群 135 | 136 | 如果已经有阿里云的ACK集群,该步可以忽略。 137 | 138 | 具体的创建流程参考:[创建Kubernetes 托管版集群]()。 139 | 140 | 141 | 142 | ### 提交作业 143 | 144 | #### 为Spark创建一个RBAC的role 145 | 146 | 创建账号(默认namespace) 147 | 148 | ```bash 149 | kubectl create serviceaccount spark 150 | ``` 151 | 152 | 绑定角色 153 | 154 | ```bash 155 | kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default 156 | ``` 157 | 158 | #### 直接使用spark-submit提交(不推荐的提交方式) 159 | 160 | ```bash 161 | liumihustdeMacBook-Pro:spark-on-k8s liumihust$ ./spark-2.3.0-bin-hadoop2.6/bin/spark-submit 162 | --master k8s://121.199.47.XX:6443 163 | --deploy-mode cluster 164 | --name WordCount 165 | --class com.aliyun.liumi.spark.example.WordCount 166 | --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark 167 | --conf spark.executor.instances=2 168 | --conf spark.kubernetes.container.image=registry.cn-beijing.aliyuncs.com/liumi/spark:2.4.4-example 169 | local:///opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar 170 | ``` 171 | 172 | #### 参数解释 173 | 174 | --master :k8s集群的apiserver,这是决定spark是在k8s集群跑,还是在yarn上跑。 175 | 176 | --deploy-mode:driver可以部署在集群的master节点(client)也可以在非master(cluster)节点。 177 | 178 | spark.executor.instances: executor的数量 179 | 180 | spark.kubernetes.container.image spark打包镜像(包含driver、excutor、应用,也支持单独配置) 181 | 182 | 183 | 184 | #### 提交基本流程 185 | ![spark-10.png](http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/pic/146238/cn_zh/1575978528864/b645a2d7a0a5b7fe918cb24d9b22d592.png) 186 | 187 |
Running Spark on Kubernetes
188 | 189 | 1. Spark先在k8s集群中创建Spark Driver(pod)。 190 | 191 | 2. Driver起来后,调用k8s API创建Executors(pods),Executors才是执行作业的载体。 192 | 193 | 3. 作业计算结束,Executor Pods会被自动回收,Driver Pod处于Completed状态(终态)。可以供用户查看日志等。 194 | 195 | 4. Driver Pod只能被用户手动清理,或者被k8s GC回收。 196 | 197 | 198 | 199 | #### 结果分析 200 | 201 | 执行过程中的截图如下: 202 | ![spark-5.png](http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/pic/146238/cn_zh/1575978655905/7c5a3b2d598506b828c1d4707a08b4c8.png) 203 | 204 | 205 | 206 | 我们30G的数据用2个1C1G的Excutor处理了大约20分钟。 207 | 208 | 作业运行结束后查看结果: 209 | 210 | ```bash 211 | [root@liumi-hdfs ~]# $HADOOP_HOME/bin/hadoop fs -cat /pod/data/A-Game-of-Thrones-Result/* 212 | (142400000,the) 213 | (78400000,and) 214 | (77120000,) 215 | (62200000,to) 216 | (56690000,of) 217 | (56120000,a) 218 | (43540000,his) 219 | (35160000,was) 220 | (30480000,he) 221 | (29060000,in) 222 | (26640000,had) 223 | (26200000,her) 224 | (23050000,as) 225 | (22210000,with) 226 | (20450000,The) 227 | (19260000,you) 228 | (18300000,I) 229 | (17510000,she) 230 | (16960000,that) 231 | (16450000,He) 232 | (16090000,not) 233 | (15980000,it) 234 | (15080000,at) 235 | (14710000,for) 236 | (14410000,on) 237 | (12660000,but) 238 | (12470000,him) 239 | (12070000,is) 240 | (11240000,from) 241 | (10300000,my) 242 | (10280000,have) 243 | (10010000,were) 244 | ``` 245 | 246 | 至此,已经能在kubernetes集群部署并运行spark作业。 247 | 248 | 249 | 250 | ## Spark on Serverless Kubernetes 251 | 252 | Serverless Kubernetes (ASK) 相比于普通的kubernetes集群,比较大的一个优势是,提交作业前无需提前预留任何资源,无需关心集群的扩缩容,所有资源都是随作业提交自动开始申请,作业执行结束后自动释放。作业执行完后就只剩一个SparkApplication和终态的Driver pod(只保留管控数据)。原理图如下图所示: 253 | ![spark-7.png](http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/pic/146238/cn_zh/1575978741796/a4235604c1c2c5cc9a19f089d73f426d.png) 254 | 255 |
Running Spark on Serverless Kubernetes
256 | 257 | ASK通过virtual kubelet调度pod到阿里云弹性容器实例。虽然架构上跟ACK有明显的差异,但是两者都是全面兼容kubernetes标准的。所以on ASK跟前面的spark on kubernetes准备阶段的基本是一致的,即HDFS数据准备,spark base镜像的准备、spark应用镜像的准备等。主要就是作业提交方式稍有不同,以及一些额外的基本环境配置。 258 | 259 | 260 | 261 | ### 创建serverless kubernetes集群 262 | 263 | 选择标准serverless集群: 264 | ![eci-spark-4](https://github.com/aliyuneci/BestPractice-Serverless-Kubernetes/blob/master/eci-spark/pics/1574233483142-9359d5e3-81c9-4154-8242-ed3a37a4e37b.png) 265 | 266 | 基本参数: 267 | 268 | 1、自定义集群名。 269 | 270 | 2、选择地域、以及可用区。 271 | 272 | 3、专有网络可以用已有的也可以由容器服务自动创建的。 273 | 274 | 4、是否公网暴露API server,如有需求建议开启。 275 | 276 | 5、开启privatezone,必须开启。 277 | 278 | 6、日志收集,建议开启。 279 | ![eci-spark-5](http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/pic/146238/cn_zh/1575978863114/1574233499811-efed418f-649b-45f0-b035-cdb09a15fa3d.png) 280 | 281 | 282 | ##### 注: 283 | 284 | 1、提交之前一定要升级集群的集群的virtual kubelet的版本(新建的集群可以忽略),只有目前最新版的VK才能跑Spark作业。 285 | 286 | 2、ASK集群依赖privatezone做服务发现,所以集群不需要开启privatezone,创建的时候需要勾选。如果创建的时候没有勾选,需要联系我们帮开启。不然Spark excutor会找不到driver service。 287 | 288 | 289 | 290 | ### *制作镜像cache 291 | 292 | 由于后面可能要进行大规模启动,为了提高容器启动速度,提前将Spark应用的镜像缓存到ECI本地,采用k8s标准的CRD的方式,具体的流程参考:[使用CRD加速创建Pod]() 293 | 294 | 295 | 296 | ### 提交: 297 | 298 | 由于spark submit目前支持的参数非常有限,所以ASK场景中不建议使用spark submit直接提交,而是使用[Spark Operator]() 。在Spark Operator出现之前,也可以采用kubernetes原生的yaml方式提交。后面会分别介绍这两种不同的方式。 299 | #### 方式一:原生的方式,编写yaml 300 | 301 | 编写自定义的标准的kubernetes yaml创建资源。 302 | 303 | 我所测试的完整的yaml文件如下(基于Spark 2.3.0): 304 | 305 | wordcount-spark-driver-svc.yaml: 306 | 307 | ```yaml 308 | apiVersion: v1 309 | kind: ServiceAccount 310 | metadata: 311 | name: spark-serverless 312 | namespace: default 313 | 314 | --- 315 | apiVersion: rbac.authorization.k8s.io/v1 316 | kind: ClusterRoleBinding 317 | metadata: 318 | name: spark-serverless-role 319 | roleRef: 320 | apiGroup: rbac.authorization.k8s.io 321 | kind: ClusterRole 322 | name: edit 323 | subjects: 324 | - kind: ServiceAccount 325 | name: spark-serverless 326 | namespace: default 327 | 328 | --- 329 | apiVersion: v1 330 | kind: Service 331 | metadata: 332 | name: wordcount-spark-driver-svc 333 | namespace: default 334 | annotations: 335 | service.beta.kubernetes.io/alibaba-cloud-private-zone-enable: "true" 336 | spec: 337 | clusterIP: None 338 | ports: 339 | - name: driver-rpc-port 340 | port: 7078 341 | protocol: TCP 342 | targetPort: 7078 343 | - name: blockmanager 344 | port: 7079 345 | protocol: TCP 346 | targetPort: 7079 347 | selector: 348 | spark-app-selector: spark-9b7952456a86413b94c70fe2b3f8496c 349 | spark-role: driver 350 | sessionAffinity: None 351 | type: ClusterIP 352 | 353 | --- 354 | apiVersion: v1 355 | kind: Pod 356 | metadata: 357 | annotations: 358 | spark-app-name: WordCount 359 | k8s.aliyun.com/eci-image-cache: "true" 360 | labels: 361 | spark-app-selector: spark-9b7952456a86413b94c70fe2b3f8496c 362 | spark-role: driver 363 | name: wordcount-spark-driver 364 | namespace: default 365 | spec: 366 | containers: 367 | - args: 368 | - driver 369 | env: 370 | - name: SPARK_DRIVER_MEMORY 371 | value: 1g 372 | - name: SPARK_DRIVER_CLASS 373 | value: com.aliyun.liumi.spark.example.WordCount 374 | - name: SPARK_DRIVER_ARGS 375 | - name: SPARK_DRIVER_BIND_ADDRESS 376 | valueFrom: 377 | fieldRef: 378 | apiVersion: v1 379 | fieldPath: status.podIP 380 | - name: SPARK_MOUNTED_CLASSPATH 381 | value: >- 382 | /opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar:/opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar 383 | - name: SPARK_JAVA_OPT_0 384 | value: '-Dspark.submit.deployMode=cluster' 385 | - name: SPARK_JAVA_OPT_1 386 | value: '-Dspark.driver.blockManager.port=7079' 387 | - name: SPARK_JAVA_OPT_2 388 | value: '-Dspark.master=k8s://https://47.99.132.xxx:6443' 389 | - name: SPARK_JAVA_OPT_3 390 | value: '-Dspark.app.id=spark-9b7952456a86413b94c70fe2b3f8496c' 391 | - name: SPARK_JAVA_OPT_4 392 | value: '-Dspark.kubernetes.authenticate.driver.serviceAccountName=spark' 393 | - name: SPARK_JAVA_OPT_5 394 | value: >- 395 | -Dspark.kubernetes.driver.pod.name=wordcount-spark-driver 396 | - name: SPARK_JAVA_OPT_6 397 | value: '-Dspark.app.name=WordCount' 398 | - name: SPARK_JAVA_OPT_7 399 | value: >- 400 | -Dspark.kubernetes.container.image=registry.cn-beijing.aliyuncs.com/liumi/spark:2.3.0-hdfs-1.0 401 | - name: SPARK_JAVA_OPT_8 402 | value: '-Dspark.executor.instances=10' 403 | - name: SPARK_JAVA_OPT_9 404 | value: >- 405 | -Dspark.jars=/opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar,/opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar 406 | - name: SPARK_JAVA_OPT_10 407 | value: >- 408 | -Dspark.driver.host=wordcount-spark-driver-svc.default.svc.cluster.local.c132a4a4826814d579c14bf2c5cf933af 409 | - name: SPARK_JAVA_OPT_11 410 | value: >- 411 | -Dspark.kubernetes.executor.podNamePrefix=wordcount-spark 412 | - name: SPARK_JAVA_OPT_12 413 | value: '-Dspark.driver.port=7078' 414 | - name: SPARK_JAVA_OPT_13 415 | value: >- 416 | -Dspark.kubernetes.executor.annotation.k8s.aliyun.com/eci-image-cache=true 417 | - name: SPARK_JAVA_OPT_14 418 | value: >- 419 | -Dspark.kubernetes.allocation.batch.size=10 420 | 421 | image: 'registry.cn-beijing.aliyuncs.com/liumi/spark:2.3.0-hdfs-1.0' 422 | imagePullPolicy: IfNotPresent 423 | name: spark-kubernetes-driver 424 | resources: 425 | limits: 426 | memory: 16384Mi 427 | requests: 428 | cpu: '8' 429 | memory: 16Gi 430 | terminationMessagePath: /dev/termination-log 431 | terminationMessagePolicy: File 432 | 433 | dnsPolicy: None 434 | dnsConfig: 435 | nameservers: 436 | - 100.100.2.136 437 | - 100.100.2.138 438 | searches: 439 | - default.svc.cluster.local.c132a4a4826814d579c14bf2c5cf933af 440 | - svc.cluster.local.c132a4a4826814d579c14bf2c5cf933af 441 | - cluster.local.c132a4a4826814d579c14bf2c5cf933af 442 | - c132a4a4826814d579c14bf2c5cf933af 443 | options: 444 | - name: ndots 445 | value: "5" 446 | hostAliases: 447 | - ip: "47.99.132.xxx" 448 | hostnames: 449 | - "kubernetes.default.svc" 450 | priority: 0 451 | restartPolicy: Never 452 | serviceAccount: spark-serverless 453 | serviceAccountName: spark-serverless 454 | terminationGracePeriodSeconds: 30 455 | tolerations: 456 | - effect: NoExecute 457 | key: node.kubernetes.io/not-ready 458 | operator: Exists 459 | tolerationSeconds: 300 460 | - effect: NoExecute 461 | key: node.kubernetes.io/unreachable 462 | operator: Exists 463 | tolerationSeconds: 300 464 | ``` 465 | 466 | yaml文件里定义了四个资源: 467 | 468 | **ServiceAccount**:spark-serverless,Driver需要在pod里面访问集群的api server,所以需要创建一个ServiceAccount。不用每次提交都创建。 469 | 470 | **ClusterRoleBinding**:spark-serverless-role,将RBAC的role绑定到这个ServiceAccount,赋予操作资源的权限。不用每次提交都创建。 471 | 472 | **Service**:Driver service,暴露Driver pod。Excutor 就是通过这个service访问Driver的。 473 | 474 | **Pod**:Driver pod,不用定义Excutor pod yaml,Excutor pod的参数通过Driver的环境变量来设置Dspark.kubernetes.*实现。 475 | 476 | kubectl 提交: 477 | 478 | ```bash 479 | liumihustdeMacBook-Pro:spark-on-k8s liumihust$ kubectl create -f wordcount-spark-driver-svc.yaml 480 | serviceaccount/spark-serverless created 481 | clusterrolebinding.rbac.authorization.k8s.io/spark-serverless-role created 482 | service/wordcount-spark-driver-svc created 483 | pod/wordcount-spark-driver created 484 | ``` 485 | 486 | #### 方式二:Spark Operator 487 | 488 | 前面直接通过k8s yaml申明的方式,也能直接利用kubernetes的原生调度来跑Spark的作业,在任何集群只要稍加修改就可以用,但问题是:1)不好维护,涉及的自定义参数比较多,且不够直观(尤其对于只熟悉Spark的用户);2)没有了Spark Application的概念了,都是裸的pod和service,当应用多的时候,维护成本就上来了,缺少统一管理的机制。 489 | 490 | [Spark Operator]() 就是为了解决在Kubernetes集群部署并维护Spark应用而开发的,Spark Operator是经典的CRD + Controller,即Kubernetes Operator的实现。Kubernetes Operator诞生的故事也很具有传奇色彩,有兴趣的同学可以了解下 。Operator的出现可以说给有状态的、特定领域的复杂应用 on Kubernetes 打开了一扇窗,Spark Operator便是其中具有代表性的一个。 491 | 492 | ![eci-spark-6](https://github.com/aliyuneci/BestPractice-Serverless-Kubernetes/blob/master/eci-spark/pics/1574233531691-89664643-2afe-40fe-8ac0-462a8dba1910.png) 493 | 494 | Spark Operator几个主要的概念: 495 | 496 | **SparkApplication**:标准的k8s CRD,有CRD就有一个Controller 与之对应。Controller负责监听CRD的创建、更新、以及删除等事件,并作出对应的Action。 497 | 498 | **ScheduledSparkApplication**:SparkApplication的升级,支持带有自定义时间调度策略的作业提交,比如cron。 499 | 500 | **Submission runner**:对Controller发起的创建请求提交spark-submit。 501 | 502 | **Spark pod monitor**:监听Spark pods的状态和事件更新并告知Controller。 503 | 504 | 505 | 506 | ##### 安装Spark Operator 507 | 508 | 推荐用 [helm 3.0]() 509 | 510 | ```bash 511 | helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator 512 | helm install incubator/sparkoperator --namespace default --set operatorImageName=registry.cn-hangzhou.aliyuncs.com/eci_open/spark-operator --set operatorVersion=v1beta2-1.0.1-2.4.4 --generate-name --set enableWebhook=true 513 | ``` 514 | 515 | 安装完成后可以看到集群多了个spark operator pod。 516 | ![eci-saprk-7](http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/pic/146238/cn_zh/1575979001598/1574233651888-ade8ea24-4e36-4189-817d-26572230970a.png) 517 | 518 | 选项说明: 519 | 520 | 1、--set operatorImageName:指定operator镜像,默认的google的镜像阿里云ECI内拉不下来,可以先拉取到本地然后推到ACR。 521 | 522 | 2、--set operatorVersion operator:镜像仓库名和版本不要写在一起。 523 | 524 | 3、--generate-name 可以不用显式设置安装名。 525 | 526 | 4、--set enableWebhook 默认不会打开,对于需要使用ACK+ECI的用户,会用到nodeSelector、tolerations这些高级特性,Webhook 必须要打开,后面会讲到。 527 | 528 | ##### 注: 529 | 530 | 创建spark operator的时候,一定要确保镜像能拉下来,推荐直接使用eci_open提供的镜像,因为spark operator卸载的时候也是用相同的镜像启动job进行清理,如果镜像拉不下来清理job也会卡主,导致所有的资源都要手动清理,比较麻烦。 531 | 532 | 533 | 534 | 申明wordcount SparkApplication: 535 | 536 | ```yaml 537 | apiVersion: "sparkoperator.k8s.io/v1beta2" 538 | kind: SparkApplication 539 | metadata: 540 | name: wordcount 541 | namespace: default 542 | spec: 543 | type: Java 544 | mode: cluster 545 | image: "registry.cn-beijing.aliyuncs.com/liumi/spark:2.4.4-example" 546 | imagePullPolicy: IfNotPresent 547 | mainClass: com.aliyun.liumi.spark.example.WordCount 548 | mainApplicationFile: "local:///opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar" 549 | sparkVersion: "2.4.4" 550 | restartPolicy: 551 | type: OnFailure 552 | onFailureRetries: 2 553 | onFailureRetryInterval: 5 554 | onSubmissionFailureRetries: 2 555 | onSubmissionFailureRetryInterval: 10 556 | timeToLiveSeconds: 36000 557 | sparkConf: 558 | "spark.kubernetes.allocation.batch.size": "10" 559 | 560 | driver: 561 | cores: 2 562 | memory: "4096m" 563 | labels: 564 | version: 2.4.4 565 | spark-app: spark-wordcount 566 | role: driver 567 | annotations: 568 | k8s.aliyun.com/eci-image-cache: "true" 569 | serviceAccount: spark 570 | executor: 571 | cores: 1 572 | instances: 100 573 | memory: "1024m" 574 | labels: 575 | version: 2.4.4 576 | role: executor 577 | annotations: 578 | k8s.aliyun.com/eci-image-cache: "true" 579 | ``` 580 | 581 | 注:大部分的参数都可以直接通过SparkApplication CRD已经支持的参数设置,目前支持的所有参数参考:[SparkApplication CRD](),此外还支持直接以sparkConf形式的传入。 582 | 583 | ##### 提交: 584 | 585 | ```bash 586 | kubectl create -f wordcount-operator-example.yaml 587 | ``` 588 | 589 | 590 | ### 结果分析 591 | 592 | 我们是100个1C1G的Excutor并发启动,应用的镜像大小约为 500 MB。 593 | 594 | 作业执行过程截图: 595 | ![eci-spark-8](http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/pic/146238/cn_zh/1575979055243/017c1cd4c74a5936acd4f9b93f089e81.png) 596 | ![eci-spark-9](http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/pic/146238/cn_zh/1575979090414/f22c2388786e00b104d677b545c69bc9.png) 597 | 598 | 可以看到并发启动的100个pod基本在30s内可以完成全部的启动,其中93%可以在20秒内完成启动。 599 | 600 | 看下作业执行时间(包括了vk调度100个Excutor pod时间、每个Excutor pod资源准备的时间、以及作业实际执行的时间等): 601 | 602 | ```yaml 603 | exitCode: 0 604 | finishedAt: '2019-11-16T07:31:59Z' 605 | reason: Completed 606 | startedAt: '2019-11-16T07:29:01Z' 607 | ``` 608 | 609 | 可以看到总共只花了178S,时间降了一个数量级。 610 | 611 | 612 | 613 | ### ACK + ECI 614 | 615 | 在Spark中,Driver和Excutor之间的启动顺序是串行的。尽管ECI展现了出色的并发创建Executor pod的能力,但是ASK这种特殊架构会让Driver和Excutor之间的这种串行体现的比较明显,通常情况下在ECI启动一个Driver pod需要大约20s的时间,然后才是大规模的Excutor pod的启动。对于一些响应要求高的应用,Driver的启动速度可能比Excutor执行作业的耗时更重要。这个时候,我们可以采用ACK+ECI,即传统的Kubernetes集群 + virtual kubelet的方式: 616 | ![eci-spark-9](https://github.com/aliyuneci/BestPractice-Serverless-Kubernetes/blob/master/eci-spark/pics/1574233670142-d818a7c8-2edf-4d4f-ac68-816d18eb1b55.png) 617 | 618 | 对于用户来说,只需如下简单的几步就可以将excutor调度到ECI的virtual node。 619 | 620 | #### 1、在ACK集群中安装ECI的virtual kubelet。 621 | 622 | 进入容器服务控制台的应用目录栏,搜索"ack-virtual-node": 623 | 624 | ![eci-spark-10](http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/pic/146238/cn_zh/1575979191602/1574233691702-743d4526-f45a-4f92-8b06-397d7086d5fc.png) 625 | 626 | 点击进入,选择要安装的集群。 627 | ![eci-spark-11](http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/pic/146238/cn_zh/1575979229956/1574233705801-03b05378-4723-4584-ae8f-1d62beb971cd.png) 628 | 629 | 必填参数参考: 630 | 631 | ```yaml 632 | virtualNode: 633 | image: 634 | repository: registry.cn-hangzhou.aliyuncs.com/acs/virtual-nodes-eci 635 | tag: v1.0.0.1-aliyun 636 | 637 | affinityAdminssion: 638 | enabled: true 639 | image: 640 | repository: registry.cn-hangzhou.aliyuncs.com/ask/virtual-node-affinity-admission-controller 641 | tag: latest 642 | 643 | env: 644 | ECI_REGION: "cn-hangzhou" #集群所在的地域 645 | ECI_VPC: vpc-bp187fy2e7l123456 # 集群所在的vpc,和创建集群的时候保持一致即可,可以在集群概览页查看 646 | ECI_VSWITCH: vsw-bp1bqf53ba123456 # 资源所在的交换机,同上 647 | ECI_SECURITY_GROUP: sg-bp12ujq5zp12346 # 资源所在的安全组,同上 648 | ECI_ACCESS_KEY: XXXXX #账号AK 649 | ECI_SECRET_KEY: XXXXX #账号SK 650 | ALIYUN_CLUSTERID: virtual-kubelet 651 | ``` 652 | 653 | 654 | 655 | #### 2、修改应用的yaml 656 | 657 | 为excutor增加如下参数即可: 658 | 659 | ```yaml 660 | nodeSelector: 661 | type: virtual-kubelet 662 | tolerations: 663 | - key: virtual-kubelet.io/provider 664 | operator: Exists 665 | ``` 666 | 667 | 完整的应用参数如下: 668 | 669 | ```yaml 670 | apiVersion: "sparkoperator.k8s.io/v1beta2" 671 | kind: SparkApplication 672 | metadata: 673 | name: wordcount 674 | namespace: default 675 | spec: 676 | type: Java 677 | mode: cluster 678 | image: "registry.cn-beijing.aliyuncs.com/liumi/spark:2.4.4-example" 679 | imagePullPolicy: IfNotPresent 680 | mainClass: com.aliyun.liumi.spark.example.WordCount 681 | mainApplicationFile: "local:///opt/spark/jars/SparkExampleJava-1.0-SNAPSHOT.jar" 682 | sparkVersion: "2.4.4" 683 | restartPolicy: 684 | type: OnFailure 685 | onFailureRetries: 2 686 | onFailureRetryInterval: 5 687 | onSubmissionFailureRetries: 2 688 | onSubmissionFailureRetryInterval: 10 689 | timeToLiveSeconds: 36000 690 | sparkConf: 691 | "spark.kubernetes.allocation.batch.size": "10" 692 | 693 | driver: 694 | cores: 2 695 | memory: "4096m" 696 | labels: 697 | version: 2.4.4 698 | spark-app: spark-wordcount 699 | role: driver 700 | annotations: 701 | k8s.aliyun.com/eci-image-cache: "true" 702 | serviceAccount: spark 703 | executor: 704 | cores: 1 705 | instances: 100 706 | memory: "1024m" 707 | labels: 708 | version: 2.4.4 709 | role: executor 710 | annotations: 711 | k8s.aliyun.com/eci-image-cache: "true" 712 | #nodeName: virtual-kubelet 713 | nodeSelector: 714 | type: virtual-kubelet 715 | tolerations: 716 | - key: virtual-kubelet.io/provider 717 | operator: Exists 718 | ``` 719 | 720 | 这样就可以将Driver调度到ACK,Excutor调度到ECI上,完美互补。 721 | 722 | 723 | 724 | #### 3、提交 725 | 726 | 效果如下: 727 | ![eci-spark-12](http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/pic/146238/cn_zh/1575979291172/79efc84b99359e069b9e3e9d42e2dc8d.png) 728 | 729 | 看下作业执行时间: 730 | 731 | ```yaml 732 | exitCode: 0 733 | finishedAt: '2019-11-16T07:25:05Z' 734 | reason: Completed 735 | startedAt: '2019-11-16T07:22:40Z' 736 | ``` 737 | 738 | 总共花了145秒,更重要的是Driver直接在本地起,只花了约2秒的时间就启动了。 739 | 740 | 741 | 742 | ## 总结: 743 | 744 | 作业执行时间不是Kubernetes + ECI的绝对优势,如果在ACK上准备好足够的节点资源,也是可以达到这个水平的。 745 | 746 | 我们的优势是: 747 | 748 | ##### 1)弹性和成本 749 | 750 | 对于不管是采用ACK + ECI还是ASK+ECI的方式,提交作业前无需提前预留任何资源,无需关心集群的扩缩容,所有资源都是随作业提交自动开始申请,作业执行结束后自动释放。作业执行完后就只剩一个SparkApplication和终态的Driver pod(只保留管控数据)。除此之外,ACK + ECI的方式还提供了更丰富的调度选择:1)可以将Driver和Excutor分开调度;2)考虑作业类型、成本等因素选择不同的调度资源,以满足更广泛的使用场景。 751 | 752 | ##### 2)计算与存储分离 753 | 754 | 在Kubernetes中跑大数据一直很困扰的问题就是数据存储的问题,到了Serverless kubernetes这个问题就更突出。我们连节点都没有,就更不可能去搭建HDFS/Yarn集群。而事实上,在HDFS集群上跑Spark,已经不是必需的了,见引用[1, 2]。阿里云的HDFS存储也正解了我们这个痛点问题,经测试读写性能也非常不错。我们可以将计算和存储分离,即kubernetes集群中的作业可以直接原生访问HDFS的数据。除了HDFS,阿里云的NAS和OSS也是可选的数据存储。 755 | 756 | ##### 3)调度 757 | 758 | 调度通常可以分为以YARN为代表的两级调度和集中式调度。两级调度有一个中央调度器负责宏观层面的资源调度,而应用的细粒度调度则由下层调度器来完成。集中式调度则对所有的资源请求进行统一调度,Kubernetes的调度就是典型的代表,Kubernetes通过将整个集群的资源信息缓存到本地,利用本地的数据进行乐观调度,进而提高调度器的性能。 759 | 760 | 当前kubernetes集群的达到一定规模的时候,性能会到达瓶颈,引用[3]。YARN可以说是历经了大数据领域多年锤炼的成果,采用kubernetes原生调度器来调度Spark作业能否hold住还是一个问号。 761 | 762 | 而对于Serverless Kubernetes,就变成了类两级调度:对于kubernetes来说调度其实进行了极大的简化,调度器只用将资源统一调度到virtual kubelet,而实际的细粒度调度就下沉到了阿里云强大的弹性计算的调度。 763 | 764 | 当处理的数据量越大,突发启动Excutor pod规模越大的时候,我们的优势会越明显。 765 | 766 | 767 | 768 | ## 参考 769 | 770 | [1] [HDFS vs. Cloud Storage: Pros, cons and migration tips]() 771 | 772 | [2] [New release of Cloud Storage Connector for Hadoop: Improving performance, throughput and more]() 773 | 774 | [3] [Understanding Scalability and Performance in the Kubernetes Master]() , Xingyu Chen, Fansong Zeng Alibaba Cloud 775 | 776 | 777 | 778 | ## 附录 779 | ### Spark Base 镜像: 780 | 本样例采用的是谷歌提供的 gcr.io/spark-operator/spark:v2.4.4 781 | 782 | ECI已经帮拉取到ACR仓库,各地域地址如下: 783 | 784 | 公网地址:registry.{对应regionId}.aliyuncs.com/eci_open/spark:2.4.4 785 | 786 | vpc网络地址:registry-vpc.{对应regionId}.aliyuncs.com/eci_open/spark:2.4.4 787 | 788 | 789 | ### Spark Operator 镜像 790 | 本样例采用的是谷歌提供的 gcr.io/spark-operator/spark-operator:v1beta2-1.0.1-2.4.4 791 | 792 | ECI已经帮拉取到ACR仓库,各地域地址如下: 793 | 794 | 公网地址:registry.{对应regionId}.aliyuncs.com/eci_open/spark-operator:v1beta2-1.0.1-2.4.4 795 | 796 | vpc网络地址:registry-vpc.{对应regionId}.aliyuncs.com/eci_open/spark-operator:v1beta2-1.0.1-2.4.4 797 | --------------------------------------------------------------------------------