├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── chatbot-ui ├── application │ ├── Dockerfile │ ├── app.py │ └── requirements.txt └── manifests │ ├── deployment.yaml │ └── ingress-class.yaml ├── helm.tf ├── main.tf ├── nodepool_automode.tf ├── static └── images │ ├── chatbot.jpg │ └── cloudshell.jpg └── vllm-chart ├── .helmignore ├── Chart.yaml ├── templates ├── NOTES.txt ├── _helpers.tpl ├── deployment.yaml └── service.yaml └── values.yaml /.gitignore: -------------------------------------------------------------------------------- 1 | # Created by https://www.toptal.com/developers/gitignore/api/terraform 2 | # Edit at https://www.toptal.com/developers/gitignore?templates=terraform 3 | 4 | ### Terraform ### 5 | # Local .terraform directories 6 | **/.terraform/* 7 | 8 | # .tfstate files 9 | *.tfstate 10 | *.tfstate.* 11 | 12 | # Crash log files 13 | crash.log 14 | crash.*.log 15 | 16 | # Exclude all .tfvars files, which are likely to contain sensitive data, such as 17 | # password, private keys, and other secrets. These should not be part of version 18 | # control as they are data points which are potentially sensitive and subject 19 | # to change depending on the environment. 20 | *.tfvars 21 | *.tfvars.json 22 | 23 | # Ignore override files as they are usually used to override resources locally and so 24 | # are not checked in 25 | override.tf 26 | override.tf.json 27 | *_override.tf 28 | *_override.tf.json 29 | 30 | # Include override files you do wish to add to version control using negated pattern 31 | # !example_override.tf 32 | 33 | # Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan 34 | # example: *tfplan* 35 | 36 | # Ignore CLI configuration files 37 | .terraformrc 38 | terraform.rc 39 | 40 | # End of https://www.toptal.com/developers/gitignore/api/terraform 41 | .terraform.lock.hcl -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT No Attribution 2 | 3 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 9 | the Software, and to permit persons to whom the Software is furnished to do so. 10 | 11 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 12 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 13 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 14 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 15 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 16 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Hosting DeepSeek-R1 on Amazon EKS 2 | 3 | In this tutorial, we’ll walk you through how to host [**DeepSeek-R1**](https://github.com/deepseek-ai/DeepSeek-R1) model on AWS using **Amazon EKS**. We are using [**Amazon EKS Auto Mode**](https://aws.amazon.com/eks/auto-mode/?trk=309fae93-0dac-4940-8d50-5b585d53959f&sc_channel=el) for the the flexibility and scalability that it provides, while eliminating the need for you to manage the Kubernetes control plane, compute, storage, and networking components. 4 | 5 | ## Deploying DeepSeek-R1 on Amazon EKS Auto Mode 6 | 7 | For this tutorial, we’ll use the [***DeepSeek-R1-Distill-Llama-8B***](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) distilled model. 8 | While it requires fewer resources (like GPU) compared to the full [***DeepSeek-R1***](https://huggingface.co/deepseek-ai/DeepSeek-R1) model with 671B parameters, it provides a lighter, though less powerful, option compared to the full model. 9 | 10 | If you'd prefer to deploy the full DeepSeek-R1 model, simply replace the distilled model in the vLLM configuration. 11 | 12 | ### PreReqs 13 | 14 | - [Check AWS Instance Quota](https://docs.aws.amazon.com/ec2/latest/instancetypes/ec2-instance-quotas.html) 15 | - [Install kubectl](https://kubernetes.io/docs/tasks/tools/) 16 | - [Install terraform](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli) 17 | - [Install finch](https://runfinch.com/docs/getting-started/installation/) or [docker](https://docs.docker.com/get-started/get-docker/) 18 | 19 | ### Create an Amazon EKS Cluster w/ Auto Mode using Terraform 20 | We'll use Terraform to easily provision the infrastructure, including a VPC, ECR repository, and an EKS cluster with Auto Mode enabled. 21 | 22 | ``` bash 23 | # Clone the GitHub repo with the manifests 24 | git clone https://github.com/aws-samples/deepseek-using-vllm-on-eks 25 | cd deepseek-using-vllm-on-eks 26 | 27 | # Apply the Terraform configuration 28 | terraform init 29 | terraform apply -auto-approve 30 | 31 | $(terraform output configure_kubectl | jq -r) 32 | ``` 33 | 34 | ### Deploy DeepSeek Model 35 | 36 | In this step, we will deploy the **DeepSeek-R1-Distill-Llama-8B** model using vLLM on Amazon EKS. 37 | We will walk through deploying the model with the option to enable GPU-based, Neuron-based (Inferentia and Trainium), 38 | or both, by configuring the parameters accordingly. 39 | 40 | #### Configuring Node Pools 41 | The `enable_auto_mode_node_pool` parameter can be set to `true` to automatically create node pools when using EKS AutoMode. 42 | This configuration is defined in the [nodepool_automode.tf](./nodepool_automode.tf) file. If you're using EKS AutoMode, this will ensure that the appropriate node pools are provisioned. 43 | 44 | #### Customizing Helm Chart Values 45 | To customize the values used to host your model using vLLM, check the [helm.tf](./helm.tf) file. 46 | This file defines the model to be deployed (**deepseek-ai/DeepSeek-R1-Distill-Llama-8B**) and allows you to pass additional parameters to vLLM. 47 | You can modify this file to change resource configurations, node selectors, or tolerations as needed. 48 | 49 | ``` bash 50 | # Let's start by just enabling the GPU based option: 51 | terraform apply -auto-approve -var="enable_deep_seek_gpu=true" -var="enable_auto_mode_node_pool=true" 52 | 53 | # Check the pods in the 'deepseek' namespace 54 | kubectl get po -n deepseek 55 | ``` 56 | 57 |
58 | Click to deploy with Neuron based Instances 59 | 60 | ``` bash 61 | # Before Adding Neuron support we need to build the image for the vllm deepseek neuron based deployment. 62 | 63 | # Let's start by getting the ECR repo name where we'll be pushing the image 64 | export ECR_REPO_NEURON=$(terraform output ecr_repository_uri_neuron | jq -r) 65 | 66 | # Now, let's clone the official vLLM repo and use its official container image with the neuron drivers installed 67 | git clone https://github.com/vllm-project/vllm 68 | cd vllm 69 | 70 | # Building image 71 | finch build --platform linux/amd64 -f Dockerfile.neuron -t $ECR_REPO_NEURON:0.1 . 72 | 73 | # Login on ECR repository 74 | aws ecr get-login-password | finch login --username AWS --password-stdin $ECR_REPO_NEURON 75 | 76 | # Pushing the image 77 | finch push $ECR_REPO_NEURON:0.1 78 | 79 | # Remove vllm repo and container image from local machine 80 | cd .. 81 | rm -rf vllm 82 | finch rmi $ECR_REPO_NEURON:0.1 83 | 84 | # Enable additional nodepool and deploy vLLM DeepSeek model 85 | terraform apply -auto-approve -var="enable_deep_seek_gpu=true" -var="enable_deep_seek_neuron=true" -var="enable_auto_mode_node_pool=true" 86 | ``` 87 |
88 | 89 | Initially, the pod might be in a **Pending state** while EKS Auto Mode provisions the underlying EC2 instances with the required drivers. 90 | 91 |
92 | Click if your pod is stuck in a "pending" state for several minutes 93 | 94 | ``` bash 95 | # Check if the node was provisioned 96 | kubectl get nodes -l owner=data-engineer 97 | ``` 98 | If no nodes are displayed, verify that your AWS account has sufficient service quota to launch the required instances. 99 | Check the quota limits for G, P, or Inf instances (e.g., GPU or Neuron based instances). 100 | 101 | For more information, refer to the [AWS EC2 Instance Quotas documentation](https://docs.aws.amazon.com/ec2/latest/instancetypes/ec2-instance-quotas.html). 102 | 103 | **Note:** Those quotas are based on vCPUs, not the number of instances, so be sure to request accordingly. 104 | 105 |
106 | 107 | ``` bash 108 | # Wait for the pod to reach the 'Running' state 109 | kubectl get po -n deepseek --watch 110 | 111 | # Verify that a new Node has been created 112 | kubectl get nodes -l owner=data-engineer -o wide 113 | 114 | # Check the logs to confirm that vLLM has started 115 | # Select the command based on the accelerator you choose to deploy. 116 | kubectl logs deployment.apps/deepseek-gpu-vllm-chart -n deepseek 117 | kubectl logs deployment.apps/deepseek-neuron-vllm-chart -n deepseek 118 | ``` 119 | 120 | You should see the log entry **Application startup complete** once the deployment is ready. 121 | 122 | ### Interact with the LLM 123 | 124 | Next, we can create a local proxy to interact with the model using a curl request. 125 | 126 | ``` bash 127 | # Set up a proxy to forward the service port to your local terminal 128 | # We are exposing Neuron based on port 8080 and GPU based on port 8081 129 | kubectl port-forward svc/deepseek-neuron-vllm-chart -n deepseek 8080:80 > port-forward-neuron.log 2>&1 & 130 | kubectl port-forward svc/deepseek-gpu-vllm-chart -n deepseek 8081:80 > port-forward-gpu.log 2>&1 & 131 | 132 | # Send a curl request to the model (change the port according to the accelerator you are using) 133 | curl -X POST "http://localhost:8080/v1/chat/completions" -H "Content-Type: application/json" --data '{ 134 | "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B", 135 | "messages": [ 136 | { 137 | "role": "user", 138 | "content": "What is Kubernetes?" 139 | } 140 | ] 141 | }' 142 | ``` 143 | The response may take a few seconds to build, depending on the complexity of the model’s output. 144 | You can monitor the progress via the `deepseek-gpu-vllm-chart` or `deepseek-neuron-vllm-chart` deployment logs. 145 | 146 | ### Build a Chatbot UI for the Model 147 | 148 | While direct API requests work fine, let’s build a more user-friendly Chatbot UI to interact with the model. The source code for the UI is already available in the GitHub repository. 149 | 150 | ``` bash 151 | # Retrieve the ECR repository URI created by Terraform 152 | export ECR_REPO=$(terraform output ecr_repository_uri | jq -r) 153 | 154 | # Build the container image for the Chatbot UI 155 | finch build --platform linux/amd64 -t $ECR_REPO:0.1 chatbot-ui/application/. 156 | 157 | # Login to ECR and push the image 158 | aws ecr get-login-password | finch login --username AWS --password-stdin $ECR_REPO 159 | finch push $ECR_REPO:0.1 160 | 161 | # Update the deployment manifest to use the image 162 | sed -i "s#__IMAGE_DEEPSEEK_CHATBOT__#$ECR_REPO:0.1#g" chatbot-ui/manifests/deployment.yaml 163 | 164 | # Generate a random password for the Chatbot UI login 165 | sed -i "s|__PASSWORD__|$(openssl rand -base64 12 | tr -dc A-Za-z0-9 | head -c 16)|" chatbot-ui/manifests/deployment.yaml 166 | 167 | # Deploy the UI and create the ingress class required for load balancers 168 | kubectl apply -f chatbot-ui/manifests/ingress-class.yaml 169 | kubectl apply -f chatbot-ui/manifests/deployment.yaml 170 | 171 | # Get the URL for the load balancer to access the application 172 | echo http://$(kubectl get ingress/deepseek-chatbot-ingress -n deepseek -o json | jq -r '.status.loadBalancer.ingress[0].hostname') 173 | ``` 174 | 175 | To access the Chatbot UI, you'll need the username and password stored in a Kubernetes secret. 176 | 177 | ``` bash 178 | echo -e "Username=$(kubectl get secret deepseek-chatbot-secrets -n deepseek -o jsonpath='{.data.admin-username}' | base64 --decode)\nPassword=$(kubectl get secret deepseek-chatbot-secrets -n deepseek -o jsonpath='{.data.admin-password}' | base64 --decode)" 179 | ``` 180 | After logging in, you'll see a new **Chatbot tab** where you can interact with the model! 181 | In this tab, you'll notice a dropdown menu that lets you switch between Neuron-based and GPU-based deployments! 182 | 183 | ![chatbot-ui](/static/images/chatbot.jpg) 184 | 185 | --- 186 | ### Disclaimer 187 | 188 | **This repository is intended for demonstration and learning purposes only.** 189 | It is **not** intended for production use. The code provided here is for educational purposes and should not be used in a live environment without proper testing, validation, and modifications. 190 | 191 | Use at your own risk. The authors are not responsible for any issues, damages, or losses that may result from using this code in production. 192 | -------------------------------------------------------------------------------- /chatbot-ui/application/Dockerfile: -------------------------------------------------------------------------------- 1 | # Use an official Python runtime as a parent image 2 | FROM python:3.12-slim 3 | 4 | # Set the working directory in the container 5 | WORKDIR /app 6 | 7 | # Copy the current directory contents into the container at /app 8 | COPY . /app 9 | 10 | # Install any needed packages specified in requirements.txt 11 | RUN pip install --no-cache-dir -r requirements.txt 12 | 13 | # Make port 7860 available to the world outside this container 14 | EXPOSE 7860 15 | 16 | # Run app.py when the container launches 17 | CMD ["python", "app.py"] -------------------------------------------------------------------------------- /chatbot-ui/application/app.py: -------------------------------------------------------------------------------- 1 | import gradio as gr 2 | import requests 3 | import os 4 | import re 5 | import json 6 | 7 | # Get credentials from environment variables 8 | ADMIN_USERNAME = os.getenv("ADMIN_USERNAME", "admin") 9 | ADMIN_PASSWORD = os.getenv("ADMIN_PASSWORD", "password") 10 | 11 | # List of possible model base URLs 12 | MODEL_URLS = json.loads(os.getenv("MODEL_URLS", "http://localhost")) 13 | 14 | 15 | def authenticate(username, password): 16 | return username == ADMIN_USERNAME and password == ADMIN_PASSWORD 17 | 18 | 19 | def query_model(question, model_base_url): 20 | url = f"{model_base_url}/v1/chat/completions" 21 | headers = {"Content-Type": "application/json"} 22 | data = { 23 | "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B", 24 | "messages": [ 25 | { 26 | "role": "user", 27 | "content": question 28 | } 29 | ] 30 | } 31 | 32 | response = requests.post(url, headers=headers, json=data) 33 | 34 | if response.status_code == 200: 35 | result = response.json() 36 | content = result['choices'][0]['message']['content'] 37 | # Remove and tags 38 | content = re.sub(r'|', '', content).strip() 39 | return content 40 | else: 41 | return f"Error: {response.status_code} - {response.text}" 42 | 43 | 44 | def chatbot(message, history, model_base_url): 45 | response = query_model(message, model_base_url) 46 | history.append((message, response)) 47 | return history 48 | 49 | 50 | def create_demo(): 51 | with gr.Blocks() as demo: 52 | gr.Markdown("# DeepSeek AI Chatbot") 53 | 54 | with gr.Tab("Login"): 55 | username = gr.Textbox(label="Username") 56 | password = gr.Textbox(label="Password", type="password") 57 | login_button = gr.Button("Login") 58 | login_message = gr.Markdown(visible=False) 59 | 60 | with gr.Tab("Chatbot", visible=False) as chatbot_tab: 61 | model_base_url_dropdown = gr.Dropdown( 62 | choices=MODEL_URLS, 63 | label="Select Model Base URL", 64 | value=MODEL_URLS[0] # Default to the first URL 65 | ) 66 | chatbot = gr.Chatbot(height=300) 67 | msg = gr.Textbox(placeholder="Type your message here...", label="User Input") 68 | clear = gr.Button("Clear") 69 | 70 | def user(user_message, history, model_base_url): 71 | return "", history + [[user_message, None]] 72 | 73 | def bot(history, model_base_url): 74 | if not history: # Check if history is empty 75 | return history 76 | user_message = history[-1][0] 77 | bot_message = query_model(user_message, model_base_url) 78 | history[-1][1] = bot_message 79 | return history 80 | 81 | msg.submit(user, [msg, chatbot, model_base_url_dropdown], [msg, chatbot], queue=False).then( 82 | bot, [chatbot, model_base_url_dropdown], chatbot 83 | ) 84 | clear.click(lambda: None, None, chatbot, queue=False) 85 | 86 | def login(username, password): 87 | if authenticate(username, password): 88 | return ( 89 | gr.update(value="Login successful!", visible=True), # login_message 90 | gr.update(visible=True), # chatbot_tab 91 | "", # username 92 | "", # password 93 | ) 94 | else: 95 | return ( 96 | gr.update(value="Invalid credentials. Please try again.", visible=True), # login_message 97 | gr.update(visible=False), # chatbot_tab 98 | gr.update(), # username (no change) 99 | gr.update(), # password (no change) 100 | ) 101 | 102 | login_button.click(login, inputs=[username, password], outputs=[login_message, chatbot_tab, username, password]) 103 | 104 | return demo 105 | 106 | if __name__ == "__main__": 107 | demo = create_demo() 108 | demo.launch(server_name="0.0.0.0", server_port=7860) 109 | -------------------------------------------------------------------------------- /chatbot-ui/application/requirements.txt: -------------------------------------------------------------------------------- 1 | aiofiles==23.2.1 2 | annotated-types==0.7.0 3 | anyio==4.8.0 4 | certifi==2024.12.14 5 | charset-normalizer==3.4.1 6 | click==8.1.8 7 | fastapi==0.115.7 8 | ffmpy==0.5.0 9 | filelock==3.17.0 10 | fsspec==2024.12.0 11 | gradio==5.13.1 12 | gradio_client==1.6.0 13 | h11==0.14.0 14 | httpcore==1.0.7 15 | httpx==0.28.1 16 | huggingface-hub==0.28.0 17 | idna==3.10 18 | Jinja2==3.1.5 19 | markdown-it-py==3.0.0 20 | MarkupSafe==2.1.5 21 | mdurl==0.1.2 22 | numpy==2.2.2 23 | orjson==3.10.15 24 | packaging==24.2 25 | pandas==2.2.3 26 | pillow==11.1.0 27 | pydantic==2.10.6 28 | pydantic_core==2.27.2 29 | pydub==0.25.1 30 | Pygments==2.19.1 31 | python-dateutil==2.9.0.post0 32 | python-multipart==0.0.20 33 | pytz==2024.2 34 | PyYAML==6.0.2 35 | requests==2.32.3 36 | rich==13.9.4 37 | ruff==0.9.3 38 | safehttpx==0.1.6 39 | semantic-version==2.10.0 40 | shellingham==1.5.4 41 | six==1.17.0 42 | sniffio==1.3.1 43 | starlette==0.45.3 44 | tomlkit==0.13.2 45 | tqdm==4.67.1 46 | typer==0.15.1 47 | typing_extensions==4.12.2 48 | tzdata==2025.1 49 | urllib3==2.3.0 50 | uvicorn==0.34.0 51 | websockets==14.2 52 | -------------------------------------------------------------------------------- /chatbot-ui/manifests/deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: deepseek-chatbot 5 | namespace: deepseek 6 | labels: 7 | app: deepseek-chatbot 8 | spec: 9 | replicas: 1 10 | selector: 11 | matchLabels: 12 | app: deepseek-chatbot 13 | template: 14 | metadata: 15 | labels: 16 | app: deepseek-chatbot 17 | spec: 18 | containers: 19 | - name: deepseek-chatbot 20 | image: __IMAGE_DEEPSEEK_CHATBOT__ 21 | ports: 22 | - containerPort: 7860 23 | env: 24 | - name: ADMIN_USERNAME 25 | valueFrom: 26 | secretKeyRef: 27 | name: deepseek-chatbot-secrets 28 | key: admin-username 29 | - name: ADMIN_PASSWORD 30 | valueFrom: 31 | secretKeyRef: 32 | name: deepseek-chatbot-secrets 33 | key: admin-password 34 | - name: MODEL_URLS 35 | value: '["http://deepseek-gpu-vllm-chart", "http://deepseek-neuron-vllm-chart"]' 36 | resources: 37 | requests: 38 | cpu: "250m" 39 | memory: "512Mi" 40 | limits: 41 | cpu: "500m" 42 | memory: "1Gi" 43 | --- 44 | apiVersion: v1 45 | kind: Service 46 | metadata: 47 | name: deepseek-chatbot-service 48 | namespace: deepseek 49 | spec: 50 | selector: 51 | app: deepseek-chatbot 52 | ports: 53 | - protocol: TCP 54 | port: 80 55 | targetPort: 7860 56 | type: ClusterIP 57 | --- 58 | apiVersion: v1 59 | kind: Secret 60 | metadata: 61 | name: deepseek-chatbot-secrets 62 | namespace: deepseek 63 | type: Opaque 64 | stringData: 65 | admin-username: admin 66 | admin-password: __PASSWORD__ 67 | --- 68 | apiVersion: networking.k8s.io/v1 69 | kind: Ingress 70 | metadata: 71 | name: deepseek-chatbot-ingress 72 | namespace: deepseek 73 | annotations: 74 | alb.ingress.kubernetes.io/scheme: internet-facing 75 | alb.ingress.kubernetes.io/target-type: ip 76 | spec: 77 | ingressClassName: alb 78 | rules: 79 | - http: 80 | paths: 81 | - path: / 82 | pathType: Prefix 83 | backend: 84 | service: 85 | name: deepseek-chatbot-service 86 | port: 87 | number: 80 88 | -------------------------------------------------------------------------------- /chatbot-ui/manifests/ingress-class.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: eks.amazonaws.com/v1 2 | kind: IngressClassParams 3 | metadata: 4 | name: alb 5 | spec: 6 | scheme: internet-facing 7 | --- 8 | apiVersion: networking.k8s.io/v1 9 | kind: IngressClass 10 | metadata: 11 | name: alb 12 | annotations: 13 | ingressclass.kubernetes.io/is-default-class: "true" 14 | spec: 15 | controller: eks.amazonaws.com/alb 16 | parameters: 17 | apiGroup: eks.amazonaws.com 18 | kind: IngressClassParams 19 | name: alb -------------------------------------------------------------------------------- /helm.tf: -------------------------------------------------------------------------------- 1 | resource "helm_release" "deepseek_gpu" { 2 | count = var.enable_deep_seek_gpu ? 1 : 0 3 | name = "deepseek-gpu" 4 | chart = "./vllm-chart" 5 | create_namespace = true 6 | wait = false 7 | replace = true 8 | namespace = "deepseek" 9 | 10 | values = [ 11 | <<-EOT 12 | nodeSelector: 13 | owner: "data-engineer" 14 | instanceType: "gpu" 15 | tolerations: 16 | - key: "nvidia.com/gpu" 17 | operator: "Exists" 18 | effect: "NoSchedule" 19 | resources: 20 | limits: 21 | cpu: "32" 22 | memory: 100G 23 | nvidia.com/gpu: "1" 24 | requests: 25 | cpu: "16" 26 | memory: 30G 27 | nvidia.com/gpu: "1" 28 | command: "vllm serve deepseek-ai/DeepSeek-R1-Distill-Llama-8B --max_model 2048" 29 | 30 | livenessProbe: 31 | httpGet: 32 | path: /health 33 | port: 8000 34 | initialDelaySeconds: 1800 35 | periodSeconds: 10 36 | 37 | readinessProbe: 38 | httpGet: 39 | path: /health 40 | port: 8000 41 | initialDelaySeconds: 1800 42 | periodSeconds: 5 43 | 44 | EOT 45 | ] 46 | depends_on = [module.eks, kubernetes_manifest.gpu_nodepool] 47 | } 48 | 49 | resource "helm_release" "deepseek_neuron" { 50 | count = var.enable_deep_seek_neuron ? 1 : 0 51 | name = "deepseek-neuron" 52 | chart = "./vllm-chart" 53 | create_namespace = true 54 | wait = false 55 | replace = true 56 | namespace = "deepseek" 57 | 58 | values = [ 59 | <<-EOT 60 | image: 61 | repository: ${aws_ecr_repository.neuron-ecr.repository_url} 62 | tag: 0.1 63 | pullPolicy: IfNotPresent 64 | 65 | nodeSelector: 66 | owner: "data-engineer" 67 | instanceType: "neuron" 68 | tolerations: 69 | - key: "aws.amazon.com/neuron" 70 | operator: "Exists" 71 | effect: "NoSchedule" 72 | 73 | command: "vllm serve deepseek-ai/DeepSeek-R1-Distill-Llama-8B --device neuron --tensor-parallel-size 2 --max-num-seqs 4 --block-size 8 --use-v2-block-manager --max-model-len 2048" 74 | 75 | env: 76 | - name: NEURON_RT_NUM_CORES 77 | value: "2" 78 | - name: NEURON_RT_VISIBLE_CORES 79 | value: "0,1" 80 | - name: VLLM_LOGGING_LEVEL 81 | value: "INFO" 82 | 83 | resources: 84 | limits: 85 | cpu: "30" 86 | memory: 64G 87 | aws.amazon.com/neuron: "1" 88 | requests: 89 | cpu: "30" 90 | memory: 64G 91 | aws.amazon.com/neuron: "1" 92 | 93 | livenessProbe: 94 | httpGet: 95 | path: /health 96 | port: 8000 97 | initialDelaySeconds: 1800 98 | periodSeconds: 10 99 | 100 | readinessProbe: 101 | httpGet: 102 | path: /health 103 | port: 8000 104 | initialDelaySeconds: 1800 105 | periodSeconds: 5 106 | EOT 107 | ] 108 | depends_on = [module.eks, kubernetes_manifest.neuron_nodepool] 109 | } 110 | -------------------------------------------------------------------------------- /main.tf: -------------------------------------------------------------------------------- 1 | variable "enable_deep_seek_gpu" { 2 | description = "Enable DeepSeek using GPUs" 3 | type = bool 4 | default = false 5 | } 6 | 7 | variable "enable_deep_seek_neuron" { 8 | description = "Enable DeepSeek using Neuron" 9 | type = bool 10 | default = false 11 | } 12 | 13 | variable "enable_auto_mode_node_pool" { 14 | description = "Enable EKS AutoMode NodePool" 15 | type = bool 16 | default = false 17 | } 18 | 19 | locals { 20 | region = "us-east-1" 21 | vpc_cidr = "10.0.0.0/16" 22 | name = "eks-automode" 23 | azs = slice(data.aws_availability_zones.available.names, 0, 3) 24 | 25 | tags = { 26 | Blueprint = local.name 27 | } 28 | } 29 | 30 | 31 | # Define the required providers 32 | provider "aws" { 33 | region = local.region # Change to your desired region 34 | } 35 | 36 | provider "kubernetes" { 37 | host = module.eks.cluster_endpoint 38 | cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) 39 | 40 | exec { 41 | api_version = "client.authentication.k8s.io/v1beta1" 42 | command = "aws" 43 | # This requires the awscli to be installed locally where Terraform is executed 44 | args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name] 45 | } 46 | } 47 | 48 | provider "helm" { 49 | kubernetes { 50 | host = module.eks.cluster_endpoint 51 | cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data) 52 | 53 | exec { 54 | api_version = "client.authentication.k8s.io/v1beta1" 55 | command = "aws" 56 | # This requires the awscli to be installed locally where Terraform is executed 57 | args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name] 58 | } 59 | } 60 | } 61 | 62 | data "aws_availability_zones" "available" { 63 | # Do not include local zones 64 | filter { 65 | name = "opt-in-status" 66 | values = ["opt-in-not-required"] 67 | } 68 | } 69 | 70 | # Use the Terraform VPC module to create a VPC 71 | module "vpc" { 72 | source = "terraform-aws-modules/vpc/aws" 73 | version = "5.17.0" # Use the latest version available 74 | 75 | name = "${local.name}-vpc" 76 | cidr = local.vpc_cidr 77 | 78 | azs = local.azs 79 | private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)] 80 | public_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)] 81 | 82 | enable_nat_gateway = true 83 | single_nat_gateway = true 84 | 85 | public_subnet_tags = { 86 | "kubernetes.io/role/elb" = 1 87 | } 88 | 89 | private_subnet_tags = { 90 | "kubernetes.io/role/internal-elb" = 1 91 | } 92 | 93 | tags = local.tags 94 | } 95 | 96 | # Use the Terraform EKS module to create an EKS cluster 97 | module "eks" { 98 | source = "terraform-aws-modules/eks/aws" 99 | version = "20.33.1" # Use the latest version available 100 | 101 | cluster_name = local.name 102 | cluster_version = "1.31" # Specify the EKS version you want to use 103 | 104 | cluster_endpoint_public_access = true 105 | enable_irsa = true 106 | enable_cluster_creator_admin_permissions = true 107 | 108 | cluster_compute_config = { 109 | enabled = true 110 | node_pools = ["general-purpose"] 111 | } 112 | 113 | 114 | vpc_id = module.vpc.vpc_id 115 | subnet_ids = module.vpc.private_subnets 116 | 117 | tags = local.tags 118 | } 119 | 120 | 121 | resource "aws_ecr_repository" "chatbot-ecr" { 122 | name = "${local.name}-chatbot" 123 | image_tag_mutability = "MUTABLE" 124 | } 125 | 126 | resource "aws_ecr_repository" "neuron-ecr" { 127 | name = "${local.name}-neuron-base" 128 | image_tag_mutability = "MUTABLE" 129 | } 130 | 131 | # Outputs 132 | output "configure_kubectl" { 133 | description = "Configure kubectl: make sure you're logged in with the correct AWS profile and run the following command to update your kubeconfig" 134 | value = "aws eks --region ${local.region} update-kubeconfig --name ${module.eks.cluster_name}" 135 | } 136 | 137 | output "ecr_repository_uri" { 138 | value = aws_ecr_repository.chatbot-ecr.repository_url 139 | } 140 | 141 | output "ecr_repository_uri_neuron" { 142 | value = aws_ecr_repository.neuron-ecr.repository_url 143 | } -------------------------------------------------------------------------------- /nodepool_automode.tf: -------------------------------------------------------------------------------- 1 | resource "kubernetes_manifest" "gpu_nodepool" { 2 | count = var.enable_auto_mode_node_pool && var.enable_deep_seek_gpu ? 1 : 0 3 | manifest = { 4 | apiVersion = "karpenter.sh/v1" 5 | kind = "NodePool" 6 | metadata = { 7 | name = "gpu-nodepool" 8 | } 9 | spec = { 10 | template = { 11 | metadata = { 12 | labels = { 13 | owner = "data-engineer" 14 | instanceType = "gpu" 15 | } 16 | } 17 | spec = { 18 | nodeClassRef = { 19 | group = "eks.amazonaws.com" 20 | kind = "NodeClass" 21 | name = "default" 22 | } 23 | taints = [ 24 | { 25 | key = "nvidia.com/gpu" 26 | value = "Exists" 27 | effect = "NoSchedule" 28 | } 29 | ] 30 | requirements = [ 31 | { 32 | key = "eks.amazonaws.com/instance-family" 33 | operator = "In" 34 | values = ["g5", "g6", "g6e", "p5", "p4"] 35 | }, 36 | { 37 | key = "kubernetes.io/arch" 38 | operator = "In" 39 | values = ["amd64"] 40 | }, 41 | { 42 | key = "karpenter.sh/capacity-type" 43 | operator = "In" 44 | values = ["spot", "on-demand"] 45 | } 46 | ] 47 | } 48 | } 49 | limits = { 50 | cpu = "1000" 51 | memory = "1000Gi" 52 | } 53 | } 54 | } 55 | 56 | depends_on = [module.eks] 57 | } 58 | 59 | resource "kubernetes_manifest" "neuron_nodepool" { 60 | count = var.enable_auto_mode_node_pool && var.enable_deep_seek_neuron ? 1 : 0 61 | manifest = { 62 | apiVersion = "karpenter.sh/v1" 63 | kind = "NodePool" 64 | metadata = { 65 | name = "neuron-nodepool" 66 | } 67 | spec = { 68 | template = { 69 | metadata = { 70 | labels = { 71 | owner = "data-engineer" 72 | instanceType = "neuron" 73 | } 74 | } 75 | spec = { 76 | nodeClassRef = { 77 | group = "eks.amazonaws.com" 78 | kind = "NodeClass" 79 | name = "default" 80 | } 81 | taints = [ 82 | { 83 | key = "aws.amazon.com/neuron" 84 | value = "Exists" 85 | effect = "NoSchedule" 86 | } 87 | ] 88 | requirements = [ 89 | { 90 | key = "eks.amazonaws.com/instance-family" 91 | operator = "In" 92 | values = ["inf2"] 93 | }, 94 | { 95 | key = "karpenter.sh/capacity-type" 96 | operator = "In" 97 | values = ["spot", "on-demand"] 98 | } 99 | ] 100 | } 101 | } 102 | limits = { 103 | cpu = "1000" 104 | memory = "1000Gi" 105 | } 106 | } 107 | } 108 | 109 | depends_on = [module.eks] 110 | } -------------------------------------------------------------------------------- /static/images/chatbot.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/deepseek-using-vllm-on-eks/d05003f1744b921081fabed9a3615bf71eea68aa/static/images/chatbot.jpg -------------------------------------------------------------------------------- /static/images/cloudshell.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/deepseek-using-vllm-on-eks/d05003f1744b921081fabed9a3615bf71eea68aa/static/images/cloudshell.jpg -------------------------------------------------------------------------------- /vllm-chart/.helmignore: -------------------------------------------------------------------------------- 1 | # Patterns to ignore when building packages. 2 | # This supports shell glob matching, relative path matching, and 3 | # negation (prefixed with !). Only one pattern per line. 4 | .DS_Store 5 | # Common VCS dirs 6 | .git/ 7 | .gitignore 8 | .bzr/ 9 | .bzrignore 10 | .hg/ 11 | .hgignore 12 | .svn/ 13 | # Common backup files 14 | *.swp 15 | *.bak 16 | *.tmp 17 | *.orig 18 | *~ 19 | # Various IDEs 20 | .project 21 | .idea/ 22 | *.tmproj 23 | .vscode/ 24 | -------------------------------------------------------------------------------- /vllm-chart/Chart.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v2 2 | name: vllm-chart 3 | description: A Helm chart for Kubernetes 4 | 5 | # A chart can be either an 'application' or a 'library' chart. 6 | # 7 | # Application charts are a collection of templates that can be packaged into versioned archives 8 | # to be deployed. 9 | # 10 | # Library charts provide useful utilities or functions for the chart developer. They're included as 11 | # a dependency of application charts to inject those utilities and functions into the rendering 12 | # pipeline. Library charts do not define any templates and therefore cannot be deployed. 13 | type: application 14 | 15 | # This is the chart version. This version number should be incremented each time you make changes 16 | # to the chart and its templates, including the app version. 17 | # Versions are expected to follow Semantic Versioning (https://semver.org/) 18 | version: 0.1.0 19 | 20 | # This is the version number of the application being deployed. This version number should be 21 | # incremented each time you make changes to the application. Versions are not expected to 22 | # follow Semantic Versioning. They should reflect the version the application is using. 23 | # It is recommended to use it with quotes. 24 | appVersion: "1.16.0" 25 | -------------------------------------------------------------------------------- /vllm-chart/templates/NOTES.txt: -------------------------------------------------------------------------------- 1 | {{- if contains "NodePort" .Values.service.type }} 2 | export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "vllm-chart.fullname" . }}) 3 | export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}") 4 | echo http://$NODE_IP:$NODE_PORT 5 | {{- else if contains "LoadBalancer" .Values.service.type }} 6 | NOTE: It may take a few minutes for the LoadBalancer IP to be available. 7 | You can watch its status by running 'kubectl get --namespace {{ .Release.Namespace }} svc -w {{ include "vllm-chart.fullname" . }}' 8 | export SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ include "vllm-chart.fullname" . }} --template "{{"{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}"}}") 9 | echo http://$SERVICE_IP:{{ .Values.service.port }} 10 | {{- else if contains "ClusterIP" .Values.service.type }} 11 | export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "vllm-chart.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}") 12 | export CONTAINER_PORT=$(kubectl get pod --namespace {{ .Release.Namespace }} $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}") 13 | echo "Visit http://127.0.0.1:8080 to use your application" 14 | kubectl --namespace {{ .Release.Namespace }} port-forward $POD_NAME 8080:$CONTAINER_PORT 15 | {{- end }} 16 | -------------------------------------------------------------------------------- /vllm-chart/templates/_helpers.tpl: -------------------------------------------------------------------------------- 1 | {{/* 2 | Expand the name of the chart. 3 | */}} 4 | {{- define "vllm-chart.name" -}} 5 | {{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} 6 | {{- end }} 7 | 8 | {{/* 9 | Create a default fully qualified app name. 10 | We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec). 11 | If release name contains chart name it will be used as a full name. 12 | */}} 13 | {{- define "vllm-chart.fullname" -}} 14 | {{- if .Values.fullnameOverride }} 15 | {{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }} 16 | {{- else }} 17 | {{- $name := default .Chart.Name .Values.nameOverride }} 18 | {{- if contains $name .Release.Name }} 19 | {{- .Release.Name | trunc 63 | trimSuffix "-" }} 20 | {{- else }} 21 | {{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }} 22 | {{- end }} 23 | {{- end }} 24 | {{- end }} 25 | 26 | {{/* 27 | Create chart name and version as used by the chart label. 28 | */}} 29 | {{- define "vllm-chart.chart" -}} 30 | {{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} 31 | {{- end }} 32 | 33 | {{/* 34 | Common labels 35 | */}} 36 | {{- define "vllm-chart.labels" -}} 37 | helm.sh/chart: {{ include "vllm-chart.chart" . }} 38 | {{ include "vllm-chart.selectorLabels" . }} 39 | {{- if .Chart.AppVersion }} 40 | app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} 41 | {{- end }} 42 | app.kubernetes.io/managed-by: {{ .Release.Service }} 43 | {{- end }} 44 | 45 | {{/* 46 | Selector labels 47 | */}} 48 | {{- define "vllm-chart.selectorLabels" -}} 49 | app.kubernetes.io/name: {{ include "vllm-chart.name" . }} 50 | app.kubernetes.io/instance: {{ .Release.Name }} 51 | {{- end }} 52 | 53 | {{/* 54 | Create the name of the service account to use 55 | */}} 56 | {{- define "vllm-chart.serviceAccountName" -}} 57 | {{- if .Values.serviceAccount.create }} 58 | {{- default (include "vllm-chart.fullname" .) .Values.serviceAccount.name }} 59 | {{- else }} 60 | {{- default "default" .Values.serviceAccount.name }} 61 | {{- end }} 62 | {{- end }} 63 | -------------------------------------------------------------------------------- /vllm-chart/templates/deployment.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: apps/v1 2 | kind: Deployment 3 | metadata: 4 | name: {{ include "vllm-chart.fullname" . }} 5 | namespace: {{ .Values.namespace }} 6 | labels: 7 | {{- include "vllm-chart.labels" . | nindent 4 }} 8 | spec: 9 | replicas: {{ .Values.replicaCount }} 10 | selector: 11 | matchLabels: 12 | {{- include "vllm-chart.selectorLabels" . | nindent 6 }} 13 | template: 14 | metadata: 15 | labels: 16 | {{- include "vllm-chart.labels" . | nindent 8 }} 17 | spec: 18 | nodeSelector: 19 | {{- toYaml .Values.nodeSelector | nindent 8 }} 20 | tolerations: 21 | {{- toYaml .Values.tolerations | nindent 8 }} 22 | volumes: 23 | - name: cache-volume 24 | hostPath: 25 | path: {{ .Values.cacheVolume.path }} 26 | type: DirectoryOrCreate 27 | - name: shm 28 | emptyDir: 29 | medium: Memory 30 | sizeLimit: {{ .Values.shmVolume.sizeLimit }} 31 | containers: 32 | - name: {{ .Chart.Name }} 33 | image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" 34 | imagePullPolicy: {{ .Values.image.pullPolicy }} 35 | command: ["/bin/sh", "-c"] 36 | args: 37 | - {{ .Values.command }} 38 | {{- if .Values.env }} 39 | env: 40 | {{- toYaml .Values.env | nindent 12 }} 41 | {{- end }} 42 | ports: 43 | - containerPort: {{ .Values.containerPort }} 44 | resources: 45 | {{- toYaml .Values.resources | nindent 12 }} 46 | volumeMounts: 47 | - mountPath: /root/.cache/huggingface 48 | name: cache-volume 49 | - name: shm 50 | mountPath: /dev/shm 51 | livenessProbe: 52 | {{- toYaml .Values.livenessProbe | nindent 12 }} 53 | readinessProbe: 54 | {{- toYaml .Values.readinessProbe | nindent 12 }} -------------------------------------------------------------------------------- /vllm-chart/templates/service.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: v1 2 | kind: Service 3 | metadata: 4 | name: {{ include "vllm-chart.fullname" . }} 5 | namespace: {{ .Values.namespace }} 6 | labels: 7 | {{- include "vllm-chart.labels" . | nindent 4 }} 8 | spec: 9 | type: {{ .Values.service.type }} 10 | ports: 11 | - name: http 12 | port: {{ .Values.service.port }} 13 | protocol: TCP 14 | targetPort: {{ .Values.service.targetPort }} 15 | selector: 16 | {{- include "vllm-chart.selectorLabels" . | nindent 4 }} -------------------------------------------------------------------------------- /vllm-chart/values.yaml: -------------------------------------------------------------------------------- 1 | namespace: deepseek 2 | replicaCount: 1 3 | 4 | containerPort: 8000 5 | 6 | image: 7 | repository: vllm/vllm-openai 8 | tag: latest 9 | pullPolicy: IfNotPresent 10 | 11 | nodeSelector: {} 12 | 13 | tolerations: [] 14 | 15 | cacheVolume: 16 | path: /tmp/deepseek 17 | 18 | shmVolume: 19 | sizeLimit: 2Gi 20 | 21 | command: "vllm serve __MODEL_NAME_AND_PARAMETERS__" 22 | 23 | resources: 24 | limits: 25 | cpu: "32" 26 | memory: 100G 27 | requests: 28 | cpu: "16" 29 | memory: 30G 30 | 31 | service: 32 | type: ClusterIP 33 | port: 80 34 | targetPort: 8000 35 | 36 | livenessProbe: 37 | httpGet: 38 | path: /health 39 | port: 8000 40 | initialDelaySeconds: 60 41 | periodSeconds: 10 42 | 43 | readinessProbe: 44 | httpGet: 45 | path: /health 46 | port: 8000 47 | initialDelaySeconds: 60 48 | periodSeconds: 5 49 | 50 | env: [] --------------------------------------------------------------------------------