├── AWS_Interview_Questions.md ├── Admission Controllers.png ├── Load balancers.png ├── Pause_Containers.png ├── Private_Subnet.png ├── Question5_Solution.png ├── README.md ├── Untitled-2023-10-04-1156.excalidraw ├── Untitled-2023-10-04-1156.png ├── app1-StatefulSet.yaml.png ├── app1.yaml.png ├── app2-StatefulSet.yaml.png ├── app2-deployment.yaml.png ├── app3-StatefulSet.yaml.png ├── app3-deployment.yaml.png ├── containers_kubernetes_trivia.md ├── docker_compose_trivia.md ├── image-1.png ├── image-10.png ├── image-11.png ├── image-12.png ├── image-13.png ├── image-14.png ├── image-15.png ├── image-16.png ├── image-17.png ├── image-18.png ├── image-19.png ├── image-2.png ├── image-20.png ├── image-21.png ├── image-22.png ├── image-23.png ├── image-24.png ├── image-25.png ├── image-26.png ├── image-27.png ├── image-28.png ├── image-29.png ├── image-3.png ├── image-4.png ├── image-5.png ├── image-6.png ├── image-7.png ├── image-8.png ├── image-9.png ├── image.png ├── initContainers-snippet.png ├── kubernetes_Scenario_Based_Questions.md ├── kubernetes_trivia.md ├── lock_id.png ├── monitoring_kuberetes.md ├── pause_containers_2.png ├── ray-so-export (13).png ├── reference.txt ├── setting_up_hpa.md └── terraform_questions.md /AWS_Interview_Questions.md: -------------------------------------------------------------------------------- 1 | # AWS Interview Questions 2 | 3 | --- 4 | 5 | ## EC2 6 | 7 | --- 8 | 9 | ### What are the family of instance types in EC2? 10 | 11 | - They are calssified into types as folows 12 | 13 | - Majorly classified into the following as 14 | 15 | - General Purpose (Starts with M) - eg M5a, M5ad, M6a 16 | - Memory Optimized (Starts with R(Signifies RAM)) - eg C5, C5a, C5ad 17 | - Compute Optimized (Starts with C(Signifies Compute)) - eg R5, R5a, R5ad 18 | 19 | - Further classified into 20 | 21 | - Storage Optimized 22 | - High Performance Computing 23 | - Accelarted Computing 24 | 25 | ### What are the Burstable Instances ? 26 | 27 | - The t2, t3, t3a They provide best oprice for the performance ratio in the EC2 Instance Types. 28 | - The t4g uses AMazons own processor `Gravitron` 29 | 30 | ### Does EC2 get the Public Address by Default ? 31 | 32 | - `NO`, by default EC2 only gets Private IP Address 33 | 34 | --- 35 | 36 | ## Networking in AWS 37 | 38 | --- 39 | 40 | ### What is CIDR ? 41 | 42 | - Classless Inter-Domain Routing(Widely Used than Classful Routing) 43 | - It is used to provider the IP Address Allocation 44 | - eg `172.16.0.0/16` 45 | 46 | ``` 47 | 172.16.0.0 ==> It is the BASE IP 48 | 16 => Is the Subnet Mask 49 | ``` 50 | 51 | ### How to calculate the no. of IP Addresses Available ? 52 | 53 | - eg `172.16.0.0/16` => It is Divided into twor parts they are `Network Address` and `Host Address` 54 | - `Host Address` - Representst he No. of IP Address taht can be allocated 55 | - Calculated as Follows 56 | 57 | ``` 58 | 1 Byte = 8 Bits 59 | 172.16.0.0/16 ==> It will be divided as 11111111.11111111.00000000.00000000 60 | Masking is done form the right to left ; we have to cover 16 Bits. 61 | Each Byte will contain about `256 Addresses` => From `0 to 255` 62 | 63 | Host Addresses ==> Will cover upto 0.0 64 | N/W Addresses ==> Will cover upto 172.16 65 | 66 | => For Host Addresses ==> 0.0 => 256 * 256 ==> 65536 Addresses - 2(As 2 IP;s are allocated for the Network Address and Broadcast Address) 67 | 68 | No. of IP Addresses can also be derived from the Formula =>` [2 ^ (32 - (Subnet mask)) - 2]` 69 | ``` 70 | 71 | - No. of IP Addresses can also be derived from the Formula =>` [2 ^ (32 - (Subnet mask)) - 2]` 72 | 73 | eg 172.8.10.12/15 74 | => Using [2 ^ (32 - (Subnet mask)) - 2] = [2 ^ (32 - (15)) - 2] = 131070 IP Addreses can be allocated 75 | 76 | ### What is VPC? 77 | 78 | - It is the Private Cloud within our cloud 79 | - It conist of the `Public` and `Private` Subnet. 80 | 81 | - The `Public Subnet` attaches to the `Internet Gatway`; Generally the `Bastion Host(Jump Server)` or `Load Balancers` are kept inside the Public Subnet. 82 | - The `Private Subnet` attaches to the `Network Translation Gateway`; Generally the Applications are stored inside the Private Subnet (Least Privileage Access); The Internet can only get accessed with the help of NAT Gateway ==> NAT Gateway connects to the Internete Gatway and gets reply back from the outer world. 83 | 84 | ### How Public and Private Subnet are from Each Other ? 85 | 86 | - Public Subnet attaches to the Internet gateway and Private Subnet to the NAT Gateway, 87 | 88 | ### What is incoming and outgoing traffic called in terms of the VPC? 89 | 90 | - Incoming Traffic ==> `Ingress traffic / Inbound Traffic` 91 | - Outgoing Traffic ==. `Egress Traffic / OutBound Traffic` 92 | 93 | ### How app inside the Private Subnet connects to the Internet ? 94 | 95 | ![alt text](Private_Subnet.png) 96 | 97 | ### NACL(Network Access Control List) VS SG (Security Groups) 98 | 99 | - `NACL` they are the set of the Rules that are acting at the `Subnet Level` 100 | - In `NACL` the `Inbound Rules` can set to be Something else and `OutBound Rules` can be set to `Blocked` 101 | - `NACL` are set for the `StateLess Routing` 102 | - NACL the Rules ranges from the `[1 to 65535] [1 with the Highest Priority]`. 103 | 104 | - `SG` they are set of Rules that are acting at the `Instance Level` 105 | - IN `SG` once the `Inbound Traffic` is `allowed`; then the `OutBound Traffic` is `allowed by Default`. 106 | - `SG` are set for the `Stateful Routing`. 107 | 108 | --- 109 | 110 | ## LOAD BALANCERS 111 | 112 | --- 113 | 114 | ### What is the Difference Between the High Avalalability [HA] vs Fault Tolerance ? 115 | 116 | - `HA` deals with the `scalability of the application` and making it available for the users even if one instance goes down. 117 | - Fault Tolerance deals with adding the `Application in the Multiple Avalability Zones in the Region`; avoiding to become the `Single Point Failure`. 118 | 119 | ### What is SSL Termintaion? 120 | 121 | - Its relative whenever we are using the `HTTPS Traffic` 122 | - When serving the HTTPS Traffic we need to add the SSL Certifcates for proper encrypted communication. 123 | ![alt text]() 124 | 125 | ### OSI Refrence Model Layers 126 | 127 | - `Layer 7` - `Application Layer` `(ALB Operates at this level)` - Users can access thius layer; Can understand HTTP, HTTPS, PATH, Host Based Routings 128 | - `Layer 6` - `Presentetion Layer` - Gives out the format in whihc the data can be represented eg zip, JPEG. 129 | - `Layer 5` - `Session Layer` - Gives Sessions Created or Not and Valid Sessions or Not ? 130 | - `Layer 4` - `Transport Layer` `(NLB Operates as this Level)` - Takes care for managing the Packet using the `TCP` or `UDP` 131 | - `Layer 3` - `Network Layer` - Finds Shortest Route to Transmit Data. 132 | - `Layer 2` - `Data Link Layer` - Converts Data to Frames 133 | - `Layer 1` - `Physical Layer` - Converst Frames to Bits and Transmission Wires 134 | 135 | ### What are the `target groups` in LB ? 136 | 137 | - They are nothing but the set of Instances clubbed together. 138 | - Tagret Groups are used to perform `HealthChecks` on the Instances. 139 | - Based on some `Rules` the routes the traffic to `Target Groups` 140 | 141 | ### What are some of the Algorithms used by the Load Balancers? 142 | 143 | - These are some of the most famous Algorithms used in the Load Balancers 144 | 145 | - Round Robin 146 | - Weighted 147 | - `Sticky Sessions` => When the request is routed form the EC2 instance and another request comes from the same user then the request is routed to the same EC2 Instance. `[Preffered for Stateful Applications]` 148 | 149 | --- 150 | 151 | ## DNS (ROUTE 53) 152 | 153 | --- 154 | 155 | ### What is DNS ? 156 | 157 | - DNS porvides mapping of the URL to the corresponding IP Address. 158 | 159 | --- 160 | 161 | ## Miscellaneous 162 | 163 | --- 164 | 165 | ### What is the difference between stopping and termintaing the Ec2 instance? 166 | 167 | - Stopping the Ec2 instances will change the IP Addresses; but it will not be removed from the AWS Console 168 | - Terminating the AWS Resources will delete the instance from the AWS Console also. 169 | 170 | ### Can we add an Existing Instance into the Autoscaling Group ? 171 | 172 | - YES, Attach Instance => Select Instance 173 | 174 | ### Cloud Watch vs Cloud Trail 175 | 176 | - `Cloud Watch` => used for monitoring and logging analysis purpose of the application; 177 | - `Cloud Trails` => USed for the auditing the services and the Users for the Compliance Purpose; Trails allow to capture activity and deliver to cold storage. 178 | 179 | ### Reserved Instance VS On Demand Instance 180 | 181 | - `Reserved Instances` they are used or the longterm purpose (1 to 3 Years); so they are provided at a discount; payment method can be all upfront, patial upfront or no upfront at all. 182 | - `On Demand Instances` they are used for shoreter interval of time; No Long Term Commitment; PAY AS YOU USE; 183 | 184 | ### Which type of scaling is recommended for RDS? 185 | 186 | - Types are `Vertical` and `Horizontal` Scaling 187 | - Vertical => When the Workload is moderate; there is predicatbility in workload pattern; requires modest performance then we can go for the vertical scaling (Increase the CPU and Memory) 188 | - Horizontal => When the Worload are extereme; no predicatbility in the workload pattern; requires Concurrencyto achieve HA, Fault Tolerance and scalability use Horizontal Scaling (Increase the count of replicas) 189 | 190 | ### What is maintainace window in RDS ? Can we access the RDS inatcne in the maintainanace window? 191 | 192 | - Maintainance window in RDS can be due to the S/W patching, Hardware Upgradation, Engine Upgrades other routine checks for the Hardware and health and relaibility. 193 | - We can acess the RDS instances during the maintainance window if and only if they are Fault Tolerant i.e. they are replicated in the various Avalability Zones. 194 | 195 | ### What are Different types of LB in AWS? 196 | 197 | - Classic LB(Deprecated) 198 | - Application Load Balancer (Widely used in MicroService Applications) 199 | - Network Load Balancers (Widely usd for the High Throughput Applications such as Gaming Applications) 200 | - Gatway Load Balancers (Used for Migrationg Onpremise to the AWS Cloud) 201 | 202 | - `ALB` => Operates at Layer 7 of OSI (Application Layer); Supports HTTP and HTTPS Traffic; USed for MS applications; It understands `Paths` and `Host Headers`. 203 | 204 | - `NLB` => Operates at Layer 4 of OSI (Transport layer); Supports the TCP and UDP Connections; Supports the High latecy applications such as Gaming Applications. 205 | 206 | ### Explain the steps to setup VPC with subnets and Everyting? 207 | 208 | - Add the CIDR Range (To get the PRivate IP Ranges) 209 | - Create `Subnets` with the `Private` and the `Public` Subnet. 210 | - The Applications are put under the `Private Subnets` and they are connected to the `NAT gateway` inturn connects to the `Internet Gateway`; For Applications requring the Internet Connction. 211 | - The `LoadBalancers` , `NAT Gateway`, `Internet Gateway` and the `JumpServers(Bastion Host)` they are put onto the `Public Subnets` 212 | - Setup the `NACLs` for securing the Access to `Subnet Level` also adding the `Security Group` for securing the Access to the `Instance`. 213 | - Enable the monitoring => Enable the `VPC Flow Logs` and monirtor it using the `Cloudwatch`. 214 | 215 | ### In the case of the AWS Pipeline how can we secure the API KEYS and the Secrets , other credentails ? 216 | 217 | - In AWS CICD PIpeline we can use the `CodeBuild` , `Codepipeline`, `CodeCommit`. 218 | - TO secure the credential of the pipeline we can use the `AWS KMS` or `AWS Secret manager` or `AWS Parameter Store`. 219 | - These services can also be used to `Rotate the secrets` and give access to CICD Services. 220 | - Further we can enable the `CloudTrail` for auditing the user's interaction with the server. 221 | 222 | ### What are some of the services which are not region specific? 223 | 224 | - `AWS IAM` , `AWS CloudFront`, `AWS Route53` 225 | 226 | ### When to use the EC2 and Lambda ? 227 | 228 | - When we want to run some servers(Web servers , db servers) then we can make use of the `EC2 instance.` 229 | - We want to run some process fro a very short amount of time consuming less resources and without managing the servers 230 | - `Lambda` are event driven; short lived with automatic scaling. 231 | 232 | ### Cloud Fromation has an error in template that you have committed; what could happen as tghe result of error; how would you correct it? 233 | 234 | - If we have commited it but it has caused errors so it will not create the Infrastructure ; and will not run. 235 | 236 | ``` 237 | Stacks in AWS ~ Infrastruture Code in Terraform 238 | Stacks can be managed as a Single unit 239 | 240 | Template in Cloud Formation is same as Template Written in the `YAML or JSON` 241 | ``` 242 | 243 | ### HOw can we disable the EC2 Instance Public IP Address? 244 | 245 | - Disable the `Auto-assign Public IP:` 246 | 247 | ### I have an on prem data center and want private connectivity between AWS network to on premise Datacenter. How to configure it and which services 248 | 249 | - It can use the `AWS Services` like `AWS Direct Connect` and `AWS VPN` 250 | - `AWS Direct Connect` => It establishes a Private Connection; `NOT on the Internet Connections`. 251 | - `AWS VPN` => It establishes a Private Connection; `On the Internet Connections`. 252 | 253 | ### If IP of VPC OR subnet IP range got occupy by all server then what action do we need to perform? 254 | 255 | - SO what we need here to do is that 256 | - 1. Expand the `Subnet Size`: This might involve adjusting the CIDR (Classless Inter-Domain Routing) block associated with the subnet. 257 | 258 | - 2. Add `Additional Subnets`: If expanding the existing subnet is not feasible or if you need to segregate resources, create additional subnets within the VPC. 259 | 260 | - 3. Implement `Elastic IP Addresses` :Instead of statically assigning IPs from the subnet pool. EIPs can be associated and disassociated from instances as needed, allowing for more efficient IP address utilization. 261 | 262 | - 4. Maybe consider using `IPV6`: If IPv6 is an option for your infrastructure, consider implementing it to significantly expand the available IP address space. 263 | 264 | ### Explain the purpose of a VPC's route tables. How are they associated with subnets? 265 | 266 | - The purpose of a Virtual Private Cloud (VPC)'s route tables is to control the routing of network traffic within the VPC. Route tables determine where network traffic is directed based on its destination IP address. They essentially act as a set of rules that guide the traffic flow within the VPC. 267 | 268 | ### What is a Virtual Private Network (VPN) connection in the context of AWS VPC? How does it differ from Direct Connect? 269 | 270 | - AWS VPC allows you to connect the `On Premise DataCenters` to `AWS VPC`; It establishes an encrypted tunnel between your network and the VPC, allowing secure communication between resources in your VPC and your on-premises infrastructure. 271 | 272 | - AWS VPN allows you to create a private connection; `with Internet Connection`. 273 | - AWS Direct Connection you to create a private connection; `without Internet Connection` 274 | 275 | ### What is VPC Flow Logs, and why would you enable them? 276 | 277 | - `VPC FLOW LOGS`: They are IP traffic going to and from network interfaces in your VPC. 278 | - It can be better used with the monitoring tools such as `AWS Cloud Watch`and further interact with the `AWS CloudTrail` 279 | 280 | ### Describe the process of migrating an EC2 instance from one VPC to another ? 281 | 282 | - Prepare the target VPC with necessary resources. 283 | - Prepare the EC2 instance for migration. 284 | - Create an AMI of the EC2 instance. 285 | - Copy the AMI to the target region if needed. 286 | - Launch a new instance in the target VPC using the AMI. 287 | - Update DNS records and application configurations if applicable. 288 | - Test and validate the new instance's functionality. 289 | - Monitor the new instance and decommission the source instance once migration is successful. 290 | 291 | ### What if the PEM file is not present then how can we connect to the EC2 instance? 292 | 293 | - Use `AWS Systems manager` ==> `Session Manager` : Session manager actually uses the IAM Permissions for conecting to EC2 insatnce. 294 | - Use `EC2 Snapshot Instance` ==> If we have the EC2 snapshot instance of the `Root Rolume`; then create a `new volume` from this and attach it to the new instance. 295 | 296 | ### EBS vs S3 vs EFS 297 | 298 | | Feature | EBS (Elastic Block Store) | S3 (Simple Storage Service) | EFS (Elastic File System) | 299 | | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------------------- | 300 | | Storage Type | Block-level storage volumes | Object storage | File storage | 301 | | Protocol | N/A | N/A | NFS (Network File System) | 302 | | Accessibility | Attached to single EC2 instance at a time | Accessible via unique URLs | Accessible by multiple EC2 instances concurrently | 303 | | Durability | Replicated within an Availability Zone | Highly durable across multiple Availability Zones | Highly durable across multiple Availability Zones | 304 | | Scalability | Scales with instance type, can be manually resized | Infinitely scalable | Automatically scales based on demand | 305 | | Use Cases | Database storage, boot volumes, file systems | Static website hosting, data archiving, content distribution | Content management systems, development environments, analytics workloads | 306 | | Backup/Recovery | Snapshots | Versioning and Cross-Region Replication | Automated backups and point-in-time recovery | 307 | | Performance | Low-latency access, provisioned IOPS available | Designed for low-latency access at scale | Burstable performance with automatic scaling | 308 | | Pricing Model | Pay for provisioned storage, provisioned IOPS, and snapshots | Pay for storage used, requests, and data transfer | Pay for storage used | 309 | | Access Management | IAM roles and policies | Bucket policies and IAM policies | POSIX permissions and IAM policies | 310 | 311 | ### You want to store temporary data on an EC2 instance. Which storage option is ideal for this purpose? 312 | 313 | - The default storage for the EC2 instance is `EBS` 314 | - If we want to have the temporary usage then we can take care of it using the ` instance local storage` typically provided by `instance store volumes.`; they are ephemeral storage directly attached to EC2 instance; didnt get persisted beyond the lifetime of the instance. 315 | 316 | ### If my RDS is running out of space how will you resolve that without launching other RDS? 317 | 318 | - Can enable the `AutoScaling Feature` in AWS for AWS RDS. 319 | - Further we can make the unused / unnsesasary data cleanup (old logs, temporary tables, or outdated records) it would free up more data in RDS 320 | - ` Manually increase the allocated storage`; do this through AWS Console , CLI or SDK. 321 | - Take a snapshot of your RDS instance and restore it to a new instance with `larger storage capacity.` 322 | 323 | ### How will you take backups using Lambda? 324 | 325 | ``` 326 | import boto3 327 | import datetime 328 | 329 | def lambda_handler(event, context): 330 | # Initialize AWS SDK clients 331 | rds_client = boto3.client('rds') 332 | ec2_client = boto3.client('ec2') 333 | 334 | # Define the list of RDS instances and EC2 instances to backup 335 | rds_instances = ['your-rds-instance-id'] 336 | ec2_instances = ['your-ec2-instance-id'] 337 | 338 | # Create RDS snapshots 339 | for instance_id in rds_instances: 340 | try: 341 | snapshot_id = 'rds-snapshot-' + instance_id + '-' + datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S') 342 | rds_client.create_db_snapshot(DBSnapshotIdentifier=snapshot_id, DBInstanceIdentifier=instance_id) 343 | print(f"Snapshot created for RDS instance {instance_id}: {snapshot_id}") 344 | except Exception as e: 345 | print(f"Error creating snapshot for RDS instance {instance_id}: {str(e)}") 346 | 347 | # Create EBS snapshots 348 | for instance_id in ec2_instances: 349 | try: 350 | volumes = ec2_client.describe_volumes(Filters=[{'Name': 'attachment.instance-id', 'Values': [instance_id]}])['Volumes'] 351 | for volume in volumes: 352 | snapshot_id = 'ebs-snapshot-' + volume['VolumeId'] + '-' + datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S') 353 | ec2_client.create_snapshot(VolumeId=volume['VolumeId'], Description=snapshot_id) 354 | print(f"Snapshot created for EBS volume {volume['VolumeId']}: {snapshot_id}") 355 | except Exception as e: 356 | print(f"Error creating snapshot for EC2 instance {instance_id}: {str(e)}") 357 | 358 | return { 359 | 'statusCode': 200, 360 | 'body': 'Backup process completed successfully.' 361 | } 362 | 363 | ``` 364 | 365 | ### What is VPC Peering ? 366 | 367 | - It is the network connection between the two VPC's it acts as the private Connection and VPCs can connect to each other as if they are part of the Same Network in the Same Region. 368 | - VPC peering enables you to `connect VPCs belonging to the same AWS account or different AWS accounts`, **`as long as they are in the same region`** 369 | - Overall, the aim of VPC peering is to enable `secure` and `efficient` communication between resources in different VPCs within the `same AWS region`, thereby facilitating the building of complex, multi-tiered architectures and enabling collaboration between different applications and environments. 370 | -------------------------------------------------------------------------------- /Admission Controllers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/Admission Controllers.png -------------------------------------------------------------------------------- /Load balancers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/Load balancers.png -------------------------------------------------------------------------------- /Pause_Containers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/Pause_Containers.png -------------------------------------------------------------------------------- /Private_Subnet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/Private_Subnet.png -------------------------------------------------------------------------------- /Question5_Solution.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/Question5_Solution.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # REFER the Links for the Getting the Insights from the Specific Content 2 | 3 | --- 4 | 5 | ### [Trivia about Containers](containers_kubernetes_trivia.md) 6 | 7 | --- 8 | 9 | ### [Trivia about Docker-Compose](docker_compose_trivia.md) 10 | 11 | --- 12 | 13 | ### [Trivia about Kubernetes](kubernetes_trivia.md) 14 | 15 | --- 16 | 17 | ### [Trivia about Monitoring Kubernetes](monitoring_kuberetes.md) 18 | 19 | --- 20 | 21 | ### [Trivia about Setting Up HPA](setting_up_hpa.md) 22 | 23 | --- 24 | 25 | ### [Trivia about AWS Bare-basics](AWS_Interview_Questions.md) 26 | 27 | --- 28 | 29 | ### [Trivia about K8s Scenarios](kubernetes_Scenario_Based_Questions.md) 30 | -------------------------------------------------------------------------------- /Untitled-2023-10-04-1156.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/Untitled-2023-10-04-1156.png -------------------------------------------------------------------------------- /app1-StatefulSet.yaml.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/app1-StatefulSet.yaml.png -------------------------------------------------------------------------------- /app1.yaml.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/app1.yaml.png -------------------------------------------------------------------------------- /app2-StatefulSet.yaml.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/app2-StatefulSet.yaml.png -------------------------------------------------------------------------------- /app2-deployment.yaml.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/app2-deployment.yaml.png -------------------------------------------------------------------------------- /app3-StatefulSet.yaml.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/app3-StatefulSet.yaml.png -------------------------------------------------------------------------------- /app3-deployment.yaml.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/app3-deployment.yaml.png -------------------------------------------------------------------------------- /containers_kubernetes_trivia.md: -------------------------------------------------------------------------------- 1 | # INTERVIEW QUESTION WITH ANSWERS 2 | 3 | ### What are Containers? 4 | - Containers are kind of advanced VMs 5 | - They are wrapped around the Application + Dependencies + System Binaries 6 | - In the containers are lightweight as they do not contains the whole OS in it. 7 | - It resolves the problem of "It runs on my machine" as it creates and environment for the Application with its required dependencies within. 8 | 9 | 10 | ### What if another person wants to use our application (running on our computer)? 11 | - We would provider them with the "Docker Images" 12 | - Docker Images will have the Application + Dependencies + System Binaries, and Docker Image's running instance is nothing but the Docker Container. 13 | 14 | ### What are the type dependencies which are included in the Containers? 15 | - These are the run-time dependencies that are needed when running the applications 16 | - There are also other types of Dependencies like Build Time Dependencies, and Compile Time Dependencies but Docker does not deal with it. 17 | 18 | ### What is docker damemon (dockerd) ? 19 | - dockerd is responsible for building running and managing the containers 20 | 21 | ### What is a Cluster? 22 | - A cluster is at least compromising of 2 major components they are 23 | - Master Node 24 | - Worker Node 25 | - Cluster can have multiple masters and workers also generally it comprises multiple master nodes; which are in the ratio master: worker set to 1:2. 26 | - For every master node we need to have ideally 2 worker nodes for the high availability [H.A.] 27 | 28 | ### Features of the Containers? 29 | - The containers provide isolation. 30 | - Each container is provided with the help of namespaces and groups [control groups]. 31 | - namespaces => Provides the isolation 32 | - groups => Provides manages the CPU and memory 33 | - In the Docker, all this is internally handled we do not directly need to deal with this in a detailed extent. 34 | 35 | ### What if we have 10000's of Containers? How to manage them? 36 | - It can be managed via the help of Orchestrators. Orchestarctors are the one who keeps the track of each container and maintains them according to our desired state. 37 | - Orchestrator's only work is to manage the runtime of the containers; let it be 100 or 1 million it does matter to it it just needs to manage the runtime of containers. 38 | - To manage the resources of the scaled instances also they can be used / portfolio management of resources in K8's. 39 | - Some very famous orchestartors are Kubernetes[K8's] and Docker Swarm. 40 | 41 | ### What was before K8's then ? 42 | - People used to use the Docker Swarm (technically it is used also now but in less favor) for Orchestration. 43 | - Docker Swarm has some shortcomings like 44 | - Usable for simpler use cases; missing advanced orchestration features like auto-scaling, rolling updates, and support to stateful applications. 45 | - Docker swarm cannot be extended but K8's can with the help of CRD's [Custom Resource Definetions] and Operators 46 | - Docker swarm is losing the community adoption and getting replaced with the K8s. 47 | 48 | 49 | ### What is the granular most level of object in K8's? 50 | - Pod is the one granular level object that we can have control over. 51 | - even every component is boiled down to the k8's object; Which is nothing but the Pods running for the each of Objects. 52 | - It further boils down to Containers; Pods are nothing but a set of containers running. 53 | 54 | - Summarised it as ANY K8's Object ==> PODS ==> Containers 55 | - That is why Folk learning Containers is essential. 56 | 57 | ### Why generally don't go to learn the managed Cluster directly? 58 | - In the managed clusters whole of the master Node is just hidden and we are only given the control of the worker nodes. 59 | -But it is essential to know the workings of the API Server and how it interacts with the other components of the master as well as the worker nodes. 60 | - Some well-known managed services are AWS EKS and Azure AKS. 61 | 62 | ### How to determine that a Node is of type master or worker? 63 | - Master Node(s) should have the components like 64 | - **KUBE API SERVER** (THE MAIN COMPOENENT) 65 | - **ETCD** (KEY: Value Data Store; Stores the metadata of scheduling) 66 | - **Control Managers** (Determines Curr. vs Desired State; deals with components like Dep. ,RS, DS etc.) 67 | - **Scheduler** (Schedules the Pod according to the State of Cluster) 68 | - **Cloud Control Manager**[CCM] (Optional) 69 | 70 | - Worker Node(s) should have the components like 71 | - Application+ Dependencies + Sys Binaries in Docker Images 72 | - Docker Containers made from Docker images 73 | - **Kube Proxy** (Networking Slave of Kube API server) 74 | - **Kubelet** (It is a Slave of the Kube API Server for managing the creating and managing Pods; Runs on each Node) 75 | - Container Network Interface [C.N.I] (Weavenet, Calico, Tiger) 76 | - Conatainer Runtime Interface [C.R.I] (runc, rkt, cri-o) 77 | 78 | ### Where does the external user interact with the Kubernetes? 79 | - External users can only access the K8s via an Interface. 80 | The external user's endpoint would be the KubeAPI Server. In which they can connect but it needs to be with the proper authentication. 81 | 82 | ### What are the different interfaces through which we can connect to the K8's? 83 | - kubectl (Most widely used for the managed services to connect to the Kubernetes Kube API Server) 84 | - Browser 85 | - Golang Binaries 86 | - API's 87 | 88 | ### What is the Architectural flow of the K8's Architecture? 89 | 90 | ### What are the basic components in the VPC created by default? 91 | - Public Subnet 92 | - Private Subnet (Optional) 93 | - Route Table 94 | - Internet Gateway 95 | 96 | ### Benefits of Python 97 | - Does not need to compile any code (It acts as both the High-level language and the low-level language) to interact with Kernel 98 | - Python is an Interpreted language so no need to convert it to the low-level language 99 | - In Java it needs to be converted to .class(acting as the low level language) to interact with Kernal 100 | - The Docker File for the interpreted language will be different from that of the compiled code language. 101 | - Python Docker File will not involve any build step in the Docker File 102 | - In Java, I suppose we need to first have a build (.war,.jar,.ear file) and then use it in the Docker File. 103 | 104 | ### Docker Architecture 105 | 106 | _ To build the Docker File 107 | - **Docker File** will be made with the instructions provided on Docker Client(Docker Client is nothing but the CLI) 108 | - 15-20 Instructions 109 | - Consists of Commands and Arguments 110 | - Each line in the Docker File is considered a Layers 111 | - Docker image is nothing but the bundling of all the Layers in a bundle in the Docker File. 112 | 113 | ## Line-by-Line Roles of Docker File 114 | - GO and Download the OS 115 | - Follow the instructions Provided 116 | - Other Meta Layers 117 | - ENDPOINT - Last Line gives how to start the application (Known as **ENTRYPINT/CMD**) 118 | 119 | ## Flow of Docker File (Internal Details) 120 | - The image created would only be the **Read Only Object** (Nobody should be able to manipulate the image once created) 121 | - When the Docker Image is shared it should only care about the Endpoint as all other above layers are already inculcated(hardcoded) in the image. 122 | 123 | - When the image starts it creates a virtual environment that is called a **Docker Container** 124 | - Docker Container will contain all the context of the Docker File. 125 | - Docker Container will run the Entrypoint to create the files, perform operations, and expose the network according to network configurations. 126 | - **NOTE** : Docker Containers are also **READ ONLY** (Cannot manipulate a Docker Container) ; **If needed those kind of changes only can be done while creating the Docker File.** 127 | - **NOTE** : Read ONLY also known as Ephemeral 128 | 129 | ## Whose responsibility is to write the DockerFile? 130 | - It is the responsibility of the Developer and the Devops Engineer to write the Dockerfile 131 | - They should work in the collabaration (insync) with each other. 132 | - Generating Docker File is a manual effort although there are automation tools for developing the DockerFile(as projects are not mature enough) 133 | - We can ask the chatgpt to identify the loopholes in the Docker Files. 134 | 135 | 136 | 137 | 138 | ## Prerequisite to Docker Architecture 139 | ![Alt text](image.png) 140 | 141 | ## What is Docker Daemon? 142 | - Docker Host actually consists of Docker Daemon and in turn **it manages the container runtime.** 143 | - Docker Daemon can have the Docker Client running(as it's a CLI for Docker). 144 | - All the containers and Images will run in the Docker Daemons. 145 | 146 | - The Docker Client actually runs the commands; the actual flow starts in the daemon; the Client actually connects to the Daemons; all the containers and Images will run in the Docker Daemons. 147 | 148 | - When hitting the docker command it will not run if only the docker client is installed it needs both the docker client and the docker daemon to be installed in it. 149 | 150 | - Docker is not a replacement for anything; It is the concept of containerization 151 | 152 | ## Docker Internals for namespaces and Cgroups 153 | ![Alt text](image-1.png) 154 | 155 | ## TO check the logs of Docker? 156 | - docker logs -f 157 | 158 | ## Some important docker commands? 159 | - **docker container ls / docker container ps** - To show the running docker containers 160 | - **docker containers ls -a / docker container ps -a** - To show all the containers (running and stopped) 161 | - **docker build -t /path/to/the/dockerfile** 162 | - **docker pull nginx** - to pull the nginx image(it will not run this) 163 | - **docker run -p 9000:9000 nginx** -To run the nginx image; mapping the PORT 9000 of the Host(LHS) to the PORT 9000 of the Container (RHS). 164 | - **docker exec -it bash** - To get inside the container and interact with it using the bash shell. 165 | - **docker stop ** to stop the container 166 | - **docker rm ** only after stopping the container we can remove the container 167 | - **docker rmi ** only after removing all the containers associated with the docker image we can delete the docker image 168 | - **docker system prune** - will clean up any resources — images, containers, volumes, and networks — that are dangling (not tagged or associated with a container): 169 | - **docker rmi $(docker images -a -q)** - To remove all the images 170 | - **docker rm $(docker ps -a -q)** - To remove all the containers 171 | - **docker rm $(docker ps -a -f status=exited -q)** - To remove all the containers with the status as exited. 172 | - **docker volume prune** - TO remove all the volumes which are dangling (not assosiated with anybody) 173 | - **ocker rm -v container_name** - To remove a container and its volume 174 | 175 | -------------------------------------------------------------------------------- /docker_compose_trivia.md: -------------------------------------------------------------------------------- 1 | ### Need of Docker Compose ?! 2 | 3 | - In K8s the Pods are wrapped around the containers. 4 | - The Docker Compose the Service is wrapped around the Container. 5 | - ![K8_Compose](image-2.png) 6 | 7 | - In real-time Scenario we will have multiple containers at scale (Replication of the containers for multiple applications; depending on requirement) 8 | - So complexity starts to increase => We need to manage the routing. 9 | - Docker solves using the 2 solutions in Docker 10 | - Docker Compose (Provides structure to Docker; with the help of references of Services) 11 | - Docker Swarm (Competitor to K8) 12 | 13 | ### Docker Compose: 14 | 15 | - Docker Compose will still use the Docker commands, but the Docker Compose Services will be the way to talk to Docker Daemon. 16 | - When you call somebody on the basis of names in the container; we call it as the service. As name remains the same and the Private IP changes after the container gets down and restarts. 17 | 18 | ### Linmitation of Docker 19 | 20 | - Run a container for the python flask app(9000) and nginx(80) and try to connect for both of them as they lay in the same network(bridge) 21 | - Do it using curl ==> 22 | 23 | ``` 24 | # From the nginx container 25 | 26 | curl -v : 27 | ``` 28 | 29 | ![Alt text](image-3.png) 30 | 31 | - But in the above frame there is hard association of IP; when the container restarts we have to manually change / maintain the IP's in it (refactor) 32 | 33 | Refer to demos in session 1 34 | 35 | ### What if we refer via the container_name? [MUST READ] 36 | 37 | - But instead of IP if give the name of the container it will not be able to connect. 38 | 39 | - **Docker does not entertain the concept of Service Discovery** 40 | - Docker understands the concept of Service Inherently; even though it supports it. 41 | - **Docker understands the name, network, and gateway; but it will not entertain the communication over the Services.** 42 | - So refactoring and management becomes a tedious task. 43 | 44 | ``` 45 | curl: (6) Could not resolve host: modest_booth 46 | ``` 47 | 48 | - If we create a new network and add our containers to it; then we can successfully connect the container via the container names. 49 | 50 | ``` 51 | #TO create the network what we have: 52 | docker network create local_network 53 | 54 | #It will be of type bridge by default & scope will be set to local 55 | 56 | 57 | ``` 58 | 59 | - It means the communication from one container to another should not stop at any level now. (as we have scope=local) 60 | - But as **_Concept of Service is not inherent to Docker we need to explicitly need to create the network; then Docker realizes and allows us the DNS mapping (Name to IP Mapping)_** 61 | 62 | - Again restart the containers with the name and ports; Add the network (local_network that we created just now) 63 | 64 | ``` 65 | docker run --network local_network --name nginx -d -p 80:80 nginx 66 | 67 | docker run --network local_network --name flask -p 9000:9000 myflaskapp 68 | ``` 69 | 70 | - After doing this and executing the script in the nginx we are able to connect via the container names. 71 | 72 | ![Alt text](image-4.png) 73 | 74 | ### What if we need to have the network restrictions in place for the containers in the same network. 75 | 76 | - eg> Only the backend should talk to the DB but if the UI wants to connect directly to DB it will restrict it. 77 | - For this we need to manage multiple layers of Networks to restrict it. i.e. Concept Of Ingress and Egress comes into Play here **(Concepts of Kubernetes)** 78 | - The service filtering comes into play here and it's covered in Docker Compose 79 | - Docker does not provide the concept of **Load Balancing.** 80 | - So we need to manually manage the entries of applications (their gateway and the IP-addresses) 81 | 82 | ### How will the UI container decide where to go to i.e. in app 0, app 1, or app 2 container? 83 | 84 | - There needs to be a controller suggesting where to go due to some reasons. 85 | - Docker there is no way that it will give logic out of the box for suggesting the containers where to connect to the next container. 86 | 87 | ### Why Docker to DockerCompose then? 88 | 89 | - Docker works best in isolation. 90 | - Every application needs to be containerized but every container cannot be deployed into the production because of the flaws of Docker discussed above. 91 | - 92 | 93 | ### Install Docker Compose 94 | 95 | Refer to Demos in session 2; part 1 96 | 97 | ``` 98 | sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose 99 | 100 | 101 | sudo chmod +x /usr/local/bin/docker-compose 102 | 103 | cat docker-compose.yml 104 | 105 | docker-compose up -d 106 | docker-compose ps 107 | docker-compose up -d --scale =2 108 | 109 | #part-2 110 | # understand how one has to create a self managed load balancer in docker 111 | docker-compose build 112 | docker-compose up 113 | 114 | ##clean up 115 | docker rm -vf $(docker ps -aq) 116 | docker rmi -f $(docker images -aq) 117 | 118 | ``` 119 | 120 | In docker-compose.yaml 121 | 122 | ``` 123 | version: '3.2' 124 | networks: 125 | mynetwork: #network name 126 | driver: bridge # type network 127 | services: #warpper on top[ of container] 128 | base: # Containers named base 129 | build: 130 | context: . #current folder 131 | dockerfile: Dockerfile.base 132 | ports: 133 | - "9002:9000" 134 | image: myapp:baseImage 135 | networks: 136 | - mynetwork 137 | nginx: #Container named nginx 138 | build: ./nginx 139 | ports: 140 | - "8080:80" 141 | networks: 142 | - mynetwork 143 | 144 | ``` 145 | 146 | Then run the docker compose build 147 | 148 | ``` 149 | docker compose build # acts to build the files 150 | 151 | docker compose up # it will up the services (run the containers) 152 | 153 | docker compose ps # will give the services running via the docker compose 154 | ``` 155 | 156 | ![Alt text](image-5.png) 157 | 158 | - Now here if I want to build the image I do not need to build them separately. 159 | 160 | - No need to Update them seprately also 161 | ![Alt text](image-6.png) 162 | 163 | - By default it attaches a name; so we do not need to provide it; 164 | - **_Important:_** Name of the container is the name of the service 165 | 166 | - It acts like minfied version of K8's 167 | 168 | To stop docker compose 169 | 170 | ``` 171 | docker compose stop 172 | ``` 173 | 174 | ![Alt text](image-7.png) 175 | 176 | - Another proble with the Docker we had it was with the saling 177 | 178 | - So we can scale easily in the Docker Compose like 179 | 180 | ``` 181 | docker-compose up -d --scale =2 182 | ``` 183 | 184 | Let us use the scale to 4 containers with the service "base" as in our above example. It will look like 185 | 186 | ``` 187 | docker-compose up -d --scale base=4 188 | ``` 189 | 190 | ![Alt text](image-8.png) 191 | 192 | - It will not run as the PORTS will get occupied by the first instance of base; when scaled it will also try to run on the same PORT but as we know the port once allocated cannot be occupied by other scaled instances. 193 | 194 | ``` 195 | # Stop docker-compose 196 | 197 | docker-compose stop 198 | ``` 199 | 200 | We need to make changes in the docker compose.yml file 201 | 202 | ``` 203 | version: '3.2' 204 | networks: 205 | mynetwork: #network name 206 | driver: bridge # type network 207 | services: #warpper on top[ of container] 208 | base: # Containers named base 209 | build: 210 | context: . #current folder 211 | dockerfile: Dockerfile.base 212 | ports: 213 | - "9002:9000" 214 | image: myapp:baseImage 215 | networks: 216 | - mynetwork 217 | nginx: #Container named nginx 218 | build: ./nginx 219 | ports: 220 | - "8080:80" 221 | networks: 222 | - mynetwork 223 | 224 | 225 | Will be chnaged to 226 | 227 | version: '3.2' 228 | networks: 229 | mynetwork: #network name 230 | driver: bridge # type network 231 | services: #warpper on top[ of container] 232 | base: # Containers named base 233 | build: 234 | context: . #current folder 235 | dockerfile: Dockerfile.base 236 | ports: 237 | - "9000" # remove the local machines port and try to assign it any random port it will only be 9000 then(port of container) 238 | image: myapp:baseImage 239 | networks: 240 | - mynetwork 241 | nginx: #Container named nginx 242 | build: ./nginx 243 | ports: 244 | - "8080:80" 245 | networks: 246 | - mynetwork 247 | ``` 248 | 249 | Now again try the scale command we will get it as follows 250 | 251 | ``` 252 | docker-compose up -d --scale base=4 253 | ``` 254 | 255 | ![Alt text](image-9.png) 256 | It gets it running 257 | 258 | ``` 259 | docker compose ps 260 | ``` 261 | 262 | ![Alt text](image-10.png) 263 | 264 | - Resolves scaling Problem 265 | 266 | ### But here PORT numbers for the containers are given randomly; so for every docker compose restart our ports will change so it will create havoc on Network Managers. 267 | 268 | - Workaround for that will be just like 269 | 270 | ``` 271 | docker-compose down 272 | ``` 273 | 274 | - Refer the session 2 part 2 275 | 276 | - In the docker-compose.yml file it will contain the 277 | 278 | ``` 279 | version: '3.2' 280 | networks: 281 | mynetwork: 282 | driver: bridge 283 | services: 284 | base: 285 | build: 286 | context: . 287 | dockerfile: Dockerfile.base 288 | ports: 289 | - "9002:9000" # HARD attaching the ports here 290 | image: myapp:baseImage 291 | networks: 292 | - mynetwork 293 | multistaged: 294 | build: 295 | context: . 296 | dockerfile: Dockerfile 297 | ports: 298 | - "9001:9000" # HARD attaching the ports here 299 | image: myapp:productionImage 300 | networks: 301 | - mynetwork 302 | multistaged-replica: 303 | build: 304 | context: . 305 | dockerfile: Dockerfile 306 | ports: 307 | - "9000:9000" # HARD attaching the ports here 308 | image: myapp:productionImage 309 | networks: 310 | - mynetwork 311 | nginx: 312 | build: ./nginx 313 | ports: 314 | - "8080:80" 315 | networks: 316 | - mynetwork 317 | ~ 318 | ``` 319 | 320 | In this context of Docker-compose.yml file we have the HARD scaling enabled and **BUT no Auto-scaling is available as we need to remap it at the service.** 321 | 322 | ``` 323 | # Then run 324 | docker compose build 325 | docker compose up 326 | ``` 327 | 328 | ![Alt text](image-11.png) 329 | 330 | ``` 331 | docker compose ps 332 | ``` 333 | 334 | ![Alt text](image-12.png) 335 | 336 | - Now if we change the order for the base service it should identify the production and the base version. 337 | 338 | ``` 339 | localhost:9000 #PRoduction Version 340 | localhost:9002 #Base Version 341 | ``` 342 | 343 | ## Production Version 344 | 345 | ![Alt text](image-13.png) 346 | 347 | ## Base Version 348 | 349 | ![Alt text](image-14.png) 350 | 351 | ## But how will Docker Compose manage the Load Balancing? 352 | 353 | - We need to create our Load Balancer from scratch. 354 | - For that we have configured our own nginx load balancer and have made changes in the nginx.conf file (By removing the default.conf from the original nginx LB) 355 | - Referring the docker file it looks like 356 | 357 | ``` 358 | FROM nginx 359 | RUN rm /etc/nginx/conf.d/default.conf 360 | COPY nginx.conf /etc/nginx/conf.d/default.conf 361 | 362 | ``` 363 | 364 | - In the nginx.conf file it will contain 365 | 366 | ``` 367 | upstream loadbalancer { 368 | 369 | server 172.17.0.1:9002 weight=5; 370 | server 172.17.0.1:9001 weight=5; 371 | 372 | # server 0.0.0.0:56360 weight=4; 373 | # server 0.0.0.0:56361 weight=3; 374 | # server 0.0.0.0:56364 weight=3; 375 | 376 | } 377 | 378 | server { 379 | location / { 380 | proxy_pass http://loadbalancer; 381 | } 382 | } 383 | 384 | ``` 385 | 386 | - It actually distributes the load with the load across the servers according to weights. 387 | - We should manage the responsibility of nginx.conf to make the changes while upscaling or downscaling the containers. 388 | - But it will be difficult to scale in this scenario for dynamically scaling. 389 | - This all will be covered with the help of Kubernetes Dynamic Scaling. 390 | 391 | ## In Docker Compose; if our container goes down it will not automatically start the container 392 | 393 | - We need to manually do it on our own. 394 | 395 | - Docker Compose is the builder of the image, and Kubernetes is the Deployer of the Image. 396 | 397 | ```` 398 | ### Important Commands to remember 399 | - If the ports are occupied and want to free the Port we can use 400 | ``` 401 | sudo lsof -i -P -n | grep 402 | kill 403 | ``` 404 | 405 | ## In production we should not have an image with Privileged Access; it should pop an error when you try to change it! 406 | 407 | ## We need to maintain both images; we need both the base (unrestricted) image and the production(staged Filtered image) new image to debug here. 408 | 409 | - We are going to deploy only the stage and filtered image in the production but to debug scenarios we will need to have the BASE (Unrestricted image) 410 | 411 | ## Q. Why Prefer Multistage Builds? 412 | - The security it provides with the image is top-notch 413 | The scaling it provides is great 414 | ```` 415 | 416 | ![Alt text](image-16.png) 417 | ![Alt text](image-15.png) 418 | -------------------------------------------------------------------------------- /image-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-1.png -------------------------------------------------------------------------------- /image-10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-10.png -------------------------------------------------------------------------------- /image-11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-11.png -------------------------------------------------------------------------------- /image-12.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-12.png -------------------------------------------------------------------------------- /image-13.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-13.png -------------------------------------------------------------------------------- /image-14.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-14.png -------------------------------------------------------------------------------- /image-15.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-15.png -------------------------------------------------------------------------------- /image-16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-16.png -------------------------------------------------------------------------------- /image-17.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-17.png -------------------------------------------------------------------------------- /image-18.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-18.png -------------------------------------------------------------------------------- /image-19.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-19.png -------------------------------------------------------------------------------- /image-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-2.png -------------------------------------------------------------------------------- /image-20.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-20.png -------------------------------------------------------------------------------- /image-21.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-21.png -------------------------------------------------------------------------------- /image-22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-22.png -------------------------------------------------------------------------------- /image-23.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-23.png -------------------------------------------------------------------------------- /image-24.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-24.png -------------------------------------------------------------------------------- /image-25.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-25.png -------------------------------------------------------------------------------- /image-26.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-26.png -------------------------------------------------------------------------------- /image-27.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-27.png -------------------------------------------------------------------------------- /image-28.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-28.png -------------------------------------------------------------------------------- /image-29.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-29.png -------------------------------------------------------------------------------- /image-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-3.png -------------------------------------------------------------------------------- /image-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-4.png -------------------------------------------------------------------------------- /image-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-5.png -------------------------------------------------------------------------------- /image-6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-6.png -------------------------------------------------------------------------------- /image-7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-7.png -------------------------------------------------------------------------------- /image-8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-8.png -------------------------------------------------------------------------------- /image-9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image-9.png -------------------------------------------------------------------------------- /image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/image.png -------------------------------------------------------------------------------- /initContainers-snippet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/initContainers-snippet.png -------------------------------------------------------------------------------- /kubernetes_Scenario_Based_Questions.md: -------------------------------------------------------------------------------- 1 | ### This are some of the Questions which were Posted by `Syed Nadeem` from his Series `Devops 101` 2 | 3 | --- 4 | 5 | ### Q1: Design a Kubernetes deployment setup where you have 3 apps that need to run in HA to create a solution. However, the apps have starting and running dependencies and requirements. 6 | 7 | a. app1 should always start first; 8 | b. app2 should always start after app1 is Ready; 9 | c. app3 should always start after app2 is Ready 10 | 11 | Follow up: 12 | 13 | 1. How will you design a check mechanism for the applications in such a way that restarts and schedular reschedules dont break the dependence requirements 14 | 2. How will you plan to incorporate such check mechanism with CD tools where deployments are gitops driven 15 | 16 | ### MY take on the Solution: 17 | 18 | ![alt text](Question5_Solution.png) 19 | 20 | #### Approach 1 21 | 22 | ##### For Part A> Application 1 should always start first 23 | 24 | - For each App ==> Have Different "Deployment" configuration 25 | 26 | - Need to make use of the K8's "Deployment" Object with `"Probes"` and `"InitContainers"` 27 | 28 | - `Probes` => Used to check the status of the containers (Application) running inside the PODs 29 | 30 | - `InitContainers` ==> They are the Containers that run before our main application gets started; [InitContainers generally get completed when our "main" application is in the "Pending Phase"] 31 | ![alt text](app1.yaml.png) 32 | - For this, we will make use of the "Readiness probe" to check if the App inside the container is ready to serve the traffic received via Load balancers and Services 33 | 34 | - Checks for health at path "/health" if not "healthy" then after the "initialDelay"; try to monitor the health after every "periodSeconds" defined. 35 | 36 | ##### For Part B> Application 2 should always start after Application 1 is Ready 37 | 38 | - We will make use of the `"Readiness Probe"` and `"init containers"` 39 | 40 | - NOTE: But before that, we need to ensure that we have created the "Services" beforehand which will be required by the running Applications 41 | 42 | - Services with appropriate configurations like PORTS, and SELECTORS and ensuring they are working as desired. 43 | 44 | - In the "Init Containers," we will use the Sandbox Image like "Busybox" to check if the service for App1 exists or not 45 | 46 | - NOTE: Init Container will add the internal Dependency in-between the Service for the App1 and App2 47 | 48 | - Init Container could create a dependency using commands like "nslookup" OR "ping" until service-1 is reachable; sleep else could execute. 49 | 50 | ![alt text](initContainers-snippet.png) 51 | 52 | - For app2 deployment file 53 | ![alt text](app2-deployment.yaml.png) 54 | 55 | ##### For Part C> Application 3 should always start after Application 2 is Ready 56 | 57 | - We can plan it similarly as we have done for App 2 which would Span out as follows 58 | ![alt text](app3-deployment.yaml.png) 59 | 60 | #### Approach 2: Using Stateful Sets 61 | 62 | - We can make use of Stateful Sets as they maintain a certain order of execution of PODS. 63 | 64 | - StatefulSets ensures that pods are deployed and scaled in a predictable and ordered manner. Pods are created sequentially, starting with ordinal index 0, which can be crucial for applications with dependencies between instances or for initialization processes that need to run in a specific order. 65 | 66 | - StatefulSets provides features for managing the lifecycle of stateful pods, including pod identity, rolling updates, and graceful termination. These features are designed to minimize disruption to stateful applications during updates or maintenance operations. 67 | 68 | - We will make use of the stateful state as follows 69 | 70 | - For application 1 Stateful Sets 71 | ![alt text](app1-StatefulSet.yaml.png) 72 | - For Application 2 Stateful Sets 73 | ![alt text](app2-StatefulSet.yaml.png) 74 | - For Application 3: Stateful Sets 75 | ![ ](app3-StatefulSet.yaml.png) 76 | 77 | #### DISADVANTAGES of using Deployments 78 | 79 | - Complex to maintain the order of Execution 80 | 81 | - Requires lots of Configuration 82 | 83 | - Probes will consume lots of resources 84 | 85 | #### For follow-up question 86 | 87 | ##### Q. How will you plan to incorporate such a check mechanism with CD tools where deployments are GitOps-driven 88 | 89 | - Maintaining via GIT-OPS TOOLS; 90 | 91 | - Consider using a different repository for storing all the "Manifest Files"; so that it acts as a single source of truth and using a reconciler mechanism that monitors the repository continuously 92 | 93 | - Using the FLUX-CD we could monitor the changes in the repository using "Reconciler"(Kustomize Controller) 94 | 95 | ## SOME IMPORTANT KUBERNETES COMMANDS 96 | 97 | ### How to check health of Kubernetes API Endpoint? 98 | 99 | ``` 100 | kubectl get --raw='readyz?verbose' 101 | 102 | or 103 | 104 | kubectl get --raw='heakthz?verbose' 105 | 106 | ``` 107 | 108 | ### How to verify service account permissions? 109 | 110 | - One of the most common scenarios comes up especially when we are checking the permissions of the certain service accounts 111 | - It cab be done using the 112 | 113 | ``` 114 | kubectl auth can-i create pods --all-namespaces 115 | 116 | OR 117 | 118 | # can we read the pods logs from it 119 | kubectl auth can-i get pods --subresource=log 120 | ``` 121 | 122 | - It will reply with either `YES` or `NO`; access level of service account is really helpful in the case of the maintining the least privilaged principles. 123 | 124 | - If the nessasary services are not added then the `role` can be modified which is bounded to the service account. 125 | 126 | ### How to get the skeleton of YAML for the various K8's native objects ? 127 | 128 | - Using the flag `--dry-run=client` => Such that it wont get created in the cluster 129 | 130 | ``` 131 | kubectl run nginx-pod --image=nginx -o yaml --dry-run=client > nginx-pod.yaml 132 | 133 | OR 134 | 135 | kubectl create pod nginx-pod --image=nginx --dry-run=client -o yaml 136 | ``` 137 | 138 | ### Are there any alternatives for ETCD in Kubernetes ? 139 | 140 | - `YES there are`, ETCD is used for Data Persistence of the K8 Cluster. 141 | - We can ue the `MYSQL`, `SQLite` and `Postgres` as a replacement for data Persistence in K8s. 142 | - In fact the `K3's uses it using KINE` 143 | - Kine (Acts as ETCD SHIM) converts the `ETCD API` to `MYSQL, SQLite, Postgres, SQLite SQL Queries` 144 | ![alt text](image-29.png) 145 | 146 | ### Can we list K8 API Resources? 147 | 148 | - `YES`; Everything in the API can be accessed via the API's; kubectl internally uses the API Commands that are converted from our `HIGH LEVEL inputs` eg (kubectl get pods). 149 | - API Server provides the API Endpoints to manage the resources such as Pods, Namespaces, ConfigMaps, ReplicaSets etc. 150 | - These object-specific endpoints are called 𝗔𝗣𝗜 𝗿𝗲𝘀𝗼𝘂𝗿𝗰𝗲𝘀 𝗼𝗿 𝗿𝗲𝘀𝗼𝘂𝗿𝗰𝗲𝘀. 151 | 152 | - For example, the API endpoint used to create a pod is referred to as a 𝗣𝗼𝗱 𝗿𝗲𝘀𝗼𝘂𝗿𝗰𝗲. 153 | 154 | - In simpler terms, a resource is a 𝘀𝗽𝗲𝗰𝗶𝗳𝗶𝗰 𝗔𝗣𝗜 𝗨𝗥𝗟 used to access an object, and they can be accessed through HTTP verbs such as GET, POST, and DELETE. 155 | 156 | ``` 157 | kubectl api-resources --verbs=list --no-headers | awk '{print $1}' 158 | ``` 159 | 160 | ### Can we add the custom columns in the K8s? 161 | 162 | - `YES` we can have the custom column name in the K8s; it acts as `VIEWS` in MYSQl (Just for the ease of the Human readable format). 163 | - Eg > command that gives custom output showing the pod name and CPU & memory requests. 164 | 165 | ``` 166 | kubectl get pod -o custom-columns='POD NAME:.metadata.name,CPU 167 | 168 | O/p Will be as 169 | 170 | POD NAME CPU REQUEST MEMORY REQUEST 171 | 172 | frontend-prod-v1 500m 512Mi 173 | backend-prod-v2 1 1Gi 174 | database-prod 2 4Gi 175 | cache-prod 250m 128Mi 176 | analytics-worker-prod 200m 256Mi 177 | ``` 178 | 179 | - Eg> lists the pod name and volumes used by a pod. 180 | 181 | ``` 182 | kubectl get pod -o custom-columns='POD NAME:.metadata.name, VOLUMES:.spec.volumes[*].name' 183 | 184 | O/P will be as follows 185 | 186 | POD NAME VOLUMES 187 | multi-container-pod nginx-logs,kube-api-access-56rhl 188 | web-app-01 nginx-logs-1,kube-api-access-8nkwn 189 | web-app-02 nginx-logs-2,kube-api-access-68hgd 190 | web-app-04 nginx-logs-2,kube-api-access-5d6xh 191 | ``` 192 | 193 | ### Misceallaneous HELP 194 | 195 | - **`JSON Objects to Graph Diagrams`** 196 | 197 | ``` 198 | Github Link: https://github.com/AykutSarac/jsoncrack.com 199 | Online at : jsoncrack.com 200 | ``` 201 | 202 | - **`YQ: Handy CLI to Parse YAML`** yq is a lightweight command-line utility to parse YAML, also works with the JSON, XML, LUA, Shell output, and properties files. 203 | - Written in Go; Perfect for those times when you need to tweak files on the fly. 204 | - eg> Pair yq with kustomize to patch values directly into your YAML files. 205 | - Lets try to do a task such as read an image name from an environment variable and add it to the YAML. 206 | 207 | - POD.yaml will be containing 208 | 209 | ``` 210 | apiVersion: v1 211 | kind: Pod 212 | metadata: 213 | name: my-nginx-pod 214 | labels: 215 | app: nginx 216 | spec: 217 | containers: 218 | - name: nginx 219 | image: nginx:1.14.2 220 | ports: 221 | - containerPort: 80 222 | ``` 223 | 224 | - To get the apiVersion 225 | 226 | ``` 227 | yq eval '.apiVersion' pod.yaml 228 | ``` 229 | 230 | - TO change the image name 231 | 232 | ``` 233 | yq eval -i '.spec.containers[0].image = "nginx:1.18"' pod.yaml 234 | ``` 235 | 236 | - **`How to get Pulic IP of Server`** 237 | - Generally do not consider to use any 3rd party tool for getting this not a standard practice; so we can use our native commands like as follows 238 | 239 | ``` 240 | curl https://checkip.amazonaws.com 241 | 242 | OR 243 | 244 | curl -s https://api.ipify.org -w "\n""" 245 | 246 | OR 247 | 248 | curl https://icanhazip.com 249 | 250 | OR 251 | 252 | curl https://ipinfo.io/ip && echo "" 253 | 254 | OR 255 | 256 | # FOR AWS MACHINES (Enable SMTP for geting the ping from google.com in the inbound Rules) 257 | curl http://169.254.169.254/latest/meta-data/public-ipv4 && echo "" 258 | 259 | ``` 260 | 261 | - `List namespaces in the linux?` => namespaces are used to create the isolation among the processes 262 | 263 | ``` 264 | lsns 265 | ``` 266 | 267 | - **`Check the Shell Script using the "ShellCheck" `**: Its a a powerful static analysis tool for bash/sh shell scripts; provides warnings and suggestions to help you identify and fix potential issues. 268 | 269 | - It can be integrated with CI pipelines as linter, ensuring that you shell scripts are bug-free even in the `GitHub Actions` 270 | 271 | - **`Utility for Monitoring Data Transfer PIPE UTILITY`**: While performing the data migration for DB Pipe Viewer (pv) utility. 272 | - Pipe viewer, also known as pv, is a terminal-based tool that can be used to monitor the progress of data transfer. 273 | 274 | ``` 275 | pv backup.sql | mysql -h rds.amazonaws.com -u bibinwilson -p my_db 276 | 277 | # It gives out features like 278 | 279 | Features 280 | - Visual progress bar 281 | – Estimated time remaining 282 | – Latency 283 | ``` 284 | 285 | - **`SSHLog: Monitor SSH Activity`** : tool to monitor SSH Activity. 286 | - SSHLog is an eBPF-based tool written in C++ and Python that passively monitors OpenSSH servers. 287 | - eBPF is a technology that allows programs to be run securely in the Linux kernel using sandboxed programs. 288 | 289 | - **`How Does SSHLog Agent Work?`** 290 | - SSHLog uses eBPF to monitor the following events: 291 | 292 | ``` 293 | SSH connections 294 | SSH commands 295 | SSH output 296 | ``` 297 | 298 | - These events are tracked using syscalls like connect, execve, read etc. When an SSH event occurs, SSHLog records the event in a log file. 299 | --- 300 | (Scenario Credits - Mayank Jadhav) 301 | I recently noticed that our K8s cluster contains several unused PVCs. They were around "194" when I listed them, and I have to delete those. 302 | 303 | I noticed that the PVC that needed to be deleted included the prefix "pvc-COUNTRY_ISO," in it's name like "pvc-fr," "pvc-in," "pvc-hk," "pvc-cn," and "pvc-gb." 304 | 305 | To remove those PVCs, I used the kubectl cli and a few Linux tools like grep and awk. 306 | 307 | This is the command I executed: 308 | ``` 309 | kubectl get pvc -n NAMESPACE_NAME | grep -E '^pv-gb|^pv-fr|^pv-cn|^pv-in' | awk '{print $1}' | xargs kubectl delete pvc -n NAMESPACE_NAME 310 | ``` 311 | 312 | If I breakdown the command then - 313 | 1. `kubectl get pvc -n NAMESPACE_NAME` -> Will list all the PVC's of the particular namespace which includes information such as PVC name, status, capacity, and associated persistent volume (PV). 314 | 315 | 2. `grep -E '^pv-gb|^pv-fr|^pv-cn|^pv-in'` -> Filters the output to only include PVCs whose names start with "pv-gb", "pv-fr", "pv-cn", or "pv-in". 316 | 317 | 3. `awk '{print $1}'` -> Extracts only the first field (PVC name) from each filtered line and prints it. 318 | 319 | 4.` xargs kubectl delete pvc -n NAMESPACE_NAME` -> Takes the list of PVC names from the previous step as input and deletes the specified PVC within the given namespace. 320 | 321 | You can alter and utilise this command if you need to perform a similar type of task for pods, PVs, secrets, configmaps, cronjobs, etc. 322 | -------------------------------------------------------------------------------- /kubernetes_trivia.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Trivia 2 | 3 | ## WHy go with K8 rather than Docker Compose ? 4 | 5 | - In Docker Compose; if our container goes down it will not automatically start the container 6 | - We need to configure our own load balancer mechanism in the compose whereas it is handled very easily in the Kubernetes Services of Load Balancer. 7 | - Docker Compose we need to manage the scaling manually but in Kubernetes, we can manage it by using the controllers(replicaset) for that matter. 8 | - DNS mapping not available by default in Compose 9 | - K8 easily manages the Current vs Desired state with the help of Controllers that were not featured in the Compose. 10 | 11 | ## K8's basics 12 | 13 | - This is how our application lies 14 | ![K8 Flow](image-17.png) 15 | 16 | - The main purpose of `K8 is to expose our PODS to the external workladd with best security practices possible` 17 | - The granular control of K8's comes from the POD (smallest level that we can control in K8s) 18 | - Pod can contain one or multiple containers. [Preferred is one container in one pod.] 19 | - [NOTE] Pod if goes down cannot restart on its own it needs the help of controllers to get them restarted. 20 | - `Controllers` are the ones who take the responsibility of maintaining the POD's. 21 | - Whenever the external user wants to connect to our Pod it can connect via the K8's services like 22 | - a> ClusterIp( Default; For internal Communication of Pods), 23 | - b> NodePort (Preferred for Communication within the same VPC). 24 | - c> LoadBalancer (Depends on the Cloud Provider for the External Communication) 25 | - d> Ingress (Most widely used in the Industry; it maintains a route table inside it for the path based routing) 26 | - e> External Name (Newly introduced not widely used) 27 | 28 | ## Some commands 29 | 30 | ``` 31 | alias k="kubectl" # alias k as kubectl 32 | 33 | k get pods / k get po # get pods 34 | 35 | k desrcibe po # To get more details about the PODS 36 | 37 | k edit po # To edit the details of the PODS 38 | 39 | # Create the pod named as "nginx-pod" with the image as "nginx" getting our o/p in yaml format and dry running our code (it will not create the pod it will show what will be the content of the pod if it gets created) 40 | 41 | k run nginx-pod --image=nginx -o yaml --dry-run=client 42 | 43 | k create pod nginx-pod --image=nginx -o yamkl --dry-run=client 44 | 45 | # To get all the api-resources (from here we can retrieve the short names) 46 | k api-resources 47 | 48 | #It gives us IP important in the context of Kubernetes 49 | k get po - o wide 50 | 51 | ``` 52 | 53 | --- 54 | 55 | ### Status of POD 56 | 57 | - Pending - (The pod has been accepted by the Kubernetes system, but one or more of its containers are not yet running.) 58 | - Running - (All containers in the pod are up and running.) 59 | - Succeeded - (All containers in the pod have successfully terminated, and won't be restarted.) 60 | - Failed - ( All containers in the pod have terminated, and at least one container has terminated in failure.) 61 | - Unknown - (The state of the pod could not be determined.) 62 | 63 | --- 64 | 65 | ### What is Probe in context to K8? 66 | 67 | - Probes are mechanisms to check the status and health of the container within the POD 68 | - Used for improving the reliability and resilience of the application. 69 | - Probes are generally identified into two types they are 70 | 71 | - `Liveliness Probe` - 72 | 73 | - **Purpose**: This is to check the application within the container is alive and healthy also checks if it is functioning as expected. 74 | - **Working**: Kubernetes actually periodically executes the commands or HTTP requests defined in the POD Specification. If the probe succeeds then the container is considered healthy if not (the application is considered in a bad state). Kubernetes restarts the container in an attempt to recover it 75 | 76 | ``` 77 | livenessProbe: 78 | httpGet: 79 | path: /healthz 80 | port: 8080 81 | initialDelaySeconds: 3 82 | periodSeconds: 3 83 | 84 | ``` 85 | 86 | - `Readiness Probe` - 87 | 88 | - **Purpose**: The readiness probe is used to determine if a container is ready to accept incoming network traffic. It signals whether the container is in a state to serve requests. 89 | - **Working**: Similar to the liveness probe, the readiness probe executes a command or HTTP request at specified intervals. If the probe succeeds, the container is considered ready to receive traffic. If it fails, the container is marked as not ready, and it is removed from the service's load balancer until it becomes ready again. 90 | 91 | ``` 92 | readinessProbe: 93 | httpGet: 94 | path: /ready 95 | port: 8080 96 | initialDelaySeconds: 5 97 | periodSeconds: 5 98 | ``` 99 | 100 | ![Alt text](image-18.png) 101 | 102 | - 103 | 104 | ### Why are some resources in the API Version set as v1 and apps/v1 ? 105 | 106 | - The use of different API versions often signifies the evolution and changes in the Kubernetes API. When Kubernetes introduces new features, it may include those features in a new API version to maintain backward compatibility with existing resources. 107 | - The API version indicates the format of the resource definition and the set of fields it supports. 108 | - `apps/v1` - It represents especially those related to certain higher-level controllers and objects, might use a different API version 109 | - `v1` - v1 API version often refers to core Kubernetes resources, such as Pods, Services, and ConfigMaps. 110 | - It's important to note that Kubernetes continues to evolve, and new API versions may be introduced to accommodate changes and improvements in the platform. 111 | 112 | --- 113 | 114 | ### What are services in K8? 115 | 116 | - Service is a way that provides an endpoint of PODs. 117 | - Service provides the `service discovery` to the PODS 118 | - The Pods can be easily identified with the help of selector `labels` in the PODs which helps in grouping the PODs. 119 | - `Service Discovery` - identify the PODs without having to remember its IP with the concept of Namespace 120 | - eg. service.namespace.cluster.svc.local becomes like the Doman name. 121 | - Even if pods restart then it can easily identify the pods with the selector labels and URL in this format `service.namespace.cluster.svc.local` 122 | - Services are classified into multiple types 123 | - a> ClusterIp( Default; For internal Communication of Pods), 124 | - b> NodePort (Preferred for Communication within the same VPC). 125 | - c> LoadBalancer (Depends on the Cloud Provider for the External Communication) 126 | - d> Ingress (Most widely used in the Industry; it maintains a route table inside it for the path based routing) 127 | - e> External Name (Newly introduced not widely used) 128 | 129 | --- 130 | 131 | ### What are the Endpoints in K8s? 132 | 133 | - Endpoints associate POD's to the services. 134 | 135 | - Pods have IP addresses 136 | - Service has an Ip address and a PORT 137 | - Also service gets and URL like service_name.namespace.cluster.service.local. This service_name and namespace will keep on changing. 138 | - eg> my-app.default.cluster.svc.local 139 | 140 | - If a pod goes down then it should be updated in Loadbalancer; so LB should not route the traffic to the pod which is down. Who would manage that? (Service does not do that) 141 | - Service jobs is to identify what traffic is coming in and what endpoints need to mention. 142 | 143 | - Somebody in the middle is responsible for a one-to-one mapping of the list of endpoints; This component is called `Endpoints`. 144 | - It is similar to nginx.config folder. 145 | - **[NOTE]** We do not need to explicitly create the Endpoint (auto-created) as we have done for service and deployments. 146 | - Whenever a new pod is added the entry will be added to the table and if pod gets down then the entry in the table also gets deleted. 147 | ![Alt text](image-19.png) 148 | 149 | - As a user, we will hit the service on the service URL which internally translates to the IP Address of the service (Service Discovery) and looks for Endpoints. 150 | 151 | - Service Endpoint also acts as a Controller. 152 | - Endpoints are system-created as opposed to services, deployments, etc. 153 | - **[NOTE]** Pod mapping is done with the help of Endpoint and not the Service. 154 | 155 | ``` 156 | # To get the list of Endpoints 157 | kubectl get endpoints 158 | 159 | kubectl get ep 160 | ``` 161 | 162 | --- 163 | 164 | ### DNS vs Service Discovery 165 | 166 | | Feature | DNS (Domain Name System) | Service Discovery | 167 | | ------------------ | ------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- | 168 | | **Purpose** | Translates domain names to IP addresses and vice versa. | Allows dynamic discovery of services in a network. | 169 | | **Functionality** | Resolves domain names to IP addresses. Provides a decentralized naming system. | Enables services to register their availability and discover the location of other services dynamically. | 170 | | **Use Cases** | Mainly used for host name resolution on the internet. | Crucial in distributed systems, microservices architectures, and dynamic computing environments. | 171 | | **Dynamic Nature** | Typically static mappings that change infrequently. | Dynamic registration and discovery of services to accommodate changes in the network or system. | 172 | | **Communication** | Provides a way for humans to access resources using human-readable names. | Facilitates communication between services within a network, allowing them to discover and interact with each other. | 173 | | **Examples** | Translates www.example.com to an IP address. | Allows a service to discover the location (IP address and port) of another service in a microservices architecture. | 174 | | **Dependency** | Often used as the initial step in service discovery to resolve initial hostnames to IP addresses. | Relies on DNS for name resolution in some cases. | 175 | 176 | --- 177 | 178 | ### Endpoints vs Services 179 | 180 | | Feature | Endpoints (Kubernetes) | Services (Kubernetes) | 181 | | --------------------- | -------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- | 182 | | **Definition** | A set of backend Pods exposed through a Service. | An abstraction that defines a logical set of Pods and a policy by which to access them. | 183 | | **Scope** | Represents individual Pods providing the actual application functionality. | Represents a higher-level abstraction that groups multiple Pods and provides a stable network endpoint. | 184 | | **Functionality** | Directly corresponds to backend application functionality, often specific to a microservice. | Acts as a stable virtual IP and port that can route traffic to one or more Pods. | 185 | | **Granularity** | Fine-grained, dealing with individual Pod instances. | Coarse-grained, dealing with a group of related Pods. | 186 | | **Usage** | Provides a direct interface to the functionality of a single Pod or set of similar Pods. | Offers a single entry point for accessing multiple Pods, providing load balancing and service discovery. | 187 | | **Example** | `my-app-pod-1`, `my-app-pod-2` | `my-app-service` | 188 | | **Dependency** | Associated with a specific workload, typically a Pod or set of similar Pods. | Represents a higher-level abstraction that may depend on one or more Pods. | 189 | | **Scalability** | Scaling is done at the Pod level to handle increased demand for specific functionalities. | Scaling involves replicating Pods and maintaining service availability and load balancing. | 190 | | **Relationship** | Directly associated with the backend implementation of a specific functionality. | Acts as an abstraction layer, decoupling consumers from the specifics of Pod implementations. | 191 | | **Abstraction Level** | Low-level, dealing with individual Pod instances and their network accessibility. | Higher-level, providing a stable and abstracted entry point to access related Pods. | 192 | | | 193 | 194 | --- 195 | 196 | ### What happens if we delete the ALL pods which are linked to a service? 197 | 198 | - Service will not tell us that it is broken. 199 | - We have to manually scavenge the Endpoints in the endpoints; 200 | - The endpoint will show us none in the Endpoints if the Pods are not available. 201 | 202 | --- 203 | 204 | ### What are Controllers? 205 | 206 | - Whenever a manual effort gets automated in a way that the system is taking care of Operation it is known as Controller. 207 | 208 | --- 209 | 210 | ### POD TO POD Communication 211 | 212 | - For POD to POD communication we need to have the `Route(Permission)` and `Address(IP)` need to have the Route as well as the address. 213 | - Now Address can be of 2 types ==> Private and Public Address 214 | - In the earlier setup we have managed to keep it in the CNI Range of the Flannel i.e. the Private IP Range. 215 | - If an external user wants to connect to the private IP range it **CANNOT** connect to it as private ip ranges are not accessible outside the cluster. 216 | - As the private IPs are not resolvable; they can resolve only by Gateway of the Infrastructure which will connect to the Nodes > Pods > Application. eg> of Private IP Range (10.10.9.8) 217 | - 218 | 219 | ### Types of Services in K8 220 | 221 | `ClusterIP` - 222 | 223 | - It is the default service type 224 | - Used for internal communication of the PODs 225 | ![Alt text](image-20.png) 226 | - They can connect just via localhost:port 227 | 228 | --- 229 | 230 | `NodePort` 231 | 232 | - The simplest way to expose the application to the outside world. 233 | - It is used to connect the external user to the Nodes ==> then to the PODs 234 | - It would need a gateway to communicate with the Nodes and then it connects to the Nodes via IP Address / DNS Names and then to the PODs. 235 | - Expose the service to the external world by using IP or the DNS to the external world. 236 | 237 | `Terminologies to understand in NodePort` 238 | 239 | - `Target Port` is the PORT of the PODS 240 | - `PORT` is the Port of the Node from where the external user needs to connect 241 | - It is best suitable for Managed VPC or VPN managed Infrastructure. 242 | - ![Alt text](image-21.png) 243 | 244 | --- 245 | 246 | `Load Balancer` 247 | 248 | - When having Load balancer Service of 3rd party Cloud Provided Component is needed here. 249 | - It will associate with the POD and the Cloud Vendor[MOST NEEDED]. 250 | - To make POD public we need to have something as PUblic needs the capability of Hosting. 251 | - This Load Balancer service is Expensive; as we need to have Domains. 252 | - Load Balancer Service will not run unless we have the dedicated controllers for them. 253 | ![Alt text](image-22.png) 254 | 255 | --- 256 | 257 | ### Why go with Ingress? 258 | 259 | - For every module we can have a load balancer associated with it. 260 | - This will be a pretty expensive affair to manage LB for each module and also we will need the service for each module. 261 | ![Alt text](image-23.png) 262 | 263 | --- 264 | 265 | `Ingress` 266 | 267 | - Most widely used and popular among all the K8 Services. 268 | - It is also a paid service (but a workaround to add our IP in the host folder) 269 | - In Ingress, we maintain a table consisting of the Rules and Destinations in an Ingress Resource. 270 | - Entire traffic will be routed by the Rules present in the Ingress Resources where to navigate further. 271 | 272 | - K8 does not come with the default installation of Ingress Controllers 273 | - Nginx Controller is a flavor of Ingress Controllers. 274 | - Ingress gives an idea about what is happening inside the cluster. 275 | 276 | ![Alt text](image-24.png) 277 | 278 | --- 279 | 280 | ### LAYER 7 vs Layer 4 Routing? 281 | 282 | `Layer 7` 283 | 284 | - Layer 7 is the application layer in the OSI model 285 | - When traffic can the Route at Layer 7 then it hits at Application layer known as App Layer Routing 286 | - Layer 7 is based on the ` path-based routing` 287 | - Layer 7 has the affinity towards the API Endpoints aka Layer7 understands what the Endpoints actually are and routes according to the API endpoints. 288 | 289 | `Layer 4` 290 | 291 | - It is network Layer Routing. 292 | - Layer 4-based routing is performed on the basis of the IP Addresses; it cannot understand the path-based routing. 293 | - As there is no concept of application in the Layer4 294 | 295 | ### What are Admission Controllers? 296 | 297 | - It is an integral part of the API Server; it does not hold a separate existence such as Pods, or Deployments. 298 | - They allow the resources to be Validated and Mutate as specified in the Policies 299 | - Used to Enforce the Security Policies, Validating and Mutating Resources 300 | 301 | Workflow for Admission Controllers: 302 | 303 | - We Submit a Desired State to the API Server 304 | - The Admission Controller intercepts the request before saving the state into the ETCD(Current State of Cluster) 305 | - Can be managed using the K8 native Objects such as ValidatingWebhookConfiguration and MutatingWebhookConfiguration; 306 | - ValidatingWebhookConfiguration can Validate or Reject the object created using the Predefined Rules or Agreements 307 | - MutatingWebhookConfigurationhas the capacity to Modify / Update Resources based on the Predefined Rules or Agreements. 308 | 309 | Admission Controllers have many Flavours such as Kyverno, KubeWarden, OPA Cloud Custodian, etc; Using any of them depends on the Requirements of Compliance that need to be met according to Org Standards. 310 | 311 | PS: Just scratching the Surface here still need to dig in; share your thoughts regarding this. 312 | 313 | ![alt text]() 314 | 315 | ### What are Pause Containers? 316 | 317 | - They are an integral part of every K8's Flavours (Baremetal, Managed) 318 | 319 | - Pause Containers remain running throughout the Lifecycle of Pods and they start first among all the containers within POD. 320 | 321 | - They Share the Network Namespace(providing the Sharing IP address) and Inter Process Communication Namespace(USing mechanism like System V IPC, POSIX Message Queue) 322 | 323 | 👉 Benefits of using it 324 | 325 | - Pause Containers create a separate "Network Namespace" Isolation improves security and performance by reducing the amount of traffic that can flow in between PODS 326 | - Pause Containers is standard component of K8s; available in all flavors of K8's(Baremetal, Managed) 327 | ![alt text](Pause_Containers.png) 328 | ![alt text](pause_containers_2.png) 329 | 330 | --- 331 | 332 | ## EKS 333 | 334 | ### 335 | -------------------------------------------------------------------------------- /lock_id.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/lock_id.png -------------------------------------------------------------------------------- /monitoring_kuberetes.md: -------------------------------------------------------------------------------- 1 | ## Monitoring 2 | 3 | ## HELM 4 | 5 | - Helm is a package manager for Kubernetes applications 6 | 7 | - The Kubernetes Resources and the Articatory resources; we can package or bundle it using the `HELM` 8 | 9 | - It streamlines the process of installing, upgrading, and managing applications deployed on Kubernetes clusters. 10 | 11 | - Helm uses charts, which are packages of pre-configured Kubernetes resources, to simplify the deployment and management of complex applications. 12 | 13 | ### HELM CHARTS 14 | 15 | - A chart is a package of pre-configured Kubernetes resources that can be easily deployed. 16 | - It includes YAML manifests describing Kubernetes resources (such as deployments, services, and ingress) and customizable templates for these resources. 17 | - Charts can be versioned and shared, making it easy to distribute and reuse configurations for applications. 18 | 19 | ### HELM CLI 20 | 21 | - Helm provides a command-line interface (CLI) for interacting with charts and managing Kubernetes applications. 22 | - Developers and operators use the Helm CLI to create, package, install, upgrade, and uninstall charts. 23 | 24 | ### HELM REPOSITORY 25 | 26 | - Helm charts can be stored in a repository, making it easy to share and distribute charts across teams and organizations. 27 | - Helm supports both public and private chart repositories. 28 | 29 | ### HELM Tiller (Deprecated in Helm 3): 30 | 31 | - Tiller was the server-side component of Helm in Helm 2. It interacted with the Kubernetes API server to manage releases. 32 | 33 | - In Helm 3, Tiller has been deprecated, and Helm now interacts directly with the Kubernetes API server. This improves security and simplifies Helm's architecture. 34 | 35 | --- 36 | 37 | ## Helm Workflow: 38 | 39 | ### 1. Install Helm: 40 | 41 | Install the Helm CLI on your local machine. Helm is available for Linux, macOS, and Windows. 42 | 43 | ### 2. Create a Chart: 44 | 45 | Create a Helm chart to define the structure and configuration of your application. 46 | 47 | ### 3. Package the Chart: 48 | 49 | Package the chart into a compressed archive (.tgz file). 50 | 51 | ### 4. Install the Chart: 52 | 53 | Install the chart on a Kubernetes cluster using the Helm CLI. 54 | Helm will create a release, which is an instance of a chart running on the cluster. 55 | 56 | ### 5.Upgrade and Rollback: 57 | 58 | Use Helm to upgrade or roll back releases as needed. This allows you to make changes to your application's configuration or deploy new versions. 59 | 60 | ### 6. Explore Helm Repositories: 61 | 62 | Explore public or private Helm repositories to discover and use charts created by others. 63 | 64 | --- 65 | 66 | ## Helm 3: 67 | 68 | Helm 3, the latest version of Helm, introduced several improvements and changes, including the removal of Tiller. In Helm 3, Helm directly interacts with the Kubernetes API server, enhancing security and simplifying Helm's architecture. 69 | 70 | To get started with Helm 3, you can use the following commands: 71 | 72 | ``` 73 | # Initialize Helm (one-time setup) 74 | helm init --upgrade 75 | 76 | # Create a new Helm chart 77 | helm create mychart 78 | 79 | # Install a chart 80 | helm install my-release ./mychart 81 | 82 | # Upgrade a release 83 | helm upgrade my-release ./mychart 84 | 85 | # Uninstall a release 86 | helm uninstall my-release 87 | 88 | ``` 89 | 90 | - Helm is widely used in the Kubernetes ecosystem to manage the deployment and lifecycle of applications, making it easier to package, version, and share Kubernetes configurations. 91 | 92 | ### Installing Helm 93 | 94 | ``` 95 | https://helm.sh/docs/intro/install/ 96 | ``` 97 | 98 | ### Install monitoring components using the helm 99 | 100 | ``` 101 | # Create a Namespace "monitoring" 102 | kubectl create namespace monitoring 103 | 104 | # Download the repo for Prometheus 105 | helm repo add prometheus-community https://prometheus-community.github.io/helm-charts 106 | 107 | # Update the Repo 108 | helm repo update 109 | 110 | # To list the Charts in all namespaces 111 | helm ls -A 112 | 113 | # Install Prometheus from the Charts 114 | helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring 115 | 116 | ## Gives us the output like 117 | NAME: prometheus 118 | LAST DEPLOYED: Fri Dec 22 13:55:57 2023 119 | NAMESPACE: monitoring 120 | STATUS: deployed 121 | REVISION: 1 122 | NOTES: 123 | kube-prometheus-stack has been installed. Check its status by running: 124 | kubectl --namespace monitoring get pods -l "release=prometheus" 125 | 126 | Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator. 127 | 128 | ## Switch the namespace to `monitoring` [Refer to kubens installation below] 129 | kubens monitoring 130 | 131 | ## Chekout pods in monitoring namespaces 132 | kubectl get pods 133 | 134 | 135 | ## Gives out the response as 136 | NAME READY STATUS RESTARTS AGE 137 | alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 9m57s 138 | prometheus-grafana-ff7876654-rqxrs 3/3 Running 0 10m 139 | prometheus-kube-prometheus-operator-5f84b5dc75-2qkjh 1/1 Running 0 10m 140 | prometheus-kube-state-metrics-6bbff75769-shznd 1/1 Running 0 10m 141 | prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 9m57s 142 | prometheus-prometheus-node-exporter-q5xd8 1/1 Running 0 10m 143 | 144 | 145 | ## It not only creates the Pods but also other type of resources like deployments etc within itself like replica set, deployments, daemon sets etc 146 | 147 | ## We will expose our services for the Prometheus and Grafana 148 | kubectl expose service prometheus-grafana --type=NodePort --name=grafana-lb --port=3000 --target-port=3000 -n monitoring 149 | kubectl expose service prometheus-kube-prometheus-prometheus --type=NodePort --name=prometheus-lb -n monitoring 150 | 151 | ## Observe the services 152 | kubectl get svc 153 | 154 | ## O/p as 155 | NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 156 | grafana-lb NodePort 10.108.102.55 3000:32352/TCP 18s 157 | prometheus-lb NodePort 10.110.87.71 9090:31993/TCP,8080:31641/TCP 18s 158 | 159 | ``` 160 | 161 | - Prometheus is running on the `31993` as we have not provided it the `NODE PORT` it randomly chooses it from the `30000 - 32767` 162 | - Access it on the browser using Minikube IP 163 | 164 | ``` 165 | # TO access the minikube IP 166 | minikube ip 167 | 168 | #GIves out the IP 169 | 192.168.49.2 170 | ``` 171 | 172 | ### TO access the Prometheus Dashboard in the Browser 173 | 174 | ``` 175 | # MINIKUBE IP: NODE_PORT 176 | 192.168.49.2:31993 177 | ``` 178 | 179 | ### It will give out the Prometheus Dashboard 180 | 181 | ![Alt text](image-25.png) 182 | 183 | - It is a 3rd Factory Product which is provided by some engineers and wee are using and releasing it; such softwares are called as `OPEN SOURCE` 184 | - Prometheus does NOT monitor the service/pods 185 | - Prometheus only monitors the `Endpoints`; we can have 1 different pod but Prometheus will monitor only the endpoints. 186 | - For the Application to be monitored by Prometheus there needs to be certain rules which need to be followed by the Application 187 | 188 | ``` 189 | # Lets monitor the enpoints for the HPA then it will acts as 190 | kubectl get hpa -n default 191 | 192 | # Whenever the deployment scales then PODS increases and all entries are maintained in the Endpoints; Prometheus monitoring the endpoints will also get the updated entries of the POD; we can monitor it in the Prometheus as well 193 | kubectl get hpa -n default 194 | kubectl get ep -n default 195 | 196 | NAME ENDPOINTS AGE 197 | myapp-production-service 10.244.0.66:9000 9d 198 | 199 | ``` 200 | 201 | - In the Status Dropdown => Targets we will get our Endpoints that are observed in the Prometheus. 202 | ![Alt text](image-26.png) 203 | 204 | ### RUNNING GRAFANA DASHBOARD IN BROWSER 205 | 206 | - It will be the same as that of the Prometheus steps 207 | - Get the NODEPORT from the service 208 | 209 | ``` 210 | kubectl get svc grafana-lb -n monitoring 211 | ``` 212 | 213 | - We will be getting it as 214 | 215 | ``` 216 | NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE 217 | grafana-lb NodePort 10.108.102.55 3000:32352/TCP 29m 218 | ``` 219 | 220 | - The Grafana Dashboard will run on the PORT `32352` and can access it using the `MinikubeIP` 221 | - Access it in the browser using the 222 | 223 | ``` 224 | # minikubeip:NODEPORT 225 | 192.168.49.2:32352 226 | 227 | ``` 228 | 229 | - It will pop the Grafana Dashboard 230 | - Login with the credentials like 231 | 232 | ``` 233 | username: admin 234 | password: prom-operator / admin 235 | ``` 236 | 237 | - The Dashboard will look as follows 238 | ![Alt text](image-27.png) 239 | 240 | - Grafana acts as another layer that acts on Prometheus 241 | 242 | - The flow will work as 243 | An application running on K8's Cluster => Prometheus monitoring the Cluster [Application that follows the rules(adherent to the rules); can only be monitored by Prometheus] => Grafana works on top of Prometheus 244 | - This can be illustrated as follows 245 | ![Alt text](image-28.png) 246 | - For Alerting we have some other layers like `OpsGenie` => responsible for getting automated the calls; so DevOps folks can do `on call changes` 247 | 248 | - Remember the Flow from Automated Calls to K8's Endpoints 249 | 250 | - Automated Calls[Oncall] => Dashboard of Grafana ==> Made using Prometheus metrics ==> Gets data from Endpoints ==> Endpoint is being monitored by Prometheus 251 | 252 | - The Dashboard we can have monitoring tabs on PODS, DEPLOYMENTS, etc on various other resources. 253 | - The K8's Infrastructure and the Underlying Application will require some integrations os as to get the application metrics in the Prometheus. 254 | - The required Integrations are `Prometheus Object` and `Service Monitoring Object[SMO]` 255 | - The `Prometheus Object is taken care by the Helm`(Helm ensures that the Prometheus object is made when pulling the repo) 256 | - `Prometheus Object` Creation is a one-time task 257 | - The `Service Monitoring Object` needs to be created by us for the underlying application. 258 | - In the `SMO` Object we need to define the POD and the ENDPOINT(PORT) 259 | - SMO Object can keep on increasing for the UI, App-layer, DB, Nginx 260 | - We have it present under the file named `smon.yaml` 261 | 262 | - Contents of which are as follows 263 | 264 | ``` 265 | apiVersion: monitoring.coreos.com/v1 266 | kind: ServiceMonitor 267 | metadata: 268 | labels: 269 | name: myapp-production 270 | namespace: default 271 | spec: 272 | endpoints: 273 | - interval: 30s 274 | port: web 275 | selector: 276 | matchLabels: 277 | app: myapp-production 278 | ``` 279 | 280 | ### Decipher the ServiceMonitor Yaml file 281 | 282 | - In the context of Kubernetes monitoring, a `ServiceMonitor` is an object used by Prometheus to define **how it should monitor a specific service.** 283 | - This YAML file contains configuration details for Prometheus to scrape metrics from a service named `myapp-production` in the `default` namespace. 284 | - `spec`: Contains the specification or configuration for the ServiceMonitor. 285 | 286 | - `endpoints`: Defines the endpoints that Prometheus should scrape metrics from. 287 | 288 | - `interval`: Specifies the interval at which Prometheus should scrape metrics from the specified endpoint. In this case, it's set to 30s. 289 | 290 | - `port`: Specifies the service port from which Prometheus should scrape metrics. In this example, it's named web. 291 | 292 | - `selector`: Specifies the selector to identify the target service. 293 | 294 | - `matchLabels`: Specifies the labels that Prometheus should use to identify the target service. In this example, it's looking for a service with the label app: myapp-production. 295 | 296 | - Apply the smon.yaml file 297 | 298 | ``` 299 | kubectl apply -f smon.yaml 300 | 301 | ## Get the SMO object 302 | 303 | kubectl get smon -n default 304 | ``` 305 | 306 | ### WORKAROUND TO get only our resources ENDPOINTS AND SERVICE DISCOVERY IN PROMETHEUS 307 | 308 | - This will break the `Grafana` 309 | - Switch to the monitoring namespace 310 | 311 | ``` 312 | kubens monitoring 313 | ``` 314 | 315 | - Get the Prometheus Object and Edit the Details 316 | 317 | ``` 318 | kubectl edit prometheus prometheus-kube-prometheus-prometheus 319 | ``` 320 | 321 | - Replace the following snippet 322 | 323 | ``` 324 | ### replace the following code 325 | serviceMonitorNamespaceSelector: {} 326 | serviceMonitorSelector: 327 | matchLabels: 328 | release: prometheus 329 | 330 | ### With the following code 331 | serviceMonitorNamespaceSelector: 332 | matchLabels: 333 | kubernetes.io/metadata.name: default 334 | serviceMonitorSelector: {} 335 | 336 | ### Restart the prometheus-kube-prometheus-prometheus 337 | kubectl get pods 338 | 339 | kubectl delete pods prometheus-kube-prometheus-operator- 340 | 341 | ### new pods should come up in some time 342 | kubectl get pods 343 | 344 | ### Verify it in the Prometheus Dashboard 345 | ``` 346 | 347 | --- 348 | 349 | ### Q. IF we want to observe our system in a 24\*7 environment How will we monitor in PROD? 350 | 351 | - K8 facilitates us with the logging, but the dashboard for monitoring is one thing that the dashboarding and default monitoring setup. 352 | 353 | - Monitoring Setup is essential in the Production; it provides automated alerts and a tool that is able to call us based on certain pre-filled automation rules and Prometheus, grafana provide us with the complete setup/picture that is missing from Kubernetes. 354 | 355 | - So we can do the data analysis on the Logs and can monitor the Logs even of the previous years. 356 | 357 | - Kubernetes itself does not directly manage or store node-level logs; it delegates this responsibility to the underlying container runtime. Therefore, the exact location and method for accessing logs depend on the container runtime in use in your Kubernetes cluster. 358 | 359 | - In case of Container Runtime as "Docker" logs are stored under `/var/lib/docker/containers//-json.log` 360 | 361 | --- 362 | 363 | ### What is KUBENS? 364 | 365 | - `kubens` is a command-line utility that helps you switch between Kubernetes namespaces quickly. It is part of the Kubectx project, which provides enhancements to working with Kubernetes contexts and namespaces. The Kubectx project includes two main tools: kubectx and kubens. 366 | 367 | - `kubens (Kube Namespace Switcher)`: 368 | - The Kubens tool simplifies the process of switching between Kubernetes namespaces. 369 | - It provides an easy-to-use command to list available namespaces and switch to a different namespace. 370 | - The primary goal is to streamline namespace-related operations, making it more convenient for users who work with multiple namespaces in Kubernetes clusters. 371 | 372 | ### Instllation of KUBENS 373 | 374 | 1. Clone the `kubectx` from the Git 375 | 376 | ``` 377 | git clone https://github.com/ahmetb/kubectx.git ~/.kubectx 378 | 379 | ``` 380 | 381 | 2. Add the following lines to your shell profile file (e.g., ~/.bashrc, ~/.zshrc, etc.): 382 | 383 | ``` 384 | export PATH=~/.kubectx:$PATH 385 | alias kubectx='kubectx' 386 | alias kubens='kubens' 387 | ``` 388 | 389 | Source the updated profile 390 | 391 | ``` 392 | source ~/.bashrc 393 | ``` 394 | -------------------------------------------------------------------------------- /pause_containers_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/pause_containers_2.png -------------------------------------------------------------------------------- /ray-so-export (13).png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adityadhopade/interview_deck/256aa3119ddc550aba345b1cdf8d7ae46ac23550/ray-so-export (13).png -------------------------------------------------------------------------------- /reference.txt: -------------------------------------------------------------------------------- 1 | nginx 2 | "Gateway": "172.17.0.1", 3 | "IPAddress": "172.17.0.3", 4 | 5 | 6 | flask_app: 7 | "Gateway": "172.17.0.1", 8 | "IPAddress": "172.17.0.2", 9 | 10 | 11 | _________________________________________ 12 | 13 | New references 14 | 15 | flask_app 16 | "Gateway": "172.19.0.1", 17 | "IPAddress": "172.19.0.3", 18 | nginx 19 | "Gateway": "172.19.0.1", 20 | "IPAddress": "172.19.0.2", 21 | -------------------------------------------------------------------------------- /setting_up_hpa.md: -------------------------------------------------------------------------------- 1 | ## The images are present locally and not in the DockerHub and want to be used locally in the minikube then you can make use of the setup steps as follows 2 | 3 | ``` 4 | # Start the minikube 5 | minikube start 6 | 7 | # Set docker env 8 | eval $(minikube docker-env) # Unix shells 9 | 10 | # Check if the images are present locally 11 | docker image ls 12 | 13 | # IF not present build it using Docker 14 | docker build -t myapp:baseImage 15 | 16 | # Run in Minikube 17 | kubectl apply -f myapp-base-deployment.yaml 18 | 19 | # Add the image pull policy set to Never; IfNotPresent 20 | image-pull-policy=Never 21 | ``` 22 | 23 | --- 24 | 25 | # SEtting UP the HPA 26 | 27 | ###################################################################################### 28 | 29 | # Setup 30 | 31 | ###################################################################################### 32 | 33 | ### Before running deployment make sure the images are present in the system, use updated-image-with-metrics for building. 34 | 35 | ``` 36 | ### Deploy the production deployments 37 | kubectl apply -f example-myapp-production/deployment.yaml 38 | 39 | ### deploy the production deployments svc 40 | kubectl apply -f example-myapp-production/svc.yaml 41 | 42 | ### Create nginx for testing 43 | kubectl run nginx --image=nginx 44 | 45 | ### curl the myapp-production service from nginx prod 46 | kubectl exec -it nginx bash 47 | curl myapp-production-service.default.svc.cluster.local:80 48 | ``` 49 | 50 | ## For setting up the HPA what we need is to have the following 51 | 52 | ## In order for HPA to work it needs a `KuberMetricsServer` 53 | 54 | - It will take the image from the `adityadho/myapp:productionImage` from `DockerHub` 55 | - Check if the hpa is present `k top nodes` 56 | - If it is not installed it will pop not available; we can install the `Kube- Metric Server` 57 | 58 | ``` 59 | kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml 60 | ``` 61 | 62 | - if the READY value is 0/1 for deployment.apps/metrics-server after running the below command 63 | 64 | ``` 65 | kubectl get deploy -n kube-system 66 | ``` 67 | 68 | - then do the following as per https://dev.to/docker/enable-kubernetes-metrics-server-on-docker-desktop-5434 69 | 70 | ``` 71 | wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml 72 | ``` 73 | 74 | - add below arg at Deployment->spec->template->spec->args 75 | 76 | ``` 77 | --kubelet-insecure-tls 78 | ``` 79 | 80 | - HPA Comes as an Object to Kubernetes 81 | 82 | ``` 83 | kubectl api-resources 84 | 85 | kubectl explain hpa --recursive 86 | 87 | ``` 88 | 89 | ## What is HPA? 90 | 91 | - It points to one deployment and keep monitoring the resources of deployment with the target percentages that we provide. 92 | - Let us try to dry run the HPA first 93 | 94 | ``` 95 | kubectl autoscale deployment myapp-production-deployment --cpu-percent=50 --min=1 --max=5 -o yaml --dry-run=client 96 | ``` 97 | 98 | - Dry run data will be as shown below 99 | 100 | ``` 101 | apiVersion: autoscaling/v1 102 | kind: HorizontalPodAutoscaler 103 | metadata: 104 | creationTimestamp: null 105 | name: myapp-production-deployment 106 | spec: 107 | maxReplicas: 5 108 | minReplicas: 1 109 | scaleTargetRef: 110 | apiVersion: apps/v1 111 | kind: Deployment 112 | name: myapp-production-deployment 113 | targetCPUUtilizationPercentage: 50 114 | status: 115 | currentReplicas: 0 116 | desiredReplicas: 0 117 | ``` 118 | 119 | ## Explaination of the HPA dry RUN !? 120 | 121 | - Whenever my CPU percentage remains 50%; autoscaling should trigger for the `kind:Deployment` and for the deployment name `myapp-production-deployment` and if the POD reaches the `CPUcapacity of 50%` we `trigger one more POD` 122 | - It can `NOT only scale-up` ; it can `scale down also` 123 | 124 | - Let us try and bring down the CPU Utilization to `30%` we can get it by changing the `--cpu-precent` field set to `--cpu-precent=30` 125 | 126 | - By default, HPA provides the autoscaling using the triggers of `CPUUtilization` and `MemoryUtilization` we can use them both and in isolation as well. 127 | 128 | - **[NOTE]** If we need some custom triggers then we need to define them on our own; so the parameters on which we are judging will be different but the process of Triggering will be the same. 129 | 130 | - If we have an application and have certain endpoints; here we are managing the infrastructure; if we want to manage the application of the POD we need to know and monitor them using the `som objects` 131 | 132 | - Service => Reaches to POD ==> Within POD we need to have defined Endpoints to monitor ==> It cannot be like a piece of code that is not exposed outside 133 | 134 | ``` 135 | #Changing the CPU utilization to 30% 136 | kubectl autoscale deployment myapp-production-deployment --cpu-percent=30 --min=1 --max=5 137 | 138 | # Checkout the POD's 139 | kubectl get pods 140 | 141 | # below command would only work if we have the metric server installed 142 | 143 | k top pods ==> gets us the usage of the pods 144 | 145 | ``` 146 | 147 | - IT can even try to scale down if the CPU Utilization is not triggering the CPULIMITS provided in the HPA Autoscaler. `(It takes time to autoscale down, autoscale up)` 148 | 149 | - If we try to get the HPA object it will give us the 150 | 151 | ``` 152 | NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE 153 | myapp-production-deployment Deployment/myapp-production-deployment 10%/30% 1 5 2 3m14s 154 | ``` 155 | 156 | ## Generate the Load for HPA !? 157 | 158 | - We can also start generating loads on it to test out the HPA working as per our expectations or not?! 159 | 160 | - The commands to add the load on the resource can be used as follows 161 | 162 | ``` 163 | kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "count=0; while sleep 0.01; count=$((count+1)); do wget -q -O- http://myapp-production-service.default.svc.cluster.local:80; echo "\n"; done" 164 | 165 | ## Chnage the name of the container form load-generator to load-generator1, load-generator2 etc (If runnning opn one or more terminals) 166 | ``` 167 | 168 | ## Decipher the command used above 169 | 170 | - `kubectl run`: Initiates the creation of a resource in the cluster. 171 | 172 | - `i --tty`: Allocates an interactive terminal(pseudo terminal) and connects it to the pod. 173 | 174 | - `load-generator` : The name of the pod being created. 175 | 176 | - `--rm` : Deletes the pod upon termination. This makes it a one-time job; (can be used with the Container as well) 177 | 178 | - `--image=busybox:1.28`: Specifies the Docker image to be used for the pod. In this case, it's BusyBox version 1.28. 179 | 180 | - `--restart=Never`: Specifies that the pod should not be restarted automatically. 181 | 182 | - `/bin/sh -c "count=0; while sleep 0.01; count=$((count+1)); do wget -q -O- http://myapp-production-service.default.svc.cluster.local:80; echo "\n"; done":` The command to run inside the pod. It's a shell script that uses a while loop to repeatedly make HTTP requests to the specified URL (http://myapp-production-service.default.svc.cluster.local:80). It increments a counter (count) and sleeps for 0.01 seconds between each request. 183 | 184 | - We would get multiple pods running with the replicas now 185 | 186 | ``` 187 | NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE 188 | myapp-production-deployment Deployment/myapp-production-deployment 85%/30% 1 5 3 17m 189 | ``` 190 | 191 | - To keep a tab on the load generated we can make the use of `watch` 192 | 193 | ``` 194 | watch kubectl get hpa 195 | ``` 196 | 197 | - Observing it for some time leads to the maximum capacity of the replication here we have set maximum=5; so it will go up to 5 `(Scaled to the maximum using Autoscaler)` 198 | - The `minimum` number set will also override what is written within the deployment manifests. 199 | 200 | ``` 201 | aditya@aditya-Inspiron-3576:~/instructors/syed nadeem/project_sessions/session-6$ k get hpa 202 | NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE 203 | myapp-production-deployment Deployment/myapp-production-deployment 22%/30% 1 5 5 22m 204 | ``` 205 | 206 | - let us try to Scale Down the Pods in the Autoscaler by `Stopping the load-generators`; The replicase will start to vanish in some time; it can go to the minimum set to 1 `(De-scaling to the minimum using Autoscaler)`. 207 | 208 | ``` 209 | ## checkout the pods 210 | kubectl get pods 211 | ``` 212 | 213 | - This will take some time to `de-scaled` 214 | 215 | ``` 216 | aditya@aditya-Inspiron-3576:~/instructors/syed nadeem/project_sessions/session-6$ k get hpa 217 | NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE 218 | myapp-production-deployment Deployment/myapp-production-deployment 10%/30% 1 5 2 33m 219 | ``` 220 | 221 | - `K8 has controller objects; HPA Comes inherently to the K8's as HPA is the native K8's Object.` 222 | 223 | - **[NOTE]**: `In the autoscaling we need to set a cap on the MAXPODS for the Infra-CostSavings` 224 | - No notifications for scaling up and scaling down 225 | - They need to perform the `PerformanceTesting` and then only release after examining the `XTPS(Transactions Per Seconds)` 226 | 227 | - Not recommended to use tghe HPA and VPA (Beef ME Up) together; the System does not know what to do either scale horizontally or uppgrade existing infra (beefing up the Infra). 228 | 229 | - **[NOTE]** VPA is not a Native K8 Object; we need to install something like a K8 Operator for VPA. 230 | 231 | ### Q How to define/calculate `thresholds` like the min and max of the application for HPA, TPS(Transaction per Second) etc? 232 | 233 | - When start building the application we do not need to define the `MAXPODS` while developing the application. 234 | - Once we do the Unit Testing and Functionality Testing then we move on to the Higher environments like QA STAGING(PRE-PROD) PROD 235 | - We will define our thresholds there in Higher Environments like TPS(Transactions Per Second) and some other parameters like that. 236 | - In Promethues, we monitor N/W latency, N/W throughput, Response, and Request Time these are some things we cannot alter/modify during the deployment but CPU and MEM we can change. 237 | - There is something called as `Quality of Service` 238 | 239 | - In the Production Environment the request and Limits are set differently (cpu limit=100 and cpu request =10) os generally it is kept in the ratio of like `limit:request= 1:10` 240 | - It can act as `Vertical Scaling` => In this case when a CPU request tries and gets towards CPU limit (100) in our case; it would try to vertically scale until the whole CPU limit is consumed. 241 | - SO what we will do is we try to add a trigger for `Horizontal Pod Scaling` at before letting the CPU consume till `CPUlimit(100)`; Suppose we set it at `(60%)`; Whenever the CPU Request hits 60% and beyond it will Horizontally Scale and generate the replicas to distribute the load 242 | - **[NOTE]** (The replicas generated will also have the same CPU LIMITS and CPU REQUESTS as of Orignal One) 243 | - **[NOTE]** SO if Replica's CPU utilization also hits 60% then also it will regenerate a new replica for that. 244 | 245 | ### What is QOS(Quality Of Service) in Kubernetes? 246 | 247 | - In Kubernetes, Quality of Service (QoS) refers to the categorization of pods based on the resource requirements and resource usage patterns. 248 | - Kubernetes uses QoS classes to make decisions about how to prioritize and evict pods when the cluster is under resource pressure. 249 | 250 | - **Guaranteed (QoS Class: Guaranteed):** 251 | 252 | - Pods in the Guaranteed QoS class have both memory (memory. request) and CPU (CPU. request) resource requests specified. 253 | - These pods are guaranteed to get the resources they request. They won't be evicted due to resource constraints unless they exceed their specified resource limits. 254 | - The resources reserved for these pods are exclusive, and they are not shared with other pods. 255 | 256 | - **Burstable (QoS Class: Burstable):** 257 | 258 | - Pods in the Burstable QoS class have resource requests (memory. request and/or cpu.request) but no resource limits (memory. limit and/or CPU. limit) specified. 259 | - These pods can consume resources beyond their requests, up to their specified limits, if available. However, they may be subject to throttling if the node becomes resource-constrained. 260 | - Burstable pods share resources with other burstable and best-effort pods on the node. 261 | 262 | - **Best-Effort (QoS Class: BestEffort):** 263 | 264 | - Pods in the Best-Effort QoS class have neither resource requests nor resource limits specified. 265 | - These pods get what is available on the node. They don't have guarantees for resources, and they are the first to be evicted if the node runs out of resources. 266 | 267 | - The QoS classes are used by the Kubernetes scheduler and eviction manager to make decisions when the cluster is under resource pressure. For example, when a node is running out of resources, the scheduler may prioritize the eviction of best-effort pods first, then burstable pods, and finally, it will only evict guaranteed pods if necessary. 268 | 269 | - Understanding and setting appropriate QoS for your pods is important for resource management and ensuring that critical workloads get the resources they need. It helps Kubernetes make intelligent decisions when there is contention for resources on a node. 270 | 271 | --- 272 | -------------------------------------------------------------------------------- /terraform_questions.md: -------------------------------------------------------------------------------- 1 | ## TERRAFORM QUESTIONS 2 | 3 | ### How to limit the access control in the tfstatefile ? 4 | 5 | - Add a DynamoDb table with the `lock_id`; it ensures that only 1 person could apply lock at a time. 6 | - Interpretation would be like as follows. 7 | ![alt text](lock_id.png) 8 | 9 | ### How to determine if changes are done in an AWS Resources? How to find who did this ? 10 | 11 | - If Done via AWS SERVICES we can use the =. CloudTrail 12 | 13 | ### What are different Backends for storing the TF State ? 14 | 15 | - The backend ios most popularly stored in the `AWS S3`, then there are `Hashicorp Consul` , `Azure Blob Storage` and `Google Cloud Storage` 16 | - When the files are kept in the backned then the Local State File can be deleted and it will be easier to maintain the versions. 17 | 18 | ### What are Benefits of using the Remote Backend? 19 | 20 | - Improved Collabration among the Team Members. 21 | - State Locking can be provided to prevent the Concurrent Operations. 22 | - Enhanced Security by storeing the state data in the centrailized locations. 23 | 24 | ### How to use the TF State file in team env to prevent conflict? 25 | 26 | - Terraform provides the `state-locking` of remote backends such that only a user can initiate terraform operation at a time. 27 | - `S3` - It uses DynamoDB for state locking using `id` as `lockid` 28 | 29 | ### How to manage the External Sensitive Data; such as API KEYS in Terraform State? 30 | 31 | - WE can use the external secret management tool like `Hashicorp Vault` for storing the API KEYS and Secrets. 32 | - OR can use the `ENV_VARIABLES` [but it will go as soom as sessions terminates]. 33 | 34 | ### What startegies can you employ for mamaging Terraform State across multiple environments like (Dev, Staging, Prod etc)? 35 | 36 | - We can make use of the the `terraform workspace` to manage multiple environments with same statefile. 37 | 38 | ``` 39 | terraform workspace list 40 | terraform workspaceselect production 41 | terraform workspace show 42 | terraform workspace delete test 43 | ``` 44 | 45 | ### What is Drift Detection? 46 | 47 | - It is the process of identifying the desired state declared in terraform filesand actual state of deployed infrastructure. 48 | - Drift in infra can be due to => someone manulaly chnaging he resources in the cloud providers console. 49 | - It comapres {tf current state of Resources} Vs {state recorded in tf state files} 50 | - Checkout the drift in Infrastructure using ==> {terraform plan} 51 | 52 | ### What are tainted Resources? 53 | 54 | - They are `resources which are destroyed & recreated with each "terraform apply"` 55 | 56 | ``` 57 | terraform taint aws_resource.my_example 58 | ``` 59 | 60 | ### Does TF suppport multi provider Deployment ? 61 | 62 | - YES; as TF is Cloud Agnoistic. 63 | 64 | ### What are some of the Built In Provisoners? 65 | 66 | - File Provisioners, Dirname, abspath 67 | 68 | ### How can you upgrade plugins on TF? 69 | 70 | - terraform init -upgrade 71 | 72 | ### How can you define dependencies in the TF? 73 | 74 | - USing the `depends_on` 75 | 76 | ### What is terraform show? 77 | 78 | - It is used to provide the human readable O/P from a state or Plan File. 79 | 80 | ### YOU have existing indfrastructure in the AWS not in the TF Code ? How to bring that Infra in the Terraform Control? 81 | 82 | - If we just want the state you can either just import it using the `terraform import` 83 | - WE want the whole code to be made; either do it by yourself or make use of the OpenSource tool using `Terraformer` 84 | 85 | ### If N people are using the TF; How to prevent team to bring up the resources in AWS/GCP which are too expensive? 86 | 87 | - Can be a way using the `Open policy Agent` 88 | 89 | ### How to tackle secrets in TF? 90 | 91 | - We can make the variable type as `sensitive` 92 | - Integrate it withe External Secret Provider like Vault. 93 | 94 | ### What is the use of the Data Resources? 95 | 96 | - DataResources are usd to refer to resources taht already exist in the AWS eg.: AWS AMI 97 | 98 | ### What is Terraform Workspace? 99 | 100 | - They are sued to perform the isolation; here the seprate statefiles for each environment like `DEV, QA, Staging, Production`. 101 | 102 | ``` 103 | terraform workspace select 104 | ``` 105 | 106 | ### Terraform variable precedence order [1-4; 4 being Highest Order] ? 107 | 108 | - 1. Environment variables 109 | - 2. terraform.tfvars 110 | - 3. \*.auto.tfvars 111 | - 4. -var or --var file 112 | 113 | - **Variable Types**: String, Number, Boolean, Array, List, Map, Set, Object, tuples 114 | 115 | ### How can we modify aonly certain resources in the Terraform? 116 | 117 | - Use `target` flag in tf command; it will mark that resource and recreate. 118 | - OR WE can use the `terraform taint` but should be careful 119 | 120 | 121 | ### What happens if your state file is accidentally deleted? 122 | Terraform loses track of managed infrastructure. On next apply, it attempts to recreate everything, causing duplicates or failures. Recovery requires manual imports or restoring backups. Always enable versioning on S3 state storage. 123 | 124 | ### How do you handle large-scale refactoring without downtime? 125 | Use "terraform state mv" to rename resources without destroying them. Control changes with targeted applies. Split refactoring into multiple non-destructive PRs and verify plans carefully to prevent resource destruction. 126 | 127 | ### What happens if a resource fails halfway through a terraform apply? 128 | Answer: Terraform creates a partial deployment with successful resources running but failed ones marked as tainted. Use targeted applies and "-refresh-only" to recover systematically. 129 | 130 | ### How do you manage secrets in Terraform? 131 | Use external secret stores (Vault, AWS Secrets Manager), ensure state encryption, mark outputs as sensitive, and integrate securely with CI/CD. For highly sensitive values, consider managing them outside Terraform completely. 132 | 133 | ### What happens if terraform plan shows no changes but infrastructure was modified outside Terraform? 134 | Terraform remains unaware until "terraform refresh" is run. Implement regular drift detection in your CI/CD process to catch unauthorized changes. 135 | 136 | ### What happens if you delete a resource definition from your configuration? 137 | Terraform destroys the corresponding infrastructure. Either use "terraform state rm" first or implement "lifecycle { prevent_destroy = true }" for critical resources. 138 | 139 | ### What happens if Terraform provider APIs change between versions? 140 | Compatibility issues may arise. Always read release notes, use version constraints, test upgrades in lower environments, and consider targeted updates for gradual migration. 141 | 142 | ### How do you implement zero-downtime infrastructure updates? 143 | Use "create_before_destroy" lifecycle blocks, blue-green deployments, health checks, and state manipulation for complex scenarios. For databases, use replicas or managed services with failover capabilities. 144 | 145 | ### What happens if you have circular dependencies in your Terraform modules? 146 | Terraform fails with "dependency cycle" errors. Refactor module structure using data sources, outputs, or restructuring resources to establish clear dependency hierarchy. 147 | 148 | ### What happens if you rename a resource in your Terraform code? 149 | Terraform sees this as destroying and recreating the resource. Use "terraform state mv" to update state while preserving infrastructure, avoiding rebuilds and downtime. --------------------------------------------------------------------------------